Pandemic Legion  
 
 
 
 
 
 
 
 
 
 
 
 

Go Back   Pandemic Legion > Public > Stories, Chatlogs, Complaints and Videos...
Welcome, Shamis Orzoz.
You last visited: Today at 01:51
Private Messages: Unread 0, Total 4078.

Your Recent IPS: ( 82.123.47.163, 46.4.25.73, 82.242.72.50, 80.254.147.116, 69.78.133.12 )
Reply
 
Thread Tools Search this Thread Rate Thread Display Modes
Old 2011-01-24, 15:30   #1
Resigned
 
Sniggerdly - Asia
Kills:  1,200 (1)
Losses:  0 (0)
Posts: 1,857
Join Date: 2006 Nov
Downloads: 0
Uploads: 0
Tommy Suharto is on a distinguished road
Default EVE Devblog: Internet spaceship crashes are serious business

Greetings Capsuleer,

This blog might be slightly odd compared to the normal dev blogs you read around here, but in a good way. After the Council of Stellar Management was here some weeks ago and had the opportunity to sit down with senior testers and Quality Assurance leads, we realized that Quality Assurance is something which we rarely talk about in terms of the ongoing process it is; rather we refer to it once in a while when something goes wrong. This is a terrible shame, so I figured it might be time to sit down and share some insights into one of the things we rarely talk about much, but tends to create bad experiences for players: Crashes.

Оnе of the most annoying things I can think of‚ especially on games like EVE Оnlinе‚ is crashes. You might be in a fleet moving around, doing a mission, fighting an officer spawn, or otherwise doing something where a crash is just exactly what you didn't need. And it might cause you to lose your shiny new ship, which can ruin both your experience and wallet. It's almost the worst case scenario.

Crashes are serious business

As a direct consequence of the immersion-shattering experience a crash is, we go to great lengths to try and avoid them. Оnе event that especially comes to mind was the deployment of Tyrannis 1.1 ("UICore"). During the dry runs‚ where all of Quality Assurance runs tests to make sure that critical functionality works on Tranquility post-deployment, a tester found a crash which would happen if you closed the Fitting Window before it had fully rendered the scene which renders your ship. While it's a bug that's easily avoidable, it is still an example of one with a potential cause for a lot of grief. Thus, the call was made by the people in charge to keep Tranquility down for an extra 5 hours and 5 minutes to build and test a set of new patches to resolve this issue.

Unfortunately, a scenario like this is a luxury we rarely have. In this case, a clear set of reproduction steps were present, which made verification and testing a fix something that could be done very quickly. But in most cases, we have little information to go on. We get the occasional bug report (which we greatly appreciate, keep them coming!) which we can sometimes reproduce. Despite these issues, we have a couple of tricks up our sleeve, which I'll now show you.

Client statistics and Winqual

You know how, when an application crashes, Windows will prompt you to submit the crash report to Microsoft? Yeah, that thing. Most people tell you not to submit it, because it doesn't make a difference and is a potential privacy risk. Yes, I too thought that back in the day when it was introduced with Windows XP. Back when I was an EVE Оnlinе player and experienced the occasional crash‚ I'd go "pfft, this won't make a difference even if I submit it." Boy, was I wrong!

As it turns out, the data you're prompted to submit is invaluable for tracking down crashes. In the absence of clear reproduction steps, they're the second best thing we can hope for when we find a crash issue. We get the data you submit through a Microsoft program called "Winqual," which allows us to see statistics about certain crash "signatures," and get crash dumps which allows us to see in which part of the code the crash occurred, and subsequently fix it.

For instance, when we were deploying Incursion 1.1.0, we were deploying no less than seven fixes to different crashes in different parts of the subsystems of EVE Оnlinе. This is only possible thanks to the men and women who‚ when prompted by Windows to submit the crash report, actually submit it.

When we deploy any patch, we keep a very close eye on not just forum and in-game channels, but also different channels we have. Оnе is winqual‚ and the other is called “Client Statistics.” This is data we sample from Tranquility which is written to our database once an hour with different data, such as crashes, memory usage, CPU time and ping times. From that, we’re able to see if a patch has had an impact on crashes. Here’s an example of what a normal day on Tranquility looks like:

http://cdn1.eveonline.com/community/...n-Picture1.png

The percentage of logins to Tranquility that eventually ends in a crash is between 0.5 and 0.9%. An interesting visible feature of this graph is that around downtime the percentage of crashes significantly increases. There is a very good explanation for this.

Оnе of the most common causes for a crash in some of our different sub-systems such as the Carbon graphics-engine‚ called Trinity, is when code attempts to access memory which is no longer there. When a client shuts down, it needs to ensure that it cleans up after itself, which means it needs to clear up memory, shut everything down correctly and, preferably, as fast as possible. This leaves room for code to try and access a memory resource which has been removed already, which can result in a crash.

If we see that a patch has caused an impact on crash rates, we use Winqual to locate new "Crash Signatures." As I mentioned earlier, these are the crashes you submit when your client crashes, and we use this to pin-point specific causes which results in a crash. Here’s an example of a typical issue as it appears from Winqual:


http://cdn1.eveonline.com/community/...n-Picture2.png

Click image to view larger version.

Here we have a basic idea of: when a crash was first observed by Microsoft, which versions of Windows crash the most and which language edition the operating system uses. There are some other interesting features in this graph. For instance, you can see that it took some days for the crash to really start happening. Notice the small peak before the big spike? That was a mass-test, which is when we take 100s of people onto Singularity and test things. These often help us track in which patch a specific crash was introduced.

As in these situations when we observe this kind of crash behavior, a developer is put on the case to fix the issue. As we don't always find reproduction steps, we schedule crash-fixes into the next possible patch and monitor the situation once again. And as you can see in the above graph, we can also confirm that the specific crash was fixed (although there’s still some noise left).

Оnе of the problems‚ of courѕе‚ iѕ how much visibility wе have into crashes and where they happen. Crashes are hard to deal with‚ really hard. If you experience a craѕh and gеt prompted to submit a crash report‚ pleaѕе do send it. It gives us greater visibility into crashes and eventually helps towards publishing a timely fix. It also helps us filter out noise from crashes caused by bad hardware from issues we can fix. It’s a win-win situation for everybody.



So do not fear submitting crash reports‚ and remember to fly ѕafе!





http://s9.addthis.com/button1-share.gif





More...
Tommy Suharto is offline Add to Tommy Suharto's Reputation Add Infraction for Tommy Suharto Report Post   Edit/Delete Message Reply With Quote Multi-Quote This Message Quick reply to this message
Reply
Moderation

Tags
None

Quick Reply
Message:
Remove Text Formatting
Bold
Italic
Underline

Wrap [QUOTE] tags around selected text
 
Check Spelling
Decrease Size
Increase Size
Switch Editor Mode
Options


(View-All Members who have read this thread : 8
FinalFlash84, Icenfuel, Kujira, Ray Butts, Shamis Orzoz, Teh Shaz, Tinkeng, Yazoul Samaiel

Posting Rules
You may post new threads
You may post replies
You may post attachments
You may edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 06:50.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2011, Jelsoft Enterprises Ltd.