Live Games Have Evolving Performance
Running a live social game can be a lot like playing a game of Jenga; the more moves you make the higher the chance the whole thing may crash down. Earlier this year, noting an increase in user complaints, we realized that the web game I work was threatening to collapse.
Years ago, we had implemented a company standard performance tracking system. It generated an immense amount of logging, our numbers looked reasonable, and we had scarier fish to deal with, so we stopped monitoring performance closely. This was a huge mistake, and we had to act quickly to remedy it.
Once we took a deeper look into performance we were in for a rude awakening. Even though the game ran well for us in our development environment, it was actually running like a dog in the real world. The numbers we saw were significantly worse than expected.
Using profiling tools we discovered that while the game ran fine before our ‘sale’ page was displayed, it ran poorly any time after it was displayed.
The problem was actually two bugs combined with a user flow issue.
The first bug was that the sale page had been built in a horribly inefficient manner. The second bug was worse. Due to a flaw in our asset loader, we were never removing dialogs from memory after we closed them, so they just sat around using up resources.
And the user flow turned it from bad to terrible. When we run sales, one of the first things we do is show the sale page to the player, and since the bugs combined to negatively impact performance after the dialog was shown, we were effectively killing performance for the entire play session whenever we were running sales!
Once we understood the problem it was not difficult to resolve. We fixed the loader issue, and optimized the sale page. But the longer term effect on our game is not from those code fixes, but the mental reset it gave us:
1) Performance is never a solved problem. I don’t care how great your game ran when you launched it. As you add features and update content, you will need to fix and refocus on performance.
2) Any Key Performance Indicator that isn’t actively monitored is degrading. Ok, maybe not always. But you have to assume they are unless you can actively prove they aren’t.
3) Automatic tracking and alert systems are essential. If you rely on manually verified numbers you will miss the moment that they shift.
4) It never makes sense until it does. Don’t dismiss data points that make no sense, but keep coming up. You might just be looking at them wrong.

Previous Performance Tracking Dashboard
Moving forward we have implemented much more thorough performance tracking, and fully intend to run them through the life of the product. As we gain confidence that we have achieved the performance we demand of our product we will turn the rate of logging down, but never off. These enhanced performance logs allow us to get a clearer picture of our players experiences by tracking the following:
1) Whole session average performance
2) Rolling 30 second average frame rate
3) Percentile breakdowns to identify outliers
4) Session time variance breakdowns to identify performance degradation over time
5) Game sub section specifics to identify troubled areas of the game
6) A/B performance comparison capability for new content roll outs
7) Various game state comparisons such as in or out of fullscreen mode

Updated Performance Tracking Dashboard
All these and more have been added to a performance dashboard that is displayed 24/7 on a monitor in our ops room. With every single feature release and content update we track the impact to player performance.
Your player’s experience is the lifeblood of your product, and you should treat it with the same level of caution that you do your personal finances. Check it regularly and assume it is broken unless you can actively prove otherwise.