I spent a day hosting Clinton Keith in the IMVU offices during the Game Developer’s Conference in March, and I spent time describing IMVU’s product development process and our implementation of Agile and Lean Startup methodologies. He heard a lot of interest in our success from people at GDC he spoke to about IMVU, and he asked if I would do a Q&A session. Here is an excerpt of our interview posted to his Agile Game Development Blog:
Agile Game Interview – Agile Engineering for Online Communities
James Birchler is an Engineering Director at IMVU, where he implemented and manages IMVU’s product development process using Agile and Lean Startup methodologies. Prior to joining IMVU in 2006, he redesigned the product development process at Bargain Network, Inc., helping complete the company’s acquisition by Vertrue, Inc. Previously, James was a Director at There.com, responsible for technical design, creative design, and content of the web applications and GUI for There’s virtual world application. James will be presenting on Lean Startup methodologies at the Startup Lessons Learned Conference on April 23 in San Francisco, and is a frequent contributor to the IMVU Engineering Blog.
Clinton Keith: What is IMVU?
James Birchler: IMVU, Inc. (www.imvu.com) is an online community where members use 3D avatars to meet new people, chat, create and play with their friends. IMVU has reached 40 million registered users, 10 million unique visitors per month and a $30+ million annualized revenue run rate. IMVU has the world’s largest virtual goods catalog of more than 3 million items, almost all of which are created by its own members. IMVU achieved profitability in 2009 and is hiring aggressively with plans to double the size of its team in 2010.
CK: What is the overview of the IMVU engineering process?
JB: Our engineering team practices what people refer to as “Agile” or “Lean Startup” methodologies, including Test-Driven Development (TDD), pair programming, collective code ownership, and continuous integration. We refactor code relentlessly, and try to ship code in small batches. And we’ve taken continuous integration a step further: every time we check in server side code, we push the changes to our production servers immediately after the code passes our automated tests. In practice, this means we push code live to our production web servers 35-50 times per day. Taken together with our A/B split-testing framework, which we use to collect data to inform our product development decisions, we have the ability to quickly see and test the effects of new features live in production. We also practice “5 Whys” root cause analysis when something goes wrong, to avoid making the same mistakes twice.
CK: How do you get so many changes out in front of customers without causing problems, either for your customers or your production web servers?
JB: I think it’s important to point out that sometimes we actually do introduce regressions and new bugs that impact our customers. However, we try to strike a balance between minimizing negative customer impacts and maximizing our ability to innovate and test new features with real customers as early as possible. We have several mechanisms we use to help us do that, and to control how customers experience our changes. It boils down to automation on one hand, and our QA engineers on the other.
First the automation: we take TDD very seriously, and it’s an important part of our engineering culture. We try to write tests for all of our code, and when we fix bugs, we write tests to ensure they don’t regress. Next, we built our code deployment process to include what we call our “cluster immune system,” which monitors and alerts on statistically significant changes in dozens of key error rates and business metrics. If a problem is detected, the code is automatically rolled back and our engineering and operations teams are notified. Next, we have the ability to limit rollout of a feature to one or more groups of customers–so we can expose new features to only QA or Admin users, or ad-hoc customer sets. We also built an incremental rollout function that allows us to slowly increase exposure to customers while we monitor both technical and business metrics to ensure there are no big problems with how the features work in production. Finally, we build in a “kill switch” to most of our applications, so that if any problems occur later, for example, scaling issues, we have fine-grained control to turn off problematic features while we fix them.
Read the rest of Agile Game Interview – Agile Engineering for Online Communities …