We’ve been working hard on the Uplift of Second Life. If you have not been following this project, that’s what we’re calling the migration of our Second Life simulators, services, and websites from a private data center to hosting in The Cloud (Amazon Web Services). It’s a massive, complicated project that I’ve previously compared to converting a steam-driven railroad to a maglev monorail — without ever stopping the train. This undertaking has at times been smooth sailing, at other times a very bumpy ride. We wanted to share some more of the story with you.
Our goal has been to move SL incrementally to give ourselves the best chance of minimizing awareness among the residents that these changes were happening. We feel we’ve done better than we expected, but of course it’s the bumps in the road that are most noticeable to our residents. We apologize for recent service disruptions, although what’s perhaps not apparent is the progress we’ve made — and the improvements in performance that have quietly taken place.
First, the rough spots:
- Region Crossings
One of the first troubles we found was that region crossings were significantly worse between a cloud region and a datacenter region. We did a deep dive into the code for objects (boats, cars, planes, etc) and produced an improvement that made them significantly faster and more reliable even within the datacenter. This has been applied to all regions already and was a good step forward.
- Group Chat stalls
Many users have reported that they are not able to get messages in some of their groups; we’re very much aware of the problem. The start of those problems does coincide with when the chat service was uplifted; unfortunately the problems did not become clear until moving that service back to the datacenter was not an option. We haven’t been able to get that fixed as quickly as we would like, but the good news is that we have some changes nearly ready that we think may improve the service and will certainly provide us with better information to diagnose it if it isn’t fixed. Those changes are live on the Beta grid now and should move to the main grid very soon.
- Bake Failures
Wednesday and especially Thursday of this past week were bad days for avatar appearance, and we’re very much aware of how important that is. The avatar bake service has actually been uplifted for some time – it wasn’t moving it that caused the problem, but another change to a related service. The good news is that thanks to a great cross-team effort during those two days we were able to determine why an apparently unrelated simulator update triggered the problem and got a fix deployed Thursday night.
- Increased Teleport Failures
We have seen a slight increase in the frequency of teleport failures. I know that if it’s happened to you it probably doesn’t feel like a “slight” problem, especially since it appears to be true that if it’s happened to someone once, it tends to keep happening for a while. Measured over the entire grid, it’s just under two percentage points, but even that is unacceptable. We’re less sure of the specific causes for this (including whether or not it’s Uplift related), but are improving our ability to collect data on it and are very much focused on finding and fixing the problem whatever it is.
- Marketplace & Stipend Glitches
We’ve had some challenges related to uplift for both the Marketplace and the service that pays Premium Stipends. Marketplace had to be returned to the datacenter yesterday, but we’ll correct the problems that required the rollback and get it done soon. The Stipends issues were both good and bad for users; there were some delays, but on the other hand we sent some users extra stipends (our fault, you win – we aren’t taking them back); those problems are, we believe, solved now.
Perhaps the above makes it sound as though Uplift is in trouble. While this week in particular has seen some bumps in the road, it’s actually going well overall. Lots of the infrastructure you don’t interact with directly, and some you do, has been uplifted and has worked smoothly.
For a few weeks, almost all of the regions on the Beta grid have been running in the cloud, and over the last couple of weeks we’ve uplifted around a hundred regions on the main grid. Performance of those regions has been very very good, and stability has been excellent. We expect to be uplifting more regions in the next few working days (if you own a region you’d like included, submit a Support Ticket and we’ll make it happen). Uplift of the Release Candidate regions, which will bring the count into the thousands, will begin soon. When we’re confident that uplifted regions are working well at that larger scale, we’ll be in a position to resume region sales, so if you’ve been waiting – the wait is almost over.
Overall, the Uplift project is on track to be complete or very nearly so by the end of this year (yes, 2020… I know I’ve said “fall” before and people have noted that I didn’t say what year 🙂 ; the leaves haven’t finished falling at my house yet…). It’s likely that there will be other (hopefully small) temporary disruptions during this process, but we promise we’ll do all we can to avoid them and fix them as fast as we can. This migration sets the stage for some significant improvements to Second Life and positions us to be able to grow the world well into the future.