Community Relations Archive
Thread: Suggestions on Patch Day angst mitigation
Calandryll_SOE wrote:
Bohdi-Tzu wrote:
Thank you for your candor, it is appreciated. Not to beat a dead horse, but I would also like to chime in on the notice for patches. Personally, I would like to see a full day notice before a major patch is pushed out. We (the players) all like new content, and nerfs aside, we're an impatient lot and want the new stuff as soon as we can get it. I suspect that you (the dev team) want to push out your work as soon as it's done.
That said, and in light of the history of patch day glitches, is there some urgent and unavoidable reason that when a patch is "done", and has passed your testing and Q&A, that you can't wait one more day to push it up and give some notice so that the players can prepare? Hotfixes like credit dupes and other exploits--sure, whack them with all due haste, but it won't hurt me to have to wait one more day to sample from my bantha, and if I knew a patch was coming today, I could plan (or not plan) my play accordingly.
Thanks for listening.
The issue with waiting an extra day after the publish is green lit is that it delays future publishes. It's always our goal to green light a publish as early in the day as possible, but sometimes one or two tricky issues can delay it until later into the night.In this particular case, waiting an extra day or even an extra week wouldn't have uncovered the problems that caused the downtime.
GangaWolf wrote:
JimerLins wrote:
GangaWolf wrote:5 * for you!In all seriousness, these are basic rules. I have run a large IT shop before, with an ERP system handling thousands of users and billions of dollars. In 4 years, I can only remember twice when we had emergency mainteance (one was a patch we had to rollback, total unplanned downtime: 45 minutes, the other was a new Oracle bug my company discovered by accident, time for us to resolve with Oracle's help: 7 hours).In every patch/migration/upgrade we planned for the worst, and thankfully almost never had to rely on those contigency plans. Even this past year, we did a *major* overhaul that touched every system, interface, etc. we had. We planned for 72 hours downtime, did 5 simulated upgrades first on our backup servers, and finished the upgrade without incident in 36 hours. I don't suggest that a huge ERP running a Corporation has the same complexities that a MMORPG does, but it does have its own set of complexities nonetheless.It is a shame that this is the first major "patch" since being told that the new Dev's, in cooperation with the existing ones, were committed to get things in right the first time. Maybe Patch 13 will be better. /crosses fingers
As you say- an ERP system, while complex (and I've worked with them too) doesn't even begin to touch the complexity of an MMORPG. You're comparing apples to oranges and it's not a fair comparison by any stretch of the imagination.I would challenge that. It has a different set of complexities - but complexities nonetheless. Supporting thousands of users all around the world, in an environment that is validated for multiple jurisdictions, and must maintain said validation else we don't do business, is a very complex environment. Until I work on a MMOG (which probably will never happen), I cannot say which is more complex. I just know intuitively that both are complex, their complexities are different, but proper testing and upgrade procedures can minimize emergency outages (you can never eliminate them completely - even the best planned scenarios occasionally fail).
And as FrankLee has stated, there needs to be a point where SOE decides to rollback the patch rather than keep the servers down. Since launch, many of the major patches have had issues with them that cause emrgency maintenance, and to my recollection I think only one has been rolled back (I could be wrong).
It is nice, though, to see JustG's postmortem, as well as Calandryll_SOE's posts. The communication has definitely improved, which is a step in the right direction. Despite this outage, things *are* starting to look better.
Message Edited by GangaWolf on 01-26-2005 11:59 AM
You could challenge it, but that wouldn't mean you were correct. Let's look at what's going on here- with an ERP system, you have a fairly stable environment. Patches are infrequent and don't generally take a long time to implement. The system runs, people use it, and most importantly- the people who run the system (the "devs") aren't in charge of what goes in it (the "content"). If a change to the system comes along, it doesn't involve changing the client (which many ERP systems don't have, using web browsers instead of a big fat client), the server, the server code, the database schema AND the database contents all at once.
I'd be willing to bet that a database for a single galaxy is at least a terabyte in size, very probably much larger. I shudder to think about the vendor servers. Here's a few things that would need to be done during a patch:
Shut down login servers, then close game servers to get players offline.
Shut down game servers gracefully, making sure all transactions are completed. That includes harvester actions, inventories, banks, loot picked up, etc. This phase alone could take quite a while; there's a LOT of activity in this game that would cause database actions that would require some sort of commit or further action that would, in the normal course of game activity, get handled over time by housekeeping processes. This isn't done to just one database server- I'm guessing that different "zones" have different servers, which is why we sometimes see bugs where items vanish crossing server boundaries. So add another level of complexity to managing the transactions between multiple servers and cleanly committing them all, so you don't get a flood of complaints about lost items or xp or whatever when the servers come back up. Oh yeah, and this has to be done on EVERY galaxy. So multiply the database size by 25 or whatever the number of galaxies is.
Roll new code out to all servers- probably doesn't take long, but you do have to get it onto all servers in a cluster properly and then validate it. And you gotta do it for every galaxy.
Modify the database- add new content, correct flaws in existing content that's being patched, update objects in the database for new stuff in the patch, and so on. I wonder how long it will take the update to remove infinite DoT weapons to run (UPDATE player_objects set charges=3000 where charges<0)? Imagine how long that query would take to complete.
Review the database for consistency, checking to make sure that there's no objects that aren't in a container (as an example) and all the other things that would need to be validated before you let it go back to live.
Remember, during this process, you're working with a VERY large database. Even if you want to roll back after a problem, that takes time too- you don't copy a few terabytes over even the fastest links without some delay.
I realize that ERP systems are complex, but it's still apples and oranges, and I think it's inappropriate to expect that the solutions that work with an ERP system will work with an MMO (or vice-versa). Let's consider this for a second logically- if the developer/patch team were even half as incompetent as many people seem to think because of problems like this, SWG wouldn't have ever launched.
I'm a software developer- I don't work on MMOs (although I'd like to), but I've studied them quite a bit as well as being a player. The challenges posed by building, running and managing an MMO are unique in the IT world, and probably among the toughest challenges to overcome. The only reason MMOs have ever succeeded beyond text MUDs is because, unlike other areas that present similar challenges, like flight control and healthcare systems, when the system crashes, people don't die.
They just act like they're going to.
I'm not saying the dev team has done everything right- I doubt they would make that claim either. But it's always gotten on my nerves when people make complaints about the way things are handled and utterly trivialize the accomplishment that SWG really is- the fact that this game runs AT ALL takes my breath away every time I log into it. Problems, yes, and they need to be fixed. But let's not forget that these problems wouldn't even be here to be discussed if the SWG dev team hadn't surmounted one of the greatest technical challenges possible in the software development world and given all of us a place to stand while we file our complaints.
Tiggs posted a bug thread today, including the issues already reported to the Development Team. The warping problem is included in those issues. I don't have specific information for fixes for these yet, but they are being investigated and should we schedule a publish to fix any of these we will update you all.
LadyLeala wrote:
Calandryll... I'm very surprised that I haven't heard any mention from the devs about some of the horrendous bugs that surfaced with yesterday's publish. The biggest one that is affecting people seems to be the rubber-banding effect. I know a lot of people who are logging out of the game because of this.
I was expecting to see the servers taken down this morning to fix this bug, but I saw no such thing... any word on what exactly is going on to remedy this?
Calandryll_SOE wrote:
Tiggs posted a bug thread today, including the issues already reported to the Development Team. The warping problem is included in those issues. I don't have specific information for fixes for these yet, but they are being investigated and should we schedule a publish to fix any of these we will update you all.
LadyLeala wrote:
Calandryll... I'm very surprised that I haven't heard any mention from the devs about some of the horrendous bugs that surfaced with yesterday's publish. The biggest one that is affecting people seems to be the rubber-banding effect. I know a lot of people who are logging out of the game because of this.
I was expecting to see the servers taken down this morning to fix this bug, but I saw no such thing... any word on what exactly is going on to remedy this?
Hmm that is odd I read the list of what was reported yesterday. And the number one complaint that I have seen and heard from the people that i play with. Is the dance/musician bugs associated with getting buffs. I know that I saw many people posting about this all day long. Seeing as it is not listed in that thread are we to assume that is is working as intended. Or is the problem being looked into?
Calandryll_SOE wrote:
Tiggs posted a bug thread today, including the issues already reported to the Development Team. The warping problem is included in those issues. I don't have specific information for fixes for these yet, but they are being investigated and should we schedule a publish to fix any of these we will update you all.
LadyLeala wrote:
Calandryll... I'm very surprised that I haven't heard any mention from the devs about some of the horrendous bugs that surfaced with yesterday's publish. The biggest one that is affecting people seems to be the rubber-banding effect. I know a lot of people who are logging out of the game because of this.
I was expecting to see the servers taken down this morning to fix this bug, but I saw no such thing... any word on what exactly is going on to remedy this?
Thanks for the rapid response! I'll keep checking for updates.
That was reported by players in the thread and passed along as well.
Arialias wrote:
Calandryll_SOE wrote:
Tiggs posted a bug thread today, including the issues already reported to the Development Team. The warping problem is included in those issues. I don't have specific information for fixes for these yet, but they are being investigated and should we schedule a publish to fix any of these we will update you all.
LadyLeala wrote:
Calandryll... I'm very surprised that I haven't heard any mention from the devs about some of the horrendous bugs that surfaced with yesterday's publish. The biggest one that is affecting people seems to be the rubber-banding effect. I know a lot of people who are logging out of the game because of this.
I was expecting to see the servers taken down this morning to fix this bug, but I saw no such thing... any word on what exactly is going on to remedy this?
Hmm that is odd I read the list of what was reported yesterday. And the number one complaint that I have seen and heard from the people that i play with. Is the dance/musician bugs associated with getting buffs. I know that I saw many people posting about this all day long. Seeing as it is not listed in that thread are we to assume that is is working as intended. Or is the problem being looked into?
Message Edited by Wolveryne40 on 01-26-2005 05:19 PM
Message Edited by Wolveryne40 on 01-26-2005 05:20 PM
Calandryll_SOE wrote:
That was reported by players in the thread and passed along as well.
Arialias wrote:
Calandryll_SOE wrote:
Tiggs posted a bug thread today, including the issues already reported to the Development Team. The warping problem is included in those issues. I don't have specific information for fixes for these yet, but they are being investigated and should we schedule a publish to fix any of these we will update you all.
LadyLeala wrote:
Calandryll... I'm very surprised that I haven't heard any mention from the devs about some of the horrendous bugs that surfaced with yesterday's publish. The biggest one that is affecting people seems to be the rubber-banding effect. I know a lot of people who are logging out of the game because of this.
I was expecting to see the servers taken down this morning to fix this bug, but I saw no such thing... any word on what exactly is going on to remedy this?
Hmm that is odd I read the list of what was reported yesterday. And the number one complaint that I have seen and heard from the people that i play with. Is the dance/musician bugs associated with getting buffs. I know that I saw many people posting about this all day long. Seeing as it is not listed in that thread are we to assume that is is working as intended. Or is the problem being looked into?
Thank you
Wolveryne40 wrote:
funny how the rubberband effect was reported in tc along with ctds and warping and yet it was still pushed to live eh? i belive that would tie into that whole qa thing and being ignored thing that people are kinda grouchy about wouldnt you say?
Message Edited by Wolveryne40 on 01-26-2005 05:19 PM
Message Edited by Wolveryne40 on 01-26-2005 05:20 PM
shows how much they care about our opinion, i dunno why we should still test at all if they don´t listen to us and push a buggy version onto live servers.
The whole rubberband is hilarious, i warped about 5 times on 200 meters today. QA? Don´t think they have such a thing, they wanted to go live with this patchat any costlooking at their testing marathon at the weekend.
KombatCamKombat wrote:
Calandryll_SOE wrote:
Bohdi-Tzu wrote:
Thank you for your candor, it is appreciated. Not to beat a dead horse, but I would also like to chime in on the notice for patches. Personally, I would like to see a full day notice before a major patch is pushed out. We (the players) all like new content, and nerfs aside, we're an impatient lot and want the new stuff as soon as we can get it. I suspect that you (the dev team) want to push out your work as soon as it's done.
That said, and in light of the history of patch day glitches, is there some urgent and unavoidable reason that when a patch is "done", and has passed your testing and Q&A, that you can't wait one more day to push it up and give some notice so that the players can prepare? Hotfixes like credit dupes and other exploits--sure, whack them with all due haste, but it won't hurt me to have to wait one more day to sample from my bantha, and if I knew a patch was coming today, I could plan (or not plan) my play accordingly.
Thanks for listening.
The issue with waiting an extra day after the publish is green lit is that it delays future publishes. It's always our goal to green light a publish as early in the day as possible, but sometimes one or two tricky issues can delay it until later into the night.In this particular case, waiting an extra day or even an extra week wouldn't have uncovered the problems that caused the downtime.
I have to comment on this as well. You're saying you can't afford a single day to test the publish on a live database? Think about how much time you have wasted now, look at the live issues thread its huge and every issue relates to what you added in. So think about it proactively, if you had a better defense you would save time. Sometimes a single day can do more than you think. Plus like I said before the smuggler revamp thread is over a year old so don't tell us about delays...
Calandryll_SOE wrote:
KombatCamKombat wrote:
Calandryll_SOE wrote:
Bohdi-Tzu wrote:
Thank you for your candor, it is appreciated. Not to beat a dead horse, but I would also like to chime in on the notice for patches. Personally, I would like to see a full day notice before a major patch is pushed out. We (the players) all like new content, and nerfs aside, we're an impatient lot and want the new stuff as soon as we can get it. I suspect that you (the dev team) want to push out your work as soon as it's done.
That said, and in light of the history of patch day glitches, is there some urgent and unavoidable reason that when a patch is "done", and has passed your testing and Q&A, that you can't wait one more day to push it up and give some notice so that the players can prepare? Hotfixes like credit dupes and other exploits--sure, whack them with all due haste, but it won't hurt me to have to wait one more day to sample from my bantha, and if I knew a patch was coming today, I could plan (or not plan) my play accordingly.
Thanks for listening.
The issue with waiting an extra day after the publish is green lit is that it delays future publishes. It's always our goal to green light a publish as early in the day as possible, but sometimes one or two tricky issues can delay it until later into the night. In this particular case, waiting an extra day or even an extra week wouldn't have uncovered the problems that caused the downtime.I have to comment on this as well. You're saying you can't afford a single day to test the publish on a live database? Think about how much time you have wasted now, look at the live issues thread its huge and every issue relates to what you added in. So think about it proactively, if you had a better defense you would save time. Sometimes a single day can do more than you think. Plus like I said before the smuggler revamp thread is over a year old so don't tell us about delays...Actually, we are going to make sure publishes are tested on a copy of a large production database from now on.
SWEET!!!!!!!!! Now that made my day.
Fact is there is no excuse for a bug like no group music buffs excaping the testing phase. Thats just a giant flashing neon sign that says "THE TESTING IS NOT WORKING!!!!!"
I am glad to see your taking the right steps to correct the problems. GO DEVS!!!
Calandryll_SOE wrote:
Actually, we are going to make sure publishes are tested ona copy of a large production database from now on.
AWESOME