Community Relations Archive

Thread: Suggestions on Patch Day angst mitigation

Draklaa
Tue Jan 25, 2005 6:45 pm
#14






KyleKnox wrote:

JustG's post was good, and its even progress that he wis making such a post. However, the idea or excuse that you didnt plan for differences in a sparse TC datbase versus the loaded live ones, makes me fall out of chair in utter disbelief....common guys...






I, too, find it stange that they test things on a server database that doesn't resemble what they actually push it out to. How can this even be described as valid testing?



_______________________________________________
Remember that you are unique, just like everyone else.
Draklaa
FrankLee
Tue Jan 25, 2005 7:14 pm
#15



Calandryll_SOE wrote:

The issue with waiting an extra day after the publish is green lit is that it delays future publishes. It's always our goal to green light a publish as early in the day as possible, but sometimes one or two tricky issues can delay it until later into the night.In this particular case, waiting an extra day or even an extra week wouldn't have uncovered the problems that caused the downtime.




Let's say your patch gets a green light monday night at 9PM, and you decide you'll put it up to Live tuesday morning at 4AM. It's little notice for us, so try holding it until Wednesday morning. Tuesday, you're free to pursue the _next_ patch, or start development on it; it doesn't change the timetable for future patches at all, except for the arbitrary 1-day hold for announcements. That is, unless your announcement lead time grows to weeks, or your development somehow depends on the new patch being live immediately. Alternately, you could give your _general impression_ that you'll be patching tomorrow, if the bugs get worked out. If you decide to hold off for a day because something didn't get fixed, we've had a lengthy warning, and if it does get a green light at midnight, we're still basically prepared for it, and have alternate plans.

Especially after JustG's description, it is obvious that the kind of testing done on TestCenter wouldn't have turned up the bugs. Similar to when I see an analysis or process problem in my lab, I'll ask you the same questions I get asked: (and I know you're the Community contact, not a developer, so consider them rhetorical)

Can you attribute cause?
Can it be addressed and corrected?
What are you doing to be sure it doesn't happen again?



FrankLee
--------------------------------------------------------------------------------
Everything I tell you is a lie. - Vergere
Jedi = Luke Skywalker - What friggin' genius designed this PR campaign?
Humans are SUPERIOR! - John Crichton
The Dallet Series (ongoing story)
Qui-Gonzalez
Tue Jan 25, 2005 7:21 pm
#16






Calandryll_SOE wrote:


The issue with waiting an extra day after the publish is green lit is that it delays future publishes. It's always our goal to green light a publish as early in the day as possible, but sometimes one or two tricky issues can delay it until later into the night.In this particular case, waiting an extra day or even an extra week wouldn't have uncovered the problems that caused the downtime.




I can guess that the point of doing these "on schedule" is to work them in around the Tuesday/Thrusday/Satruday maintenance cycles. Now my question is, when there were daily server restarts, was "waiting for a patch day" really an issue? I know I personally keep beating this horse, but I noticed more foibles and odd things happen since this and am curious as to what reason was behind the chage to begin with?


"If it ain't broke, don't fix it".





Gonz
~ Eclipse's resident Stick-in-the-Mud!~
The "Edit" feature is of the Dark Side..

JimerLins
Wed Jan 26, 2005 12:31 am
#17



KyleKnox wrote:
JustG's post was good, and its even progress that he wis making such a post. However, the idea or excuse that you didnt plan for differences in a sparse TC datbase versus the loaded live ones, makes me fall out of chair in utter disbelief....common guys...





Out of line- as a developer myself, I can tell you that no one and no team- ever- catches everything, and stuff like what JustG described will happen from time to time. If there was no reason to believe that (as with previous patches) the size of the database would cause things like out of memory conditions and so on, then the consequence could not be anticipated. Having worked on teams that do work of the scale being done at SOE with these patches, I can tell you unequivocally that the best process has flaws.

Stuff happens. I suppose people will complain regardless- I mean, if there wasn't any communication, people would complain too.



Jimer's Bug Reporting Guide - Gonna file bugs? Read it!


"A man may fight for many things. His country, his friends, his principles, the glistening tear on the cheek of a golden child. But personally, I'd mud-wrestle my own mother for a ton of cash, an amusing clock and a sack of French porn." -Edmund Blackadder
JimerLins
Wed Jan 26, 2005 12:33 am
#18



GangaWolf wrote:
5 * for you!
In all seriousness, these are basic rules. I have run a large IT shop before, with an ERP system handling thousands of users and billions of dollars. In 4 years, I can only remember twice when we had emergency mainteance (one was a patch we had to rollback, total unplanned downtime: 45 minutes, the other was a new Oracle bug my company discovered by accident, time for us to resolve with Oracle's help: 7 hours).
In every patch/migration/upgrade we planned for the worst, and thankfully almost never had to rely on those contigency plans. Even this past year, we did a *major* overhaul that touched every system, interface, etc. we had. We planned for 72 hours downtime, did 5 simulated upgrades first on our backup servers, and finished the upgrade without incident in 36 hours. I don't suggest that a huge ERP running a Corporation has the same complexities that a MMORPG does, but it does have its own set of complexities nonetheless.
It is a shame that this is the first major "patch" since being told that the new Dev's, in cooperation with the existing ones, were committed to get things in right the first time. Maybe Patch 13 will be better. /crosses fingers





As you say- an ERP system, while complex (and I've worked with them too) doesn't even begin to touch the complexity of an MMORPG. You're comparing apples to oranges and it's not a fair comparison by any stretch of the imagination.



Jimer's Bug Reporting Guide - Gonna file bugs? Read it!


"A man may fight for many things. His country, his friends, his principles, the glistening tear on the cheek of a golden child. But personally, I'd mud-wrestle my own mother for a ton of cash, an amusing clock and a sack of French porn." -Edmund Blackadder
GangaWolf
Wed Jan 26, 2005 1:15 am
#19

5 * for you!


In all seriousness, these are basic rules. I have run a large IT shop before, with an ERP system handling thousands of users and billions of dollars. In 4 years, I can only remember twice when we had emergency mainteance (one was a patch we had to rollback, total unplanned downtime: 45 minutes, the other was a new Oracle bug my company discovered by accident, time for us to resolve with Oracle's help: 7 hours).


In every patch/migration/upgrade we planned for the worst, and thankfully almost never had to rely on those contigency plans. Even this past year, we did a *major* overhaul that touched every system, interface, etc. we had. We planned for 72 hours downtime, did 5 simulated upgrades first on our backup servers, and finished the upgrade without incident in 36 hours. I don't suggest that a huge ERP running a Corporation has the same complexities that a MMORPG does, but it does have its own set of complexities nonetheless.


It is a shame that this is the first major "patch" since being told that the new Dev's, in cooperation with the existing ones, were committed to get things in right the first time. Maybe Patch 13 will be better. /crosses fingers



/GANGA\
Ganga WolfvCrymsinvRiten Sayer
MercenaryvSmuggler/BHvPvE-only Jedi
"There are 3 sides to every story - yours, mine, and the truth."
SWG will go down in gaming history as the MMOG with the most potential that achieved the least.

krupps58
Wed Jan 26, 2005 3:21 am
#20

I'd like to thank Calandryll for taking time to address what happened. It's communication like this and the improvements being made to the game that are going to keep me onboard for another year when my subscription runs out.



-----------------------------------------------
http://www.chillywintersnight.com
NewEco
Wed Jan 26, 2005 6:22 am
#21


FrankLee wrote:
...... see above .....br>



to my eyes you are expecting a little bit too much. Sometimes in process implementation unexpected things happen, actually, if you do process implementation your own, you will find out, that almost always unexpected issues need a prompt fix before things run smoothly again ....


In conclusion do appreciate the communication about the issues concerning getting 12.1 to live !

I have one comment/suggestion that might consider for your (SOE/SWG) QA process:

1. TC DBs being smaller than live servers is a inherent consequence of the TC-community, even if you frequently copy a live server DB on TC these DBs WILL SKRINK, because fewer people play TC

2. You might want to consider to perform stress test for 2 days on each patch and publish by getting a live server copy AND implementing the blug frogs (char builders). i am 100% positive that this would lead to an improved QA process, although the TC would go nuts for those 2 days and the original TC community has to hibernate for 2 days



Yes ! I really liked TC2 and would like to have them back ... or if thats to expensive my suggestion for a compromise

Message Edited by NewEco on 01-26-2005 05:49 PM



___________________________________________________________________
my vision of a starwarsy integration of massive Jedi presence into SWG :
The Force Planet
concept draft on how to solve problems with balancing Jedi,
role of Jedi in GCW, Jedi Visibility, Jedi "Rarity" & the Force Ranking System.
No nerfs, but (hopefully) smart additions to SWG to solve the core dilema:
"Keep Jedi rare, except for on my account"
Leana_Txorana
Wed Jan 26, 2005 7:19 am
#22

1) Test all patches/fixes on the Test Center before you push them to Live, even the little ones. Make sure that nobody can point to a thread and post in the Test Center Bugs Thread that repeatedly and accurately describes this bug.

===================================================================

So every spelling error, every trivial bug that may take days to find and fix even though it does not effect game play, every broken skill that will take months to redesign the entire combat system to fix...


If that was the case, we would still be waiting for the first publish ever. They have explained that bugs are prioritized. All bugs are planned to be fixed but to delay a publish for non-critical bugs would be a bad idea.



www.usa4usa.blogspot.com
=========================================
There are 10 kinds of people, those who understand binary and those that don't
There's no place like 127.0.0.1
================================
3.14159 + Ice Cream = Pi ala mode
KombatCamKombat
Wed Jan 26, 2005 8:09 am
#23

Geez every dev response is an excuse to dance around the issue. Just admit you were wrong. You NEED to give notice before pushing publishes, you NEED to test with a live type database. These are things you NEED to do, and honestly why do your customers need to tell you this?



---Signature---------------------------------------
Bounty Hunter 3-4-4-4
Combat Prowess 2-2-0-0

FrankLee
Wed Jan 26, 2005 8:57 am
#24



NewEco wrote:

FrankLee wrote:
...... see above .....br>



to my eyes you are expecting a little bit too much. Sometimes in process implementation unexpected things happen, actually, if you do process implementation your own, you will find out, that almost always unexpected issues need a prompt fix before things run smoothly again ....





My code runs the production system's data. I know all about unusual snags and snarls; my programs span 5 platforms and 22 years of computers and equipment. It's a spiderweb in a windstorm.
That's precisely why I have a copy of a solid, no-problems version of the code installed as a back up. No failure of my program should cause more than a 5 minute delay to production, unless we have some kind of hardware failure. We've got duplicates of everything, so hardware is seldom an issue.
I'm not saying that unexpected things _don't_ happen. I'm saying we need to be better prepared. If they have some kind of a rollback policy after 2 hours, they wouldn't have to take every server down for 6 or 7 hours, devote the entire dev team, and risk the ire of all their customers... they roll back the upgrade, let the servers run on yesterday's code, and fix what happened at their leisure.



FrankLee
--------------------------------------------------------------------------------
Everything I tell you is a lie. - Vergere
Jedi = Luke Skywalker - What friggin' genius designed this PR campaign?
Humans are SUPERIOR! - John Crichton
The Dallet Series (ongoing story)
PixellJ
Wed Jan 26, 2005 9:57 am
#25






Calandryll_SOE wrote:





Bohdi-Tzu wrote:
Thank you for your candor, it is appreciated. Not to beat a dead horse, but I would also like to chime in on the notice for patches. Personally, I would like to see a full day notice before a major patch is pushed out. We (the players) all like new content, and nerfs aside, we're an impatient lot and want the new stuff as soon as we can get it. I suspect that you (the dev team) want to push out your work as soon as it's done.

That said, and in light of the history of patch day glitches, is there some urgent and unavoidable reason that when a patch is "done", and has passed your testing and Q&A, that you can't wait one more day to push it up and give some notice so that the players can prepare? Hotfixes like credit dupes and other exploits--sure, whack them with all due haste, but it won't hurt me to have to wait one more day to sample from my bantha, and if I knew a patch was coming today, I could plan (or not plan) my play accordingly.

Thanks for listening.





The issue with waiting an extra day after the publish is green lit is that it delays future publishes. It's always our goal to green light a publish as early in the day as possible, but sometimes one or two tricky issues can delay it until later into the night.In this particular case, waiting an extra day or even an extra week wouldn't have uncovered the problems that caused the downtime.




At the risk of sounding trite, I think the community here is pretty much used to delays with respect to the dev team. CURB? Smugger revamp? Etc....pushing out code prematurely certainly didn't save time for future publishes next time did it? If you ran a poll asking the customer base whether:


a) patch on time on schedule, with little to no QA


b) wait a day to make sure it 's been tested 'reasonably' fully


I would think b) the popular choice. As for delaying publishes, this exact thing was done back in august I think it was, for a week or more (the jedi trials if memory serves). Sure some people got upset, but it sets precedence that publishes can be and sometimes are held back, and from my view that entire week hasn't left me feeling that you're any farther behind than you already are.





--------------------------------------

"Doc" Porl Fik'ya
The only thing SOE could make that doesn't suck is a vacuum cleaner. Fix the effing game, and I'll fix my account.

GangaWolf
Wed Jan 26, 2005 9:57 am
#26






JimerLins wrote:





GangaWolf wrote:

5 * for you!


In all seriousness, these are basic rules. I have run a large IT shop before, with an ERP system handling thousands of users and billions of dollars. In 4 years, I can only remember twice when we had emergency mainteance (one was a patch we had to rollback, total unplanned downtime: 45 minutes, the other was a new Oracle bug my company discovered by accident, time for us to resolve with Oracle's help: 7 hours).


In every patch/migration/upgrade we planned for the worst, and thankfully almost never had to rely on those contigency plans. Even this past year, we did a *major* overhaul that touched every system, interface, etc. we had. We planned for 72 hours downtime, did 5 simulated upgrades first on our backup servers, and finished the upgrade without incident in 36 hours. I don't suggest that a huge ERP running a Corporation has the same complexities that a MMORPG does, but it does have its own set of complexities nonetheless.


It is a shame that this is the first major "patch" since being told that the new Dev's, in cooperation with the existing ones, were committed to get things in right the first time. Maybe Patch 13 will be better. /crosses fingers







As you say- an ERP system, while complex (and I've worked with them too) doesn't even begin to touch the complexity of an MMORPG. You're comparing apples to oranges and it's not a fair comparison by any stretch of the imagination.




I would challenge that. It has a different set of complexities - but complexities nonetheless. Supporting thousands of users all around the world, in an environment that is validated for multiple jurisdictions, and must maintain said validation else we don't do business, is a very complex environment. Until I work on a MMOG (which probably will never happen), I cannot say which is more complex. I just know intuitively that both are complex, their complexities are different, but proper testing and upgrade procedures can minimize emergency outages (you can never eliminate them completely - even the best planned scenarios occasionally fail).


And as FrankLee has stated, there needs to be a point where SOE decides to rollback the patch rather than keep the servers down. Since launch, many of themajor patches have had issues with them that cause emrgency maintenance, and to my recollection I think only one has been rolled back (I could be wrong).


It is nice, though, to see JustG's postmortem, as well as Calandryll_SOE's posts. The communication has definitely improved, which is a step in the right direction. Despite this outage, things *are* starting to look better.

Message Edited by GangaWolf on 01-26-2005 11:59 AM



/GANGA\
Ganga WolfvCrymsinvRiten Sayer
MercenaryvSmuggler/BHvPvE-only Jedi
"There are 3 sides to every story - yours, mine, and the truth."
SWG will go down in gaming history as the MMOG with the most potential that achieved the least.

Page 2 of 4