Community Relations Archive

Thread: Suggestions on Patch Day angst mitigation

Scoooter
Fri Jan 30, 2004 9:31 am
#1






Leana_Txorana wrote:







Don't be too quick to bash the testers. Very often the TC people report bugs that do not get fixed before it goes live. They do not have a No Bug Policy in place, hence they feel it is acceptable to push out buggy code to a live server. That is their choice I suppose.







There will always be bugs, even large list of known bugs. But to delay a publish with critical fixes, needed content, and many non-critical bug fixes just because there are minor bugs is not a good idea. Bugs are given a priority and publishes get pushed live based on the priority on outstanding known bugs.








Also a year ago they put TC-Bria in place to do a live server patch test before any new patch went out. It worked wonders and all the patches since then have worked fine. They dropped TC-Bria from their testing schedule and of course got burned. As soon as you removea safety feature from any system you have an accident.







This has been addressed. The TC-Bria hardware is supporting the CU sandbox. It was not pulled because they did not want to do the extra testing, it was pulled to do lots of extra testing.








And yes, waiting an extra day to make sure you did your job right is ALWAYS a right decision. Not making sure your ass is covered is a sure fire way to waste a lot of time.







This is an odd statement, as if a SINGLE day delay would fix every problem. An extra day, in many cases does not allow much extra fixing to occur. It seems many thing a single day slip will fix all the outstanding problems.



As far as bugs getting to live. Most bugs are found by Sony internal testers but priorities get them pushed to test center. Most bugs including some new ones are found and prioritiezed. So the bugs that make it live are in many cases known and are scheduled to be fixed in a future build. But no matter how much you test or how long you test, having thousands of people doing thing that would never be anticipated will find new bugs. It is an unfortunate nature of the size and complexity of the code.









It's a given that with complex software its hard to find all the bugs


But lets take 12.1 fo rexample


Look at the issues list, it's huge. Then look at the TC issues thread. You will see 75% of the issues were reported on TC.


Now if the limited testing that is done on TC shows the bugs it olny means there will be more in live.


It is a complete disreguard of quality to push bugs to live that the TC (Acceptance test) discovers. It actually will make things internally at SOE worse. they all of a sudden get thousands of bug reports from common things that people do that TC discovered.


I am a professional developer and if I did this I would get fired.


Also the Dev's refuse to acknowledge the players concern to impact. For example in publish 12.1 it went live with player structures not deducting any maintenence at all. It was immediatly brought up that this also means no property tax for cities. And with sales tax bugged in publish 9. Player cities will now have lost 75% if their income fo rmaintenence. Posts were made in TC and no dev comments at all.


It boils down to all TC issues need to be resolved prior to patches going live. TC only tests a subset of the game and if the TC testers find it, pretty much every player in the game will when it goes live.


As an example of your statements on priorities. Sales tax was bugged in publish 9, mayors have been struggling with maintenence changes ever since. This was found on TC. We still have no fix.


Throwing bugs live that were found in TC so they can be prioritized and fixed later means they may notget fixed.


It takes more time to correct bugs as time goes on, as opposed to right when they were found. Fixing them on TC the coders have a source base immediatly availaible to know all the changes









Scoooter - Master Pilot/Master Politician
ScootBacca - Master Creature Handler/Master Rifleman
Co-Leader - mVa
Mayor of Mos Vegas, Tatooine, Valcyn
FrankLee
Tue Jan 25, 2005 11:38 am
#2

Roadmap to a happier playerbase:



1) Test all patches/fixes on the Test Center before you push them to Live, even the little ones. Make sure that nobody can point to a thread and post in the Test Center Bugs Thread that repeatedly and accurately describes this bug. If you’re not getting enough feedback, wait another week until folks start testing it. Tell the people what you want tested. (I think you’re beginning to do this now anyway)

2) Announce your intention to put a publish up to live more than 24 hours in advance. I can’t describe (without obscenity) how angry I get when I check the boards as I leave work (10PM EST) to make sure nothing’s planned for the next morning when I can play, only to find out that a patch or hotfix, or downtime has been scheduled unannounced - or, it’s been scheduled with a 40 minute lead time, giving me just enough time to waste 15k on a set of buffs that I won’t be able to recoup. Announcing with anything less than 8 hours notice has 2 huge pitfalls: First, nobody that has an otherwise occupied schedule can plan any play time around you, and Second, it makes you look as if your development team can’t set or meet any deadlines or schedules. It’s like knowing you’re playing a game run by an adult attention-deficit-disorder support group. “Soon” is not an accurate measurement of time. If you get the patch done tonight, that doesn’t mean patch it up tomorrow. It means tell folks it’s ready, and you’ll be putting it in the day after tomorrow, or 24 hours from now, or 48. Putting it in on the fly, then having it break looks terrible.

3) Have an exit strategy if things blow up. If you load super-patch-99 up to Live and it explodes, houses burn and women weep, make an ‘undo’ script that rolls things back to the backup copy you made before you patched. Make a backup copy before you patch. Sure it’s huge, sure it’s redundant; in 24 hours though you can ditch it when you know things are working. Of course it sounds like a major PITA to code. Count the downtime hours on a per-server/per-player basis every time your publish breaks, and your return on investment is 1 publish.

4) Expect the unexpected. “Unexpected Maintenance” as an excuse is an insult, and you should be ashamed to use it. Once in a blue moon, sure. Every patch day, no. On patch day, EVERYONE expects issues. If you’re not expecting some kind of bug, take a long, hard look at the previous 12 patches. When I’m watching America’s Funniest Videos and I see a kid standing beside dad with a wiffle bat, I know damned well that in about 2.5 seconds dad’s taking a groin shot. There is nothing surprising about this event, and it feels strikingly similar.


The game is a huge, complex thing, and we’re pretty much resigned to the fact that it will probably not patch in right, and require some tweaking. Feigning perpetual surprise that something broke when you changed things is… dumb. We know you’re working hard, but let’s be reasonable. This isn’t week two, and some of us have sunk hundreds of dollars and thousands of hours into this enterprise. Go slow, get it right; we’ll wait. I think you guys have been making slow, steady improvements for months now, and I've been generally satisfied with the results. Today however I scheduled 4 hours to play, and logged on to see that there was an ETA of 1 hour to fix a problem. Annoying, yes, but I can live with an hour. That was almost 4 hours ago now, and the server's been down for triple your estimate. We know things sometimes go bad, but why is there no device in play to allow you to roll things back until you've got it right?



FrankLee
--------------------------------------------------------------------------------
Everything I tell you is a lie. - Vergere
Jedi = Luke Skywalker - What friggin' genius designed this PR campaign?
Humans are SUPERIOR! - John Crichton
The Dallet Series (ongoing story)
FrankLee
Tue Jan 25, 2005 3:28 pm
#3

I'm a chemist, but I write code for instrument interfacing and data acquisition. I'll be the first to admit that the kind of database programming I might do pales in comparison to the sheer volume of transactions that SWG must use... I freely and happily admit it's a monumental effort they undertake.
But if I shut production down for 8 hours, I'd probably get one shot at redeeming myself, a second offense would have me looking for a new job. Granted, by my continued subscription I'm implicitly not firing anyone, but c'mon.
Everyone can understand that unforeseen things happen. Make a rule: If the downtime is going to take over 45 minutes, roll back the servers and work on fixing your code while you have a copy running. To me, losing an hour of play to a rollback is preferable to losing a day of play because I couldn't log on.



FrankLee
--------------------------------------------------------------------------------
Everything I tell you is a lie. - Vergere
Jedi = Luke Skywalker - What friggin' genius designed this PR campaign?
Humans are SUPERIOR! - John Crichton
The Dallet Series (ongoing story)
Calandryll_SOE
Tue Jan 25, 2005 4:04 pm
#4






FrankLee wrote:
Roadmap to a happier playerbase:



1) Test all patches/fixes on the Test Center before you push them to Live, even the little ones. Make sure that nobody can point to a thread and post in the Test Center Bugs Thread that repeatedly and accurately describes this bug. If you’re not getting enough feedback, wait another week until folks start testing it. Tell the people what you want tested. (I think you’re beginning to do this now anyway)

2) Announce your intention to put a publish up to live more than 24 hours in advance. I can’t describe (without obscenity) how angry I get when I check the boards as I leave work (10PM EST) to make sure nothing’s planned for the next morning when I can play, only to find out that a patch or hotfix, or downtime has been scheduled unannounced - or, it’s been scheduled with a 40 minute lead time, giving me just enough time to waste 15k on a set of buffs that I won’t be able to recoup. Announcing with anything less than 8 hours notice has 2 huge pitfalls: First, nobody that has an otherwise occupied schedule can plan any play time around you, and Second, it makes you look as if your development team can’t set or meet any deadlines or schedules. It’s like knowing you’re playing a game run by an adult attention-deficit-disorder support group. “Soon” is not an accurate measurement of time. If you get the patch done tonight, that doesn’t mean patch it up tomorrow. It means tell folks it’s ready, and you’ll be putting it in the day after tomorrow, or 24 hours from now, or 48. Putting it in on the fly, then having it break looks terrible.

3) Have an exit strategy if things blow up. If you load super-patch-99 up to Live and it explodes, houses burn and women weep, make an ‘undo’ script that rolls things back to the backup copy you made before you patched. Make a backup copy before you patch. Sure it’s huge, sure it’s redundant; in 24 hours though you can ditch it when you know things are working. Of course it sounds like a major PITA to code. Count the downtime hours on a per-server/per-player basis every time your publish breaks, and your return on investment is 1 publish.

4) Expect the unexpected. “Unexpected Maintenance” as an excuse is an insult, and you should be ashamed to use it. Once in a blue moon, sure. Every patch day, no. On patch day, EVERYONE expects issues. If you’re not expecting some kind of bug, take a long, hard look at the previous 12 patches. When I’m watching America’s Funniest Videos and I see a kid standing beside dad with a wiffle bat, I know damned well that in about 2.5 seconds dad’s taking a groin shot. There is nothing surprising about this event, and it feels strikingly similar.


The game is a huge, complex thing, and we’re pretty much resigned to the fact that it will probably not patch in right, and require some tweaking. Feigning perpetual surprise that something broke when you changed things is… dumb. We know you’re working hard, but let’s be reasonable. This isn’t week two, and some of us have sunk hundreds of dollars and thousands of hours into this enterprise. Go slow, get it right; we’ll wait. I think you guys have been making slow, steady improvements for months now, and I've been generally satisfied with the results. Today however I scheduled 4 hours to play, and logged on to see that there was an ETA of 1 hour to fix a problem. Annoying, yes, but I can live with an hour. That was almost 4 hours ago now, and the server's been down for triple your estimate. We know things sometimes go bad, but why is there no device in play to allow you to roll things back until you've got it right?






I can't really respond to 1 or 3 since those are beyond my area of responsibility, however JustG wrote a post explaining the downtime. It was tested on TC, but the problems that occured didn't happen on TC.


We announced the publish yesterday at around 9:30pm. That said, we do want to give more notice on publishes, but sometimes a couple of issues need to be resolved before we can confirm the publish so the announcement goes later than we'd like. We're also going to be putting the publish announcement on the launchpad from now on so people who don't check the site at night will see it too.


Agreed that the explaination didn't correctly explain the problems. That's my fault for not catching the wording. During the downtime our number one priority is to get it resolved and to let you know that we are working on it. Once everything was resolved we posted an update about what happened with more detail.

Karquile
Tue Jan 25, 2005 4:33 pm
#5






FrankLee wrote:
Roadmap to a happier playerbase:

1) Test all patches/fixes on the Test Center before you push them to Live, even the little ones. Make sure that nobody can point to a thread and post in the Test Center Bugs Thread that repeatedly and accurately describes this bug. If you’re not getting enough feedback, wait another week until folks start testing it. Tell the people what you want tested. (I think you’re beginning to do this now anyway)





This would be wonderful, but I've been watching these guys for 18 months now, and I long ago concluded that they have target dates for the publishes (although they don't announce them and may not even share them with Community Relations) -- and that the only kind of bug they will ever hold a publish for is an exploit bug. Hair-tearing UI glitches, broken game mechanisms, what have you - TC can report them day in and day out, right up til the even of publish, and it doesn't matter, they will go straight to Live and maybe get hotfixed a week later, or two, or never. But if there's an invulnerable doorway somewhere, or a melon dupe, everything stops until it's fixed.


I'm not saying they don't hear us when we report bugs. I'm just saying that they feel absolutely no obligation to fix them before the publish. Except exploits.


For my own grim amusement, I just logged into Live and ran down my outstanding bug list; every one is still there. Enjoy.

FrankLee
Tue Jan 25, 2005 4:40 pm
#6

I appreciate the response. It's definitely a step in the right direction. I also appreciate JustG's explanation, even if it did come several hours after the problem(s) surfaced.

The post from Tiggs announcing the patch shows up as having come after midnight here in the EST zone. Some or most of the playing population (if they work days) is already abed by midnight, or at least not surfing the forums. To my mind, we're not getting enough notice. I don't see anything wrong with a post that goes something like: We're working out a last few issues, our _tentative_ patch day is Thursday. If thursday comes and the issues don't get fixed, that's alright; but if wednesday night comes and there's no word, so I'm organizing a group hunt thursday morning, I'm in trouble.

Again, improved communication is to be applauded. I can't tell you how much more satisfying "X, Y, and Z were messed up, so we fixed them, sorry about that" is than silence, or the CSR's "We have no legal obligation to provide continuous or error-free service".

Having said all of that, are there any plans for tomorrow we should know about?



FrankLee
--------------------------------------------------------------------------------
Everything I tell you is a lie. - Vergere
Jedi = Luke Skywalker - What friggin' genius designed this PR campaign?
Humans are SUPERIOR! - John Crichton
The Dallet Series (ongoing story)
Calandryll_SOE
Tue Jan 25, 2005 4:58 pm
#7

You are correct. It was late at night. Again, we do our best to post as soon as we get confirmation. Generally the timing is earlier than this one.


There are no updates planned for tomorrow, barring any unexpected needs.
DustusNavar
Tue Jan 25, 2005 5:10 pm
#8






Calandryll_SOE wrote:

You are correct. It was late at night. Again, we do our best to post as soon as we get confirmation. Generally the timing is earlier than this one.


There are no updates planned for tomorrow, barring any unexpected needs.






I think a more ameable solution (and this publish is a great example) is to continue to post "late at night", but don't say "We're publishing it tomorrw". Instead, say "We're publishing it the day after tomorrow." You'll make a lot more friends that way.



22 Professions Mastered Jedi Padawan Rebel Colonel
Killing In The Name Of


drjjwow
Tue Jan 25, 2005 5:11 pm
#9

I'm still crashing to desktop, can you guy please get this fix before you send out a large patch with new stuff that only half@$$ works

master dancer buffs dont work for me either. I dont understand why this keeps happening, one would think you have plenty of info to just fix this crashing stuff.



__________________________________________________________________
ga Remnants of the Jedi ga
"If you run; you have your wisdom. If you fight you have your thoughts. If you hide; you have your weakness. If you have anger; you have the darkside. If you died... you have your self respect."
ga Drjjwow Mazap ga
Jedi Combatant || Steward Councilman || Jedi Guardian
__________________________________________________________________



__________________________________________________________________
ga Remnants of the Jedi ga
"If you run; you have your wisdom. If you fight you have your thoughts. If you hide; you have your weakness. If you have anger; you have the darkside. If you died... you have your self respect."
ga Drjjwow Mazap ga
Jedi Combatant || Steward Councilman || Elder Guardian
__________________________________________________________________Ì
Bohdi-Tzu
Tue Jan 25, 2005 5:17 pm
#10

Thank you for your candor, it is appreciated. Not to beat a dead horse, but I would also like to chime in on the notice for patches. Personally, I would like to see a full day notice before a major patch is pushed out. We (the players) all like new content, and nerfs aside, we're an impatient lot and want the new stuff as soon as we can get it. I suspect that you (the dev team) want to push out your work as soon as it's done.

That said, and in light of the history of patch day glitches, is there some urgent and unavoidable reason that when a patch is "done", and has passed your testing and Q&A, that you can't wait one more day to push it up and give some notice so that the players can prepare? Hotfixes like credit dupes and other exploits--sure, whack them with all due haste, but it won't hurt me to have to wait one more day to sample from my bantha, and if I knew a patch was coming today, I could plan (or not plan) my play accordingly.

Thanks for listening.





Sun-Tzu  Jedi Knight
"I sense a plot to destroy the jedi..."

g7rpg
Tue Jan 25, 2005 5:45 pm
#11

One thing that Wasnt mentioned that I think might be of use to us, is communication, when I log in and find that the servers are down is go to chat to see if there was an ETA, there was 1 hr eta Notice.

Fine ok 1 hr is not too bad, that hour passes, go to check again and there is a further hour added to the ETA, annoying but still not too bad (Remembering that the servers had been down for a good few hours prior to me trying to connect).

Then, at 50 mins into the 2nd Hour suddenly becomes NO ETA, this then really becomes enfuriating as by this time I had wasted almost 2 hours believing (maybe naively) that the downtime was nearly over.

Questions to the poor CSRs proved fruitless as they seemed to know about the same as we did, which then made their lives harder as again they appeared to have no true function other than to repeatedly say 'NO ETA'.

If they had been given something to tell us I am sure that the players would have been less disgruntled and hence they would have had an easier time of it.

There will always be people who vent their feelings a little more vociferously than they should do but some kind of explanation about what was going on I believe would have made so much of a difference to the atmpsphere in the chat rooms and I know would have made me happier about the whole debacle.



Ekoh Rebel Elder Jedi
Ekho Ex Chef/BE not sure about it all now
Bobbette Ex DE/Merchant
Ocra Rebel Elder Jedi
T'amat Noob With No direction
E'koh Rebel Master Spy
Calandryll_SOE
Tue Jan 25, 2005 6:26 pm
#12






Bohdi-Tzu wrote:
Thank you for your candor, it is appreciated. Not to beat a dead horse, but I would also like to chime in on the notice for patches. Personally, I would like to see a full day notice before a major patch is pushed out. We (the players) all like new content, and nerfs aside, we're an impatient lot and want the new stuff as soon as we can get it. I suspect that you (the dev team) want to push out your work as soon as it's done.

That said, and in light of the history of patch day glitches, is there some urgent and unavoidable reason that when a patch is "done", and has passed your testing and Q&A, that you can't wait one more day to push it up and give some notice so that the players can prepare? Hotfixes like credit dupes and other exploits--sure, whack them with all due haste, but it won't hurt me to have to wait one more day to sample from my bantha, and if I knew a patch was coming today, I could plan (or not plan) my play accordingly.

Thanks for listening.





The issue with waiting an extra day after the publish is green lit is that it delays future publishes. It's always our goal to green light a publish as early in the day as possible, but sometimes one or two tricky issues can delay it until later into the night.In this particular case, waiting an extra day or even an extra week wouldn't have uncovered the problems that caused the downtime.
KyleKnox
Tue Jan 25, 2005 6:28 pm
#13

JustG's post was good, and its even progress that he wis making such a post. However, the idea or excuse that you didnt plan for differences in a sparse TC datbase versus the loaded live ones, makes me fall out of chair in utter disbelief....common guys...



Dyvim Storm - Eclipse
PrePatch9 4444 Guardian
Force Master
Page 1 of 4
Previous Next