Community Relations Archive
Thread: Suggestions on Patch Day angst mitigation
Leana_Txorana wrote:
Don't be too quick to bash the testers. Very often the TC people report bugs that do not get fixed before it goes live. They do not have a No Bug Policy in place, hence they feel it is acceptable to push out buggy code to a live server. That is their choice I suppose.
There will always be bugs, even large list of known bugs. But to delay a publish with critical fixes, needed content, and many non-critical bug fixes just because there are minor bugs is not a good idea. Bugs are given a priority and publishes get pushed live based on the priority on outstanding known bugs.
Also a year ago they put TC-Bria in place to do a live server patch test before any new patch went out. It worked wonders and all the patches since then have worked fine. They dropped TC-Bria from their testing schedule and of course got burned. As soon as you removea safety feature from any system you have an accident.
This has been addressed. The TC-Bria hardware is supporting the CU sandbox. It was not pulled because they did not want to do the extra testing, it was pulled to do lots of extra testing.
And yes, waiting an extra day to make sure you did your job right is ALWAYS a right decision. Not making sure your ass is covered is a sure fire way to waste a lot of time.
This is an odd statement, as if a SINGLE day delay would fix every problem. An extra day, in many cases does not allow much extra fixing to occur. It seems many thing a single day slip will fix all the outstanding problems.
As far as bugs getting to live. Most bugs are found by Sony internal testers but priorities get them pushed to test center. Most bugs including some new ones are found and prioritiezed. So the bugs that make it live are in many cases known and are scheduled to be fixed in a future build. But no matter how much you test or how long you test, having thousands of people doing thing that would never be anticipated will find new bugs. It is an unfortunate nature of the size and complexity of the code.
1) Test all patches/fixes on the Test Center before you push them to Live, even the little ones. Make sure that nobody can point to a thread and post in the Test Center Bugs Thread that repeatedly and accurately describes this bug. If you’re not getting enough feedback, wait another week until folks start testing it. Tell the people what you want tested. (I think you’re beginning to do this now anyway)
2) Announce your intention to put a publish up to live more than 24 hours in advance. I can’t describe (without obscenity) how angry I get when I check the boards as I leave work (10PM EST) to make sure nothing’s planned for the next morning when I can play, only to find out that a patch or hotfix, or downtime has been scheduled unannounced - or, it’s been scheduled with a 40 minute lead time, giving me just enough time to waste 15k on a set of buffs that I won’t be able to recoup. Announcing with anything less than 8 hours notice has 2 huge pitfalls: First, nobody that has an otherwise occupied schedule can plan any play time around you, and Second, it makes you look as if your development team can’t set or meet any deadlines or schedules. It’s like knowing you’re playing a game run by an adult attention-deficit-disorder support group. “Soon” is not an accurate measurement of time. If you get the patch done tonight, that doesn’t mean patch it up tomorrow. It means tell folks it’s ready, and you’ll be putting it in the day after tomorrow, or 24 hours from now, or 48. Putting it in on the fly, then having it break looks terrible.
3) Have an exit strategy if things blow up. If you load super-patch-99 up to Live and it explodes, houses burn and women weep, make an ‘undo’ script that rolls things back to the backup copy you made before you patched. Make a backup copy before you patch. Sure it’s huge, sure it’s redundant; in 24 hours though you can ditch it when you know things are working. Of course it sounds like a major PITA to code. Count the downtime hours on a per-server/per-player basis every time your publish breaks, and your return on investment is 1 publish.
4) Expect the unexpected. “Unexpected Maintenance” as an excuse is an insult, and you should be ashamed to use it. Once in a blue moon, sure. Every patch day, no. On patch day, EVERYONE expects issues. If you’re not expecting some kind of bug, take a long, hard look at the previous 12 patches. When I’m watching America’s Funniest Videos and I see a kid standing beside dad with a wiffle bat, I know damned well that in about 2.5 seconds dad’s taking a groin shot. There is nothing surprising about this event, and it feels strikingly similar.
The game is a huge, complex thing, and we’re pretty much resigned to the fact that it will probably not patch in right, and require some tweaking. Feigning perpetual surprise that something broke when you changed things is… dumb. We know you’re working hard, but let’s be reasonable. This isn’t week two, and some of us have sunk hundreds of dollars and thousands of hours into this enterprise. Go slow, get it right; we’ll wait. I think you guys have been making slow, steady improvements for months now, and I've been generally satisfied with the results. Today however I scheduled 4 hours to play, and logged on to see that there was an ETA of 1 hour to fix a problem. Annoying, yes, but I can live with an hour. That was almost 4 hours ago now, and the server's been down for triple your estimate. We know things sometimes go bad, but why is there no device in play to allow you to roll things back until you've got it right?
But if I shut production down for 8 hours, I'd probably get one shot at redeeming myself, a second offense would have me looking for a new job. Granted, by my continued subscription I'm implicitly not firing anyone, but c'mon.
Everyone can understand that unforeseen things happen. Make a rule: If the downtime is going to take over 45 minutes, roll back the servers and work on fixing your code while you have a copy running. To me, losing an hour of play to a rollback is preferable to losing a day of play because I couldn't log on.
FrankLee wrote:
Roadmap to a happier playerbase:
1) Test all patches/fixes on the Test Center before you push them to Live, even the little ones. Make sure that nobody can point to a thread and post in the Test Center Bugs Thread that repeatedly and accurately describes this bug. If you’re not getting enough feedback, wait another week until folks start testing it. Tell the people what you want tested. (I think you’re beginning to do this now anyway)
2) Announce your intention to put a publish up to live more than 24 hours in advance. I can’t describe (without obscenity) how angry I get when I check the boards as I leave work (10PM EST) to make sure nothing’s planned for the next morning when I can play, only to find out that a patch or hotfix, or downtime has been scheduled unannounced - or, it’s been scheduled with a 40 minute lead time, giving me just enough time to waste 15k on a set of buffs that I won’t be able to recoup. Announcing with anything less than 8 hours notice has 2 huge pitfalls: First, nobody that has an otherwise occupied schedule can plan any play time around you, and Second, it makes you look as if your development team can’t set or meet any deadlines or schedules. It’s like knowing you’re playing a game run by an adult attention-deficit-disorder support group. “Soon” is not an accurate measurement of time. If you get the patch done tonight, that doesn’t mean patch it up tomorrow. It means tell folks it’s ready, and you’ll be putting it in the day after tomorrow, or 24 hours from now, or 48. Putting it in on the fly, then having it break looks terrible.
3) Have an exit strategy if things blow up. If you load super-patch-99 up to Live and it explodes, houses burn and women weep, make an ‘undo’ script that rolls things back to the backup copy you made before you patched. Make a backup copy before you patch. Sure it’s huge, sure it’s redundant; in 24 hours though you can ditch it when you know things are working. Of course it sounds like a major PITA to code. Count the downtime hours on a per-server/per-player basis every time your publish breaks, and your return on investment is 1 publish.
4) Expect the unexpected. “Unexpected Maintenance” as an excuse is an insult, and you should be ashamed to use it. Once in a blue moon, sure. Every patch day, no. On patch day, EVERYONE expects issues. If you’re not expecting some kind of bug, take a long, hard look at the previous 12 patches. When I’m watching America’s Funniest Videos and I see a kid standing beside dad with a wiffle bat, I know damned well that in about 2.5 seconds dad’s taking a groin shot. There is nothing surprising about this event, and it feels strikingly similar.
The game is a huge, complex thing, and we’re pretty much resigned to the fact that it will probably not patch in right, and require some tweaking. Feigning perpetual surprise that something broke when you changed things is… dumb. We know you’re working hard, but let’s be reasonable. This isn’t week two, and some of us have sunk hundreds of dollars and thousands of hours into this enterprise. Go slow, get it right; we’ll wait. I think you guys have been making slow, steady improvements for months now, and I've been generally satisfied with the results. Today however I scheduled 4 hours to play, and logged on to see that there was an ETA of 1 hour to fix a problem. Annoying, yes, but I can live with an hour. That was almost 4 hours ago now, and the server's been down for triple your estimate. We know things sometimes go bad, but why is there no device in play to allow you to roll things back until you've got it right?
I can't really respond to 1 or 3 since those are beyond my area of responsibility, however JustG wrote a post explaining the downtime. It was tested on TC, but the problems that occured didn't happen on TC.
We announced the publish yesterday at around 9:30pm. That said, we do want to give more notice on publishes, but sometimes a couple of issues need to be resolved before we can confirm the publish so the announcement goes later than we'd like. We're also going to be putting the publish announcement on the launchpad from now on so people who don't check the site at night will see it too.
Agreed that the explaination didn't correctly explain the problems. That's my fault for not catching the wording. During the downtime our number one priority is to get it resolved and to let you know that we are working on it. Once everything was resolved we posted an update about what happened with more detail.
FrankLee wrote:
Roadmap to a happier playerbase:
1) Test all patches/fixes on the Test Center before you push them to Live, even the little ones. Make sure that nobody can point to a thread and post in the Test Center Bugs Thread that repeatedly and accurately describes this bug. If you’re not getting enough feedback, wait another week until folks start testing it. Tell the people what you want tested. (I think you’re beginning to do this now anyway)
The post from Tiggs announcing the patch shows up as having come after midnight here in the EST zone. Some or most of the playing population (if they work days) is already abed by midnight, or at least not surfing the forums. To my mind, we're not getting enough notice. I don't see anything wrong with a post that goes something like: We're working out a last few issues, our _tentative_ patch day is Thursday. If thursday comes and the issues don't get fixed, that's alright; but if wednesday night comes and there's no word, so I'm organizing a group hunt thursday morning, I'm in trouble.
Again, improved communication is to be applauded. I can't tell you how much more satisfying "X, Y, and Z were messed up, so we fixed them, sorry about that" is than silence, or the CSR's "We have no legal obligation to provide continuous or error-free service".
Having said all of that, are there any plans for tomorrow we should know about?
Calandryll_SOE wrote:
You are correct. It was late at night. Again, we do our best to post as soon as we get confirmation. Generally the timing is earlier than this one.
There are no updates planned for tomorrow, barring any unexpected needs.
I think a more ameable solution (and this publish is a great example) is to continue to post "late at night", but don't say "We're publishing it tomorrw". Instead, say "We're publishing it the day after tomorrow." You'll make a lot more friends that way.
That said, and in light of the history of patch day glitches, is there some urgent and unavoidable reason that when a patch is "done", and has passed your testing and Q&A, that you can't wait one more day to push it up and give some notice so that the players can prepare? Hotfixes like credit dupes and other exploits--sure, whack them with all due haste, but it won't hurt me to have to wait one more day to sample from my bantha, and if I knew a patch was coming today, I could plan (or not plan) my play accordingly.
Thanks for listening.
Fine ok 1 hr is not too bad, that hour passes, go to check again and there is a further hour added to the ETA, annoying but still not too bad (Remembering that the servers had been down for a good few hours prior to me trying to connect).
Then, at 50 mins into the 2nd Hour suddenly becomes NO ETA, this then really becomes enfuriating as by this time I had wasted almost 2 hours believing (maybe naively) that the downtime was nearly over.
Questions to the poor CSRs proved fruitless as they seemed to know about the same as we did, which then made their lives harder as again they appeared to have no true function other than to repeatedly say 'NO ETA'.
If they had been given something to tell us I am sure that the players would have been less disgruntled and hence they would have had an easier time of it.
There will always be people who vent their feelings a little more vociferously than they should do but some kind of explanation about what was going on I believe would have made so much of a difference to the atmpsphere in the chat rooms and I know would have made me happier about the whole debacle.
Bohdi-Tzu wrote:
Thank you for your candor, it is appreciated. Not to beat a dead horse, but I would also like to chime in on the notice for patches. Personally, I would like to see a full day notice before a major patch is pushed out. We (the players) all like new content, and nerfs aside, we're an impatient lot and want the new stuff as soon as we can get it. I suspect that you (the dev team) want to push out your work as soon as it's done.
That said, and in light of the history of patch day glitches, is there some urgent and unavoidable reason that when a patch is "done", and has passed your testing and Q&A, that you can't wait one more day to push it up and give some notice so that the players can prepare? Hotfixes like credit dupes and other exploits--sure, whack them with all due haste, but it won't hurt me to have to wait one more day to sample from my bantha, and if I knew a patch was coming today, I could plan (or not plan) my play accordingly.
Thanks for listening.
The issue with waiting an extra day after the publish is green lit is that it delays future publishes. It's always our goal to green light a publish as early in the day as possible, but sometimes one or two tricky issues can delay it until later into the night.In this particular case, waiting an extra day or even an extra week wouldn't have uncovered the problems that caused the downtime.