Wir haben festgestellt, dass Euer Browser keine Cookies akzeptiert. Bitte erlaubt die Verwendung von Cookies in den Optionen Eures Browsers, um eine optimale Funktion dieser Webseite zu gewährleisten.
Seite 1 von 2 1 2 LetzteLetzte
Ergebnis 1 bis 25 von 41
  1. #1
    Registriert seit
    18.11.2013
    Beiträge
    1

    Question Help me understand

    As an IT professional, I am struggling to understand how a power outage completely cripples all the game servers. Was there no backup plan? Today's sophisticated data centers handle mission-critical operations and processes, and it's not feasible to shut them down -- even for a short duration. This means that power needs to be available continuously. A properly designed and regularly tested emergency power system will ensure that critical data center operations are protected and continuously operational. Your customers deserve no less.

    Extremely bad timing, considering many of us pre-paid for Helms Deep. There was certainly no delay in the transfer of funds from my bank to Turbine.

  2. #2
    Registriert seit
    23.05.2013
    Beiträge
    344
    the facebook post i believe noted that the 'back-up' power failed as well. i deal with IT as well (several servers, battery back-ups, etc) and no matter what gets put in place, things happen beyond our control. in this case you would think the datacenter had a beefy enough generator to kick in to handle the load but if the power-outage is fairly widespread, networking/internet gear outside your control will go down and everyone will lose access anyway. thankfully i've never had that big of an issue *knock on wood*

  3. #3
    Registriert seit
    06.10.2010
    Beiträge
    701
    Zitat Zitat von Smaugnakh Beitrag anzeigen
    As an IT professional, I am struggling to understand how a power outage completely cripples all the game servers. Was there no backup plan? Today's sophisticated data centers handle mission-critical operations and processes, and it's not feasible to shut them down -- even for a short duration. This means that power needs to be available continuously. A properly designed and regularly tested emergency power system will ensure that critical data center operations are protected and continuously operational. Your customers deserve no less.

    Extremely bad timing, considering many of us pre-paid for Helms Deep. There was certainly no delay in the transfer of funds from my bank to Turbine.
    turbine doesn't own the datacenter. the datacenter back power failed to work and all servers including non-turbine ones were affected. maybe they'll switch to a different host or maybe not.

  4. #4
    Registriert seit
    29.05.2007
    Beiträge
    1.353
    In case the OP does not realise, this is a GAME...hardly mission critical. Back ups fail, there were far worse things happening in the States last night than loss of a game!

  5. #5
    Registriert seit
    19.04.2007
    Beiträge
    797
    Zitat Zitat von Ceejay90 Beitrag anzeigen
    In case the OP does not realise, this is a GAME...hardly mission critical. Back ups fail, there were far worse things happening in the States last night than loss of a game!
    You obviously have no clue what you're talking about. It is a SERVICE that their customers pay for. Keeping the service up and running is absolutely, positively mission critical. Flaky servers when presented with stressful conditions is a great way to lose customers. Turbine is in no position to lose customers. Your apathy towards this problem just because the planet is rife with other bigger problems in no way gives Turbine a free pass to host this game on shoddy gear.

  6. #6
    Registriert seit
    04.06.2011
    Ort
    In the Ninky Nonk
    Beiträge
    4.336
    Zitat Zitat von Ceejay90 Beitrag anzeigen
    In case the OP does not realise, this is a GAME...hardly mission critical. Back ups fail, there were far worse things happening in the States last night than loss of a game!
    I think you'll find that as far as Turbine are concerned their games are mission-critical. It may be a game for us but to them it's what generates their revenue, pays their salaries and feeds their families.

    I too work in IT. Even the best-laid plans can go wrong. I've seen systems with extremely well-rehearsed recovery plans go awry when the 0.0001% chance of something occurring actually occurs. What most people don't appreciate when they use their IT (whether corporate or cloud services) is just how reliable these things are nowadays so when a failure does occur it's often as not the result of a problem that cannot be forseen.

    Put it this way, back in 1999 I worked for a company and we spent the latter half of that year testing & re-testing the critical systems to make sure that things would work on 01/01/2000. The number of issues we found and fixed meant that when the date cut over everything went smoothly to plan, with the result that out director of IT was told to the effect that "y2k was a waste of money and it could have been better spent elsewhere". Sometimes people just take for granted that when they flick a switch the light turns on.

    So, rather than dwell on this issue, let's be grateful that because Turbine treat their systems as mission-critical the games are back up and running and we've got a new date scheduled for HD. Otherwise, if they had been more lax, things could be a lot worse.
    <A sig goes here>

  7. #7
    Registriert seit
    28.03.2007
    Beiträge
    12.622
    Yes, I have the feeling that there are going to be INN-teresting conversations between Turbine's
    people and the guys running the datacenter. Like, why didn't their backups kick in?

    We'll probably never know the details, though I would like to be a fly on the wall to hear the
    conversation.

    There's an old, old, joke among engineers, which has been adapted (as a metaphor) by programmers:

    "But the automatic stop didn't stop!"

    "Well, why weren't you WATCHING the automatic stop?"
    Eruanne - Shards of Narsil-1 - Elendilmir -> Arkenstone

  8. #8
    Registriert seit
    23.07.2011
    Ort
    Germany
    Beiträge
    612
    The datacenter probably outsourced their Admins to India.
    VoIP Admins rule!

    You "real" IT Pros know I´m only half joking..

  9. #9
    Registriert seit
    04.06.2011
    Ort
    In the Ninky Nonk
    Beiträge
    4.336
    Zitat Zitat von Aldeld Beitrag anzeigen
    You obviously have no clue what you're talking about. It is a SERVICE that their customers pay for. Keeping the service up and running is absolutely, positively mission critical. Flaky servers when presented with stressful conditions is a great way to lose customers. Turbine is in no position to lose customers. Your apathy towards this problem just because the planet is rife with other bigger problems in no way gives Turbine a free pass to host this game on shoddy gear.
    Don't think the issue is shoddy hardware. One can have the most up-to-date servers in the best data centres with all the redundant power one could ever need - but if someone fails to regularly test the failover to the backup generator then all that money has gone down the drain.
    <A sig goes here>

  10. #10
    Registriert seit
    22.02.2007
    Beiträge
    326
    Zitat Zitat von Smaugnakh Beitrag anzeigen
    As an IT professional, I am struggling to understand how a power outage completely cripples all the game servers. Was there no backup plan? Today's sophisticated data centers handle mission-critical operations and processes, and it's not feasible to shut them down -- even for a short duration. This means that power needs to be available continuously. A properly designed and regularly tested emergency power system will ensure that critical data center operations are protected and continuously operational. Your customers deserve no less.

    Extremely bad timing, considering many of us pre-paid for Helms Deep. There was certainly no delay in the transfer of funds from my bank to Turbine.
    Yes I also agree. To me this is very unprofessional.

  11. #11
    Registriert seit
    28.03.2007
    Beiträge
    12.622
    Zitat Zitat von Startrekman1of9 Beitrag anzeigen
    Yes I also agree. To me this is very unprofessional.
    As others have pointed out, it was not unprofessional behavior on Turbine's part.

    It may have been unprofessional behavior on their data center's part, if it was due to
    someone's negligence that their backups didn't come up; we may never find out.

    When power drops like a stone for a large number of computers, some of them will suffer damage
    to their data -- as witnessed by the fact that several servers are/have been unable to come up or
    stay up.

    This used to be called "an act of God." Nowadays, many people substitute "Murphy" for "God."
    Eruanne - Shards of Narsil-1 - Elendilmir -> Arkenstone

  12. #12
    Registriert seit
    02.06.2011
    Beiträge
    61
    Zitat Zitat von Smaugnakh Beitrag anzeigen
    As an IT professional, ...
    No, you're obviously not.

  13. #13
    Registriert seit
    23.05.2013
    Beiträge
    344
    Zitat Zitat von Flatfoot789 Beitrag anzeigen
    The datacenter probably outsourced their Admins to India.
    VoIP Admins rule!

    You "real" IT Pros know I´m only half joking..
    LOL the company i work for is currently 'experimenting' with they can outsource. so far so good. everything that they've tried has resulted in poor results and late projects when/if they were completed at all.

  14. #14
    Registriert seit
    17.02.2007
    Ort
    Sarasota, FL, USA
    Beiträge
    3.206
    Zitat Zitat von Aldeld Beitrag anzeigen
    You obviously have no clue what you're talking about. It is a SERVICE that their customers pay for. Keeping the service up and running is absolutely, positively mission critical. Flaky servers when presented with stressful conditions is a great way to lose customers. Turbine is in no position to lose customers. Your apathy towards this problem just because the planet is rife with other bigger problems in no way gives Turbine a free pass to host this game on shoddy gear.
    The 'shoddy gear' is provided by a third-party who also handles other companies and their processes. When the power goes down, all hardware is not brought back online at the same time. It is done in a specific order. I would guess that despite the size of the client, Turbine, hardware dedicated to running games is not on the top of the list.
    [CENTER][COLOR="RoyalBlue"]<< Co-founder of [I][U][SIZE="4"]The Firebrands of Caruja[/SIZE][/U][/I] on Landroval >>
    Ceolford of Dale, Dorolin, Tordag, Garberend Bellheather, Colfinn Belegorn, Garmo Butterbuckles, Calensarn Nimlos, Langtiriel, Bergteir[/COLOR]
    [/CENTER]

  15. #15
    Registriert seit
    05.06.2007
    Beiträge
    35.538
    Zitat Zitat von Smaugnakh Beitrag anzeigen
    As an IT professional, I am struggling to understand how a power outage completely cripples all the game servers.
    I believe what occurred is that the power company stopped supplying power. The data center had to switch over to battery backup and or generators. One problem we have here in South Florida is heat. Very few data centers have a robust. One company I work with has three generators. Only needs two. There is enough fuel to power each generator a week. The capacity is high enough for the data center and the cooling systems. The battery plant is designed for 24 hours of operation in the case that none of the generators start.

    Other companies have no backup for the cooling system. One hour on battery life. In this case the battery backup is enough to last thru a very short power outage before a controlled shut down begins.

    We are dealing with a game. It is not a critical service like an energy control center for a power company. Game low profit product on a single unit basis. There is very few spare dollars to put in back up systems or duplicated hardware. These kind of applications are designed to go off when a sub server fails or the power company stops supplying power.

    What is interesting is that all this trouble restarting the service is mostly likely due to a failure to perform a controlled shutdown. It appears that looking on from the outside - some or all of the units in the Lotro server complex powered off. Restarting complex systems in this situation is very painful.

    We ran into this situation at one of the companies I worked at. Our engineering servers were handled by the engineering department employees. When the power went off. They had enough sense to properly power down the servers. It never occurred to them that the huge UPS has a big computer in it. They let the UPS turn off when its battery ran out. They did not have any clue as to how to properly cold start a discharged UPS. They brought it back on line. Got it charged a little bit. Turned on the servers. They started booting up. That completely drained the little charge out of the UPS. The UPS shut down again taking the servers with it. I have both software engineering and an electrical engineering. I ended up going in there. Restarted the UPS properly for them. It took them hours to get the servers back up after a few power cycles. It took even more time to resolve data mismatches between machines that shared a work load. All because they had just enough knowledge to be dangerous.
    Geändert von Yula_the_Mighty (18.11.2013 um 15:58 Uhr)
    Unless stated otherwise, all content in this post is My Personal Opinion.

  16. #16
    Registriert seit
    27.10.2010
    Beiträge
    491
    Zitat Zitat von Smaugnakh Beitrag anzeigen
    Extremely bad timing, considering many of us pre-paid for Helms Deep.
    I fail to see what this has to do with anything.

    The fact that you pre-paid for Helms Deep, didn't contain in it any sort of promise that you'd be able to play it on the 18th. No more then pre-ordering any other software means you'll be able to play it on release day. This especially true when you consider that this is an expansion to an existing game.

    LotRO is still there, you can play it the same as you always did before. So the fact that you paid in advance for Helms Deep means pretty much nothing...

    Unless you honestly think they'd be better off trying to force such a massive upgrade on systems that aren't even running correctly with the stable version of the code, because that would work so much better....

  17. #17
    Registriert seit
    01.06.2011
    Ort
    Local cluster
    Beiträge
    521
    Zitat Zitat von Smaugnakh Beitrag anzeigen
    Extremely bad timing, considering many of us pre-paid for Helms Deep. There was certainly no delay in the transfer of funds from my bank to Turbine.
    Good lord. Are you really saying that you feel the value of your purchase has been decreased because a 48h delay was introduced in delivery? Did you use instant availability as your only cue for everything you buy, or do you only expect that when you are buying a digital product? Because as an IT professional you should probably know that #### happens..

  18. #18
    Registriert seit
    27.10.2010
    Beiträge
    491
    Zitat Zitat von Yula_the_Mighty Beitrag anzeigen
    In this case the battery backup is enough to last thru a very short power outage before a controlled shut down begins.
    This is the part I don't get... I work in IT infrastructure, and when you lose power the first thing you do is start to power down the servers, because the battery backups don't last very long. Yet it seems like the LotRO servers went down hard, and that's why they're having issues getting them back up.

    I've also seen reports of lost data, and rollbacks, which further points to a uncontrolled shutdown.

    No reasonable person should expect the game to stay up and running if the datacenter lost power. It's unlikely they'd have the resources to keep the whole thing running on backup generators. But I think it's very reasonable to ask why the systems didn't go down gracefully, if that is in fact what happened.

  19. #19
    Registriert seit
    27.10.2007
    Beiträge
    264
    Zitat Zitat von rannion Beitrag anzeigen
    Good lord. Are you really saying that you feel the value of your purchase has been decreased because a 48h delay was introduced in delivery? Did you use instant availability as your only cue for everything you buy, or do you only expect that when you are buying a digital product? Because as an IT professional you should probably know that #### happens..

    Not fair!! You beat me to the rant.

  20. #20
    Registriert seit
    26.01.2009
    Beiträge
    1.512
    Zitat Zitat von Aldeld Beitrag anzeigen
    You obviously have no clue what you're talking about. It is a SERVICE that their customers pay for. Keeping the service up and running is absolutely, positively mission critical. Flaky servers when presented with stressful conditions is a great way to lose customers. Turbine is in no position to lose customers. Your apathy towards this problem just because the planet is rife with other bigger problems in no way gives Turbine a free pass to host this game on shoddy gear.
    Lighten up Francis.

    I've work in IT professionally since 1986. All of that time has been at data centres. I still work in IT at a data centre today.

    It is a physical impossibility to give an absolute 100% guarantee that a data centre can never completely lose power. There is ALWAYS a chance it can happen. If preparations and planning are done right, that chance is small, but that chance still exists and it is always there. 99.9x % of the time, the planning and preparations will work, but sometimes that 0.0x % of the time, you get bitten in the ### instead.

    Here's one example I personally witnessed (and not the only example).

    The data centre in question has electronically locked doors as part of it's security. Staff use pass cards to open the doors, but in an emergency such as a fire, the doors have a button to unlock them behind a small "break glass" panel.
    Inside the main control room for the data centre is another "break glass" panel with a button behind it, but this button turns off ALL OF THE POWER to the data centre and DISABLES THE BACKUP GENERATOR from starting.

    One day a technician was doing maintenance work (on contract) on the door locking system, including the break glass buttons that unlock the doors. He mistook the emergency power shutdown button for a door release button, even though the button was clearly labelled as an emergency power shutdown and was in a clearly different type of break glass box. The entire data centre shut down in seconds. It took many hours to get all the equipment in the data centre running again. That technician was banned from ever coming to that data centre ever again. Extra measures were taken to make sure it would be impossible to trip the master power shutdown button by accident a second time.

    Yes, some of us, including me, pay real world money to play this game. A certain level of reliability is expected. 100% reliability with absolutely no possibility of an unexpected interruption just isn't possible while keeping the costs of running the game at a reasonable level. Even with good planning and preparation, problems sometimes happen that are outside the scope of what has been prepared and planned for. When that happens, you do the best you can under the circumstances you're faced with.

    I'm not saying I know the exact details of the data centre where the LOTRO servers run from, but in all probability, neither do you. What I can say is that I've been in similar situations, more than once, in multiple data centres. This isn't a perfect world, sometimes things go wrong, Do yourself a favour and learn to live with that little fact.

  21. #21
    Registriert seit
    28.03.2007
    Beiträge
    12.622
    Is everybody familiar with Clarke's Third Law?

    "Any sufficiently advanced technology is indistinguishable from magic."

    There are many people who use computers every day for whom, essentially, those
    computers run by magic.

    Heck, there are people for whom a light switch is magic. Anything you don't understand is magic.

    If someone who had all the information were to explain to me what actually went wrong last night,
    I *might* understand it. Whh certainly would. But in the absence of such information, let us assume
    that the data center didn't sacrifice enough chickens to Murphy last night, and get on with life, the
    game, and everything.
    Eruanne - Shards of Narsil-1 - Elendilmir -> Arkenstone

  22. #22
    Registriert seit
    27.10.2010
    Beiträge
    491
    Zitat Zitat von GarethB Beitrag anzeigen
    but this button turns off ALL OF THE POWER to the data centre and DISABLES THE BACKUP GENERATOR from starting.
    As soon as I saw this I saw where it was going

    100% reliability with absolutely no possibility of an unexpected interruption just isn't possible while keeping the costs of running the game at a reasonable level.
    I remember once my boss telling me that we had a 100% data recovery policy. That 100% of the data on the systems was recoverable with in the last 6 weeks. I told her she was a fool if she was making that kind of promise. Backup tapes can be lost or damaged, so there's no way you can say that 100% of data can be recovered.

    But clearly something weird happened here, because IMO the issue isn't the power outage, or even the failure of the backup systems. What I can't figure out, is how the systems went down so hard, if that is in fact what happened.

  23. #23
    Registriert seit
    05.11.2008
    Ort
    Utah
    Beiträge
    15
    I'm just returning after a four year hiatus. Got software installed Sat, played a little, played a little Sun, then servers fall down go boom. I think the most surprising thing to me is how hard things went down and that Turbine seemed as poleaxed as the rest of us. It seems it took an hour or two for them just to get in contact with the data center. I wonder if they're in the market for a different center now... I'd sure hate to be the guys going through all the debris in lost+found and elsewhere trying to get things largely back together.

    Edit: I see the post above me wonders the same thing. There was a serious protocol failing of some sort at the data center.

    Edit2: Forgot to mention another newbie/returnee impression - I don't know what bounder's tokens are, but the rage over them has been crazy nuts.
    Geändert von biodegraded (18.11.2013 um 17:33 Uhr)

  24. #24
    Registriert seit
    27.10.2007
    Beiträge
    264
    Zitat Zitat von Solarfox Beitrag anzeigen
    But clearly something weird happened here, because IMO the issue isn't the power outage, or even the failure of the backup systems. What I can't figure out, is how the systems went down so hard, if that is in fact what happened.
    My hunch is there are multiple physical servers for each "server". If there was a sudden power outage and zero battery backup then they aren't just going to go down hard it would be like losing RAID integrity.

    THAT sucks big time.

    My hunch is the batteries that should be there as a stop gap between power failure and backup generator starting failed miserably.

    Someone probably forgot to put diesel in the generator. Actually seen that happen before.

  25. #25
    Registriert seit
    29.03.2007
    Beiträge
    10.510
    There are more things that can go wrong than most people imagine, even with systems that are expected to be "always on".

    After the Loma Prieta earthquake in 1989, the phone system in the area stayed up and functioning, except for on central office (CO) that went down and stayed down a few days after the 'quake. (The relevance here is that COs were--and are--all run on computers.)

    So what happened? To follow the story, you have to know that COs have battery backups that are in turn backed up by generators powered by internal combustion engines, usually diesels. The CO doesn't actually ever run directly off the generator. When the external power goes down (as it did throughout the affected area), the CO runs off the batteries. When the batteries get low, the generator kicks in to recharge the batteries. When the batteries are recharged, the generator shuts down until the next time it needs to recharge. COs are designed to operate for a minimum of two weeks without external power. The generators are tested regularly and fuel supplies are checked as well.

    In the CO that went down, when the batteries got low, the generator failed to start. When the batteries went flat, the CO shut down.

 

 
Seite 1 von 2 1 2 LetzteLetzte

Berechtigungen

  • Neue Themen erstellen: Nein
  • Themen beantworten: Nein
  • Anhänge hochladen: Nein
  • Beiträge bearbeiten: Nein
  •  

Diese Formular-Sitzung ist abgelaufen. Du musst die Seite neu laden.

Neu laden