T-Mobile down due to flooding?

We’re just going off some tips right now, but it sounds like T-Mobile might be experiencing some issues with their website and call centers. The problems seem to stem from significant flooding at their central data center in Bothell, WA. Western Washington has been experiencing something close to a torrential downpour, so it’s no surprise to see that local businesses are affected. What isn’t clear, however, is just how widespread the outage is and whether it’s affecting customers outside of the Western Washington area. T-Mobile CS is acknowledging the issue, but won’t (or can’t) give any info on the specifics. We’ll have to wait a bit to see how this pans out, but for now…are any of you experiencing issues with T-Mobile call centers? Maybe this would be a good time for ole’ T-Mobile to pull the trigger on some 3G action, right?

89 Responses to “T-Mobile down due to flooding?”

  1. 51
    jdh says:

    My UMA works fine talked to other here in PA using wifi and doesn’t seem to be an issue. No I know when the 8320 came out the software was buggy with UMA but software upgrade fixed that issue. If your having troubles with uma on the curve I would suggest getting newest os once t-mobile.com/bbupgrade is back up and running.

    Thumb up Thumb down 0

  2. 52
    costo says:

    I’ve needed the PUK code for my SIM card since 8pm Monday. BTW, I’ve tried the default “1234″. No PUK code == no phone. A day or two of outage happens, anymore is inexcusable. Otherwise they’ve been fine (barring long past coverage issues) for the last 10 years (from back in the VS days).

    I feel for the front line t-mobile customer service reps. Bet they’re getting unreasonably dumped on.

    Thanks for blogging about this mess.

    Thumb up Thumb down 0

  3. 53
    Techie says:

    Like several people have said, UMA wasn’t in the area effected – I have a Curve and it is working just fine with UMA.

    As for having all servers in once spot, while I’ll admit that wasn’t the brighest idea T-Mobile ever had, as other’s have said, it was just the website and the employee’s internal systems. T-Mobile.com is alot easier to bring back up since it just coding than it is to bring the 20 – 30 systems used on a daily basis to service customers.

    Read only systems are back up and can provide data up until the time the outage occured – But anything else, sorry – You have to wait.

    Keep in mind, T-Mobile is the only one open 24 hours a day, 7 days a week. They only close Cust Care for Thanksgiving and Christmas, unlike others who close down when ever they can. Case and point: I called AT&T last night around 4am because I wanted to see if they were effected to since their data center is in the same area. They were closed – they gave an emergency number to call if your phone wasn’t working, but I called that one and it was closed too! So again – Remind me how bad it is that the one carrier that is always there for it’s customers is having issues. Please – I dare you to…

    Because in my opinion, it’s not corp America that has the issues – It’s the greediness of the American population who seems to expect that everything works 100% of the time at 100% operating effiency. We as a people have gotten far to concerned with the material aspects of life and the “I want it and I want it NOW!!!” way of thinking.

    Get your heads out of your rear’s and be thankful that you are alive and kicking and can afford to have a cell phone.

    Thumb up Thumb down 0

  4. 54
    t-mobile rep says:

    in response to that remark about t-mobile response to weather related disasters: you better check out who was helping those in need when the hurricanes hit the southeast part of the USA.. T-mobile was the most accomodating to the people in that region and helped with voluntary relief efforts for the emergencies that affected that region and will continue to do so… nothing you can do when the primary server is located in a region that’s status is in a state of emergency…fyi we also tried the backup server but that crashed when trying to handle the responsibilities, we have limited systems running on the backup now and don’t want to increase the workload b/c it may cause it to crash again.

    Thumb up Thumb down 0

  5. 55
    Bobby says:

    I live just south of Bothell. Bothell doesn’t flood like that normally. I have never seen it rain this much, and I have lived in the area for over 20 years. Bothell can flood, but not to that extent…Did T-Mobile servers go down last year when it flooded, no it didn’t how about in 2002? nope, This is a one in a 100 year storm. Nobody knew it was going to happen…

    First on Saturday, we got around 2 feet of snow in the mountains…4 inches-6 inches on the Eastside, and trace to 2 inches in Seattle. Then a typhoon hit…Dropped a billion gallons of water with 80-100 MPH wins. So, all that snow melted….so even after the rain hits, And it is sunny now, the conditions are getting worse. Snow is melting and going down stream…which has to go the nearest source of water (aka Lake Samm., or Lake Washington). Bothell has one of the waterways that runs to lake washington. Woodinville was even hit the hardest….check our komo4, or king5.com. Check out the flooding…

    Also, we had a huge wind storm last December, did T-mobile servers go out, no. Cause they have generators. You can’t stop water..and I don’t work for T-Mobile.

    Thumb up Thumb down 0

  6. 56
    twothirds says:

    My UMA went down sometime this morning and it is still down. (Yes, I do have the latest curve BB software)

    (sf bay area)

    Thumb up Thumb down 0

  7. 57
    Drew says:

    I will agree when we talk about data center redundancy having an offsite center is an idea however most data centers are designed to a 5 9 standard (meaning they will be up 99.999 percent of the time)
    These redundancy features typically include failover SAN’s in redundant raid volumes, Failover connections (typically 3 – 4 possible failover’s)
    battery backup and generators on all equipment fire supression that is non-detramental to equipment. Designing a simple data center to meet these standards is not an easy process and then wanting to duplicate it offsite is neither cost affective or a feasible option quite often.
    What you have to consider is that EVERY transaction, EVERY change, and EVERY piece of data that goes across to one has to go across to the other. Then be verified by each that it went acoss and was stored (not to mention by each redundant system in each office.) Looking at it from a network design stanpoint it is not a typically viable option especially based on the sheer amount of data that t-mobile emcompasses. Granted there could be a redundant off site backup of some sort but with any “backup” you lose data because it is time based and therefore you are stuck with still missing information.

    LEAVE T-MOBILE ALONE!

    Thumb up Thumb down 0

  8. 58
    not surprised says:

    Not surprised that tmobile is down at all…still can’t the mytmobile site.

    At least Tmobile tech support can’t offer to do a full wipe on a blackberry as first level support.

    Blame it on the rain!

    Thumb up Thumb down 0

  9. 59
    Chad says:

    Im in Baton Rouge, LA and I am having service problems, no names or anything saved. I called and they said it was because of the severe weather, so it reached all the way to Louisiana

    Thumb up Thumb down 0

  10. 60
    Charlie says:

    It reached the whole United States because the US headquarters or whatever is in Bothell.

    Thumb up Thumb down 0

  11. 61
    Claude says:

    UMA is working fine in upstate NY. Can’t access #646 or #225, but thats fine, they’ve managed to keep the critical services up so I can’t complain.

    Thumb up Thumb down 0

  12. 62
    Rudy says:

    Prob explains why I had simcard not recognized error earlier in the day. Freaked me out.

    Thumb up Thumb down 0

  13. 63
    Electric Shoots says:

    Yes, T-Mobile is trying real hard, but the minutes aren’t showing up yet..

    Thumb up Thumb down 0

  14. 64
    nathan says:

    UMA service restored in Las Vegas around 1500 pst… to those of you whose UMA service wasn’t affected, the WWW is as discombobulated as a bowl of spaghetti, so your ISP may have been able to resolve UMA servers, while others couldn’t. Also, each region seems to have different servers to authenticate and be serviced from, so the differences are to be expected.

    I just wish I could get the three hours back that I lost today because of stupidly trying to figure out what was wrong with my UMA (since it supposedly wasn’t affected).

    So far as comments about customer service, my only other experience in the US is with the “nation’s most reliable network”. I like dealing with CSRs who appreciate their customers and aren’t condescending pricks. So far, tmo has exceeded my expectations in that regard. Very friendly people, and nearly no wait time to speak with someone. I cannot say the same about the “other” carrier, save one friendly rep who gave me his ext and told me I could call him any time. He appreciated the thousands of dollars I’ve spent on his company’s services. Too bad the rest didn’t. Good riddance.

    Thumb up Thumb down 0

  15. 65
    Julie C says:

    I AM a customer care rep with T-Mobile for past 3 years and can attest that T-Mobile jumps through hoops to help customers and our fellow employees stay in place during disasters to help when their own families have been in jeopardy. T-Mobile always makes things right for the customer as far as I’ve seen. We do get a small percentage of customers who cannot be helped as they are trying to get something for nothing. I want to ask that type of customer if they’d call their local water department or electric company and ask for a credit for overage? This outage was not expected and no, the servers were not where floodwater could get to them. A flood can take place anywhere at anytime, don’t kid yourself otherwise. I’ve worked with several companies in the past 25 years and T-Mobile is tops! I’d like to thank all of our customers who are patiently waiting for everything to get back to normal. Cell phone service is a great convenience these days but not life-threatening to be without for a short time. Blessings to all!

    Thumb up Thumb down 0

  16. 66
    gigi says:

    Julie,
    Thanks for always helping. That is very true of t-mobile. Do you think TMO will be getting their unbilled data working soon? Or will those telephone numbers be shown at all? just curious.

    happy holidays

    gigi

    Thumb up Thumb down 0

  17. 67
    JULIO says:

    For those o you defending TMO, that’s entirely your prerogative. And don’t even try to spin my comments as discounting human life, or maybe you need to get a life. The simple fact here is that another company trying to make as much money as they can, regardless of how “good” they are or how long their CS lines are open, has failed yet again to take into account very basic IT risk initiatives. I’m not even a TMO customer, I just commented because it astounds me when companies get burned after making a conscious decision to not address disaster recovery or business continuity.

    Thumb up Thumb down 0

  18. 68
    Someone says:

    Julio, stop trying to defend yourself. Tmobile has 3 data centers, however they have so many call centers and so many outsourced centers that routing all of them to the other data centers wasn’t possible.

    Thumb up Thumb down 0

  19. 69
    T-Mo DC Employee says:

    So I’ve been keeping up with this thread, it was the first and only place I saw any reporting of the problems.

    OK, people, so here is the full, real skinny, from someone who actually works for T-Mo IN the T-Mo datacenter in Bothell. I experienced all this first hand, so I know it’s true.

    Monday, coming in to work, the rain (and the real problem, adding to the runoff was a lot of melting snow from last weekend) had caused all kinds of problems all over the city, mostly traffic snarls due to washed out roads. The usually 4-ft wide creek that runs behind the building had turned into a raging river several hundred feet wide in places, and was threatening to overflow the levies that hold it in. Bothell City/PD officials and T-Mo facilities made the decision at @10am to evac the building. Most technical folks went home to VPN in, and some stayed to help sandbag. Sandbagging efforts were started early, both up on the levies as well as around the building itself. Unfortunately as the afternoon wore on and the rain continued, the levies themselves began breaking down and leaking, and water began pouring over into the parking lot for the building, which is surrounded on two sides by levies.

    So the real cause of the outage was not flooding IN the datacenter itself, but due to the fact that water had risen outside the building and wound up causing our transformer to literally blow up. It actually exploded. Had never seen that before. So we lost street power. We switched immediately to generator power with no downtime, but basically that only lasted @15 minutes, due mostly to the fact that the generator itself was also becoming submerged. Some teams had a chance to gracefully shut down systems (ours among them), but many systems in the DC just lost power abruptly – not the ideal way to bring down a multi-node oracle cluster… So the Bothell DC lost all power at @4:25pm PST. Confusion reigned for about 20-30 minutes after that.

    We have a second Bothell DC that was not flooding (in the Canyon Park area), and all emergency incident operations switched over to there. Some IT systems were able to swing service to systems there, but many (most) were not. Efforts were made to turn up critical servers at the other DC, and bring up a read-only subset of services (mainly we tried to get the website to do something other than just time out…) At one point, as the flood waters continued to rise, and power was still out, there were executive level decisions being considered to physically remove servers from the DC and at the very least take them upstairs (yes, our DC is on the first floor of the building – more on that in a sec). Yes, we actually considered going into the dark DC and by the light of flashlights, un-racking several tons of equipment (mostly drive cages) and trying to move them physically upstairs (no elevators of course) where they would not get water damage. Thank god the water never made it past the front door, or we would have been doing that.

    So that was actual outage. Bothell city, Puget Sound Energy, and emergency crews worked all night long and got street power restored to the building at @3:20am 12/4 (PST). So we spent a total of approx 11 hours without power to our main datacenter.

    So let me tell you what was in there. Basically most all T-Mo IT systems and administrative function are represented there. Ideally all the systems there have an analog in our other data center (not the other Bothell DC, but our second DC in Tampa, FL). In practice this is not the case – again, more on that in a sec. NO engineering gear is here, so all the switches, all the SS7 gear, all the radio and cell equipment – all that resides in different DC’s that are geographically diverse (both from each other as well as from Bothell). This is why none of you lost cell service or vmail. However… Blackberry service, the T-Mo website, billing reconciliation (note, NOT billing, just reconciliation for billing – sorry, not likely to be getting any free calls), credit card processing, store connectivity, call center connectivity, HR systems, corp email systems, etc are all a different story.

    Those systems were affected hard and are still being brought back up. As of noon PST today, most are back up and functioning nominally again, with just a few issues among them. Really, for as much poor planning and bad architecture, and for as hard as these systems went down, I am impressed with our technical teams’ ability to do what needs to be done to provide workarounds and fixes quickly. I also would have predicted a week-long, painful period of getting everything back up and running, but in fact, it has happened much faster than that and its because our technical teams have been working their asses off.

    To those that commented that we (as in T-Mo) dont care about our customers or that T-Mo employees themselves did not do anything to help, you couldnt be more wrong. (Specifically JULIO: “Yet another failure by corporate America in pursuit of the almighty dollar while simultaneously not giving a rat’s a$$ about the customers that pay the bills that keep their companies going.”) I can tell you for a fact that MANY T-Mo employees went WAY above and beyond the call of duty, staying at the DC, snadbagging, working all night trying to restore service, etc. Some people even slept at the DC, and managers/leadership went out and bought a bunch of air mattresses on their own dime. I wont go so far as to say ‘heroic’ but I can tell you that a LOT of people worked their ASSES off Monday night and all day Tuesday, trying to get things restored ASAP. So get the full story before you rail on T-Mo for “not caring”. I am not going overboard and talking about “how much t-mobile does for society” or anything sappy like that. I am just saying that the regular folks that work here are people too and we did and are doing everything we can to get things back up and working.

    To those that have commented about the fact that we should be geographically diverse and loss of a DC should not take down services the way that it did – well, all I can say is that I couldnt agree more. All said and done, a lot of people (the DR team especially) do have a lot to be embarrassed about. I am not disputing that at all. Yes it was an act of god, yadda yadda. But best practices and correct architecture should always be able to mitigate those things – the fact that our shit was not together enough to handle an outage like this IS inexcusable and I am not making any claims that there are not areas of this company (like all companies) that have their head up their asses. Someone mentioned the “5 9’s uptime” thing, and we (T-Mo IT – not engineering) have been struggling with that anyway, irrespective of flooding. And going back further than that, the choice of location for the DC in the first place is just downright STUPID. (I mean, 1st floor of a shared building in an office park that sits in what amounts to the middle of a swamp? C’mon, retards!) I absolutely will not dispute that. And many of us who work there and have decades of IT experience have been saying it for years. No matter what else happened, the fact is that this was a problem waiting to happen. And Monday, it finally did. What can I say – us worker bees do our best and give our technical recommendations, but C-level’s and execs make their decisions and we are stuck with them. Such is life. Hopefully this will be a BIG wakeup call to some folks who work in Bellevue to re-evaluate what DR means and how to run a world-class IT organization. We will see.

    “But someone at t-mobile should have made a backup plan. After 9/11, most real companies have a disaster recovery plan that prevents and works arround even acts of terrorism and acts of God.” You are absolutely correct and T-Mo DID fall down on the job there. I am not making any excuses for that, but all I can say is that most of the underlings here know full well just how fast and loose we play with system availability, and while we do what we can to mitigate it, there is only so much that can be done if the execs and C-level management dont see a problem. Thats all I can say about that.

    So there you go. That is the real deal. By COB tomorrow, I imagine this will all be 100% restored. At this moment I would say we are about 95% restored with only a handful of problems lingering. FWIW.

    Thumb up Thumb down 0

  20. 70
    Julie C says:

    Gigi: I’ve been off work since Monday evening when our systems went down (due to regularly scheduled days off but I’ve been checking my self help features such as #646# send and my personal mytmobile account and they are both working again.
    Julio: Companies take risks in business and of course gain if they are successful. I’ve quit working with several large companies (whose names I will not divulge) because when it comes to customer service they only wanted me to do one thing – ask for money! That is NOT customer service (and I was not working in collections with those companies.) T-Mobile could not in any way foresee this disaster and I’m sure had backup but exactly how long does backup power last and what happens when no one can get to the equipment to keep it online? During Katrina T-Mobile employees stayed in the disaster areas to give out free phones and service. T-Mobile engineers and technicians nationwide went into those areas and set up portable generators to provide power to the towers. T-Mobile opened up the services for all carriers to use at that time also but guess what? You didn’t hear T-Mobile advertising this humanitarian effort on the news as you did with their competitors. We worked in the call centers during that time to pass along information to loved ones who could not be reached otherwise and this was done for a caller no matter which carrier they did business with. T-Mobile believes there can be a win-win-win situation between owners, employees, and customers and you just don’t find that in most corporations. The moment that T-Mobile does business the typical way in the US or is bought out by such a company, I will no longer be an employee. It’s not always about being “good” but about doing the right thing without someone forcing you to do so.

    Thumb up Thumb down 0

  21. 71
    tricky says:

    Billed Minutes on tmobile.com are STILL DOWN!

    Thumb up Thumb down 0

  22. 72
    frustrated says:

    In MN – frustrated that I still can see my current used, unbilled minutes. Very frustrated.

    When will this thing be fixed? It’s been DAYS!

    Thumb up Thumb down 0

  23. 73
    Libby says:

    In TX – unbilled minutes are still down for me also! I’m trying to get a number that I dialed yesterday and can’t! Waiting impatiently for it to come back! Any news?

    Thumb up Thumb down 0

  24. 74
    I says:

    try checking the call log that’s actually in the phone.

    Thumb up Thumb down 0

  25. 75
    Libby says:

    Of course, common sense would tell me that, thank you. Unfortunately, my call logs were deleted when I did a reset. I’m not that dense, thank you.

    Thumb up Thumb down 0

Leave a Reply