Recovery Disaster: PayPal Crash Strands Merchants
Written by Frank HayesTwo major technology glitches in a row knocked PayPal offline on Friday (Oct. 29), preventing the alternative payment giant from processing any E-tailer transactions for 80 minutes. First a network hardware failure shut down all PayPal payments. Then the backup plan failed when a handoff to a secondary datacenter didn’t go smoothly. The result was a worldwide shutdown of PayPal’s $40 billion merchant-services business that left E-tailers scrambling to limit damage from the outage.
PayPal’s outage again spotlights the problem of backup strategies that simply don’t. It’s painfully reminiscent of recent datacenter fiascos at American Eagle Outfitters and Wal-Mart. And while some major retailers were kept apprised of the progress of PayPal’s outage and disabled PayPal payment functionality on their E-Commerce sites to minimize problems, most of PayPal’s customers got the word late or not at all. Apparently there was no effective plan for dealing with that side of the outage, either.
PayPal isn’t saying much about the outage beyond its official statement by Scott Guilfoyle, the company’s senior VP for platform services: “At around 8:07 AM [San Francisco time Friday], a network hardware failure in one of our datacenters resulted in a service interruption for all PayPal users worldwide. Everyone in our organization was immediately engaged to identify the issue and get PayPal back up and running. We were not able to switch over to our backup systems as quickly as planned. We partially restored service by approximately 8:45 AM and the issue was fully resolved by 9:24 AM. A second service interruption started at around 11:30 AM and was partially resolved at 11:55 AM with full recovery at 12:21 PM.”
But the company’s “Live Site Status” blog tells a more detailed story. According to the technical blog, the incident (and PayPal) went down like this:
November 4th, 2010 at 10:57 am
There are two thoughts this whole incident inspires. The first is just the whole idea of backups in general. The simple answer is “practice, practice, practice”. Backup plans have to be exercised on a regular basis and they must go full circle, transferring to the backup site and also bringing services back on location.
But the other thing that comes to mind is “too big to fail”, to borrow a phrase from the financial crises. A lot of retailers are considering Cloud Computing and they should. Cloud Computing makes significant sense economically, but it also introduces a whole new set of risk factors. The backup plan becomes even more significant because the retailer is counting on their service provider to be practicing it. As processing becomes more centralized the impact of a single outage becomes more significant. At the same time, the processes necessary to ensure adequate backup are becoming more opaque. Retailers considering Cloud solutions should consider this in their evaluations.