advertisement
advertisement

This is page 2 of:

Human Error Is Still Amazon Cloud’s Achilles Heel

January 2nd, 2013

At 5:02 PM on December 24 the Amazon team disabled the ability of the load balancers to scale up or down or be modified. That stopped the spread of the problem. “At the peak of the event, 6.8 percent of running ELB load balancers were impacted,” Amazon said. That’s a bit coy, though—the other 93.2 percent were technically operating correctly but outside the control of customers.

The team manually recovered some of the affected load balancers, but the main plan was to rebuild the deleted state data as of 12:24 PM, then merge in all the API calls after that point to create an uncorrupted configuration for each load balancer and get them working correctly again.

The first try at doing that took several hours. It failed.

At 2:45 AM on December 25 a different approach finally made it possible to restore a snapshot of the ELB state data to what it was more than 12 hours earlier.

At 5:40 AM on December 25 15 hours worth of API calls and state changes were finally merged in and verified, and the team began to slowly re-enable the APIs and to recover the load balancers.

At 8:15 AM on December 25 most of the APIs and workflows had been re-enabled.

At 10:30 AM on December 25 almost all load balancers were working correctly.

At 12:05 PM on December 25 Amazon announced that its U.S.-East cloud was operating normally again.

Yes, Amazon has learned from the experience—changing access controls so a programmer can’t do that again, adding checks for the health of state data to its data-recovery process and starting to work up ways for load balancers to heal themselves in the future. And to be fair, this incident wasn’t as lengthy as the April 2011 incident in which a change in network configuration (another human error) paralyzed much of Amazon’s cloud storage for three days.

But coming on Christmas Eve, the timing was really lousy. Fortunately for most big chains, they’re not using Amazon’s cloud—yet.

But these high-profile Amazon problems are actually a good sign, at least if you’re not Netflix or one of the smaller E-tailers who were slammed by the load-balancing failure. The catastrophes are shorter, fewer and further between, and Amazon is getting better at dealing with them.

The problem that remains: Amazon is still learning. And all that learning is still happening in datacenters that never shut down for maintenance.

There’s an irony here: We know from our own experience that everything in a cloud can be moved out of a particular datacenter, so it doesn’t have to run 24/7 forever. When Hurricane Sandy was heading up the East Coast of the U.S., StorefrontBacktalk‘s cloud provider (not Amazon) moved our virtual production server from Virginia to Chicago, apparently without a glitch, and presumably did the same with all the other cloud servers in that threatened Virginia datacenter. We have no doubt Amazon can do the same thing (and quite possibly did).

That dodges some maintenance-related problems. But it wouldn’t have helped in this case. Things are getting better. But cloud still may not be mature enough for a big chain’s production E-Commerce system—with or without humans.


advertisement

Comments are closed.

Newsletters

StorefrontBacktalk delivers the latest retail technology news & analysis. Join more than 60,000 retail IT leaders who subscribe to our free weekly email. Sign up today!
advertisement

Most Recent Comments

Why Did Gonzales Hackers Like European Cards So Much Better?

I am still unclear about the core point here-- why higher value of European cards. Supply and demand, yes, makes sense. But the fact that the cards were chip and pin (EMV) should make them less valuable because that demonstrably reduces the ability to use them fraudulently. Did the author mean that the chip and pin cards could be used in a country where EMV is not implemented--the US--and this mis-match make it easier to us them since the issuing banks may not have as robust anti-fraud controls as non-EMV banks because they assumed EMV would do the fraud prevention for them Read more...
Two possible reasons that I can think of and have seen in the past - 1) Cards issued by European banks when used online cross border don't usually support AVS checks. So, when a European card is used with a billing address that's in the US, an ecom merchant wouldn't necessarily know that the shipping zip code doesn't match the billing code. 2) Also, in offline chip countries the card determines whether or not a transaction is approved, not the issuer. In my experience, European issuers haven't developed the same checks on authorization requests as US issuers. So, these cards might be more valuable because they are more likely to get approved. Read more...
A smart card slot in terminals doesn't mean there is a reader or that the reader is activated. Then, activated reader or not, the U.S. processors don't have apps certified or ready to load into those terminals to accept and process smart card transactions just yet. Don't get your card(t) before the terminal (horse). Read more...
The marketplace does speak. More fraud capacity translates to higher value for the stolen data. Because nearly 100% of all US transactions are authorized online in real time, we have less fraud regardless of whether the card is Magstripe only or chip and PIn. Hence, $10 prices for US cards vs $25 for the European counterparts. Read more...
@David True. The European cards have both an EMV chip AND a mag stripe. Europeans may generally use the chip for their transactions, but the insecure stripe remains vulnerable to skimming, whether it be from a false front on an ATM or a dishonest waiter with a handheld skimmer. If their stripe is skimmed, the track data can still be cloned and used fraudulently in the United States. If European banks only detect fraud from 9-5 GMT, that might explain why American criminals prefer them over American bank issued cards, who have fraud detection in place 24x7. Read more...

StorefrontBacktalk
Our apologies. Due to legal and security copyright issues, we can't facilitate the printing of Premium Content. If you absolutely need a hard copy, please contact customer service.