StorefrontBacktalk

Best Buy's Cloud: Wild West Gives Way To Making The Same Data Mistakes Again

Written by Frank Hayes and Evan Schuman
December 7th, 2011
Many chains have seen the cloud as a nice way to get unlimited data storage on the cheap. But Best Buy's initial cloud efforts revealed something much more fun: a lawless area where IT management didn't have any rules.

A funny thing happened, though: "Everybody has always said that if we could do the datacenter over again, we'd make no mistakes and everything would be perfect. It would be this incredible Utopian datacenter, except that we're all making the same mistakes that we made in the datacenter originally, because you go to the cloud like the Wild West," said Thomas Kelly, Best Buy's enterprise architect for cloud services.

Kelly, who made his comments at a VentureBeat conference last week in Redwood City, Calif., spoke initially of the short-lived joy of lack of regulation. "That which was not regulated by arch gov [architecture governance] is technically something you don't have to tell arch gov about. So it was hilarious, because people would say 'oh, we'll put it on the cloud.' We had architects who were interested in the cloud, or we had a consultancy or a vendor that was interested in the cloud, so we ended up with 50 or so applications running on the cloud with absolutely no governance whatsoever," he said.

That couldn't last. But once it was time to bring governance back in—not on a project-by-project basis but to build an enterprise infrastructure in the cloud—it was possible to rethink both the hows and whys of IT governance.

"For the first time, we have a clean slate," Kelly said. "What if we brought all of the existing cloud applications under governance and refactored them into—before it's too late—a cohesive enterprise infrastructure? We laid down the requirements for a core repository, an ETL2 dataview, mandates for atomic service development. We brought in a really powerhouse gateway with L7. We selectively take advantage of pretty much all of the Amazon data services, for example. We run an L2-backed EMS bus to our back-end datacenter, so our cloud is actually now becoming a hybrid cloud where we're directly connected for localization to our back-end bus."

In short, Best Buy didn't just shove some easy IT functions up into the cloud. It didn't even simply come up with an IT architecture more suitable to the cloud than the one the company had used for decades. Instead, Best Buy rethought not only governance, but many of the assumptions that had never before needed to be questioned about how to do retail IT.

The problem isn't just that the Wild West of the cloud needs a sheriff. It's also what types of rules are appropriate, and what considerations drive them. That's going to take some trial and error. Fortunately, it's relatively cheap trial and error. Case in point: Capacity management.

"We are legitimately in a situation where we can be doing as much as 10 to 15 times the amount of volume on our data systems in November than we are in the middle of the summertime," Kelly said. "We were always building out to Black Friday. However, Black Friday is one day a year, and you have to pay for software and hardware and everything else all year long."

For online retailers, one month a year is insane for capacity requirements. Historically, IT has had to pay for all that capacity the rest of the time, too. With the cloud, that's no longer necessary. Scaling up and back isn't a trivial process. But it beats acquiring, paying for, setting up and maintaining the necessary hardware and software to scale up without the cloud.

That difference in acquisition grief doesn't stop there.That difference in acquisition grief doesn't stop there.

When you don't have to buy (or free up) a server to try an experiment or pilot a new project, anything seems possible to developers. "We initially went to the cloud in very, very small isolated pockets of application development," Kelly said. The cloud eliminated the developers' problems acquiring capacity for new projects. "We began to streamline that process, because we went to the cloud, which is awesome. And we realized, wow, there are things here that we never really thought we could do," he added.

"We didn't really have the concept of super-scalable data sources. We'd have to upgrade the hardware; we'd have to cluster; we'd have to do this and that. When things like Simple DBA and S3 came out, that was like the beginning of realizing that we're really in a situation where we can scale beyond anything that we thought possible, so we went back and started looking at what was possible."

See the problem? This is why IT governance has to be rethought when you hit the cloud. One major justification for governance in the past was cost—all that hardware had to be bought, serviced and written off over years. It was a long-term financial commitment that justified governance. And for many organizations, that cost was the real driver for governance.

With the cloud, the cost is still there, but it's easier to cut losses and the long-term accounting issues are simpler. If you've been using cost as the main reason for saying no to developers with clever ideas, it's harder to reject those ideas when they won't add capital spending and new racks in the server room.

But those projects still have to be governed—especially if they grow into production applications. "It is so difficult to support and maintain when you're delivering 500 projects a year at the enterprise level and you're trying to deal with lifecycle maintenance that goes back 8, 10, 12 years. You have 20, 30, 40, 50 different vendors in your space. You're dealing with incredible amounts of high-speed data," Kelly said.

Then there are the routine obstacles that corporate likes to foist on IT just to keep life interesting.

"Everybody lives in the world where it's really hard to get a bunch of servers spun up in environments. You're always battling everyone to get what you need, tooling, all of that. You have to go around security. The security issue is always going to be a huge issue in the datacenter. Even to get a VPN connection or datacenter-to-datacenter connection, it can take weeks," Kelly said.

And why does all that require full-blown IT governance, aside from a desire to keep tidy flowcharts of who's doing what? "Anything without governance immediately becomes a failure potential—oftentimes a catastrophe over the holiday season," he added.

Another important Best Buy concern was adding layers of control without demoralizing programmers and developers, who had fallen in love with the carefree cloud approach.

"We are not willing to subvert programmers and are not looking to subvert brilliant people. What we've tried to come in and say is, 'You can have whatever you want. You can use whatever software you want. But it's got to be best of breed.' So we brought in governance, but we brought in governance differently," Kelly said. "Once we establish the best of breed for a particular product family, we're going to mandate that. But that doesn't mean you can't bring in another product right next to it, as long as it's not an overlap. So we ended up with SQL, No SQL databases. We have two or three different caches running. We have a lot of different messaging software running. We have our gateways. At the end of the day, you say to yourself that by trying to be all inclusive of technology, we actually have better governance than we had when we weren't inclusive."

That's what governance is really for. And forgetting that on the way to the cloud is the one mistake you really can't afford.