Business considerations for the move to the cloud

Blue SquareMigration implies change and change implies risk. So, what are the hurdles that the decision maker has to make before committing to a migration to the cloud?

First, what type of migration is it? Is it a migration to Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS) … or any of the other “fill in the blank as a Service” (XaaS)? Wikipedia can provide sufficient definitions for IaaS, PaaS, and SaaS, but just to quickly provide examples: IaaS allows one to hotel their computing environment – e.g. run Microsoft Server on someone else’s gear by renting it out. PaaS allows for a development environment to produce software on someone else’s gear and use their software development tools. SaaS allows one to run a specific software app on someone else’s environment — “webmail” being SaaS before there was a term for it. Now, it could be online learning, Salesforce.com, etc.

IaaS, PaaS, SaaS

IaaS, PaaS, SaaS

Second, what are the risks? In exchange for Capital Expenses and some Operational Expenses, one gets Operational Expenses. This also means that some control is turned over to the service. When I lose power to my house, since I haven’t built my own power plant, I’m at the mercy of the utility company. Power comes back when it comes back. I can’t re-prioritize tasks that the power company has set (e.g. bring my neighborhood back before the other neighborhood). Depending on the SLAs – Service Level Agreements – uptime, performance, etc. is where the expectation is set.

I’ve worked with some users when approached by the SLAs of internal systems – wanted to drive costs down. “Oh, I don’t need redundancy or highly available systems – these are test & development servers… except right before we do a code release, then the systems have to be up 24×7.” “Um, you don’t get to pick the time of your disaster or failure, so sounds like you need to buy an HA system.”

As systems become more complex, firms struggle with: “how is the expertise maintained?” Acquisition cost of gear is about 1/3 the total cost of gear. There is maintenance and then the administration. Unless one runs a tech company, the tech administration is not the company’s core competency. So, why would a company want to run that in their business?

This is the classic buy v. build decision. Of course, with IT, the problem is that after one builds, they still have to administer. And, after one buys, they still have to handle the vendor relations.

In addition to vendor relations, one has the concern about vendor longevity. Is the vendor going to be there for as long as your company needs it to be? What happens when the vendor goes out of business or ends the line of business?

Of course, on the build side, what happens when the expert you hired, finds a new job or you wish to promote him to an alternate position?

Non-profits have alternate problems where funds may not be regular and OpEx costs infinitum might not be serviceable. But, hardware/software maintenance costs and training fall in the same boat.

A third consideration is security. How secure is your data in the cloud? Returning to the SaaS e-mail, it is fair to assume given recent revelations that the NSA is mining your e-mail off Gmail, yahoo mail, Hotmail, and others just to name a few. One would hope that the systems are secure from hackers and this info is only leaking to the government lawfully. But, if you are concerned about hackers, how secure is your data in-house? So, there is a cost consideration for the build solution and there is a trust consideration given one’s provider.

The build v. buy decision is admittedly harder with technology given the high rate of change. This is especially true as it ties to security. Feature implementation is based upon service provider timetables and evaluation of risk. All this again returns to priorities and that in the build solution, one gets to make their own calls and evaluations.

In summary, one can select at what level they wish to move to the cloud. One needs to be concerned about the build v. buy decisions, but the cloud move could allow for granular cloud moves (we put this out there, we don’t put that). Security, Vendor Longevity, Vendor Relations, etc. are big factors. Time & Labor needs to be accounted for, doing it in-house or working to out-source. And, of course, there is the decisions about CapEx & OpEx.

Jim

@itbycrayon

<a href=”http://www.linkedin.com/pub/jim-surlow/7/913/b80″&gt;

<img src=”http://www.linkedin.com/img/webpromo/btn_myprofile_160x33.png&#8221; width=”160″ height=”33″ border=”0″ alt=”View Jim Surlow’s profile on LinkedIn”>

</a>

Advertisements

How often do disks fail these days?

Black DiamondGood news & Bad news & more Bad news:   The good news is that there has been a couple of exhaustive studies on hard drive failure rates.  The bad news is the studies are old.   More bad news:  With the growing acceptance of solid state drives, we may not see another study.

Google’s Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz Andre Barroso wrote a great paper, “Failure Trends in a Large Disk Drive Population“, submitted to USENIX FAST conference of ’07.  Reading this paper in today’s context, the takeaways are:

  • The study was on consumer-grade (aka SATA) drives.
  • Operating Temperature is not a good predictor of failure (i.e. it is not conclusive that drives operating in higher temp environments fail more frequently).
  • Disk utilization (wear) is not a good predictor of failure.
  • Failure rates over time graph more check mark or fish hook curve than bathtub.

    Fish hook v. Bathtub curves

    Fish hook v. Bathtub curves

Carnegie Mellon’s Bianca Schroeder & Garth A. Gibson also published a paper, “Disk failures in the real world:
What does an MTTF of 1,000,000 hours mean to you?
” , at USENIX’S FAST 2007 conference.  The takeaways from this paper are that their study was on FC, SCSI, & SATA disks and that in the first 5 years, it is reasonable to expect a 3% failure rate again following a check mark pattern.  After 5 years, the failure rates are, of course, more significant.

Here’s the problem:  In 2007, they are analyzing disks, where some are over 5 years old.  So, is there any expectation that these failure rates are good proxies for our disks in the last 6 years?  (The CMU paper has a list of disks that were sampled, if the reader cares to see). It would be natural to assume that disk drive vendors have improved the durability of their products.  But, maybe the insertion of SAS (Serial Attached SCSI) drives into the technology mix might introduce similar failure rates, being that the technology is new(er).

Another study of note was a University of Illinois study by Weihang Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky submitted during the ’08 FAST conference, “Are Disks the Dominant Contributor for Storage Failures?” which today’s takeaway is that when there is a single disk failure, there is another potential failure sooner than odds would dictate (i.e. that the events can be correlated).

This seems to be in place today as well as the research presented here was a result of cascading failures initially stemming from a disk failure.  So, the question is:  When can we expect another disk failure?

The research being authoritative and vendor-neutral would be helpful if the data were current.    Until then, this data needs to be used as a proxy for predictions – or one can use anecdotal information.  sigh.

Jim

@itbycrayon
<a href=”http://www.linkedin.com/pub/jim-surlow/7/913/b80″&gt;

<img src=”http://www.linkedin.com/img/webpromo/btn_myprofile_160x33.png&#8221; width=”160″ height=”33″ border=”0″ alt=”View Jim Surlow’s profile on LinkedIn”>

</a>