Notes from #NTAPInsight 2014

Green Ball

After a partial week at NetApp 2014 Insight US, here are my thoughts:
(full disclosure:  I was a presenter of one session at the conference)
  1. Keynote thought
  2. OnTap 8.3 announcement
  3. Hybrid Cloud
    1. Data is state-ful, unlike (cloud) computing
  4. Data locality
  5. Different UNIX variants – Different Cloud
  6. Laundry services similar to cloud computing (Jay Kidd / NA CTO)
Tom Mendoza (NetApp Vice Chairman) was fantastic in his keynote.  He focused on culture and wanting to build a culture of trust & candor.  CIOs understand every company is going to have issues, the question will be does the CIO of the customer trust the vendor to be there when there is a problem.
Lots of talk about OnTap 8.3 – though the fact that it is RC1 and not GA is disappointing.   Didn’t hear anyone reference that the 8.3 is a Release Candidate.  8.3 provides full feature parity with 7-mode.  There was little discussion about 7-mode, except for how to move off 7-mode (7-mode transition tool).  7-mode transition still appears to be a large effort.  For, 7MTT, the key term is “tool”.
The key focus in the keynotes was “Hybrid Cloud”.  One of the key takeaways is the need for data locality.  The data is ‘state-ful’ as opposed to cloud computing which is ‘stateless’ — in the sense that the resource need can be metered, but data is not.  So, when moving from on-prem to cloud, data would have to be replicated completely between 2.   Or more so, if you are working between clouds, or maybe between clouds in different countries, the full data set has to be replicated.  The concern is that government entities (Snowden effect) will require data to be housed in respective countries.  This now becomes the digital equivalent of import/export laws and regulations.
With the notion of different clouds, it reminds me of all the different UNIX variants.  We had Solaris boxes and we had HP-UX boxes and we had DEC boxes and we struggled moving data between.  Some were big endian, some little endian.  So, binaries were incompatible.
Finally and irreverently during Jay Kidd’s (NetApp CTO) presentation, my mind wandered when thinking about cloud computing analogies.  Never noticed before how metered cloud computing is so much like washing machines at the laundry mat – pay per use.


Jim – 10/30/14 @itbycrayon View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)


How to deal with exponential growth rates? And how does this relate to cloud computing?

Double Black DiamondWhat happens when demand exceed the resources? Ah, raise prices. But, sometimes that is a not available as a solution. And sometimes demand spikes far more than expected.

Example: Back in the early 2000s, NetFlix allowed renters to have 3 DVDs at a time, but some customers churned those 3 DVDs more frequently than average and more frequently than Netflix expected. So, they throttled those customers and put them at the back the line. (dug up this reference). This also appears to have happened in their streaming business.

Another example: Your web site gets linked on a site that generates a ton of traffic (I should be so lucky). This piece says that the Drudge Report sent 30-50,000 hits per hour bringing down the US Senate’s web site. At 36,000, that is an average of 10 per second.

Network Bandwidth tends to be the resource. Another example from AT&T: As a service provider, this piece says that 2% of their customers consume 20% of their network.

There are non-technical examples as well. The all-you-can-eat buffet is one. Some customers will consume significantly more than the average. (Unfortunately, I can’t find a youtube link to a commercial that VISA ran during the Olympics in the 80s or 90s where a sumo wrestler walks into a buffet – if you can find it for me, please reply).

Insurance customers deal with this as well. They try to spread out the risk so that if an event were to occur (e.g. a hurricane), they don’t want all their customers in a single area. Economists call this “adverse selection”. “How do we diversify the risk so that those that file claims, aren’t the only ones paying in?”

How does this deal with computing? Well, quotas are an example. I used to run systems with home directory quotas. If I had 100GB, but 1000 users, I couldn’t divide this up evenly. I had about 500 users who didn’t need 1MB, but I had 5 that needed 10GB. For the 500 users that did need more than 1MB, they needed more than an even slice.

So, the disk space had to be “oversubscribed”. I then could have a situation where everyone stayed under quota, but I could still run out of disk space.

Banks do this all the time. They have far less cash on-hand in the bank, than they have deposits. Banks compensate by having insurance through the Fed which should prevent a run on the bank.

In computing, this happens on network bandwidth, disk space, and compute power. At deeper levels, this deals with IO. As CPUs get faster, disks become the bottleneck and not everyone can afford solid state disks to keep up with the IO demand.

The demand in a cloud computing environment would hopefully follow a normal distribution (bell curve) for demand. But, that is not what always occurs. Demand tends to follow an exponential curve.


As a result, if the demand cannot be quenched by price increases, then throttling must be implemented to prevent full consumption of the resources. There are many algorithms to choose from when looking at the network, likewise there are algorithms for the compute.

Given cloud architecture which is VM on a host connected to a switch connected to storage which has a disk pool of some sort, there are many places to introduce throttles. In the image below which is uses a VMware & NetApp vFiler environment (could be SVM aka vServer as well) serving, there is VM on ESX host, connected to Ethernet switch, connected to Filer, split between disk aggregate and a vFiler which then pulls from the volume sitting on the aggregate, and then has the file.


Throttling at the switch may not do much good. As this would throttle all VMs on an ESX host or if not filtering by IP, all ESX hosts. Throttling at the ESX server layer again, affects multiple VMs. Imagine a single customer on 1 or many VMs. Likewise, filtering at the storage layer, specifically, the vFiler may impact multiple VMs. The logical thing to do for greatest granularity would be to throttle at the VM or vmdk level. Basically, throttle at the end-points. Since a VM could have multiple vmdks, it is probably best to throttle at the VM level. (NetApp Clustered OnTap 8.2 would allow for throttles at the file level). Not to favor NetApp, other vendors (e.g EMC, SolidFire) who are introducing QoS are doing these at the LUN layer (they tend to be block vendors).

For manual throttling, some isolate the workloads to specific equipment – this could be compute, network, or disk. When I used to work at the University of CA, Irvine and we saw the dorms coming online with Ethernet to the rooms, I joked that we should drive their traffic through our slowest routers as we feared they would bury the core network.

The question would be what type of throttle algorithm would be best? Since starving the main consumers to zero throughput is not acceptable, following a network model may be preferred. Something like a weighted fair queueing algorithm may be the most reasonable, though a simple proposition would be to revert back to the quota models for disk space – just set higher thresholds for many which will not eliminate every problem, but a majority. For extra credit (and maybe a headache) read this option which was a network solution to also maximize throughput

Jim – 11/03/13

View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, I’ll accept)



Migrating to the Cloud – Technical Concerns of migrating to an IaaS Cloud

Blue SquareThe thoughts of migrating to the Cloud can be flippant or daunting depending on where you sit on the optimist/pessimist scale. In reality, this is a matter of proportion to your environment.

In this week’s post, I’m talking specifically about Infrastructure-as-a-Service Cloud — rather than having a physical presence, your goal is to move to the cloud, so you don’t have to care for that hardware stuff.

My recommendation on where to begin is to ask the question: How would I migrate somewhere else?

It starts by – what services do I move 1st? When I worked at UC Irvine‘s School of Humanities in the early 90s, we had to move into a new building and the finance staff needed to move 1st since they didn’t want to get caught with closing the books at the very time we had to evacuate the old modular building. So, a server had to go over there to provide the Netware routing that we were doing between a classroom network and the office network (it was summer, so I didn’t have to worry about student congestion on the network – though the empty classroom I put the server in was victim of the painters unplugging it). After the office staff could move, then I could bring the office Netware server over one evening. The important part of this story is that I needed networking to handle the people 1st.

Another move that I performed was similar. I had to move servers from downtown Denver to a new data center in south Denver. The users of those machines couldn’t deal with the network latency as our route went from downtown Denver to Massachusetts to south Denver. Those users had to move, then they had to get new AD (Microsoft Active Directory) credentials and new security tokens. So, the important part here is that the users needed an authentication infrastructure 1st.

While moving some servers from Denver to Aurora, into a new facility for us – we again were concerned about latency, so we needed to have authentication and name services also stood up in Aurora, so that authentication wouldn’t have to cross the WAN.

My point from these anecdotes is it is not just as simple as moving one OS instance. There are dependencies. Typically, those dependencies are infrastructure dependencies, and they typically exist so that latency can be avoided. [I haven’t defined network latency — but, for those who need an example — think of TV interviews that occur when the anchor is in New York and the reporter is in the middle east. The anchor asks a question and the reporter has to wait for all the audio to get to him while the viewer sees a pause in conversation. That is network latency. The amount of time it takes to travel the “wire”].

Back to dependencies – I may need DNS (domain name service) at the new site, so that every time I look up, I don’t have to have the server talk back to my local network to get that information. I may need authentication services (e.g AD). I may need a network route outbound. I may need a database server. Now, these start to add up.

In my experience, there is a 1st wave – infrastructure services.

Then there is a 2nd wave – actual systems used by users. Typically, these are some guinea pigs which can endure the kinks being smoothed out.

Eventually, there are a bunch of systems that are all interrelated. This wave ends up being quite an undertaking, as this bulk of systems takes time to move and users are going to want minimal downtime.

Then after the final user wave, the final clean-up occurs – decommissioning the old infrastructure servers.


What I’ve presented is more about how to do a migration than a migration into the cloud. For the cloud, there may be additional steps depending on your provider – maybe you have to convert VMware VMs using OVFtool and then import.

VM portability eases the task. The underlying hardware tends to be irrelevant – as opposed to moving physical servers where there may be different driver stacks, different devices, etc. Obviously, one has to be cognizant of compatibility. If one is running IBM AIX, then one must find a cloud provider that supports this.

My point is that it is still a migration of how to get from A to B, and high level requirements remain the same (How is my data going to move – over the wire or by truck? What can I live with? What systems depend on what other systems?). The big difference between an IaaS Cloud migration and a physical migration is that servers aren’t moving from site A to site B – so there isn’t the “swing gear” conversation or the “physical move” conversation. This is a migration of landing on pre-staged gear. The destination is ready. Figure out the transport requirements of your destination cloud and get going!

Business considerations for the move to the cloud

Blue SquareMigration implies change and change implies risk. So, what are the hurdles that the decision maker has to make before committing to a migration to the cloud?

First, what type of migration is it? Is it a migration to Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS) … or any of the other “fill in the blank as a Service” (XaaS)? Wikipedia can provide sufficient definitions for IaaS, PaaS, and SaaS, but just to quickly provide examples: IaaS allows one to hotel their computing environment – e.g. run Microsoft Server on someone else’s gear by renting it out. PaaS allows for a development environment to produce software on someone else’s gear and use their software development tools. SaaS allows one to run a specific software app on someone else’s environment — “webmail” being SaaS before there was a term for it. Now, it could be online learning,, etc.

IaaS, PaaS, SaaS

IaaS, PaaS, SaaS

Second, what are the risks? In exchange for Capital Expenses and some Operational Expenses, one gets Operational Expenses. This also means that some control is turned over to the service. When I lose power to my house, since I haven’t built my own power plant, I’m at the mercy of the utility company. Power comes back when it comes back. I can’t re-prioritize tasks that the power company has set (e.g. bring my neighborhood back before the other neighborhood). Depending on the SLAs – Service Level Agreements – uptime, performance, etc. is where the expectation is set.

I’ve worked with some users when approached by the SLAs of internal systems – wanted to drive costs down. “Oh, I don’t need redundancy or highly available systems – these are test & development servers… except right before we do a code release, then the systems have to be up 24×7.” “Um, you don’t get to pick the time of your disaster or failure, so sounds like you need to buy an HA system.”

As systems become more complex, firms struggle with: “how is the expertise maintained?” Acquisition cost of gear is about 1/3 the total cost of gear. There is maintenance and then the administration. Unless one runs a tech company, the tech administration is not the company’s core competency. So, why would a company want to run that in their business?

This is the classic buy v. build decision. Of course, with IT, the problem is that after one builds, they still have to administer. And, after one buys, they still have to handle the vendor relations.

In addition to vendor relations, one has the concern about vendor longevity. Is the vendor going to be there for as long as your company needs it to be? What happens when the vendor goes out of business or ends the line of business?

Of course, on the build side, what happens when the expert you hired, finds a new job or you wish to promote him to an alternate position?

Non-profits have alternate problems where funds may not be regular and OpEx costs infinitum might not be serviceable. But, hardware/software maintenance costs and training fall in the same boat.

A third consideration is security. How secure is your data in the cloud? Returning to the SaaS e-mail, it is fair to assume given recent revelations that the NSA is mining your e-mail off Gmail, yahoo mail, Hotmail, and others just to name a few. One would hope that the systems are secure from hackers and this info is only leaking to the government lawfully. But, if you are concerned about hackers, how secure is your data in-house? So, there is a cost consideration for the build solution and there is a trust consideration given one’s provider.

The build v. buy decision is admittedly harder with technology given the high rate of change. This is especially true as it ties to security. Feature implementation is based upon service provider timetables and evaluation of risk. All this again returns to priorities and that in the build solution, one gets to make their own calls and evaluations.

In summary, one can select at what level they wish to move to the cloud. One needs to be concerned about the build v. buy decisions, but the cloud move could allow for granular cloud moves (we put this out there, we don’t put that). Security, Vendor Longevity, Vendor Relations, etc. are big factors. Time & Labor needs to be accounted for, doing it in-house or working to out-source. And, of course, there is the decisions about CapEx & OpEx.



<a href=”″&gt;

<img src=”; width=”160″ height=”33″ border=”0″ alt=”View Jim Surlow’s profile on LinkedIn”>


What’s all this “cloud” stuff? What does it mean?

Green BallIn the old days (I don’t really recall when exactly), techies used the term “cloud” – and back then it meant the network, specifically the WAN (wide area network). So, we would draw a picture on the board showing site A and site B and in between was this cloud (the network cloud). We didn’t know how we got from A to B and didn’t care — or if we did, we knew that the vendor, e.g. AT&T, could change it.


Now, “cloud” has a new meaning, stemming from cloud computing. One can now have application-as-a-service, platform-as-a-service, or infrastructure-as-a-service. An example of application-as-a-service cloud computing before such terms existed: webmail. Once, we stopped downloading e-mail from e-mail servers and just used some web interface from Yahoo or Hotmail, and the like, we moved into Cloud Computing. Those web servers are somewhere and can be moved without us knowing it.


So, “cloud” is just a way of saying, the technical resources/service is out there and the service is what is important not the “where”.

I’ll leave the different types of cloud computing – application-as-a-service, platfrom-as-a-service, and infrastructure-as-a-service for another day… Comment if there is specific interest…

Jim Rev 1.3