VMware Linked Clones & NetApp Partial writes fun


Double Black Diamond
NetApp OnTap writes data in 4k-blocks. As long as the writes to NetApp are in 4k increments, all is good.

Let me step back. Given all the fun that I’ve experienced recently, I am going to alter my topic schedule to bring this topic forward while it is hot.

One more step back: The environment consists of a VMware ESX hosts getting their storage via NFS from NetApp storage. When data is not written to a storage system in increments matching its block size, then misalignment occurs. For NetApp, that block size is 4k. If I write 32k, that breaks nicely into 8 4k blocks. If I write 10k, that doesn’t as it ends up being 2 and a half.

20130830-222125.jpg

The misalignment problems has been well documented. VMware has a doc. NetApp has a doc. Other vendors (e.g. HP/3PAR and EMC) reference alignment problems in their docs. The problem is well known – and easily googled. With misalignment, more read & write operations are required because the underlying block is not aligned with the block that is going out to storage.

And yay! VMware addresses it in their VMFS-5 file system by making the blocks 1MB in size. That will divide up nicely. And Microsoft, with Windows 2008, they changed the starting block which helped alignment.

So, all our problems are gone, right??

NO.

VMware introduced linked clones which have a grain size of 512 (see Cormac Hogan’s blog)

Once this issue is discovered, you end up reading more of Cormac’s blog, and then maybe some of Duncan Epping‘s, and maybe some of Jack McLeod, not to mention Knowledge Base articles from both VMware & NetApp. The recommended solution is to use VAAI and let the NetApp handle clones on the backend. And these 512-byte writes are technically “partial” writes and not “misaligned”.

If everything is aligned, then the partial writes require 1 disk read operation (of 4k), an instruction to now wedge the 512 packet in appropriately to the 4k, and 1 write back out. If misalignment exists, then it requires twice the IO operations.

However, if you look at nfsstat -d, you’ll notice that there are a whole bunch of packets in the 0-511 range. Wait! I have partial writes, those show up in the 512-1k. What are all these?

At this point, I don’t entirely know (gee Jim, great piece – no answers), but according to VMware KB 1007909 VMware NFS is doing 84-byte (84!?) writes for NFS locking. Given the count in my 1-511 bytes, NFS locking can’t account for all of those – but what does this do to NetApp’s 4K byte blocks?

Jim – 08/30/13
@itbycrayon

View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

Advertisements

Migrating to the Cloud – Technical Concerns of migrating to an IaaS Cloud

Blue SquareThe thoughts of migrating to the Cloud can be flippant or daunting depending on where you sit on the optimist/pessimist scale. In reality, this is a matter of proportion to your environment.

In this week’s post, I’m talking specifically about Infrastructure-as-a-Service Cloud — rather than having a physical presence, your goal is to move to the cloud, so you don’t have to care for that hardware stuff.

My recommendation on where to begin is to ask the question: How would I migrate somewhere else?

It starts by – what services do I move 1st? When I worked at UC Irvine‘s School of Humanities in the early 90s, we had to move into a new building and the finance staff needed to move 1st since they didn’t want to get caught with closing the books at the very time we had to evacuate the old modular building. So, a server had to go over there to provide the Netware routing that we were doing between a classroom network and the office network (it was summer, so I didn’t have to worry about student congestion on the network – though the empty classroom I put the server in was victim of the painters unplugging it). After the office staff could move, then I could bring the office Netware server over one evening. The important part of this story is that I needed networking to handle the people 1st.

Another move that I performed was similar. I had to move servers from downtown Denver to a new data center in south Denver. The users of those machines couldn’t deal with the network latency as our route went from downtown Denver to Massachusetts to south Denver. Those users had to move, then they had to get new AD (Microsoft Active Directory) credentials and new security tokens. So, the important part here is that the users needed an authentication infrastructure 1st.

While moving some servers from Denver to Aurora, into a new facility for us – we again were concerned about latency, so we needed to have authentication and name services also stood up in Aurora, so that authentication wouldn’t have to cross the WAN.

My point from these anecdotes is it is not just as simple as moving one OS instance. There are dependencies. Typically, those dependencies are infrastructure dependencies, and they typically exist so that latency can be avoided. [I haven’t defined network latency — but, for those who need an example — think of TV interviews that occur when the anchor is in New York and the reporter is in the middle east. The anchor asks a question and the reporter has to wait for all the audio to get to him while the viewer sees a pause in conversation. That is network latency. The amount of time it takes to travel the “wire”].

Back to dependencies – I may need DNS (domain name service) at the new site, so that every time I look up itbycrayon.com, I don’t have to have the server talk back to my local network to get that information. I may need authentication services (e.g AD). I may need a network route outbound. I may need a database server. Now, these start to add up.

In my experience, there is a 1st wave – infrastructure services.

Then there is a 2nd wave – actual systems used by users. Typically, these are some guinea pigs which can endure the kinks being smoothed out.

Eventually, there are a bunch of systems that are all interrelated. This wave ends up being quite an undertaking, as this bulk of systems takes time to move and users are going to want minimal downtime.

Then after the final user wave, the final clean-up occurs – decommissioning the old infrastructure servers.

20130819-213319.jpg

What I’ve presented is more about how to do a migration than a migration into the cloud. For the cloud, there may be additional steps depending on your provider – maybe you have to convert VMware VMs using OVFtool and then import.

VM portability eases the task. The underlying hardware tends to be irrelevant – as opposed to moving physical servers where there may be different driver stacks, different devices, etc. Obviously, one has to be cognizant of compatibility. If one is running IBM AIX, then one must find a cloud provider that supports this.

My point is that it is still a migration of how to get from A to B, and high level requirements remain the same (How is my data going to move – over the wire or by truck? What can I live with? What systems depend on what other systems?). The big difference between an IaaS Cloud migration and a physical migration is that servers aren’t moving from site A to site B – so there isn’t the “swing gear” conversation or the “physical move” conversation. This is a migration of landing on pre-staged gear. The destination is ready. Figure out the transport requirements of your destination cloud and get going!

Best of Breed v. One Throat to Choke

Best of Breed or One-Throat-to-Choke – Pros & ConsGreen Ball

In the industry, you hear the term, “Best of Breed” thrown around.  What this implies is that in a system, no vendor has a single solution that excels everywhere in the system – so you need to assemble parts like a cooking recipe to get the best.

One may see VARs (Value Added Resellers), System Integrators, and even vendors with broad product portfolios sell best-of-breed solutions.  Symantec has multiple backup products, EMC has multiple backup products, EMC has multiple storage arrays running different operating systems, IBM has multiple storage arrays with different operating systems.

The reason for this is that: consumers have different preferences and those preferences could be either product specific or vendor specific.  As a consumer, I may want different feature functionality from my products, I may want my network routing from one vendor, my firewalls from another, and my WAN accelerators from another.  Or, as a consumer, I may want it all from one vendor so that I have one company rep to complain to and I’m not stuck between 2 vendors pointing their fingers at one another.

With best of breed, the responsibility is on the purchaser to make it all work.  Unless the purchaser is buying from a vendor (VAR, System Integrator, etc.) who is going to make it all interoperate (and that still doesn’t address the long term maintenance), the purchaser lacks that single contact – that “one throat to choke”.

20130803-214844.jpg

Companies know this, which is why larger companies will want to partner with a particular product vendor or OEM (brand the equipment as their own) or actually provide competing products.

I mentioned Symantec earlier.  Symantec for years offered both BackupExec and NetBackup.  Why would they do that?  Some companies would need the full enterprise features of NetBackup and would accept the complexity.  Others want a simpler use and not need the enterprise features, so they could go with BackupExec.

To be clear, these are not the same product with one with fewer licenses & features.  These are two separate products which would require two different command sets.  I don’t mean to pick on Symantec as many other companies do this as well – it was just that they came to mind first.

When companies purchase SAN storage arrays, if they use Brocade for their networking, they would purchase the Brocade FibreChannel switches from their array vendor.  Brocade sees no benefit in just selling switches, they would rather use the storage array vendors as their channel.  Meanwhile, the array vendors front end the selling of the switches and if the purchaser has an issue whether with the array or the storage, they can talk directly with the array vendor.   The “one throat to choke”.

Vendor relations can be very painful when one considers all the elements – sales, support contract maintenance, tech support, and upgrade planning.  That doesn’t sound too painful for one vendor, but when you consider that there are multiple hardware vendors and multiple software vendors – it can be very time consuming.

The caveat here is would you rather do less vendor relations or have a better mix of products?

Considering the mix of products, you do have to consider the interoperability and compatibility issues.

Each vendor is going to have their own strengths and weaknesses.

Is the premium Steakhouse going to do the same fantastic job on fish that they do on steak – when they sell 10x the number of steaks?  Or vice versa for the fish restaurant with steaks?

Companies that purchase other companies to fill or supplement product portfolio holes will inevitably have interoperability issues with their other products.  The benefit is that they are responsible and you aren’t.  The drawback is that they might not be completely interoperable.

 

Jim

@itbycrayon
<a href=”http://www.linkedin.com/pub/jim-surlow/7/913/b80″&gt;

<img src=”http://www.linkedin.com/img/webpromo/btn_myprofile_160x33.png&#8221; width=”160″ height=”33″ border=”0″ alt=”View Jim Surlow’s profile on LinkedIn”>

</a>