NetApp cDOT ssh key config via CLI

Double Black DiamondI had posted prior on how to configure SSH keys on 7-mode.  I’ve been remiss on getting the SSH keys for cDOT (NetApp’s clustered Data OnTap).

Before I get to the steps, let me list the assumptions:

  1. The steps below will be for a non-root user
  2. Root/Administrator privs are available to the user who is setting this up.
  3. The SSH key for the non-root user has already been generated on the client system.
  4. The SSH key can be done with a copy/paste from something reading the file (e.g. xterm or notepad) into a shell window with the CLI login into the filer (e.g. xterm or puTTY)

The methodology is fairly simple (provided one has the admin privs):

  1. Login into filer via CLI with appropriate privileges.
  2. # go to the security/login section
    • login
  3. # allow for ssh for the user
    • create -username <username> -application ssh -authmethod publickey
  4. # enter the public key
    • create -username <username> -publickey "ssh-rsa <public-key> <username>@<ssh client hostname>"

Jim – 09/29/14

@itbycrayon

View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

Advertisements

Shellshock / Bashbug quick check

Black Diamond

Given the latest news on the Shellshock aka Bashbug vulnerability, I modified a public command line check.
Backstory:  Unix systems (includes Linux & the Mac OS, OSX) have shells for their command line windows.  Bash is common.  A vulnerability was found and this has fairly large implications.   More detail is available online:

My modification to the command line script is:

Jim – 09/26/14 @itbycrayon View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

NetApp 7-mode ssh key config via CLI w/o NFS or CIFS

Double Black DiamondConfiguring NetApp to use SSH with keys without having the root volume holding /etc NFS exported or CIFS shared can be convoluted.

Before I get to the steps, let me list the assumptions:

  1. The steps below will be for a non-root user
  2. Root/Administrator privs are available to the user who is setting this up.
  3. The SSH key for the non-root user has already been generated on the client system.
  4. The SSH key can be done with a copy/paste from something reading the file (e.g. xterm or notepad) into a shell window with the CLI login into the filer (e.g. xterm or puTTY)

Basically, the trick is to setup the empty user directories since there isn’t a command to create directories.  Obviously, with NFS or CIFS, the directory can be made fairly easily.

  1. Login into filer via CLI with appropriate privileges.
  2. # go into advanced mode
    • priv set advanced
  3. # find an empty directory using ls – in some cases, /home/http may be empty.
    • ls /home/http
  4. # check ndmpd status
    • ndmpd status
  5. # if ndmp is not on, turn it on.
    • ndmpd on
  6. # When using ndmpcopy, the shortcut of dropping /vol/<root volume> does not work for the destination
    • ndmpcopy /home/http /vol/<root volume>/etc/sshd/<username>
      ndmpcopy /home/http /vol/<root volume>/etc/sshd/<username>/.ssh
  7. # Create the text file with wrfile and cut and Paste key(s) from your other window, and then ctrl-c
    • wrfile /vol/<root volume>/etc/sshd/<username>/.ssh/authorized_keys
  8. # if ndmpd was off, turn it off.
    • ndmpd off
  9. # ndmpd creates a restore_symboltable file.  For cleanliness, need to remove that.
    • rm /vol/<root volume>/etc/sshd/<username>/restore_symboltable
    • rm /vol/<root volume>/etc/sshd/<username>/.ssh/restore_symboltable

Short Cut (if a user has already been setup then their ssh keys and directory structure could be copied which saves some steps).
Warning: Technically, the permissions (unix or Windows ACLs) are going to follow with the ndmpcopy, so there is a security risk here, if /etc is NFS mounted or CIFS shared. Keep that in mind.

  1. # check ndmpd status
    • ndmpd status
  2. # if ndmp is not on, turn it on.
    • ndmpd on
  3. # When using ndmpcopy, the shortcut of dropping /vol/<root volume> does not work for the destination
    • ndmpcopy /vol/<root volume>/etc/sshd/<exist user with ssh keys>/vol/<root volume>/etc/sshd/<new ssh user>
  4. # Create the text file with wrfile and cut and Paste key(s) from your other window, and then ctrl-c
    • wrfile /vol/<root volume>/etc/sshd/<new ssh username>/.ssh/authorized_keys
  5. # if ndmpd was off, turn it off.
    • ndmpd off
  6. # ndmpd creates a restore_symboltable file.  For cleanliness, need to remove that.
    • rm /vol/<root volume>/etc/sshd/<new ssh username>/restore_symboltable

Jim – 11/18/13

@itbycrayon

View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

How to deal with exponential growth rates? And how does this relate to cloud computing?

Double Black DiamondWhat happens when demand exceed the resources? Ah, raise prices. But, sometimes that is a not available as a solution. And sometimes demand spikes far more than expected.

Example: Back in the early 2000s, NetFlix allowed renters to have 3 DVDs at a time, but some customers churned those 3 DVDs more frequently than average and more frequently than Netflix expected. So, they throttled those customers and put them at the back the line. (dug up this reference). This also appears to have happened in their streaming business.

Another example: Your web site gets linked on a site that generates a ton of traffic (I should be so lucky). This piece says that the Drudge Report sent 30-50,000 hits per hour bringing down the US Senate’s web site. At 36,000, that is an average of 10 per second.

Network Bandwidth tends to be the resource. Another example from AT&T: As a service provider, this piece says that 2% of their customers consume 20% of their network.

There are non-technical examples as well. The all-you-can-eat buffet is one. Some customers will consume significantly more than the average. (Unfortunately, I can’t find a youtube link to a commercial that VISA ran during the Olympics in the 80s or 90s where a sumo wrestler walks into a buffet – if you can find it for me, please reply).

Insurance customers deal with this as well. They try to spread out the risk so that if an event were to occur (e.g. a hurricane), they don’t want all their customers in a single area. Economists call this “adverse selection”. “How do we diversify the risk so that those that file claims, aren’t the only ones paying in?”

How does this deal with computing? Well, quotas are an example. I used to run systems with home directory quotas. If I had 100GB, but 1000 users, I couldn’t divide this up evenly. I had about 500 users who didn’t need 1MB, but I had 5 that needed 10GB. For the 500 users that did need more than 1MB, they needed more than an even slice.

So, the disk space had to be “oversubscribed”. I then could have a situation where everyone stayed under quota, but I could still run out of disk space.

Banks do this all the time. They have far less cash on-hand in the bank, than they have deposits. Banks compensate by having insurance through the Fed which should prevent a run on the bank.

In computing, this happens on network bandwidth, disk space, and compute power. At deeper levels, this deals with IO. As CPUs get faster, disks become the bottleneck and not everyone can afford solid state disks to keep up with the IO demand.

The demand in a cloud computing environment would hopefully follow a normal distribution (bell curve) for demand. But, that is not what always occurs. Demand tends to follow an exponential curve.

20131103-211158.jpg

As a result, if the demand cannot be quenched by price increases, then throttling must be implemented to prevent full consumption of the resources. There are many algorithms to choose from when looking at the network, likewise there are algorithms for the compute.

Given cloud architecture which is VM on a host connected to a switch connected to storage which has a disk pool of some sort, there are many places to introduce throttles. In the image below which is uses a VMware & NetApp vFiler environment (could be SVM aka vServer as well) serving, there is VM on ESX host, connected to Ethernet switch, connected to Filer, split between disk aggregate and a vFiler which then pulls from the volume sitting on the aggregate, and then has the file.

20131103-211311.jpg

Throttling at the switch may not do much good. As this would throttle all VMs on an ESX host or if not filtering by IP, all ESX hosts. Throttling at the ESX server layer again, affects multiple VMs. Imagine a single customer on 1 or many VMs. Likewise, filtering at the storage layer, specifically, the vFiler may impact multiple VMs. The logical thing to do for greatest granularity would be to throttle at the VM or vmdk level. Basically, throttle at the end-points. Since a VM could have multiple vmdks, it is probably best to throttle at the VM level. (NetApp Clustered OnTap 8.2 would allow for throttles at the file level). Not to favor NetApp, other vendors (e.g EMC, SolidFire) who are introducing QoS are doing these at the LUN layer (they tend to be block vendors).

For manual throttling, some isolate the workloads to specific equipment – this could be compute, network, or disk. When I used to work at the University of CA, Irvine and we saw the dorms coming online with Ethernet to the rooms, I joked that we should drive their traffic through our slowest routers as we feared they would bury the core network.

The question would be what type of throttle algorithm would be best? Since starving the main consumers to zero throughput is not acceptable, following a network model may be preferred. Something like a weighted fair queueing algorithm may be the most reasonable, though a simple proposition would be to revert back to the quota models for disk space – just set higher thresholds for many which will not eliminate every problem, but a majority. For extra credit (and maybe a headache) read this option which was a network solution to also maximize throughput

Jim – 11/03/13
@itbycrayon

View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, I’ll accept)

 

 

NetApp de-dupe, flexclone, fragmentation, and reallocation

Double Black DiamondSince NetApp has their own filesystem, WAFL, disk layouts and fragmentation behave differently than on traditional filesystems. Before NetApp introduced de-duplication, fragmentation was less common. Real quickly let me deviate a bit and describe some notions of WAFL:

WAFL stands for “Write-Anyway File Layout” The notion being that if inodes could be saved with part of the file, then there wouldn’t need to be a file allocation table or inode table at one part of the volume and then data at another.

Another goal was to minimize disk head seeks – so data through a RAID set would be striped across all the disks in the set at essentially the same sectors. If I have 4 data disks and a parity disk (keeping it simple here), the stripe of data would be 25% on one disk, 25% on the next, and so on, and parity on the last. Assuming I have 100 sectors (for ease of calculation), I could put data on disk one at sectors 0-24 and on the next disk 25-49 and on the next 10-34 and on the next one 95-100 and 0-3, then 88, then 87, then 5-10, etc. Well, that would be rather slow as the head jumps around the disk platter searching for those. NetApp tries to keep all the data in one spot on each disk.

So, data is now assembled in nice stripes and if the writes are large, then a file takes up nice consecutive blocks on the stripe. The writes are in 4k blocks, but with a much larger file, they line up nice and sequential.

From their Technical Report on the subject, they state that the goal is to line up data for nice sequential reads. So, when a read request takes place, a group of blocks can be pre-fetched in anticipation of serving up a larger sequence.

Well, with NetApp FlexClone (virtual clones) and NetApp ASIS (de-duplication), the data gets a bit more jumbled. (Same with writes of non-contiguous blocks). Here’s why:

If I have data that doesn’t live together, holes get created. Speaking of non-contiguous blocks, I’ve worked in areas with Oracle databases over NFS on NetApp. When Oracle does a write of a group of blocks, those may not be logically be grouped (like a text file might be). So, after some of the data is aged off, holes are created as block 1 might be kept, but not block 2 & 3.

With FlexClone, the snapshot is made writeable and then future writes to a volume are tracked separately and may get aged off separately. More clear is the notion of de-duplication. NetApp de-dupe is post-process so it accepts an entire write of a file and then later compares the contents of that file with other files on the system. It will then remove any duplicated blocks.

So, if I write a file and blocks 2, 3, & 4 match another file and blocks 1 & 5 do not, when the de-dupe process sweeps the file, it is going to remove those blocks 2,3, & 4. So, that stripe now has holes and that will impact the pre-fetch operation on the next read.
20130916-221649.jpg

In the past, I only ran the reallocate command when I added disk to the disk aggregate, but now while searching through performance enhancing methods on NetApp, I’ve realized the impact of de-dupe on fragmentation.

Jim – 09/16/13
@itbycrayon

View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

How often do disks fail these days?

Black DiamondGood news & Bad news & more Bad news:   The good news is that there has been a couple of exhaustive studies on hard drive failure rates.  The bad news is the studies are old.   More bad news:  With the growing acceptance of solid state drives, we may not see another study.

Google’s Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz Andre Barroso wrote a great paper, “Failure Trends in a Large Disk Drive Population“, submitted to USENIX FAST conference of ’07.  Reading this paper in today’s context, the takeaways are:

  • The study was on consumer-grade (aka SATA) drives.
  • Operating Temperature is not a good predictor of failure (i.e. it is not conclusive that drives operating in higher temp environments fail more frequently).
  • Disk utilization (wear) is not a good predictor of failure.
  • Failure rates over time graph more check mark or fish hook curve than bathtub.

    Fish hook v. Bathtub curves

    Fish hook v. Bathtub curves

Carnegie Mellon’s Bianca Schroeder & Garth A. Gibson also published a paper, “Disk failures in the real world:
What does an MTTF of 1,000,000 hours mean to you?
” , at USENIX’S FAST 2007 conference.  The takeaways from this paper are that their study was on FC, SCSI, & SATA disks and that in the first 5 years, it is reasonable to expect a 3% failure rate again following a check mark pattern.  After 5 years, the failure rates are, of course, more significant.

Here’s the problem:  In 2007, they are analyzing disks, where some are over 5 years old.  So, is there any expectation that these failure rates are good proxies for our disks in the last 6 years?  (The CMU paper has a list of disks that were sampled, if the reader cares to see). It would be natural to assume that disk drive vendors have improved the durability of their products.  But, maybe the insertion of SAS (Serial Attached SCSI) drives into the technology mix might introduce similar failure rates, being that the technology is new(er).

Another study of note was a University of Illinois study by Weihang Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky submitted during the ’08 FAST conference, “Are Disks the Dominant Contributor for Storage Failures?” which today’s takeaway is that when there is a single disk failure, there is another potential failure sooner than odds would dictate (i.e. that the events can be correlated).

This seems to be in place today as well as the research presented here was a result of cascading failures initially stemming from a disk failure.  So, the question is:  When can we expect another disk failure?

The research being authoritative and vendor-neutral would be helpful if the data were current.    Until then, this data needs to be used as a proxy for predictions – or one can use anecdotal information.  sigh.

Jim

@itbycrayon
<a href=”http://www.linkedin.com/pub/jim-surlow/7/913/b80″&gt;

<img src=”http://www.linkedin.com/img/webpromo/btn_myprofile_160x33.png&#8221; width=”160″ height=”33″ border=”0″ alt=”View Jim Surlow’s profile on LinkedIn”>

</a>

IOPS, Spinning Disk, and Performance – What’s the catch?

Black Diamond
For a quick introduction – IOPS means Input Output (operations) per second.  Every hard drive has certain IO performance.  So, forgive the oversimplification, add additional disks, one gets additional IOPS which means one gets better performance.

Now, generally speaking, I hate IOPS as a performance characteristic.  I hate them, because, IOPS can be read or write and sequential or random and of different IO sizes.  Unless one is trying to tune for a specific application and is dedicating specific disk drives to the application, the measurement breaks down as the description of the assumed utilization lacks accuracy.  For instance, assume that it has random reads & writes, but then the backups kick off and that ends up being a huge sequential read for long durations.

But, I digress.

20130625-211646.jpg Every hard drive has an IOPS rating whether SAS, SATA, or FibreChannel or 7200, 10000 or 15000 RPM.  (see wikipedia for a sample).  When a RAID set is established, drives of the same geometry (speed & size) are put together to stripe the data across the drives. For simplicity sake, lets say one uses a RAID5 set with 6 drives:  that is the capacity of 1 drive is used for error (parity) checking and 5 for data.  And continuing the example, assume that these are 1TB (terabyte) drives with 100 IOPS per drive.  So, one has 5 TB of capacity and 500 IOPS.  [Let’s imagine these are read IOPS and not write, so I don’t have to get into parity calculations, etc. etc.].    If I could add a drive to the RAID set, then I get another TB and another 100 IOPS.  Nice and linear.

20130625-211611.jpg
And, my IOPS per TB are constant.  [Again, to simplify, I’m going to assume that it falls in the same RAID set and so I don’t have to consider more parity drive space].  So, none of this should be earth shaking.

20130625-211636.jpg
The huge implication here is:  To increase performance, add more disks.   The more disks, the more IOPS, everyone’s happy.  However, that assumes that consumption (and more importantly IOPS demand) has not increased.  The graph on the right looks consistent with the graphs that we saw earlier.

20130625-211653.jpgThe problem here is that if one adds disks, which adds capacity, and then that capacity is consumed at the same IO rate as the original disk space, the performance curve looks like the graph on the left.  If I’m consuming 100 IOPS per TB and I have 5 TB, that is 500 IOPS of demand.  So, I add a 1TB disk and now I have 600 IOPS w/5TB of used capacity on 6TB of disk.  So, I can spread that out and yippie, those 5TBs can get 120IOPS.  But, if I also say, “hey, I got another TB of disk space” and then consume it, then I’m back to where I started and am still constrained at 100IOPS/TB.  So, what good is this?

20130625-211659.jpgThe assumption is that one is adding to a heterogenous array i.e. multi-purpose (maybe multi-user or multi-system).  So, by being multi-purpose, the usage curve should hopefully become more normalized.  If the usage is more homogenous, e.g. everyone who needs fast performance, so we move them from the slow array to the fast array – well that just means that the fast users are competing with other fast users.

Just like on the NASCAR track for time trials, if I have one race car start and then send another race car when the 1st is half way around the track, I’m probably not going to have contention.  If one customer wants high performance in the evening and the other in the business day, I probably have no contention.

However, on race day after the start, all the cars are congested and some can’t go as fast they want because someone is slow in front of them – gee, and we moved them off the freeway onto the race track for just this reason.   Well, on the storage array, this is like everyone running end-of-the-month reports, well, at the end-of-the month.

I need another analogy for the heterogenous use.  Imagine a road that one guy uses daily, but his neighbor only uses it monthly.  However, the neighbor still needs use of a road, so he pays for the consumption as well.  Overall, there may not be conflict for the road resource – as opposed to, if both used it daily.

So, yes, overall – adding disks does add performance capacity.  And without knowing usage characteristics, the generality of adding disks still holds.  Why?  Because no one complains that the disks are going too fast, they only complain when it is too slow.  There is still the mindset that one buys disk for capacity and not for performance.  And then once performance is an issue, the complaints start.  So, adding disks, to a random workload means that the bell curve should get smoother over all.  This won’t end all the headaches, but should minimize them by minimizing the number of potential conflicts.

Let me know what you think
Jim
@itbycrayon