Notes from #NTAPInsight 2014

Green Ball

After a partial week at NetApp 2014 Insight US, here are my thoughts:
(full disclosure:  I was a presenter of one session at the conference)
  1. Keynote thought
  2. OnTap 8.3 announcement
  3. Hybrid Cloud
    1. Data is state-ful, unlike (cloud) computing
  4. Data locality
  5. Different UNIX variants – Different Cloud
  6. Laundry services similar to cloud computing (Jay Kidd / NA CTO)
Tom Mendoza (NetApp Vice Chairman) was fantastic in his keynote.  He focused on culture and wanting to build a culture of trust & candor.  CIOs understand every company is going to have issues, the question will be does the CIO of the customer trust the vendor to be there when there is a problem.
Lots of talk about OnTap 8.3 – though the fact that it is RC1 and not GA is disappointing.   Didn’t hear anyone reference that the 8.3 is a Release Candidate.  8.3 provides full feature parity with 7-mode.  There was little discussion about 7-mode, except for how to move off 7-mode (7-mode transition tool).  7-mode transition still appears to be a large effort.  For, 7MTT, the key term is “tool”.
The key focus in the keynotes was “Hybrid Cloud”.  One of the key takeaways is the need for data locality.  The data is ‘state-ful’ as opposed to cloud computing which is ‘stateless’ — in the sense that the resource need can be metered, but data is not.  So, when moving from on-prem to cloud, data would have to be replicated completely between 2.   Or more so, if you are working between clouds, or maybe between clouds in different countries, the full data set has to be replicated.  The concern is that government entities (Snowden effect) will require data to be housed in respective countries.  This now becomes the digital equivalent of import/export laws and regulations.
With the notion of different clouds, it reminds me of all the different UNIX variants.  We had Solaris boxes and we had HP-UX boxes and we had DEC boxes and we struggled moving data between.  Some were big endian, some little endian.  So, binaries were incompatible.
Finally and irreverently during Jay Kidd’s (NetApp CTO) presentation, my mind wandered when thinking about cloud computing analogies.  Never noticed before how metered cloud computing is so much like washing machines at the laundry mat – pay per use.

 

Jim – 10/30/14 @itbycrayon View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

Advertisements

Problem calculating workloads on Storage, in this case NetApp

Double Black Diamond

With a centralized storage array, there can be front-side limitations (outside of the array to the host or client) and back-side limitations (the actual disk in the storage array).

The problem that occurs is that from the storage array point of view, the workloads at any given moment in time are random and from the array the details of the workloads are invisible.  So, to alleviate load on the array has to be determined from the client side not the storage side.

Take for example a VMware environment with NFS storage on a NetApp array:image

Each ESX host has some number of VMs and each ESX host is mounting the same export from the NetApp array.

 

Let IA = The Storage Array’s front side IOPS load.
Let hn(t) = The IOPS generated from a particular host at time t and n = number of ESX hosts.

 

The array’s front side IOPS load at time t, equals the sum of IOPS load of each ESX host at time t.

IA(t) = Σ hn(t)

 

An ESX host’s IOPS load at time t, equals the sum of the IOPS of each VM on the host at time t.

h(t) = Σ VMn(t)

 

A VM’s IOPS load at time t, equals the sum of the Read IOPS & Write IOPS on that VM at time t.

VM(t) = R(t) + W(t)

 

The Read IOPS are composed of those well formed Reads and not well formed reads.  “Well formed reads” are reads which will not incur a penalty on the back side of the storage array.  “Not well formed reads” will generate anywhere between 2 and 4 additional IOs on the back side of the storage array.

Let r1 = Well formed IOs

Let r2 = IOs which cause 1 additional IO on the back side of the array.

Let r3 = IOs which cause 2 additional IOs on the back side of the array.

Let r4 = IOs which cause 3 additional IOs on the back side of the array.

Let r5 = IOs which cause 4 additional IOs on the back side of the array.

Then

R(t) = ar1(t) + br2(t) + cr3(t) + dr4(t) + er5(t)

Where a+b+c+d+e = 100% and a>0, b>0, c>0, d>0, e>0

and

W(t) = fw1(t) + gw2(t) + hw3(t) + iw4(t) + jw5(t)

Where f+g+h+i+j = 100% and f>0, g>0, h>0, i>0, j>0

Now for the back side IOPS (and I’m ignoring block size here which would just add a factor into the equation of array block size divided by block size).  The difference is to deal with the additional IOs.

R(t) = ar1(t) + 2br2(t) + 3cr3(t) + 4dr4(t) + 5er5(t)

and

W(t) = fw1(t) + 2gw2(t) + 3hw3(t) + 4iw4(t) + 5jw5(t)

Since the array cannot predetermine the values for a-i, it cannot determine the effects of an additional amount of IO.  Likewise it cannot determine if the host(s) are going to be sending sequential or random IO.  It will trend toward the random given n number of machines concurrently writing and the likelihood of n-1 systems being quite while 1 is sending sequential is low.

Visibility into the host side behaviors from the host side is required.

 

Jim – 10/01/14

@itbycrayon

View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

NetApp cDOT ssh key config via CLI

Double Black DiamondI had posted prior on how to configure SSH keys on 7-mode.  I’ve been remiss on getting the SSH keys for cDOT (NetApp’s clustered Data OnTap).

Before I get to the steps, let me list the assumptions:

  1. The steps below will be for a non-root user
  2. Root/Administrator privs are available to the user who is setting this up.
  3. The SSH key for the non-root user has already been generated on the client system.
  4. The SSH key can be done with a copy/paste from something reading the file (e.g. xterm or notepad) into a shell window with the CLI login into the filer (e.g. xterm or puTTY)

The methodology is fairly simple (provided one has the admin privs):

  1. Login into filer via CLI with appropriate privileges.
  2. # go to the security/login section
    • login
  3. # allow for ssh for the user
    • create -username <username> -application ssh -authmethod publickey
  4. # enter the public key
    • create -username <username> -publickey "ssh-rsa <public-key> <username>@<ssh client hostname>"

Jim – 09/29/14

@itbycrayon

View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

Shellshock / Bashbug quick check

Black Diamond

Given the latest news on the Shellshock aka Bashbug vulnerability, I modified a public command line check.
Backstory:  Unix systems (includes Linux & the Mac OS, OSX) have shells for their command line windows.  Bash is common.  A vulnerability was found and this has fairly large implications.   More detail is available online:

My modification to the command line script is:

Jim – 09/26/14 @itbycrayon View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

Lack of Tech Workforce Diversity in Silicon Valley – my $0.02

Green Ball

Earlier today, on a Wall St. Journal tech blog stats were published showing that a large majority of workers at well known Silicon Valley tech companies are white or asian.  This follows some news of the last several weeks where tech companies are acknowledging this.
The question is:  Is this a problem?
And the next:  If so, can it be solved?
And lastly:  If so, what is the one solution or what are the multiple solutions to the problem?
I’d argue that it is a problem.  The world is in a knowledge economy and the more Americans that can participate in the knowledge economy, the better for America.  The lack of diversity reflects a lack of participation in the field and thus portions of the country not participating in the economy, as full as possible.
Yes, there is extrapolation going on here – large companies predominantly housed in Silicon Valley is being used as a proxy for all tech, and tech being a proxy for the best portions of the economy in the nation.
But, when they say that small companies grow the economy, it isn’t someone selling stamps or vitamins, it is companies that have venture capital like the beginnings of Facebook and such.
Tech companies start with some tech guys with an idea.  They borrow.  Then they go for venture capital.  Venture Capitalists want to ensure that the plan is sound and/or that they have some proven leadership.  The companies try to staff up with the best staff they can.
Meanwhile, the tech companies are in fierce competition for talent (except when they collude to keep wages down). So, tech companies in Silicon Valley have glorious headquarters and are willing to shuttle staff down from San Francisco.
So, when selecting candidates from college, what would tech companies look for?  Graduates with STEM degrees, of course.  And what does that diversity look like?  According to this site, , in 2011 75% of grads in Comp Sci were White or Asian.
In addition, those who start college pursuing STEM degrees, under represented minorities are less successful in completing those programs than others.  And this can be tied to how they perform in high school.  Minorities are known not to perform as well. In 2013 it was said, “This year only 15 percent of blacks and 23 percent of Latinos met or exceeded the SAT benchmark for college and career readiness.”
So, this does not really seem to be a problem with the tech companies.  You don’t hear how NFL teams aren’t recruiting enough from the Ivy League.  Going back to the question:  Is this a problem?  Yes.  More specifically, is it the tech companies’ problem?  No.
Can the problem of minority participation in tech be solved?  Maybe.  It needs to be done in earlier years.  In high school and earlier, logic and cause & effect, need to be taught.  Taking on the subject of the problem with public schools is beyond this blog, but the point is that the diversity in tech outcomes are results of issues long before it gets to employers.

Off soapbox,

Jim – 06/19/14 @itbycrayon View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

IT Operational Excellence: Lone Ranger to NFL to CSI or is it marching band

Blue Square

In 1993, Frederik Wiersema, et. al, wrote in Harvard Business Review, their piece on Customer Intimacy, Operational Excellence, and Product Leadership.   IT Operations departments commonly focus on Operational Excellence.  And Change Management tends to be a common thread to deal with avoiding operational issues that arise during maintenance windows.

My intention was to quote statistics on human error during maintenance windows, but I found that the statistics being too specific to disciplines (e.g. telephony, data center).  So, trust me when I say that it is easy to envision that managers would prefer that there be less human error than average during maintenance windows or other types of change.  Certainly, downtime would wish to be avoided.  Microsoft did a good job explaining types of downtime.

I used to hear stories of C-level execs saying after an outage, “We have the navy training cadets to operate nuclear submarines, so why can’t we get IT professionals not to cause outages?”

Let me start with how bureaucracies are formed.  Organizational maturity requires different skill sets.  Until there is enough organizational size, there is unique knowledge and thus the Lone Rangers emerge (forgive the oxymoron of a plural Lone Rangers).

Starting off, there needs to be an expert, Lone Ranger, who still might be a jack-of-all-trades.  “Hey, we need someone to do <blank>”.  At this point, there isn’t much operational rigor as that organization probably is not too sophisticated.  It is possible that the person who is responsible doesn’t even write anything down, they just execute when need be.  They evaluate risk, evaluate the solution, and decide.

Next, another person is added to the responsibility of the technology.  At this point, coordination may just be yelling over the cubicle wall – “Hey, I’m going to change this.”

As more people are added, the change management becomes a bit more sophisticated, as multiple people need to be notified.

Then the enterprise becomes more complex with more users, more dependencies, and/or more interactions.  So, change control now comes into place.  The Lone Ranger mentality no longer works.  “Is risk assessed properly?”  “Who is responsible and is that up to their pay grade?”

Enter the CSI Lab Technician.

It could be after the environment has grown, or it could be after the organization as entered a new audit scope that significant operational rigor is added.  When a company falls under audit scope, for instance Sarbanes Oxley (SoX) or Payment Card Industry (PCI) or Health Information Portability and Accountability Act (HIPAA) then more rigor must be applied.  Another body (usually the auditor) is trying to ensure that all the requirements are being performed to a certain standard.

In “CSI: Crime Scene Investigations”, one sees the scientists in the lab analyzing trace evidence and they are usually under some pressure to analyze the sample because it is from the suspect in the interrogation room that they’ve been chasing all day.  Well, in real life, I doubt that the lab techs know the names of whom they are sampling – because they need to maintain neutrality and not be biased.  Because bias tends to get things thrown out in court, because there are legal standards.  Also, for legal scrutiny, there are standard procedures for handling evidence.  For the chemist, there are standard procedures on how samples are placed under the microscope, so that they aren’t dropped or contaminated.

I worked with a former chemist who transferred into IT.  I’d want him to switch between Excel and Word.  Rather than have them up simultaneously and task switch between them, he would go through the same routine:  File/Save.  File/Close.  File/Exit.  Then open the next program.  I could accept his concerns for RAM shortage given his vintage of hardware – but I struggled to be patient.  “You could just click the ‘x’ and it’ll prompt you to save, then it will close it out”.  “Yes, but I feel more comfortable doing it this way.”  An adherence to procedure, provided comfort.

Prior to this, I mentored two student workers.  One was a Computer Science major, the other a Biology major.  They were both very good.  I was always entertained with handing them the same hard problem to solve.  The computer science major was very intuitive in his problem solving — randomly trying different solutions based upon hunches and feel.  The biology major would attack problems very sequentially – trying the most frequent solution to similar problems first, then the next, and so on.

In my experience, computer programmers and engineers are much more geared to their careers because of the problem solving aspect of the jobs.  What has made them successful through college and early part of their careers has been the Lone Ranger aspect:  Identify the problem quickly and solve the problem.  But, now with rigorous change control, the organization is looking for methodical, repeatable, standardized solutions.  There ends up being an incongruity between the personality of the normal IT worker and the job to be performed.

In The leadership pipeline: how to build the leadership powered company – Ram Charon, Steve Drotter, and Jim Noel discuss that when individuals move from leadership tier to leadership tier (individual contributor to manager to director then higher) that the person needs to utilize different skills at each tier — and not use the skills that helped them succeed at the last one.   In a similar vein, I posit that when significant changes come to an operating environment, IT workers and IT teams need to modify their skill sets to be provide Operational Excellence.

When such changes are mandated, of course, it is important that teams be supplied with the resources necessary to be successful whether that be training or equipment.  And managers need to identify that the responsibilities have changed and communicate that to their staff accordingly.

Enter the football game

When one watches the NFL, it seems that even though these professionals who are paid 6 or 7 or 8 figures a year, you will still see dumb penalties.  These players have probably played football since Pop Warner as a youth, yet you still see the occasional 12 men on the field penalties by the defense prior to a field goal attempt.  How hard is it to get the right personnel on the field?  Or how hard is it for the offensive line not to false start – they know the signal for the ball snap.  So, there are still mental errors by professionals that occur.  [I drafted this before the last AFC leading Broncos football game where they were caught with 12 men on the field 3 times!  Once they avoided the penalty by calling a timeout before getting penalized.]

An NFL football game has changes on every play:  Different formations, different routes, and different yardage goals.  And during the snap count, maybe the quarterback changes the play because he doesn’t like the defense that he sees.  When things go bad after the snap, receivers may have to break off routes.  Lots of change – every single play.  And it doesn’t always go right.

Alternatively, there are the halftime routines.  For high school & college, there are the marching bands.  Everyone has their own place and may have unique music.  Zero improvising is required, as all of this is planned out ahead of time.  See this video for an example of the coordination required:  http://www.youtube.com/watch?v=DNe0ZUD19EE

Both the football game and the halftime routines require much practice.  The difference is where is improvising required?  The trick for the Operational Excellence in IT, is to ensure that maintenance windows have more rehearsal and less improvising and that there is time to practice.  That rehearsal and discipline may be contrary to methodologies of some IT workers.

I also recognize that discipline to rehearse and to duplicate environments is easier said than done – lab environments struggle to perfectly match production and simulated workloads are difficult to match as well, and testing time is also difficult.  However, those organizations that strive to drive human error out of their maintenance events decide it is better to spend on the resources ahead of time, as opposed to reacting after the fact and spending potentially just as many resources post mortem.

Jim – 12/16/13

@itbycrayon

View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)

NetApp 7-mode ssh key config via CLI w/o NFS or CIFS

Double Black DiamondConfiguring NetApp to use SSH with keys without having the root volume holding /etc NFS exported or CIFS shared can be convoluted.

Before I get to the steps, let me list the assumptions:

  1. The steps below will be for a non-root user
  2. Root/Administrator privs are available to the user who is setting this up.
  3. The SSH key for the non-root user has already been generated on the client system.
  4. The SSH key can be done with a copy/paste from something reading the file (e.g. xterm or notepad) into a shell window with the CLI login into the filer (e.g. xterm or puTTY)

Basically, the trick is to setup the empty user directories since there isn’t a command to create directories.  Obviously, with NFS or CIFS, the directory can be made fairly easily.

  1. Login into filer via CLI with appropriate privileges.
  2. # go into advanced mode
    • priv set advanced
  3. # find an empty directory using ls – in some cases, /home/http may be empty.
    • ls /home/http
  4. # check ndmpd status
    • ndmpd status
  5. # if ndmp is not on, turn it on.
    • ndmpd on
  6. # When using ndmpcopy, the shortcut of dropping /vol/<root volume> does not work for the destination
    • ndmpcopy /home/http /vol/<root volume>/etc/sshd/<username>
      ndmpcopy /home/http /vol/<root volume>/etc/sshd/<username>/.ssh
  7. # Create the text file with wrfile and cut and Paste key(s) from your other window, and then ctrl-c
    • wrfile /vol/<root volume>/etc/sshd/<username>/.ssh/authorized_keys
  8. # if ndmpd was off, turn it off.
    • ndmpd off
  9. # ndmpd creates a restore_symboltable file.  For cleanliness, need to remove that.
    • rm /vol/<root volume>/etc/sshd/<username>/restore_symboltable
    • rm /vol/<root volume>/etc/sshd/<username>/.ssh/restore_symboltable

Short Cut (if a user has already been setup then their ssh keys and directory structure could be copied which saves some steps).
Warning: Technically, the permissions (unix or Windows ACLs) are going to follow with the ndmpcopy, so there is a security risk here, if /etc is NFS mounted or CIFS shared. Keep that in mind.

  1. # check ndmpd status
    • ndmpd status
  2. # if ndmp is not on, turn it on.
    • ndmpd on
  3. # When using ndmpcopy, the shortcut of dropping /vol/<root volume> does not work for the destination
    • ndmpcopy /vol/<root volume>/etc/sshd/<exist user with ssh keys>/vol/<root volume>/etc/sshd/<new ssh user>
  4. # Create the text file with wrfile and cut and Paste key(s) from your other window, and then ctrl-c
    • wrfile /vol/<root volume>/etc/sshd/<new ssh username>/.ssh/authorized_keys
  5. # if ndmpd was off, turn it off.
    • ndmpd off
  6. # ndmpd creates a restore_symboltable file.  For cleanliness, need to remove that.
    • rm /vol/<root volume>/etc/sshd/<new ssh username>/restore_symboltable

Jim – 11/18/13

@itbycrayon

View Jim Surlow's profile on LinkedIn (I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)