With a centralized storage array, there can be front-side limitations (outside of the array to the host or client) and back-side limitations (the actual disk in the storage array).
The problem that occurs is that from the storage array point of view, the workloads at any given moment in time are random and from the array the details of the workloads are invisible. So, to alleviate load on the array has to be determined from the client side not the storage side.
Take for example a VMware environment with NFS storage on a NetApp array:
Each ESX host has some number of VMs and each ESX host is mounting the same export from the NetApp array.
Let IA =
The Storage Array’s front side IOPS load.
Let hn(t) =
The IOPS generated from a particular host at time t and n = number of ESX hosts.
The array’s front side IOPS load at time t, equals the sum of IOPS load of each ESX host at time t.
IA(t) = Σ hn(t)
An ESX host’s IOPS load at time t, equals the sum of the IOPS of each VM on the host at time t.
h(t) = Σ VMn(t)
A VM’s IOPS load at time t, equals the sum of the Read IOPS & Write IOPS on that VM at time t.
VM(t) = R(t) + W(t)
The Read IOPS are composed of those well formed Reads and not well formed reads. “Well formed reads” are reads which will not incur a penalty on the back side of the storage array. “Not well formed reads” will generate anywhere between 2 and 4 additional IOs on the back side of the storage array.
Let r1 =
Well formed IOs
Let r2 =
IOs which cause 1 additional IO on the back side of the array.
Let r3 =
IOs which cause 2 additional IOs on the back side of the array.
Let r4 =
IOs which cause 3 additional IOs on the back side of the array.
Let r5 =
IOs which cause 4 additional IOs on the back side of the array.
Then
R(t) = ar1(t) + br2(t) + cr3(t) + dr4(t) + er5(t)
Where a+b+c+d+e = 100% and a>0, b>0, c>0, d>0, e>0
and
W(t) = fw1(t) + gw2(t) + hw3(t) + iw4(t) + jw5(t)
Where f+g+h+i+j = 100% and f>0, g>0, h>0, i>0, j>0
Now for the back side IOPS (and I’m ignoring block size here which would just add a factor into the equation of array block size divided by block size). The difference is to deal with the additional IOs.
R(t) = ar1(t) + 2br2(t) + 3cr3(t) + 4dr4(t) + 5er5(t)
and
W(t) = fw1(t) + 2gw2(t) + 3hw3(t) + 4iw4(t) + 5jw5(t)
Since the array cannot predetermine the values for a-i, it cannot determine the effects of an additional amount of IO. Likewise it cannot determine if the host(s) are going to be sending sequential or random IO. It will trend toward the random given n number of machines concurrently writing and the likelihood of n-1 systems being quite while 1 is sending sequential is low.
Visibility into the host side behaviors from the host side is required.
Jim – 10/01/14
(I don’t accept general LinkedIn invites – but if you say you read my blog, it will change my mind)