With HPE Simplivity, each VM has storage mirrored on another specific host in the cluster. As an example, in a 3 host VMware cluster, a VM on ESXi host #1 will have a mirrored copy of the storage on ESXi host #2, but no other data on ESXi host #3. Unlike other HyperConverged Infrastructure products in the marketplace, Simplivity mirrors the VM as opposed to striping the VM across other nodes in the cluster.
This has its advantages and disadvantages.
One result is that each node in the cluster ends up with different storage consumption, as the blend of VMs on a host vary. In the following example:
- ESXi Host #1 could have VMs: a, d, e, f
- ESXi Host #2 could have VMs: a, b, c, e
- ESXi Host #3 could have VMs: b, c, d, f
As a result, at any given time, 2 nodes will be closer to full than the remainder. “Closer” could be significant or insignificant, but the point is relevent to the script that I’ll include below.
The Simplivity storage environment is controlled by what are call Omnistack Virtual Controllers (“OVCs” for short). Each ESXi host has a dedicated OVC, but the OVCs in a VMware cluster work together.
(I don’t know the Simplivity interface from Hyper-V, so what follows is VMware specific.) From vSphere, it is cumbersome to determine where Simplivity stores copies of a VM. Clearly, if ESXi Host 1 is nearing full, one will see a VMware alarm and one could go to the host and see the VMs with a primary copy of storage on that host – but what about a VM with a secondary (redundant mirrored) copy? There would have to be some deciphering of the DRS rules.
Simplivity offers a command line tool through the OVC which will list all the VMs in the cluster and where the primary and secondary copies are stored and the VM size: dsv-balance-manual
The drawbacks of this tool are: first, one can only see the size of the VM, one cannot see the size of the VM AND all its associated backups. Secondly, it does not report any remote backups copied from another cluster to the hosts, if any.
When storage runs tight, removing backups is the most likely path forward. Having a total size for a VM which would include its associated backups would be very helpful – but with de-duplication and compression across VMs on the host while backups expiring at different intervals this becomes very difficult to calculate and would only be valid in real time.
The first step to gaining space would be to search the backups on the cluster to determine if there are any backups which lack an expiration date, and remove as necessary.
The second step might be to identify which VMs are shared on the 2 most full hosts.
It is fairly easy to eyeball the output to dsv-balance-manual
, but when one runs it often and if there are many VMs, human error can kick in. I wrote the following CLI command pipeline to do this:
node=(`sudo /var/tmp/build/dsv/dsv-balance-show --ShowNodeIndex | sed 's/\(.B \)/ \1/;s/\.\([0-9][0-9]\) TB/\10 GB/' | awk '/^\| Node [0-9]/ {print $3,$15}' | sort -nr +1 | awk 'BEGIN {z=10^6}{b=a;a=$1;if ($1 < z) {z=$1}} END {print a-z+1,b-z+1}'`) ; cl=(`svt-federation-show | awk -F"\| " '/Alive/ {if ($3 ~ /^[a-zA-Z]/) {x=$3};{a=x} ; if ($4 ~ /^[a-zA-Z]/) {y=$4};{b=y} ; print a,b,$9}' | grep \`ifconfig eth0 | awk '/inet/ {print $2}'\``) ; sudo /var/tmp/build/dsv/dsv-balance-manual --datacenter ${cl[0]} --cluster ${cl[1]} > /dev/null ; awk -F, -v n=${node[0]} -v m=${node[1]} '/\]/ {offset=2 ; if (($(n+offset) ~ /s|p/) && ($(m+offset) ~ /s|p/)) print $(NF-2),$(NF-1)}' /tmp/balance/replica_distribution_file_${cl[0]}.csv | sort -n
Had I known it would be so long, I probably would have written a script, but then the script would have to be pushed to all OVCs on all the hosts and with the Simplivity upgrade procedure, the OVCs would be wiped out and the scripts re-written.
Documentation:
This needs to be run on an OVC in the cluster that is short on space (i.e. won’t work across clusters).
Create the node array – 2 entries with the 2 nodes in the cluster with the least available space.
node=(`
Run Simplivity command as root to determine which nodes lack space, this will include space remaining (as opposed to consumed).
sudo /var/tmp/build/dsv/dsv-balance-show --ShowNodeIndex
Add a space between digits of storage and label of storage, and convert TB to GB by removing the decimal point and adding a zero
| sed 's/\(.B \)/ \1/;s/\.\([0-9][0-9]\) TB/\10 GB/'
Find the lines with only details about the nodes (throw out the headers) and only print the IP address and the storage consumed (the label above is now discarded).
| awk '/^\| Node [0-9]/ {print $3,$15}'
Sort numerically by storage consumed in descending order.
| sort -nr +1
Print the last 2 entries and reset the node number so that it counts from 1 – the output from the earlier Simplivity command depending on retired equipment, might not start from 1. (Unsure how this behaves if run on a cluster with less than 1 host – but one would not need to run this script if there was only 1 host).
| awk 'BEGIN {z=10^6}{b=a;a=$1;if ($1 < z) {z=$1}} END {print a-z+1,b-z+1}'
This completes the array. The contents of the array are 2 numbers reflecting the nodes which have the least space available.
`);
Set the datacenter and cluster variables. This is a lot of code to include what is already known, but will reduce human error of misspellings. Set the “cl” array (cluster).
cl=(
`
Run the Simplivity command to show all the nodes in the federation.
svt-federation-show
Find only the lines that include nodes. Given the output from above, if the datacenter field (#3) is empty, print what was in the line before and if the cluster field (#4) is empty, print what was in the line before. Finally, only print datacenter, cluster, and management IP.
| awk -F"\| " '/Alive/ {if ($3 ~ /^[a-zA-Z]/) {x=$3};{a=x} ; if ($4 ~ /^[a-zA-Z]/)
{y=$4};{b=y} ; print a,b,$9}'
Search the output of the above, with output of what follows.
| grep \`
Determine the IP of the management IP of this OVC.
ifconfig eth0 | awk '/inet/ {print $2}'\`
Finalize the array with datacenter, cluster, and IP (the latter won’t be used).
`) ;
Run the Simplivity command to list the VMs and add the datacenter and cluster information so that it can run unattended, dump the output to /dev/null, as an output file will be left behind.
sudo /var/tmp/build/dsv/dsv-balance-manual –datacenter ${cl[0]} –cluster ${cl[1]} > /dev/null ;
sudo /var/tmp/build/dsv/dsv-balance-manual --datacenter ${cl[0]} --cluster ${cl[1]} > /dev/null ;
Parse the output file: /tmp/balance/replica_distribution_file_<cluster name>.csv Use the 2 variables, m & n, to represent the 2 nodes to search for. The CSV is offset by 2 other data points before the node data is included and a “p” or “s” for primary or secondary copy. The number of nodes will determine the number of columns. The 2nd to last column is the VM name shown by $(NF-1). The 3rd to last column is the VM size.
awk -F, -v n=${node[0]} -v m=${node[1]} '/\]/ {offset=2 ; if (($(n+offset) ~ /s|p/) && ($(m+offset) ~ /s|p/)) print $(NF-2),$(NF-1)}' /tmp/balance/replica_distribution_file_${cl[0]}.csv
Then sort the output numerically in ascending fashion, so the largest VMs are at the bottom.
| sort -n
Given all this, it could be scripted to look cleaner and could be made more tidy.