After having two events related to HCI system I have spend quite some time explaining how to determine design factors and decision points if someone is evaluating hyperconverged as a solution. That list of course will never be finalized as every case is different, however if these points are taken as base, the chosen solution will deliver the the proper performance and features it needs.
Know you VM workload in detail (IOPS,RAM,CPU,capacity)
Know your workload. Ok this is true – or should be – in the case of siloed systems too, but quite basic, if we know the performance requirement of the workload so we can design a system which can properly host it. Be aware this is not just a simple report that can be extracted from RVtools, so it should be a little more detailed as total RAM usage and vCPU count is good information but lack of real depth. Virtual machine load characteristics to be described in every way, like the working set size, read/write ratio, random or sequential I/O it generates, block size, if it is CPU intensive or not.
This is very important in the case of HCI as it is a complex system, which has levels of service and these services are more-or-less dependent on the previous layer. So if we don’t put in enough RAM, or we wrong class of SSD is choosen in cache layer, we don’t add enough capacity in the storage layer or just simply not powerful enough CPU was configured for the server, it will suffer in many ways. Resolving these issues is not an easy task, since homogeneous systems are preferred throughout the members, furthermore a single change in SSD for example will render the required procurement up to the cluster node size easily.
Hybrid and/or all-flash – dedup/compression?
The achievable IOPS is heavily dependent on the type of the system, if it is hybrid or all flash, furthermore the count of the participating nodes can change the performance quite frankly. There are some vendors out there having no other than all flash options – like HPE Simplivity – some others simply allows certain functions only if all flash is used and licensed – VMware VSAN, deduplication and compression is AF only – while again some others do this included, out of the box – HPE Simplivity and Cisco Hyperflex.
(VSAN BYO) SSD TBW
This is the most important in case of build your own. Read it million times, as I’ve seen many customers skipping this part since they wanted to save money and reuse their 240GB read intensive SSDs from 2012….If not VSAN Ready Node is selected, so customer is configuring the server on his own, than it is very important to downselect and size the SSD so it’s capacity can take the 10% of the total capacity – used – of the capacity layer. This is true especially if VSAN hybrid is on the table. SSD endurance is equally important, earlier this was often desribed as DWPD in specifications, but this has changed to TBW lately. Majority of the solutions are working in a way that all write IO is always through this SSD, so every single bit that is being written, is written to the cache SSD until it is destaged.
VM swap – sparse swap (VSAN)
This is again not really HCI specific, but in the required capacity calculations this sould be taken into account too. If we have a look on the virtual machine’s directory on the datastore, we can realize there are some files there and one of them has the extension of vswp. This is the virtual machine swap file and it is always there if the VM is powered on, what more, it’s size equals the size of the memory configured to that particular VM. Size of this file can be reduced even down to zero – file will disappear – if we reserve some or all memory for that VM, but this kills the consolidation ratio and tells VMware ESXi not do to the fancy/magical memory management things it can do. So if we have configured 6 TB of RAM to our virtual machines, we need 6 TB space on the datastore. This is often forgotten. Since HCI systems usually use mirroring the swap will be mirrored too so we talk about 12 TB requirement now. If 3-way mirror it just adds an another 6 TB as a must. In case of VSAN, Sparse swap is not an option but it can be user only if there will not be phyical RAM overprovisioned, ever.
Overhead – CPU és RAM
Every HCI solution has CPU and RAM requirements. Period. If deduplication and compression is in use, it requires additional resources. Erasure coding in VSAN also adds some load so this should be taken into account.
VSAN RAM requirements are for example:
VSAN CPU requirement:
Surprisingly the difference in requirements are in the ballpark area in all vendor solutions. What is different is how it is presented to the administrator. Vendors that have own hypervisor has advantage here, since all others will need to use a “service/control VM”. Since this is really a VM entity, it’s CPU and RAM usage can be determined just by looking at it. Compare this to VSAN CPU and RAM usage which is condensed down to the accumulated host CPU and RAM scale, which is wide as hell since hosts today have 24 x 2,800 Mhz and 5120000GB of RAM. The order of magnitude is so big, that the actual VSAN usage is just two pixel in the chart. This gives the feeling and the believe to the customers that kernel integrated is so much better since the SDS layer requires no visible resources, while the control VM based solutions are resource heavy.
HPE is often targeted with the comment, Simplivity has a hardware component, a card which has a special ASIC – and some other components – which does the heavy lifting, however it has lower performance than any other kernel based solutions. I don’t relly share this view as one is simply software based, the other is software with hardware based. Let me add here a picture how this card is given to the control VM – SVC – at its disposal:
The Control VM is having the physical card by using DirectPath IO, so this ASIC and the RAID controller too bypassing the hypervisor.
VMware documentation states this aboud DirectPath IO in a different related component, network card:
I am not making any statement here, just let you think this through and consider if this is a valid attack against HPE/Cisco or not.
Slack space
Space that is required just to have the system up and running. If snapshots are wanted you need to have some space to be able creating them. Furthermore it is a must if in case of node failure a rebuild is needed when a node has failed. Slack is usually around 30% and this can be accepted, as no should run his/her storage with 95% used capacity.
(VSAN,HyperFlex) – Resiliency factor – VSAN RAID level (too)
Touched this slightly earlier, in the planning phase it is needed to plan with resiliency. Define the level we need. Single node failure is the starting point – if not using RAID0/RF1 on purpose – all vendors can take such event without any impact. If two node failure is required, many solutions require three times the used space as capacity – VSAN erasure coding is an exception.
As an example if there is a VM with 100 GB used space, one failure to tolerate will consume 200GB, two node failures to tolereate will eat up 300 GB of capacity.
This can me mitigated by compression and deduplication and in case of VSAN, RAID5/6 – erasure coding – can help for sure. In return these require more nodes- RAID5 requires 4 nodes-, more CPU and RAM for the storage layer. At the same time it will deliver less IOPS and can increase latency, not to mention the license it needs.
Fentebb már említettem, de fontos már az elején tervezni a hibatűrés szintjével. Egy node kiesését minden gyártó kezeli – hacsak valaki direkt nem hoz létre RAID0/RF1-el VM-et – RAID1-el. Ha két node kiesését szeretnénk kezelni, akkor három példányt tárol majd a megoldás minden VM lemezeiből.
Build your own, Validated system or turnkey ready solution
If you want to do BYO, options as technology are quite limited. VMware VSAN or Microsoft Storage Spaces Direct. I don’t really recomment taking this route as your homepage will be surely be VMware HCL in order to match the HBA firwmare version with the driver version and the SSD FW to the etc….I think you understand the complexity here. A hyperconverged solution should make life easier, not million times more complicated.
Second alternative is VSAN Ready Nodes – there are HPE boxes for MS S2D – which is a joint solution by server vendor and VMware. It is validated, performance tested and supported. This sounds much better, but this is still not a turnkey ready HCI.
Systems which are ready out of the box are usually called integrated solutions. HPE Simplivity, DellEMC VXrail, Cisco Hyperflex and Nutanix are in this league. The deployment is usually wizard guided which provisions and configures the systems – Cisco Hyperflex includes the Fabric interconnect too – and in about an hour they will be ready for VM hosting. Common factor of these solutions that the support is in one hand, there will be no fingerpointing if there is an issue. Updates and upgrades are validated and tested by vendor, so just by deploying all updates in a bundle – usually using an another wizard workflow – will result a working system, which will perform as before.
The more important thing I can recommend: Start small, but not to small!