My view on VMware Cloud Foundation

Before I commence into any details, let me emphasize I like VMware Cloud Foundation which is the best way forward for clients who want cloud in their premises that works pretty much similar to their currently consumed vSphere base population.

No need to explain that is VMware Cloud Foundation since there are many articles and tutorials about that already – my blog has some but in Hungarian – it is enough to tell it is not a product. This is a bundle which uses the well known VMware products in a way that is “standardized” and tested together.

Products:

  • vSphere ESXi
  • vSphere vCenter Server
  • NSX-T Data Center
  • vRealize Suite
  • VSAN
  • SDDC Manager

With a exception of SDDC Manager you can build your solution to be like VCF, but will never be totally the same. The power of VCF comes from the fact that it can be updated with a tested bundle that will work together perfectly as it was tested by VMware and they stand behind it a support that whatever it takes. Worth mentioning if you have a fast moving infrastructure, than you are probably not the main target since VMware needs time for validations, so If you expect that a certain component goes GA with a new version and on the very same day a new VCF version comes out, you are wrong.

The update of a VCF deployment starts with the management domain as step one. After that the workload domains to be updated. This will be important later.

It is also important to realize that while you are allowed to do certain things in a given component in a non VCF deployment does not mean that it is allowed in a VCF setup. I’ll talk about this later.

Comment 1 – How many NSX instances do we need? One or multiple?

In a VMware Cloud Foundation there are two – I know we can create multiple sub-types – type of domains:

  • management domain
  • workload domain

Management domain will have NSX-T deployed whatever you select in the deployment workbook. So three NSX appliances will be deployed to the management domain even though you don’t want to use application virtual networking (AVN).

For the first workload domain an another set of three will be added – they will be running in the management domain – but serve the first workload domain. For every consecutive workload domain you can consume this or dedicate own NSX Manager instance for each and everyone of them.

If you know NSX-T you also know that overlay transport zone defines the span of segments. If two host are members of the same overlay TZ they will have the same segments defined and if a VM is sitting within that segment will be able to talk layer 2 with an other VM in the same segment but on an another host.

Above you can see two overlay zones:

  • TZ_WLD1_OVERLAY – apparently the boundary is the same as it’s workload domain.
  • TZ_WLD2/3_OVERLAY – this spans over WLD2 and WLD3.

Any segment you create in WLD2 or WLD3 will be available in all clusters in all those two workload domains. In a VCF deployment all clusters/workload domains that are using the same NSX instance – as WLD2 and WLD3 do – will be the part of the SAME overlay. This immediately rules out isolation – I know that distributed firewall is there – but strictly speaking the possibility to run a workload that belongs to a distinct system thrown onto WLD3 can even run in WLD2. The update in a setup above also affects both WLD2 and WLD3 as they all consume the same NSX Managers.

If you run NSX-T and not within VCF you can even have clusters attached to multiple overlay transport zones – surely they will need physical interfaces for that – or certain hosts/clusters in one NSX instance, but running different overlay TZ. Choice is yours, but in VCF you are not allowed to do the same.

Comment 2 – WLD for Edge Cluster

A month ago I received a comment from an engineer working at VMware that if I am so keen to have a dedicated vsphere cluster running my Edge VM workloads – I might want to consider a workload domain dedicated to Edge VMs.

Well……if you look that pic above and translate that to this option it will look like this.

I have no idea if this is possible to provision, but I don’t like it, since if you don’t have enough hosts in the Edge WLD located vSphere cluster – in the example at least six – the uplink bandwidth will shared by multiple Edge VMs. It does not resolve isolation, all workload domains will have “access” to all segments. If in my view a workload domain is one distinct system – like an Oracle license island – where I want to give full isolation, this does not fulfil that.

The upgrade sequence even worse since the whole infrastructure would be affected by it.

If you run pure NSX-T, this is not an issue, you can build this easily, but upgrade would be pretty much the same as above.

Comment 3 – Edge VM failover in multi availability zone

This was a surprise to me that in a multi AZ setup VCF does require stretched layer 2 for many VLANs. I would accept that for some of them, but surely not for Edge VM uplink networking.

It wants to do full mesh peering…..

Here is the documentation

https://docs.vmware.com/en/VMware-Validated-Design/6.1/sddc-architecture-and-design-for-the-management-domain/GUID-C924E896-D9C4-47BF-91D5-DF72605EF63E.html

These networks are required as system will try to fail over the dead Edge VM to the surviving availability zone. Exactly what you should not ever do. It is way better to have an Edge VM already in the other availability zone and let the T0/T1 instances to fail over. If you have a carefully sized edge cluster – if you don’t run VCF – the resiliency can return in the other availability zone after it is healing the missing T0/T1 standby parts in a “free” Edge VM.

https://docs.vmware.com/en/VMware-Validated-Design/6.0/sddc-architecture-and-design-for-a-virtal-infrastructure-workload-domain/GUID-A8C6A1A4-AEF3-40BA-82E7-E10349C3935F.html

Comment 4 – Full mesh vs non full mesh

If you do Edge networking in a pure NSX-T setup you can run non full mesh, meaning that one Edge VM – running a specific T0 – will have two uplinks – or one – towards a peer that has no connection to the other Edge VM at all. In a VCF you can’t do this as the bringup and later the SDDC Manager initiated workflow will as for two IP that will be the uplink peers, four IP in those two VLANs. Period. This is full mesh.

Summary

VMware Cloud Foundation is about standardization. Deliver the most common setup in a repeatable way that is the same everywhere in the world regardless who/when/where deployed it. This is not the case even if the same puzzle pieces of VCF – except SDDC Manager – was used to land a system. But this comes at a cost as there are rules to follow and while many of them makes sense, a couple of them are certainly interesting and hard to decode why it is used by it that way.

Posted in VCF