NSX-T materials and new design guide

Some weeks ago I had a post about why we need a new design guide as with the arrival of NSX-T 3.0 and vSphere 7 many new options came into the picture. Just had to ask and now it is out 😀 Before we check that, let’s see what others were released in the last some weeks.

NSX-T 3.0 Security Reference Guide (link)

Pretty good document, highly recommend it. I found couple of mistakes in it, those will be fixed in version 1.1 – if not yet already.

Last two lines in the attached picture tells 11 and just below that 10 predefined roles.

newmanhun_0-1607590196196.png

Page 17 – I would amend Option 1 with the term “security focused design” as this was the name of this approach back in the time with NSX for vSphere.

Page 20 – “In the physical representation, both T0 and the T1 firewalls are on the Edge Transport Node. Thus the packed does not leave the Edge host until it has passed through T1 Gateway Firewall” This is true, but not always as active instance of the T1 can be elsewhere, so within an other Edge node and in that case the traffic will leave the T0 hosing Edge node and flow to the one hosting the active T1.

Page 32 – Flow 3 is in the text as example but picture talks about flow 2.

Page 34 – I’d iclude some lines about that the “applied to” can be configured at the section level also which is also important.

NSX-T 3.1 Multi-Location Design Guide (Federation + Multisite) (link)

Multisite is certanly a hot topic and this is a must read. It would not be me if not doing quality check and Dimitri has done a perfect job. Page 21 had a diagram where the green segment had not active T1/T0, but all others – as much as I can judge and understand is flawless.

VMware® NSX-T Reference Design (link)

It is finally here. I was waiting for this document more than the release of Cyberpunk 2077. Latter one is a huge disappointment, but the design document is useful and almost complete. It discusses vDS related things to a certain detail and tells what is recommended – same as before.

Page 204 describes this utilizing 2 pNIC – it makes not that much difference if using 4 pNIC from the configuration of vDS PGs.

Explanation is here from the document.

Transport zone – one overlay and VLAN – consistent compared to three N-VDS design
where external VLANs have two specific VLAN transport zone due to unique N-VDS
per peering

  • N-VDS-1(derived from matching transport zone name – both overlay and VLAN)
    defined with dual uplinks that maps to unique vNICs, which maps to unique DVPG at
    VDS – duality is maintained end-to-end
  • N-VDS-1 carries multiple VLANs per vNIC – overlay and BGP peering
    o The overlay VLAN must be same on both N-VDS uplink with source ID teaming
    o BGP Peering VLAN is unique to each vNIC as it carries 1:1 mapping to ToR with
    named teaming policy with only one active pNIC in its uplink profile
  • VDS DVPG uplinks is active-standby (Failover Order teaming for the trunked DVPG) to
    leverage faster convergence of TEP failover. The failure of either pNIC/ToR will force
    the TEP IP (GARP) to register on alternate pNIC and TOR. This detection happens only
    after BFD from N-VDS times out, however the mapping of TEP reachability is
    maintained throughout the overlay and thus system wide update of TEP failure is avoided
    (host and controller have a mapping or VNI to this Edge TEP), resulting into reduced
    control plane update and better convergence. The BGP peering recovery is not needed as
    alternate BGP peering is alive, the BGP peering over the failed pNIC will be timed out
    based on either protocol timer or BFD detection.

No change there however this still tells that in case of – for example – P1 port dies due to cable disconnected cable, TOR switch issue, the TEP-IP-1 will fail over to vNIC3, so to P2 in the lower layer. VLAN 300 which is an uplink VLAN will be rendered into a dormant state as the T0 IP will fail over too but the BGP peering will be dead in it since the other TOR does not have that VLAN on the trunk. So even if “Trunk DVPG-1 will activate the standby adapter that will not do any traffic in VLAN 300. As I decode this, VMware suggests to leave it all up to BGP to do redundancy and do not even try to solve something anywhere else.

What I miss is that what happens if the peers in a design are not the TOR swtiches, but some devices that are reachable over a layer 2 – actually two – that TOR switches provide. That way it is not a bad idea to still have both VLANs on both TOR switches.

Section 7.3.2. it talks about “Teaming mode” options, but it is not discussed in the perspective of a collapsed design, but a pure host transport setup where Edge (VM) is somewhere else.

Single TEP vs Dual TEP decision is still no brainer…..dual TEP.

In all my designs I deployed Dual TEP.

Summary

With the release of these documents I believe 3.X related why/where/how questions are pretty much answered now. In case something is still not clear, it is our duty to answer them – this is the role of a IT consultant.

Rule is still: Test your design in every phase of the project and make sure it delivers the resiliency you need.