Sync replication and Peer Persistence setup – HPE Nimble

Just about a week ago my colleague, Peter Vadon – a.k.a. VaPe – presented why he/we think(s) HPE Nimble is getting to the level around functionalities that are required in enterprise segment in order to take it seriously by everyone. With the help of HPE we got hold of two identical HF20 arrays and VaPe configured it in Peer Persistence – referenced as PE in later sections.

The setup looks like this:

Lines colored green are used for iSCSI and Group traffic.

Lines colored red are used for management traffic.

Before we start, let’s kick the tires first by repeating some very important prerequisites. These requirements should all met in order to have a supported PP setup, so if at least one is not fulfilled, it will not work, or will not be guaranteed by HPE.

Requirements-prerequisites

  • Models from same class: Peer Persistence can be configured between same models. Currently it is not possible to sync replicate a HF20 to a HF40 for example, even though the latter one is more powerful and could take the additional load.
  • Replication works over IP: As stated earlier, sync replication traffic can be carried over IP only, there is no option to use fibre channel. This should not be considered as an issue, at least regarding the count of the 10Gbit-T connections as all models have at least two onboard. Worth to remember, front end connectivity – so array to host – media can be FC.
  • Arrays in PP relation should use same protocol, so either fibre channel or iSCSI. Mixed protocol in one PP group is not allowed.
  • Group, replication and management traffic must use the same L2 – not one L2 segment all together – segment, so ports used for a certain traffic should belong to the same L2 segment.
  • Maximum 5ms round trip time between the two arrays.
  • (optional) Installation of witness, located on a third site. This is not optional if Peer Persistence is the goal to set up, but optional if manual activation of volumes required upon failover on the surviving array.

Configuration

Steps to take:

  1. Upgrade both arrays to Nimble OS 5.1. No array can have it currently by default over Infosight, but it can be requested from Nimble Support to whitelist the array. Only thing they ask is the serial of the storage and soon it will appear.
  2. Add both array to one group. I will not describe this, as this is not really new. It works the same way if you have at least two Nimble boxes. This way the arrays will become one as management entity, one will be acting as group leader and will host the configuration and the management interface.

Above and below it is confirmed that both arrays are now part of the same group, called “HF”. CLI shows the same:

3. (only applicable if iSCSI is used as frontend protocol) Both storage should be set to use “Group Scoped Target” instead of “Volume Scoped Target”. Table below shows why:

It is required because a volume export contains the array’s IQN as target, but since the volume itself can move between two arrays that would lead to unavailability so it should use the group name in the IQN instead.

This can be done by entering “group –edit –default_iscsi_target_scope group” in CLI. If there are already some volumes, those should be unexported by removing the initiator group from the export and commit the command “vol –edit –iscsi_target_scope” command one by one on them.

4. Nimble Connection Manager deployment. This is important since this will set the Path Selection Policy in VMware to NIMBLE_PSP_DIRECTED on given LUNs.

5. Witness installation. As mentioned before, this is optional and required only if we ant ASO (Automatic SwitchOver), which is automatically brings the Peer Persistence synchronously replicated volume to “active” state on the surviving array. If manual failover is enough this can be skipped. Currently witness is an RPM file, which can be downloaded from HPE Infosight and can be installed on a CentOS machine (minimum 7.2). Later there will be an appliance for this, but not yet public. Important to open up firewall – if there is any – as arrays’ management IP should reach it over port 5395.

6. Set witness

It can be tested through the GUI or even using the CLI.

7. Creation of necessary Volume Collections. One is enough, but in that case only those volumes will be replicated which are member of that particular volume collection, so direction will be source-to-destination. In case of failover and after the recovery, the direction will be reversed, but still run in one direction. If we want to have volumes that are active on HF01 and some on array HF02, we need at least two groups, since the replication direction is defined on volume group level.

Let’s create HF01 to HF02 first:

It can be seen that even though i selected “No protection template” it will still set the snapshot retention to two. It can be modified to one, but it will show an error if “Do not replicate” is selected. I am not totally clear at the moment why is this, but will find out.

It is more important to set the “Replication partner” properly. Above the replication partner is HF02 for volume collection “NimbleHF01-to-HF02”. I’ve created a second volume collection, named “NimbleHF02-to-HF01” in which the partner is HF01. This can be seen here:

8. Almost seeing the chequered flag, create some volumes according to this table:

LabelPOOLLUN IDSize
Nimble-PP-LUN01-HF01HF01-Pool1500 GB
Nimble-PP-LUN02-HF02HF02-Pool2500 GB
Nimble-LUN03-HF01HF01-Pool3500 GB
Nimble-LUN03-HF02HF02-Pool4500 GB

First two volumes will be replicated the third will be only local on HF01, the last local on HF02 array.

After pressing “Create” button, the volume will appear twice, one will be “upstream”, the other will be “downstream”. No matter how they name this, it is active and standby.

Let’s create “Nimble-PP-LUN02-HF02” volume:

Finally the “local” volume creation:

If all successfully done we will get this window and state below:

9. I am using VMware, so just need to format the volumes to VMFS – I will name them in the same way as the volumes:

If everything is by the book the end result in VMware will look like this – regarding the first four datastore your mileage might vary:

We are at the point where and when Nimble Peer Persistence setup is completed.

Testing

In the picture about the test architecture you can see two virtual machines, named VM1, located on datastore “Nimble-PP-LUN02-HF01” and VM2, located on datastore “Nimble-PP-LUN02-HF02”. I have two hosts in the EPYC cluster, VM1 runs on host1, VM2 runs on host2. I have started an IOmeter on both VMs and set four workers and use the following profile (16k, 50% random, 70-30 read-write):

This is not a performance benchmark, I just want to generate some activity to have traffic between the two arrays. The load sets to a baseline:

Same “load” viewing through the Nimbe management interface:

After each test round I will wait till the successful reconvergence and if upstream array is changed I am setting it back to the original array.

First test: Management connection interruption

An array is losing the connectivity to the management network, so both of it’s controllers will have the mgmt port offline.

Result: no failover, no outage. The management interface puts up a warning showing no connection, but array is still manageable.

Second test: Active controller failure in array HF02

Immediate failover to the standby controller in the same array, so in HF02.

Since this is local failover, it has some impact on virtual machines – in this case VM2 – since the IO will be on “hold” during the activation of the standby controller and process to activate the volume. SCSI timeout is 30 seconds, and this hold time might vary, but usually takes 10-20 seconds.

Peer replication is still healthy:

Third test: I will remove power cords from HF2 array power supplies.

Response it total failover between arrays. So replicated and peer persistence protected volumes will be placed in “upstream” state on surviving array by using witness as tie breaker. Non replicated local volumes will be offline, at least the ones – in my case “Nimble-LUN03-HF02” – which are local on one array.

Replication state surfaces up the error as well:

From VMware the state is shown below in the form of Dead paths:

Virtual machinces located on datastores that are peer persistence protected replicated volumes will have their IO on hold for 10-20 seconds – just like in test two – so VM2 in this case will need to hold breath a little, up until the point when HF01 brings datastore Nimble-PP-LUN02-HF02 up in upstream state.

Further what-if tests

What happens if I remove two capacity drives and put them into the other’s bay. Well, nothing as you can see below – don’t search the fault 🙂

What happens if I pull out two SSD from the array – since this is a hybrid array I am removing cache – affecting write/read cache operations? Nothing, besides receiving alerts and the decreased FDR – flash-to-disk ratio – and capacity. Note the top right capacity in the picture which is reduced since removed two 480GB cache drives.

What if I remove all capacity disks, shuffle and put them back in some totally different order. Nothing. I have no words to describe the surprise on my face.

Other things to remember

Sync replicated volumes are synchronized to the other array – so from source to target – in non deduplicated and/or compressed format, as that would require additional CPU cycles to process at source side. So it is better to replicate it immediately when hitting the source array active controller NVDIMM module and let both arrays take the load to deduplicate/compress them. Latency is more important here than the amount of replication traffic.

Also important that sync replicated volumes cannot be resized, so choose wisely or later you might be looking at breaking the replication, do the resize and add it back.

Summary

Solution works well, I have tested failover around 20 times and there was only a single occurrence when automatic failover was not working. Remember, Nimble OS 5.1.1 is still not in production, so they will further improve the code and once released it will pass all tests. HPE Nimble is now advancing to an area where earlier some more expensive and certainly more complicated storage systems are/were.