You might recall some presentations from HPE and some of those surely contained at least a slide about their cloud based management platform called Infosight. It came around a while ago in recent years HPE is adding more and more product lines to it. Is is not uncommon we see this happening with more vendors out there. Point is that the age of management islands is over. These isolated self contained systems with their management tool was great for a long time but not enough anymore. If HPE sells thousands of systems worldwide why these devices don’t feed back any data to a central location just to build a common knowledge. We see this happening with Tesla for example, so their cars always shadowing even when the driver has the control. If a driver does something else in a given situation differently than the self driving system would do, it is reported back to Tesla for further analysis. A route that a car has taken also helping other cars to take the same route with more confidence. This is called fleet learning.
Basically this is what HPE’s Infosight is about and this is where in this article Nimble comes into play. This is optional so if a Nimble array is not allowed to report back to Infosight, it is fine, but if possible allow it. It worth it.
This can be enabled with a single click from the Nimble UI.
What is being sent to Infosight?
Let’s start with the most important: No data will be transferred from any volume – so never any block that is stored inside a VVol or a Volume – to Infosight ever. Only configuration and performance data.
HPE Nimble arrays – so as their Gen10 servers and 3PAR/Primera – are full with sensors and the measurements by these are sent to Infosight for sure, but the interval of the communication varies by their nature:
- If an alert raises that is being send in an instant.
- There is a heartbeat which occurs in every five minutes.
- Performance counters and statistics are sent in every ten minutes.
- Diagnostics data is once per day. This is the configuration of that given array, the type the capacity the group setup – this is really optional.
- VM streaming data if configured. This is the best feature in my opinion. A Nimble array can have more vCenter Servers registrated from which it can fetch VM specific configuration and performance data, allowing cross stack analysis in Infosight.
First is to log in to Infosight at https://infosight.hpe.com. From there in case of Nimble choose Infrastructure – Nimble Storage tab.
I have one array added and this will be displayed. This is an AF1000 model with the smallest capacity possible. Just by looking at the picture below I can clearly see the model, the NimbleOS version the status of the device, RAW and Usable space figures (free and total) and the interface options it has installed.
Let’s click on the array itself. This opens up a little more detailed look. It tells me which controller is active and when was the last time it has communicated with Infosight. This array has no available newer firmware as I always keep it at the latest version, but if there would be any firmware, it would tell me that. The amazing part is the hardware recommendations. This array is quite full, above 80% capacity utilization. Currently it has 24 x 240GB SSDs so once SSD per dual flash carry (DFC). It suggest me to populate the bank B side of all DFCs to increase capacity. If the controllers of this array would be hammered by IOPS or exceeding the 1ms latency in certain occurrence, Inforsight would tell me exactly to upgrade the controller to AF3000 model if that would satisfy the performance.
Let’s select Volumes. It lists the volumes configured on this array with their names, data usage – before any dedup/compression- , physical usage, exported size, IO per week (!!) and protection if available. So the Nimble-VMFS2 volume for example is 1,5TiB and after DECO it goes down to 870,2 GiB physically and it is doing 4.1 million IOPS per week.
Let’s click on this Nimble-VMFS2 volume. This reports this specific volume in really detailed fashion.
What is visible down below?:
- 2 TiB is exported to the host and 1.5 TiB is used – this is what VMware shows for this datastore. This has increased with 1% in the last 60 days, but in the last 30 days there was no change in consumption. Dedupe is enabled and it is down to 870.2 Gib.
- In weekly average 573.3 IO/sec was measured with 9.0 MiB/s throughput. Average read latency was averaging at 0.8, write at 0.1 ms.
- Diagrams show the latency, IOPS and throughput in a weekly timescale. This can be changed to 1 day and up to 3 months.
The capacity tab is pretty much the same but can tell exactly when this volume will run out of space. Based on the 1% increase this will not ever run out, but if the consumption if increasing, it will tell you based on projected forecast.
It opens up a single pane where all Infosight registered arrays can be checked against their SW version and support level, together with available recommendations. I believe as suggested by the name of this tab is only available for partners/resellers but not end consumers.
The operational dashboard is the main for customers. Above you can see there are four demo systems and two real ones. The picture below shows only the real systems and you can see the performance in a scale which is showing the Potential impact score. This is simple – and difficult to calculate – that based on the IOPS and latency measurements what is the array performance and what this performance results in your application.
Moving the mouse to a column is displaying this. No impact so far, it would be colored anyway.
Let’s deep dive into the details by clicking onto the array itself. This is really something useful. Earlier I showed the report for a volume only, but here I am showing the whole array. It reports in red that in 2 weeks it will run out of capacity and this is justified by the diagram at the bottom. Performance degradation comes at 90%, so before that it is better to increase capacity in this case. There are small wrenches in the events diagram, that shows when firmware was upgraded in my case. It is also good information that the CPU is not saturated so the controller’s Xeon processor has a lot of time to do its necessary magic. If I want to add async or sync replication to a volume, that would increase the CPU load and based on this I can calculate if my array would work with it or not.
The performace tab gives more detail in IOPS latency figures while the capacity is free/consumed space in detail. It seems the array experienced a higher load on the 23rd October right after lunch. There was a large read and write operation at the same time which had the potential impact on the performance itself – rated as 3 on the scale of 10.
Let’s go back to the Dashboards -> Executive tab. This if not for techies but for higher level managers. It speaks for itself. If I would replicate my volumes to an other Nimble array and the replication would not succeed in some cases this would tell the management exactly that RPO is violated, also if a volume is not protected. It has the upgrade option explicitly mentioned in the bottom. The opened and closed cases for this array – we had some opened as our datacenter was relocated and Nimble support called my that the array is no longer responding – of course as we unracked it, but this was my mistake.
I hope and believe that this article has convinced you, so if you have a Nimble array or planning to buy one – or two – please allow Infosight for it, as there are tons of benefit attached to it.
In the upcoming article I will go with a Cross stack analysis and visit Labs – a tab in Infosight with crazy reports you can imagine.