Unusual topic here, but I am sure many of you have at least one VMware cluster – except if you don’t use VMware, which gives an excuse to close this windows anyway – where you have some excess capacity. At least I really hope that my posts – in hungarian sorry – earlier describing which admission control method to use are understood and you have at least one host resouce this or that way sitting there doing nothing.
We have this global pandemic situation here and while we can do a lot with washing hands properly, stay home and support health care, the ones who are in trouble by loosing their jobs or having no access to food or clean water. Here is this project which I will not talk about as it is mentioned in the news recently. So this has reached some serious performance recently and it became much faster than the other 7 HPC systems TOGETHER.
HPE lab capacity runs Folding@home:
VMware Fling has released a prepared appliance to import and run:
My employer – 99999 Informatika KFT. – has decided to allocate as much capacity to the Folding@home project as possible. In our production clusters we have like 35% available compute and in our development/test systems we can allocate all to the pool.
Totally understand if you don’t want to give up your failover capacity, but let’s say you have four hosts and use percentage based calc. This will give you some headroom. You deploy three folding machines and set these things below. If one host dies with a folding rig on it, that rig will not be restarted on the surviving nodes.
Quick route to sucess is to do this if you have VMware HA and DRS enabled
- Select your cluster and go Configure ->Configuration ->VM Overrides
2. Click “Add” and select all your Folding@home VMs – does not matter if self installed or appliance kit.
3. After hitting “Next” set DRS automation level to “Disabled” – this will prevent DRS vMotioning the machines (alternatively you can do VM/Host rules and set separation for those VMs) – and VM restart priority to “Disabled” – this will tell HA not to start the machine in case of host failure.