Currently our requirements state that if a host is failed then the affected VMs can be evacuated by the Consumer, or if the host is going to maintenance then the affected VMs can be migrated by the Consumer as a response for notification from the VIM.
I propose to support a third type of behavior.
- When a host is failed then the affected VM can be rebooted after the host has been recovered.
- When a host is going to maintenance the affected VM can be shut down and the booted up on the same host after the maintenance is over.
This way we can support three classes of VMs
- Controller VM in a HA setup should be evacuated / migrated
- Payload VM should not be evacuated / migrated but rebooted on the same host after the problem / maintenance is over. This way a failure (or maintenance) means capacity degradation only. Moreover we are not spending resources on evacuating (migrating) a payload VM.
Also this behavior is good for APPs in N+M setup. As if we loose one STBY VM then we don't have to spend too much effort recovery that VM but eventually it is good to go from N + (M-1) back to N+M for example when the fault is solved or the maintenance is over.
- other VMs e.g. non HA (best effort) VMs should not be evacuated / migrated or either rebooted automatically in case of a failure or maintenance.
The above use case especially valid in such an advanced implementation that allows pre-definition of VM evacuation / migration policies to avoid waiting for the Cosumer to decide on the action to be taken.