Plan recovery approach
Recover the leader first when leader is malfunctioning
When recovering multiple nodes, check whether any of the nodes to be recovered are reported as being the leader
based on the output of the rvtconfig report-group-status
command.
If any of the nodes to be recovered are the current leader, recover the leader node first.
This helps to speed up the handover of group leadership, so that the recovery will complete faster.
Choose between csar heal over csar redeploy
In general, use the csar heal
operation where possible instead of csar redeploy
.
The csar heal
operation requires that the initconf process is active on the VM, and that the VM can reach both the CDS and MDM services, as reported by rvtconfig report-group-status
.
If any of those pre-requisites are not met for csar heal
, use csar redeploy
instead.
When report-group-status
reports that a single node cannot connect to CDS or MDM, it should be considered a VM specific fault. In that case, use csar redeploy
instead of csar heal
.
But a widespread failure of all the VMs in the group to connect to CDS or MDM suggest a need to investigate the health of the CDS and MDM services themselves, or the connectivity to them.
When recovering multiple VMs, you don’t have to consistently use either csar redeploy
or csar heal
commands for all nodes.
Choose the appropriate command for each VM according to the guidance on this page instead.
Recovering one node
Healing one node
VMs should be healed one at a time, reassessing the group status using the rvtconfig report-group-status
command after each heal operation, as detailed below.
See the 'Healing a VM' section of the SIMPL VM Documentation for details on the csar heal
command.
The command should be run as follows:
csar heal --vm <VM name> --sdf <path to SDF>
Make sure that you pass the SDF pertaining to the correct version, being the same version that the recovering VM is already on, especially during an upgrade. |
Redeploying one node
VMs should be redeployed one at a time, reassessing the group status using the rvtconfig report-group-status
command after each heal operation, as detailed below.
Exceptions to this rules are noted on this page.
See the 'Healing a VM' section of the SIMPL VM Documentation for details on the csar redeploy
command.
The command should be run as follows:
csar redeploy --vm <VM name> --sdf <path to SDF>
Make sure that you pass the SDF pertaining to the correct version, being the same version that the recovering VM is already on, especially during an upgrade. |
Re-check status after recovering each node
To ensure a node has been successfully recovered, check the status of the VM in the report generated by rvtconfig report-group-status
.
The csar heal command waits until heal is complete before indicating success, or times out in the awaiting_manual_intervention case (see below).
The csar redeploy command does not wait until recovery is complete before returning.
|
On accidental heal or redeploy to the wrong version
If the output of report-group-status
indicates an unintended recovery to the wrong version, follow the procedure in Troubleshooting accidental VM recovery to recover.