Plan recovery approach

Recover the leader first when leader is malfunctioning

When recovering multiple nodes, check whether any of the nodes to be recovered are reported as being the leader based on the output of the rvtconfig report-group-status command. If any of the nodes to be recovered are the current leader, recover the leader node first. This helps to speed up the handover of group leadership, so that the recovery will complete faster.

Choose between csar heal over csar redeploy

In general, use the csar heal operation where possible instead of csar redeploy. The csar heal operation requires that the initconf process is active on the VM, and that the VM can reach both the CDS and MDM services, as reported by rvtconfig report-group-status. If any of those pre-requisites are not met for csar heal, use csar redeploy instead.

When report-group-status reports that a single node cannot connect to CDS or MDM, it should be considered a VM specific fault. In that case, use csar redeploy instead of csar heal. But a widespread failure of all the VMs in the group to connect to CDS or MDM suggest a need to investigate the health of the CDS and MDM services themselves, or the connectivity to them.

When recovering multiple VMs, you don’t have to consistently use either csar redeploy or csar heal commands for all nodes. Choose the appropriate command for each VM according to the guidance on this page instead.

Recovering one node

Healing one node

VMs should be healed one at a time, reassessing the group status using the rvtconfig report-group-status command after each heal operation, as detailed below.

See the 'Healing a VM' section of the SIMPL VM Documentation for details on the csar heal command.

The command should be run as follows:

csar heal --vm <VM name> --sdf <path to SDF>
Warning Make sure that you pass the SDF pertaining to the correct version, being the same version that the recovering VM is already on, especially during an upgrade.

Redeploying one node

VMs should be redeployed one at a time, reassessing the group status using the rvtconfig report-group-status command after each heal operation, as detailed below. Exceptions to this rules are noted on this page.

See the 'Healing a VM' section of the SIMPL VM Documentation for details on the csar redeploy command.

The command should be run as follows:

csar redeploy --vm <VM name> --sdf <path to SDF>
Warning Make sure that you pass the SDF pertaining to the correct version, being the same version that the recovering VM is already on, especially during an upgrade.

Re-check status after recovering each node

To ensure a node has been successfully recovered, check the status of the VM in the report generated by rvtconfig report-group-status.

Note The csar heal command waits until heal is complete before indicating success, or times out in the awaiting_manual_intervention case (see below). The csar redeploy command does not wait until recovery is complete before returning.

On accidental heal or redeploy to the wrong version

If the output of report-group-status indicates an unintended recovery to the wrong version, follow the procedure in Troubleshooting accidental VM recovery to recover.

Previous page Next page
Rhino VoLTE TAS VMs Version 4.1