Disable scheduled tasks
Scheduled Rhino restarts, Cassandra repairs, and SBB/activity cleanups should be disabled before running recovery operations.
Run the rvtconfig enter-maintenance-window
command to do this.
Gather group status
The recovery steps to follow are highly dependent on the status of each VM and the VM group as a whole.
Prior to choosing which steps to follow, run the rvtconfig report-group-status
command, and save the output to a local file.
Collect diagnostics from all of the VMs
The diagnostics from all the VMs should be collected to help a later analysis of the fault that caused the need to recovery VMs. Gathering diagnostics from the VMs to be recovered is of higher priority than from the non-recovering VMs. This is because as diagnostics can be gathered from the healthy VMs after the recovery steps, whereas the VMs to be recovered will be destroyed along with all their logs. To gather diagnostics, follow instructions from RVT Diagnostics Gatherer. After generating the diagnostics, transfer it from the VMs to a local machine.
Ensure that non-recovering VMs are responsive
Before recovering VM(s), use the output of the report-group-status
command above to ensure that the other nodes,
which are not the target of the recovery operation, are responsive and healthy.
This includes the ability for each of the other VMs to see the CDS and MDM services, and the initconf process must be running, and should be converged:
[ OK ] initconf is active (running) and converged
[ OK ] CDS connection successful
[ OK ] MDM connection successful
For TSN nodes, both Cassandra services (disk-based and RAM-disk) should be listed as being in the UN
(up/normal) state on all the non-recovering nodes.