Rhino VoLTE TAS VMs 4.1 :: RVT VM Install Guide (GSM) :: Cassandra version switch procedure for TSN nodes

Table of Contents

Planning for the procedure
1. Preparation for upgrade procedure
2. Upgrade procedure
3. Post-upgrade procedure
4. Post-acceptance
5. Backout Method of Procedure

The TSN nodes in RVT 4.1 support two versions of the Cassandra database: 3.11.13 and 4.1.3. The procedure described here Major Upgrade from 4.0.0 of TSN Nodes explains how to do a major upgrade from TSN 4.0 to 4.1 but maintaining the major Cassandra version at 3.11.

In the major upgrade procedure, there is a minor Cassandra version upgrade from 3.11.4 to 3.11.13 but it is implicit and transparent during the VM upgrade and does not require any specific actions.

However, the procedure explained here in this page describes the Cassandra upgrade from 3.11.13 to 4.1.3. This procedure can only be executed once all TSN VMs have been ugpraded to version 4.1.

The Cassandra version upgrade to 4.1.3 cannot be done at the same time as the TSN major upgrade from 4.0.0 to 4.1.

Please, make sure the procedure Major Upgrade from 4.0.0 of TSN Nodes has been successfully carried out before attempting the Cassandra version switch from 3.11.13 to 4.1.3.

The page is self-sufficient, that is, if you save or print this page, you have all the required information and instructions for upgrading TSN nodes to the newer Cassandra version. However, before starting the procedure, make sure you are familiar with the operation of Rhino VoLTE TAS nodes, this procedure, and the use of the SIMPL VM.

There are links in various places below to other parts of this book, which provide more detail about certain aspects of solution setup and configuration.
You can find more information about SIMPL VM commands in the SIMPL VM Documentation.
You can find more information on rvtconfig commands on the rvtconfig page.

The page is self-sufficient, that is, if you save or print this page, you have all the required information and instructions for upgrading TSN nodes. However, before starting the procedure, make sure you are familiar with the operation of Rhino VoLTE TAS nodes, this procedure, and the use of the SIMPL VM.

There are links in various places below to other parts of this book, which provide more detail about certain aspects of solution setup and configuration.
You can find more information about SIMPL VM commands in the SIMPL VM Documentation.
You can find more information on rvtconfig commands on the rvtconfig page.

Planning for the procedure

This procedure assumes that:

You are familiar with UNIX operating system basics, such as the use of vi and command-line tools like scp.
You have deployed a SIMPL VM, version 6.13.3 or later. Output shown on this page is correct for version 6.13.3 of the SIMPL VM; it may differ slightly on later versions.

Check you are using a supported VNFI version:

Platform	Supported versions
OpenStack	Newton to Wallaby
VMware vSphere	6.7 and 7.0

Platform

Supported versions

OpenStack

Newton to Wallaby

VMware vSphere

6.7 and 7.0

Important notes

Do not use these instructions for target versions whose major version component differs from 4.1.

If you have fewer than 3 TSN nodes (for example, a lab deployment of a single TSN node), upgrading them is not possible. This is because Cassandra data (such as VM configuration data and state, and registration data) will be lost if there are fewer than two nodes active at any one time.

To upgrade one or two TSN nodes, you need to destroy and recreate all VMs of all node types in the entire site.

To upgrade the TSN nodes, it is required that:

The uplevel version of TSN to upgrade to is 4.1-5-1.0.0 or higher.
All other non-TSN nodes are already upgraded to 4.1-5-1.0.0 or higher.

Determine parameter values

In the below steps, replace parameters marked with angle brackets (such as <deployment ID>) with values as follows. (Replace the angle brackets as well, so that they are not included in the final command to be run.)

<deployment ID>: The deployment ID. You can find this at the top of the SDF. On this page, the example deployment ID mydeployment is used.
<site ID>: A number for the site in the form DC1 through DC32. You can find this at the top of the SDF.
<site name>: The name of the site. You can find this at the top of the SDF.
<MW duration in hours>: The duration of the reserved maintenance period in hours.
<CDS address>: The management IP address of the first TSN node.
<SIMPL VM IP address>: The management IP address of the SIMPL VM.
<CDS auth args> (authentication arguments): If your CDS has Cassandra authentication enabled, replace this with the parameters -u <username> -k <secret ID> to specify the configured Cassandra username and the secret ID of a secret containing the password for that Cassandra user. For example, ./rvtconfig -c 1.2.3.4 -u cassandra-user -k cassandra-password-secret-id ….

If your CDS is not using Cassandra authentication, omit these arguments.
<service group name>: The name of the service group (also known as a VNFC - a collection of VMs of the same type), which for Rhino VoLTE TAS nodes will consist of all TSN VMs in the site. This can be found in the SDF by identifying the TSN VNFC and looking for its name field.
<uplevel version>: The version of the VMs you are upgrading to. On this page, the example version 4.1-7-1.0.0 is used.

Tools and access

You must have the SSH keys required to access the SIMPL VM and the TSN VMs that are to be upgraded.

The SIMPL VM must have the right permissions on the VNFI. Refer to the SIMPL VM documentation for more information:

When starting an SSH session to the SIMPL VM, use a keepalive of 30 seconds. This prevents the session from timing out - SIMPL VM automatically closes idle connections after a few minutes.

When using OpenSSH (the SSH client on most Linux distributions), this can be controlled with the option ServerAliveInterval - for example, ssh -i <SSH private key file for SIMPL VM> -o ServerAliveInterval=30 admin@<SIMPL VM IP address>.

rvtconfig is a command-line tool for configuring and managing Rhino VoLTE TAS VMs. All TSN CSARs include this tool; once the CSAR is unpacked, you can find rvtconfig in the resources directory, for example:

$ cdcsars
$ cd tsn/<uplevel version>
$ cd resources
$ ls rvtconfig
rvtconfig

The rest of this page assumes that you are running rvtconfig from the directory in which it resides, so that it can be invoked as ./rvtconfig. It assumes you use the uplevel version of rvtconfig, unless instructed otherwise. If it is explicitly specified you must use the downlevel version, you can find it here:

$ cdcsars
$ cd tsn/<downlevel version>
$ cd resources
$ ls rvtconfig
rvtconfig

1. Preparation for upgrade procedure

These steps can be carried out in advance of the upgrade maintenance window. They should take less than 30 minutes to complete.

1.1 Ensure the SIMPL version is at least 6.13.3

Log into the SIMPL VM and run the command simpl-version. The SIMPL VM version is displayed at the top of the output:

SIMPL VM, version 6.13.3

Ensure this is at least 6.13.3. If not, contact your Customer Care Representative to organise upgrading the SIMPL VM before proceeding with the upgrade of the TSN VMs.

Output shown on this page is correct for version 6.13.3 of the SIMPL VM; it may differ slightly on later versions.

1.2 Prepare downlevel config directory

If you keep the configuration hosted on the SIMPL VM, find it and rename it to /home/admin/current-config. Verify the contents by running ls /home/admin/current-config and checking that at least the SDF (sdf-rvt.yaml) is present there. If it isn’t, or you prefer to keep your configuration outside of the SIMPL VM, then create this directory on the SIMPL VM:

mkdir /home/admin/current-config

Use scp to upload the SDF (sdf-rvt.yaml) to this directory.

1.3 Prepare uplevel config directory including an SDF

On the SIMPL VM, run mkdir /home/admin/uplevel-config. This directory is for holding the uplevel configuration files.

Use scp (or cp if the files are already on the SIMPL VM, for example in /home/admin/current-config as detailed in the previous section) to copy the following files to this directory. Include configuration for the entire deployment, not just the TSN nodes.

The uplevel configuration files.
The current SDF for the deployment.
The Rhino license.

1.4 Update SDF

Open the /home/admin/uplevel-config/sdf-rvt.yaml file using vi. Find the vnfcs section, and within that the TSN VNFC. Within the VNFC, locate the custom-options field and remove the cassandra_version_3_11. Save and close the file.

You can verify the change you made by using diff -u2 /home/admin/current-config/sdf-rvt.yaml /home/admin/uplevel-config/sdf-rvt.yaml. The diff should look like this (context lines and line numbers may vary), with only a change:

--- sdf-rvt.yaml        2022-10-31 14:14:49.282166672 +1300
+++ sdf-rvt.yaml        2022-11-04 13:58:42.054003577 +1300
@@ -211,5 +211,5 @@
      product-options:
        tsn:
          cds-addresses:
          - 172.18.1.10
          - 172.18.1.11
          - 172.18.1.12
          custom-options:
          - log-passwords
-         - cassandra_version_3_11

1.5 Reserve maintenance period

The upgrade procedure requires a maintenance period. For upgrading nodes in a live network, implement measures to mitigate any unforeseen events.

Ensure you reserve enough time for the maintenance period, which must include the time for a potential rollback.

The cassandra switch procedure is reasonably fast, but it must be done sequentially. The VMs are not updated, only the cassandra containers inside the VMs. However, the rollback procedure, usually involves redeploy of failed VMs, which takes longer.

To calculate the time required for the actual upgrade or roll back of the VMs, run rvtconfig calculate-maintenance-window -i /home/admin/uplevel-config -t tsn --site-id <site ID>.

The output will be similar to the following, stating how long it will take to do an upgrade or rollback of the TSN VMs.

Nodes will be upgraded sequentially

-----

Estimated time for a full upgrade of 3 VMs: 24 minutes
Estimated time for a full rollback of 3 VMs: 24 minutes

-----

Your maintenance window must include time for:

The preparation steps. Allow 15 minutes.
The Cassandra switch of the VMs, 5 minures per VM.
The rollback of the VMs, as calculated above.
Post-upgrade or rollback steps. Allow 5 minutes, plus time for any prepared verification tests.

In the example above, this would be 59 minutes.

These numbers are a conservative best-effort estimate. Various factors, including IMS load levels, VNFI hardware configuration, VNFI load levels, and network congestion can all contribute to longer upgrade times.

These numbers only cover the time spent actually running the upgrade on SIMPL VM. You must add sufficient overhead for setting up the maintenance window, checking alarms, running validation tests, and so on.

You must also reserve time for:

Any validation testing needed to determine whether the upgrade succeeded.

2. Upgrade procedure

2.1 Check Cassandra Status

Run cassandra-status of both cassandra clusters:

Primary: ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <TSN Addresses>
Ramdisk: ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <TSN Addresses> --ramdisk

Check the output and verify that:

All nodes run version 3.11.13.
All nodes are UP and Normal (UN).
All nodes use the same schema version.

=====> Checking cluster status on node 172.30.102.224
Setting up a connection to 172.30.102.224
Connected (version 2.0, client OpenSSH_7.4)
Auth banner: b'WARNING: Access to this system is for authorized users only.\n'
Authentication (publickey) successful!
ReleaseVersion: 3.11.13
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  1.2.3.4  1.59 MiB   256          100.0%            3381adf4-8277-4ade-90c7-eb27c9816258  rack1
UN  1.2.3.5  1.56 MiB   256          100.0%            3bb6f68f-0140-451f-90a9-f5881c3fc71e  rack1
UN  1.2.3.6  1.54 MiB   256          100.0%            dbafa670-a2d0-46a7-8ed8-9a5774212e4c  rack1

Cluster Information:
    Name: rvt41-tsn
    Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
    DynamicEndPointSnitch: enabled
    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
    Schema versions:
        29a54aa0-67af-39d2-bffe-16aed6eb021a: [1.2.3.4, 1.2.3.5, 1.2.3.6]

2.2 Collect diagnostics

We recommend gathering diagnostic archives for all TSN VMs in the deployment.

On the SIMPL VM, run the command ./rvtconfig gather-diags --sdf /home/admin/uplevel-config/sdf-rvt.yaml -t tsn --ssh-key-secret-id <SSH key secret ID> --output-dir <diags-bundle>.

If <diags-bundle> does not exist, the command will create the directory for you.

Each diagnostic archive can be up to 200 MB per VM. Ensure you have enough disk space on the SIMPL VM to collect all diagnostics. The command will be aborted if the SIMPL VM does not have enough disk space to collect all diagnostic archives from all the VMs in your deployment specified in the provided SDF.

2.3 Disable scheduled tasks

Only perform this step if this is the first, or only, node type being upgraded.

Run ./rvtconfig enter-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID> --hours <MW duration in hours>. The output will look similar to:

Maintenance window is now active until 04 Nov 2022 21:38:06 NZDT.
Use the leave-maintenance-window command once maintenance is complete.

This will prevent scheduled tasks running on the VMs until the time given in the output.

If at any point in the upgrade process you wish to confirm the end time of the maintenance window, you can run ./rvtconfig maintenance-window-status -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>.

2.4 Pause Initconf in non-TSN nodes

Set the running state of initconf processes in non-TSN VMs to a paused state.

./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Stopped.

You should see an output similar to this, indicating that the initconf process of non-TSN nodes are in state Stopped.

Connected to MDM at 10.0.0.192
Put desired state = Stopped for Instance mydeployment-mag-1
Put desired state = Stopped for Instance mydeployment-shcm-1
Put desired state = Stopped for Instance mydeployment-mmt-gsm-1
Put desired state = Stopped for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Stopped",
    "mydeployment-shcm-1": "Stopped",
    "mydeployment-mmt-gsm-1": "Stopped",
    "mydeployment-smo-1": "Stopped"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

2.5 Take a CDS backup

Take a backup of the CDS database by issuing the command below.

./rvtconfig backup-cds --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --output-dir <backup-cds-bundle> --ssh-key-secret-id <SSH key secret ID> -c <CDS address> <CDS auth args>

The output should look like this:

Capturing cds_keyspace_schema
Capturing ramdisk_keyspace_schema
cleaning snapshot metaswitch_tas_deployment_snapshot
...
...
...
running nodetool snapshot command
Requested creating snapshot(s) for [metaswitch_tas_deployment_info] with snapshot name [metaswitch_tas_deployment_snapshot] and options {skipFlush=false}
...
...
...

Final CDS backup archive has been created at <backup-cds-bundle>/tsn_cassandra_backup_20230711095409.tar

If the command ended successfully, you can continue with the procedure. If it failed, do not continue the procedure without a CDS backup and contact your Customer Care Representative to investigate the issue.

2.6 Begin the upgrade

Upgrade a single TSN node

Run ./rvtconfig cassandra-upgrade --ssh-key-secret-id <SSH key secret ID> --ip-addresses <TSN Address> for one TSN node. Do one node at a time.

====> Upgrading Cassandra on node 172.30.102.224
Setting up a connection to 172.30.102.224
Connected (version 2.0, client OpenSSH_7.4)
Auth banner: b'WARNING: Access to this system is for authorized users only.\n'
Authentication (publickey) successful!
Checking the current version of nodetool...
Checking if cassandra-ramdisk is running...
Stopping Initconf...
Flushing Cassandra memory tables...
Stopping Cassandra...
Untagged: registry.rhino.metaswitch.com/rhino/cassandra:{cassandra-version-three}-0
...
Starting...
Initconf started
Waiting for Cassandra container(s) to be up and running...
Waiting for Cassandra container(s) to be up and running...
Waiting for Cassandra container(s) to be up and running...
Waiting for Cassandra container(s) to be up and running...
Cassandra container(s) started, release: docker-local-temp.metaswitch.com/rhino/cassandra:feature-468278-cassandra-41
Waiting for 'cassandra' to be UP/NORMAL...
...
Waiting for 'cassandra-ramdisk' to be UP/NORMAL...
...
Started
Run 'rvtconfig cassandra-upgrade-sstables' AFTER you finish upgrading ALL of your Cassandra nodes.

The following two errors can be ignored during the Cassandra switch:

nodetool error

nodetool: Failed to connect to '1.2.3.4:7199' - ConnectException: 'Connection refused (Connection refused)'.
Waiting for 'cassandra' to be UP/NORMAL...

java.lang.RuntimeException error

error: No nodes present in the cluster. Has this node finished starting up?
-- StackTrace --
java.lang.RuntimeException: No nodes present in the cluster. Has this node finished starting up?
    at org.apache.cassandra.dht.Murmur3Partitioner.describeOwnership(Murmur3Partitioner.java:294)

Check status of switched TSN Node(s)

Run cassandra-status of both cassandra clusters in the switched TSN node:

Primary: ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <TSN Address>
Ramdisk: ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <TSN Address> --ramdisk

Verify the output and check that:

TSN node is running Cassandra 4.1.3
All nodes are UP and NORMAL (UN)
All Cassandra 3.11.13 nodes use same schema
All Cassandra 4.1.3 nodes use same schema
Cassandra cluster has exactly 2 database versions (3.11.13, 4.1.3) - some nodes are 3.11.13, some nodes are 4.1.3
All nodes are reachable

=====> Checking cluster status on node 1.2.3.4
Setting up a connection to 172.0.0.224
Connected (version 2.0, client OpenSSH_7.4)
Auth banner: b'WARNING: Access to this system is for authorized users only.\n'
Authentication (publickey) successful!
ReleaseVersion: 4.1.3
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load      Tokens  Owns (effective)  Host ID                               Rack
UN  1.2.3.4  1.59 MiB   256          100.0%            3381adf4-8277-4ade-90c7-eb27c9816258  rack1
UN  1.2.3.5  1.56 MiB   256          100.0%            3bb6f68f-0140-451f-90a9-f5881c3fc71e  rack1
UN  1.2.3.6  1.54 MiB   256          100.0%            dbafa670-a2d0-46a7-8ed8-9a5774212e4c  rack1

Cluster Information:
    Name: rvt41-tsn
    Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
    DynamicEndPointSnitch: enabled
    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
    Schema versions:
        1c15f3b1-3374-3597-bc45-a473179eab28: [1.2.3.5, 1.2.3.6]

        08e3d7be-452e-3112-840c-8702cd468b73: [1.2.3.4]

Stats for all nodes:
    Live: 3
    Joining: 0
    Moving: 0
    Leaving: 0
    Unreachable: 0

Data Centers:
    dc1 #Nodes: 3 #Down: 0

Database versions:
    3.11.13: [1.2.3.5:7000, 1.2.3.6:7000]

    4.1.3: [1.2.3.4:7000]

Keyspaces:
...

Continue with the remaining nodes

Repeat Upgrade a single TSN node with the remaining nodes until all TSNs have been switched to Cassandra 4.1.3. Then continue.

Check all nodes are running cassandra 4.1.3

Run cassandra-status of both cassandra clusters:

Primary: ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <TSN Address>
Ramdisk: ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <TSN Address> --ramdisk

Verify the output and check that:

All nodes run version 4.1.3
All nodes are UP and NORMAL (UN)
All nodes use same schema version

Upgrade `sstables` for all TSN nodes

Finish the upgrade by running command ./rvtconfig cassandra-upgrade-sstables --ssh-key-secret-id <SSH key secret ID> --ip-addresses <TSN Addresses>

You should specify the addresses of all TSN VMs when running this command. <TSN Addresses> takes a list of the TSN IP addresses, seperated by spaces.

For example ./rvtconfig cassandra-upgrade-sstables --ssh-key-secret-id <SSH key secret ID> --ip-addresses 10.244.21.160 10.244.21.161 10.244.21.162.

2.7 Run basic validation tests

Run csar validate --vnf tsn --sdf /home/admin/uplevel-config/sdf-rvt.yaml to perform some basic validation tests against the uplevel nodes.

This command first performs a check that the nodes are connected to MDM and reporting that they have successfully applied the uplevel configuration:

========================
Performing healthchecks
========================
Commencing healthcheck of VNF 'tsn'
Performing health checks for service group mydeployment-tsn with a 0 second timeout
Running MDM status health-check for dc1-mydeployment-tsn-1
dc1-mydeployment-tsn-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-tsn-2
dc1-mydeployment-tsn-2: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-tsn-3
dc1-mydeployment-tsn-3: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'

After that, it performs various checks on the health of the VMs' networking and services:

================================
Running validation test scripts
================================
Running validation tests in CSAR 'tsn/4.1-7-1.0.0'
Test running for: mydeployment-tsn-1
Running script: check_ping_management_ip…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-21-51.log

If all is well, then you should see the message All tests passed for CSAR 'tsn/<uplevel version>'!.

If the VM validation fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log.

Running validation test scripts
================================
Running validation tests in CSAR 'tsn/4.1-7-1.0.0'
Test running for: mydeployment-tsn-1
Running script: check_ping_management_ip...
Running script: check_can_sudo...
Running script: check_converged...
Running script: check_liveness...
ERROR: Script failed. Specific error lines from the ansible output will be logged to screen. For more details see the ansible_output file (/var/log/csar/ansible_output-2023-01-06-03-40-37.log). This file has only ansible output, unlike the main command log file.

fatal: [mydeployment-tsn-1]: FAILED! => {"ansible_facts": {"liveness_report": {"cassandra": true, "cassandra_ramdisk": true, "cassandra_repair_timer": true, "cdsreport": true, "cleanup_sbbs_activities": false, "config_hash_report": true, "docker": true, "initconf": true, "linkerd": true, "mdm_state_and_status_ok": true, "mdmreport": true, "nginx": true, "no_ocss7_alarms": true, "ocss7": true, "postgres": true, "rem": true, "restart_rhino": true, "rhino": true}}, "attempts": 1, "changed": false, "msg": "The following liveness checks failed: ['cleanup_sbbs_activities']", "supports_liveness_checks": true}
Running script: check_rhino_alarms...
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-40-37.log
***Some tests failed for CSAR 'tsn/4.1-7-1.0.0' - see output above***

----------------------------------------------------------


WARNING: Validation script tests failed for the following CSARs:
  - 'tsn/4.1-7-1.0.0'
See output above for full details

The msg field under each ansible task explains why the script failed.

If there are failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.

3. Post-upgrade procedure

3.1 Check Cassandra version and status

Verify the status of the cassandra clusters. First, check that the primary Cassandra cluster is healthy and in the correct version. Run ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <CDS Address> for every TSN node.

Next, check that the ramdisk-based Cassandra cluster is healthy and in the correct version. Run ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <CDS Address> --ramdisk for every TSN node.

For both Cassandra clusters, check the output and verify the running cassandra version is 4.1.3.

=====> Checking cluster status on node 1.2.3.4
Setting up a connection to 172.0.0.224
Connected (version 2.0, client OpenSSH_7.4)
Auth banner: b'WARNING: Access to this system is for authorized users only.\n'
Authentication (publickey) successful!
ReleaseVersion: 4.1.3
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load      Tokens  Owns (effective)  Host ID                               Rack
UN  1.2.3.4  1.59 MiB   256          100.0%            3381adf4-8277-4ade-90c7-eb27c9816258  rack1
UN  1.2.3.5  1.56 MiB   256          100.0%            3bb6f68f-0140-451f-90a9-f5881c3fc71e  rack1
UN  1.2.3.6  1.54 MiB   256          100.0%            dbafa670-a2d0-46a7-8ed8-9a5774212e4c  rack1

Cluster Information:
    Name: mydeployment-tsn
    Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
    DynamicEndPointSnitch: enabled
    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
    Schema versions:
        1c15f3b1-3374-3597-bc45-a473179eab28: [1.2.3.4, 1.2.3.5, 1.2.3.6]
Stats for all nodes:
    Live: 3
    Joining: 0
    Moving: 0
    Leaving: 0
    Unreachable: 0

Data Centers:
    dc1 #Nodes: 3 #Down: 0

Database versions:
    4.1.3: [1.2.3.4:7000, 1.2.3.5:7000, 1.2.3.6:7000]

Keyspaces:
...

3.2 Resume Initconf in non-TSN nodes

Run ./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Started.

You should see an output similar to this, indicating that the non-TSN nodes are un the desired running state Started.

Connected to MDM at 10.0.0.192
Put desired state = Started for Instance mydeployment-mag-1
Put desired state = Started for Instance mydeployment-shcm-1
Put desired state = Started for Instance mydeployment-mmt-gsm-1
Put desired state = Started for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Started",
    "mydeployment-shcm-1": "Started",
    "mydeployment-mmt-gsm-1": "Started",
    "mydeployment-smo-1": "Started"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

3.3 Enable scheduled tasks

Run ./rvtconfig leave-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>. This will allow scheduled tasks to run on the VMs again. The output should look like this:

Maintenance window has been terminated.
The VMs will resume running scheduled tasks as per their configured schedules.

3.4 Run verification tests

If you have prepared verification tests for the deployment, run these now.

4. Post-acceptance

The upgrade of the TSN nodes is now complete.

5. Backout Method of Procedure

First, gather the log history of the downlevel VMs. Run mkdir -p /home/admin/rvt-log-history and ./rvtconfig export-log-history -c <CDS address> <CDS auth args> -d <deployment ID> --zip-destination-dir /home/admin/rvt-log-history --secrets-private-key-id <secret ID>. The secret ID you specify for --secrets-private-key-id should be the secret ID for the secrets private key (the one used to encrypt sensitive fields in CDS). You can find this in the product-options section of each VNFC in the SDF.

Next, the cassandra switch procedure is not performed with csar update and it does not involve upgrading the TSN VMs. How much of the backout procedure to run depends on how much progress was made with the switch. In general it is not possible to switch back to the cassandra version 3.11.13 but depending on the failed scenario, a redeploy may be possible.

The table below shows which procedure must be followed depending on how much progress was made

Scenario Rollback Procedure

Scenario	Rollback Procedure
Switch failed for first node	Redeploy with custom option `cassandra_version_3_11`
Switch failed for Nth node	Redeploy without custom option `cassandra_version_3_11` and continue the switch
Switch failed after completion	A whole redeploy is needed

Switch failed for first node

Redeploy with custom option cassandra_version_3_11

Switch failed for Nth node

Redeploy without custom option cassandra_version_3_11 and continue the switch

Switch failed after completion

A whole redeploy is needed

If you encounter further failures during recovery or rollback, contact your Customer Care Representative to investigate and recover the deployment.

5.1 Collect diagnostics

We recommend gathering diagnostic archives for all TSN VMs in the deployment.

On the SIMPL VM, run the command ./rvtconfig gather-diags --sdf /home/admin/uplevel-config/sdf-rvt.yaml -t tsn --ssh-key-secret-id <SSH key secret ID> --output-dir <diags-bundle>.

If <diags-bundle> does not exist, the command will create the directory for you.

5.2 Pause Initconf in non-TSN nodes

Set the running state of initconf processes in non-TSN VMs to a paused state.

./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Stopped.

You should see an output similar to this, indicating that the initconf process of non-TSN nodes are in state Stopped.

Connected to MDM at 10.0.0.192
Put desired state = Stopped for Instance mydeployment-mag-1
Put desired state = Stopped for Instance mydeployment-shcm-1
Put desired state = Stopped for Instance mydeployment-mmt-gsm-1
Put desired state = Stopped for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Stopped",
    "mydeployment-shcm-1": "Stopped",
    "mydeployment-mmt-gsm-1": "Stopped",
    "mydeployment-smo-1": "Stopped"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

5.3 Take a CDS backup

Take a backup of the CDS database by issuing the command below.

./rvtconfig backup-cds --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --output-dir <backup-cds-bundle> --ssh-key-secret-id <SSH key secret ID> -c <CDS address> <CDS auth args>

The output should look like this:

Capturing cds_keyspace_schema
Capturing ramdisk_keyspace_schema
cleaning snapshot metaswitch_tas_deployment_snapshot
...
...
...
running nodetool snapshot command
Requested creating snapshot(s) for [metaswitch_tas_deployment_info] with snapshot name [metaswitch_tas_deployment_snapshot] and options {skipFlush=false}
...
...
...

Final CDS backup archive has been created at <backup-cds-bundle>/tsn_cassandra_backup_20230711095409.tar

5.4 Rollback VMs

Depending on the type of failure described choose one of the following ways to rollback your failed VMs and continue with the next section.

Switch failed for first node
Switch failed for Nth node
Switch failed after completed

Switch failed for first node

A redeploy is needed for the failed nodes, making sure the cassandra_version_3_11 is in the SDF file under product-options: → tsn: → custom-options.

      product-options:
        tsn:
          cds-addresses:
          - 172.18.1.10
          - 172.18.1.11
          - 172.18.1.12
          custom-options:
          - log-passwords
          - cassandra_version_3_11

The command csar update does not work in this case. Instead, csar redeploy must be used.

For the VM that failed, run csar redeploy --vm <failed VM name> --sdf /home/admin/downlevel-config/sdf-rvt.yaml.

If csar redeploy fails, contact your Customer Care Representative to start the recovery procedures (i.e: Recovering all nodes from a total TSN cluster failure).

If csar redeploy worked successfully, you can reattempt the Cassandra version upgrade procedure.

Switch failed for Nth node

A redeploy is needed for the Nth node failed, making sure the following custom option is not in the SDF file under product-options: → tsn: → custom-options.

      product-options:
        tsn:
          cds-addresses:
          - 172.18.1.10
          - 172.18.1.11
          - 172.18.1.12
          custom-options:
          - log-passwords

For the VM that failed, run csar redeploy --vm <failed VM name> --sdf /home/admin/uplevel-config/sdf-rvt.yaml.

If csar redeploy fails, contact your Customer Care Representative to start the recovery procedures (i.e: Recovering all nodes from a total TSN cluster failure).

If csar redeploy worked successfully, the new TSN VM should be started with Cassandra version 4.1.3 and the upgrade procedure to Cassandra 4.1.3 can continue.

Switch failed after completion

Once all TSN VMs have been switched to the new Cassandra 4.1.3, it is not possible to go back to Cassandra 3.11.13. If the nodes are malfunctioning, contact your Customer Care Representative to investigate the cause of the upgrade failure and determine which TSN VMs Recovery procedures may apply (i.e: Recovering all nodes from a total TSN cluster failure).

5.5 Delete uplevel CDS data

Run ./rvtconfig delete-node-type-version -c <CDS address> <CDS auth args> -t tsn --vm-version <uplevel version> -d <deployment ID> --site-id <site ID> --ssh-key-secret-id <SSH key secret ID> to remove data for the uplevel version from CDS.

Example output from the command:

The following versions will be deleted: 4.1-7-1.0.0
The following versions will be retained: {example-downlevel-version}
Do you wish to continue? Y/[N] Y

Check the versions are the correct way around, and then confirm this prompt to delete the uplevel data from CDS.

5.6 Cleanup after backout

Backout procedure

If desired, remove the uplevel CSAR. On the SIMPL VM, run csar remove tsn/<uplevel version>.
If desired, remove the uplevel config directories on the SIMPL VM with rm -rf /home/admin/uplevel-config. We recommend these files are kept in case the upgrade is attempted again at a later time.

5.7 Resume Initconf in non-TSN nodes

Run ./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Started.

You should see an output similar to this, indicating that the non-TSN nodes are un the desired running state Started.

Connected to MDM at 10.0.0.192
Put desired state = Started for Instance mydeployment-mag-1
Put desired state = Started for Instance mydeployment-shcm-1
Put desired state = Started for Instance mydeployment-mmt-gsm-1
Put desired state = Started for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Started",
    "mydeployment-shcm-1": "Started",
    "mydeployment-mmt-gsm-1": "Started",
    "mydeployment-smo-1": "Started"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

5.8 Enable scheduled tasks

Maintenance window has been terminated.
The VMs will resume running scheduled tasks as per their configured schedules.

5.9 Verify service is restored

Perform verification tests to ensure the deployment is functioning as expected.

If applicable, contact your Customer Care Representative to investigate the cause of the upgrade failure.

Before re-attempting the upgrade, ensure you have run the rvtconfig delete-node-type-version command, Attempting an upgrade while there is stale uplevel data in CDS can result in needing to completely redeploy one or more VMs.

You will also need to re-upload the uplevel configuration.

Previous page Next page

Cassandra version switch procedure for TSN nodes

Planning for the procedure

Important notes

Determine parameter values

Tools and access

1. Preparation for upgrade procedure

1.1 Ensure the SIMPL version is at least 6.13.3

1.2 Prepare downlevel config directory

1.3 Prepare uplevel config directory including an SDF

1.4 Update SDF

1.5 Reserve maintenance period

2. Upgrade procedure

2.1 Check Cassandra Status

2.2 Collect diagnostics

2.3 Disable scheduled tasks

2.4 Pause Initconf in non-TSN nodes

2.5 Take a CDS backup

2.6 Begin the upgrade

Upgrade a single TSN node

Check status of switched TSN Node(s)

Continue with the remaining nodes

Check all nodes are running cassandra 4.1.3

Upgrade sstables for all TSN nodes

2.7 Run basic validation tests

3. Post-upgrade procedure

3.1 Check Cassandra version and status

3.2 Resume Initconf in non-TSN nodes

3.3 Enable scheduled tasks

3.4 Run verification tests

4. Post-acceptance

5. Backout Method of Procedure

5.1 Collect diagnostics

5.2 Pause Initconf in non-TSN nodes

5.3 Take a CDS backup

5.4 Rollback VMs

Switch failed for first node

Switch failed for Nth node

Switch failed after completion

5.5 Delete uplevel CDS data

5.6 Cleanup after backout

Backout procedure

5.7 Resume Initconf in non-TSN nodes

5.8 Enable scheduled tasks

5.9 Verify service is restored

Upgrade `sstables` for all TSN nodes