Rhino VoLTE TAS VMs 4.1 :: RVT VM Install Guide (CDMA) :: Major upgrade from 4.0.0 of TSN nodes

Table of Contents

Planning for the procedure
1. Preparation for upgrade procedure
2. Upgrade procedure
3. Post-upgrade procedure
4. Post-acceptance
5. Backout Method of Procedure

This page explains how to do a major upgrade from version 4.0.0 of the TSN nodes.

The page is self-sufficient, that is, if you save or print this page, you have all the required information and instructions for upgrading TSN nodes. However, before starting the procedure, make sure you are familiar with the operation of Rhino VoLTE TAS nodes, this procedure, and the use of the SIMPL VM.

There are links in various places below to other parts of this book, which provide more detail about certain aspects of solution setup and configuration.
You can find more information about SIMPL VM commands in the SIMPL VM Documentation.
You can find more information on rvtconfig commands on the rvtconfig page.

Once the TSN VMs have been upgraded from version 4.0.0 to 4.1 it is possible to perform a Cassandra upgrade from version 3.11.13 to 4.1.3 following the procedure Cassandra version switch procedure for TSN nodes.

Planning for the procedure

This procedure assumes that:

You are familiar with UNIX operating system basics, such as the use of vi and command-line tools like scp.
You are upgrading TSN VMs from version 4.0.0 to 4.1.
You have completed the steps in Prepare for the upgrade.
You have deployed a SIMPL VM, version 6.13.3 or later. Output shown on this page is correct for version 6.13.3 of the SIMPL VM; it may differ slightly on later versions.

Check you are using a supported VNFI version:

Platform	Supported versions
OpenStack	Newton to Wallaby
VMware vSphere	6.7 and 7.0

Platform

Supported versions

OpenStack

Newton to Wallaby

VMware vSphere

6.7 and 7.0

Important notes

Do not use these instructions for target versions whose major version component differs from 4.1.

If you have fewer than 3 TSN nodes (for example, a lab deployment of a single TSN node), upgrading them is not possible. This is because Cassandra data (such as VM configuration data and state, and registration data) will be lost if there are fewer than two nodes active at any one time.

To upgrade one or two TSN nodes, you need to destroy and recreate all VMs of all node types in the entire site.

To upgrade the TSN nodes, it is required that:

The uplevel version of TSN to upgrade to is 4.1-5-1.0.0 or higher.
All other non-TSN nodes are already upgraded to 4.1-5-1.0.0 or higher.

Determine parameter values

In the below steps, replace parameters marked with angle brackets (such as <deployment ID>) with values as follows. (Replace the angle brackets as well, so that they are not included in the final command to be run.)

<deployment ID>: The deployment ID. You can find this at the top of the SDF. On this page, the example deployment ID mydeployment is used.
<site ID>: A number for the site in the form DC1 through DC32. You can find this at the top of the SDF.
<site name>: The name of the site. You can find this at the top of the SDF.
<MW duration in hours>: The duration of the reserved maintenance period in hours.
<CDS address>: The management IP address of the first TSN node.
<SIMPL VM IP address>: The management IP address of the SIMPL VM.
<CDS auth args> (authentication arguments): If your CDS has Cassandra authentication enabled, replace this with the parameters -u <username> -k <secret ID> to specify the configured Cassandra username and the secret ID of a secret containing the password for that Cassandra user. For example, ./rvtconfig -c 1.2.3.4 -u cassandra-user -k cassandra-password-secret-id ….

If your CDS is not using Cassandra authentication, omit these arguments.
<service group name>: The name of the service group (also known as a VNFC - a collection of VMs of the same type), which for Rhino VoLTE TAS nodes will consist of all TSN VMs in the site. This can be found in the SDF by identifying the TSN VNFC and looking for its name field.
<downlevel version>: The current version of the VMs. On this page, the example version 4.0.0-14-1.0.0 is used.
<uplevel version>: The version of the VMs you are upgrading to. On this page, the example version 4.1-7-1.0.0 is used.

Tools and access

You must have the SSH keys required to access the SIMPL VM and the TSN VMs that are to be upgraded.

The SIMPL VM must have the right permissions on the VNFI. Refer to the SIMPL VM documentation for more information:

When starting an SSH session to the SIMPL VM, use a keepalive of 30 seconds. This prevents the session from timing out - SIMPL VM automatically closes idle connections after a few minutes.

When using OpenSSH (the SSH client on most Linux distributions), this can be controlled with the option ServerAliveInterval - for example, ssh -i <SSH private key file for SIMPL VM> -o ServerAliveInterval=30 admin@<SIMPL VM IP address>.

rvtconfig is a command-line tool for configuring and managing Rhino VoLTE TAS VMs. All TSN CSARs include this tool; once the CSAR is unpacked, you can find rvtconfig in the resources directory, for example:

$ cdcsars
$ cd tsn/<uplevel version>
$ cd resources
$ ls rvtconfig
rvtconfig

The rest of this page assumes that you are running rvtconfig from the directory in which it resides, so that it can be invoked as ./rvtconfig. It assumes you use the uplevel version of rvtconfig, unless instructed otherwise. If it is explicitly specified you must use the downlevel version, you can find it here:

$ cdcsars
$ cd tsn/<downlevel version>
$ cd resources
$ ls rvtconfig
rvtconfig

1. Preparation for upgrade procedure

These steps can be carried out in advance of the upgrade maintenance window. They should take less than 30 minutes to complete.

1.1 Ensure the SIMPL version is at least 6.13.3

Log into the SIMPL VM and run the command simpl-version. The SIMPL VM version is displayed at the top of the output:

SIMPL VM, version 6.13.3

Ensure this is at least 6.13.3. If not, contact your Customer Care Representative to organise upgrading the SIMPL VM before proceeding with the upgrade of the TSN VMs.

Output shown on this page is correct for version 6.13.3 of the SIMPL VM; it may differ slightly on later versions.

1.2 Verify the downlevel CSAR is present

On the SIMPL VM, run csar list.

Each listed CSAR will be of the form <node type>/<version>, for example, tsn/4.0.0-14-1.0.0.

Ensure that there is a TSN CSAR listed there with the current downlevel version.

If the downlevel CSAR is not present, return to the pre-upgrade steps.

1.3 Reserve maintenance period

The upgrade procedure requires a maintenance period. For upgrading nodes in a live network, implement measures to mitigate any unforeseen events.

Ensure you reserve enough time for the maintenance period, which must include the time for a potential rollback.

To calculate the time required for the actual upgrade or roll back of the VMs, run rvtconfig calculate-maintenance-window -i /home/admin/uplevel-config -t tsn --site-id <site ID> --sequential. The output will be similar to the following, stating how long it will take to do an upgrade or rollback of the TSN VMs.

Nodes will be upgraded sequentially

-----

Estimated time for a full upgrade of 3 VMs: 24 minutes
Estimated time for a full rollback of 3 VMs: 24 minutes

-----

Your maintenance window must include time for:

The preparation steps. Allow 45 minutes.
The upgrade of the VMs, as calculated above.
The rollback of the VMs, as calculated above.
Post-upgrade or rollback steps. Allow 15 minutes, plus time for any prepared verification tests.

In the example above, this would be 108 minutes.

These numbers are a conservative best-effort estimate. Various factors, including IMS load levels, VNFI hardware configuration, VNFI load levels, and network congestion can all contribute to longer upgrade times.

These numbers only cover the time spent actually running the upgrade on SIMPL VM. You must add sufficient overhead for setting up the maintenance window, checking alarms, running validation tests, and so on.

The time required for an upgrade or rollback can also be manually calculated.

For node types that are upgraded sequentially, like this node type, calculate the upgrade time by using the number of nodes. The first node takes 30 minutes, while later nodes take 30 minutes each.

You must also reserve time for:

The SIMPL VM to upload the image to the VNFI. Allow 2 minutes, unless the connectivity between SIMPL and the VNFI is particularly slow.
Any validation testing needed to determine whether the upgrade succeeded.

2. Upgrade procedure

2.1 Verify downlevel config has no changes

Skip this step if the downlevel version is 4.0.0-27-1.0.0 or below.

Run rm -rf /home/admin/config-output on the SIMPL VM to remove that directory if it already exists. Using rvtconfig from the downlevel CSAR, run ./rvtconfig compare-config -c <CDS address> -d <deployment ID> --input /home/admin/current-config --vm-version <downlevel version> --output-dir /home/admin/config-output -t tsn to compare the live configuration to the configuration in the /home/admin/current-config directory.

Example output is listed below:

Validating node type against the schema: tsn
Redacting secrets…
Comparing live config for (version=4.0.0-14-1.0.0, deployment=mydeployment, group=RVT-tsn.DC1) with local directory (version=4.1-7-1.0.0, deployment=mydeployment, group=RVT-tsn.DC1)
Getting per-level configuration for version '4.0.0-14-1.0.0', deployment 'mydeployment', and group 'RVT-tsn.DC1'
  - Found config with hash 7f6cc1f3df35b43d6286f19c252311e09216e6115f314d0cb9cc3f3a24814395

Wrote currently uploaded configuration to /tmp/tmprh2uavbh
Redacting secrets…
Redacting SDF…
No differences found in yaml files
Uploading this will have no effect unless secrets, certificates or licenses have changed, or --reload-resource-adaptors is specified

There should be no differences found, as the configuration in current-config should match the live configuration. If any differences are found, abort the upgrade process.

2.2 Disable Cassandra repairs

Disable the scheduled Cassandra repairs to minimise the possibility that they will interact with this procedure.

Complete the following procedure for each TSN node.

Establish an SSH session to the management IP of the node. Then run systemctl list-timers. This should give output of this form:

NEXT                          LEFT     LAST                          PASSED    UNIT                         ACTIVATES
Sat 2023-01-14 02:00:00 NZDT  12h left Fri 2023-01-13 02:00:00 NZDT  12h ago   cassandra-repair-daily.timer cassandra-repair-daily.service
Sat 2023-01-14 13:00:00 NZDT  23h left Fri 2023-01-13 13:00:00 NZDT  1h ago    systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service

If there is no line with UNIT as cassandra-repair-daily.timer, and also no line with UNIT as cassandra-repair-weekly.timer, move on to the next node. Otherwise, run the following commands:

sudo systemctl disable cassandra-repair-daily.timer
sudo systemctl stop cassandra-repair-daily.timer
sudo systemctl disable cassandra-repair-weekly.timer
sudo systemctl stop cassandra-repair-weekly.timer
systemctl list-timers

Depending on your version, you will either have cassandra-repair-daily.timer or cassandra-repair-weekly.timer. Therefore, exactly two of the commands will fail. This is expected.

This should give output of this form:

NEXT                          LEFT     LAST                          PASSED    UNIT                         ACTIVATES
Sat 2023-01-14 13:00:00 NZDT  23h left Fri 2023-01-13 13:00:00 NZDT  1h ago    systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service

You should no longer see an entry for cassandra-repair-daily.timer or cassandra-repair-weekly.timer.

Prepare for Maintenance Window

Only perform this step if this is the first, or only, node type being upgraded.

First, establish an SSH session to the management IP of the first TSN node. Type cqlsh to enter the cassandra shell and execute the following cql statement:

  CREATE TABLE IF NOT EXISTS
  metaswitch_tas_deployment_info.maintenance_window (
       deployment_id text, site_id text, end_timestamp int,
      PRIMARY KEY (deployment_id, site_id)
  );

Run ./rvtconfig enter-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID> --hours <MW duration in hours>. The output will look similar to:

Maintenance window is now active until 04 Nov 2022 21:38:06 NZDT.
Use the leave-maintenance-window command once maintenance is complete.

This will prevent scheduled tasks running once the new VMs are started until the time given in the output.

If at any point in the upgrade process you wish to confirm the end time of the maintenance window, you can run ./rvtconfig maintenance-window-status -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>.

2.3 Validate configuration

Run the command ./rvtconfig validate -t tsn -i /home/admin/uplevel-config to check that the configuration files are correctly formatted, contain valid values, and are self-consistent. Ensure you use the uplevel version of rvtconfig. A successful validation with no errors or warnings produces the following output.

Validating node type against the schema: tsn
YAML for node type(s) ['tsn'] validates against the schema

If the output contains validation errors, fix the configuration in the /home/admin/uplevel-config directory and go back to the Update the configuration files for RVT 4.1 step.

If the output contains validation warnings, consider whether you wish to address them before performing the upgrade. The VMs will accept configuration that has validation warnings, but certain functions may not work.

2.4 Upload configuration

Upload the configuration to CDS:

./rvtconfig upload-config -c <CDS address> <CDS auth args> -t tsn -i /home/admin/uplevel-config --vm-version <uplevel version>

Check that the output confirms that configuration exists in CDS for both the current (downlevel) version and the uplevel version:

Validating node type against the schema: tsn
Preparing configuration for node type tsn…
Checking differences between uploaded configuration and provided files
Getting per-level configuration for version '4.1-7-1.0.0', deployment 'mydeployment-tsn', and group 'RVT-tsn.DC1'
  - No configuration found
No uploaded configuration was found: this appears to be a new install or upgrade
Encrypting secrets…
Wrote config for version '4.1-7-1.0.0', deployment ID 'mydeployment', and group ID 'RVT-tsn.DC1'
Versions in group RVT-tsn.DC1
=============================
  - Version: 4.0.0-14-1.0.0
    Config hash: 7f6cc1f3df35b43d6286f19c252311e09216e6115f314d0cb9cc3f3a24814395
    Active: mydeployment-tsn-1, mydeployment-tsn-2, mydeployment-tsn-3
    Leader seed: 

  - Version: 4.1-7-1.0.0
    Config hash: f790cc96688452fdf871d4f743b927ce8c30a70e3ccb9e63773fc05c97c1d6ea
    Active: None
    Leader seed:

2.5 Verify the TSN clusters are healthy

First, establish an SSH session to the management IP of the first TSN node. To check that the primary Cassandra cluster is healthy, run nodetool status on the TSN node:

Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  1.2.3.4        678.58 KiB  256          ?       f81bc71d-4ba3-4400-bed5-77f317105cce  rack1
UN  1.2.3.5        935.66 KiB  256          ?       aa134a07-ef93-4e09-8631-0e438a341e57  rack1
UN  1.2.3.6        958.34 KiB  256          ?       8ce540ea-8b52-433f-9464-1581d32a99bc  rack1

Check that all TSN nodes are present and listed as UN (Up and Normal). The output in the Owns colomn may differ and is irrelevant.

Next, check that the ramdisk-based Cassandra cluster is healthy. Run nodetool status -p 17199 on the TSN node. Again, check that all TSN nodes are present and listed as UN.

If either the primary or ramdisk-based Cassandra cluster is not healthy (i.e. not all TSN nodes show up as UN in the output from nodetool status and nodetool status -p 17199), stop the upgrade process here and troubleshoot the node. Only continue after both the Cassandra clusters are healthy.

2.6 Apply TSN efix for rollback

If the TSN is currently on version 4.0.0-35-1.0.0 or later, skip this step.

Once the TSN upgrade has been commenced, it cannot be rolled back to the original TSN version. Instead, it needs to be rolled back to a patched version of the downlevel CSAR.

You will have been provided with an efix patch by your Customer Care Representative, in the form of a .tar.gz file. Use scp to transfer this file to /csar-volume/csar/ on the SIMPL VM. Apply it to the downlevel CSAR by running csar efix tsn/<downlevel version> <patch file>, for example, csar efix tsn/4.0.0-14-1.0.0 /csar-volume/csar/4.0.0-14-1.0.0-efix-from-41-rollback.tar.gz. This takes about five minutes to complete.

Check the output of the patching process states that SIMPL VM successfully created a patch. Example output for version 4.0.0-14-1.0.0 and a vSphere deployment is:

Applying efix to tsn/4.0.0-14-1.0.0
Patching tsn-4.0.0-14-1.0.0-vsphere-from-41-rollback.ova,  this may take several minutes
Updating manifest
Successfully created tsn/4.0.0-14-1.0.0-from-41-rollback

You can verify that a patched CSAR now exists by running csar list again - you should see a CSAR named tsn/<downlevel version>-from-41-rollback (for the above example that would be tsn/4.0.0-14-1.0.0-from-41-rollback).

2.7 Apply TSN in-place patch for upgrade from 4.0.0

If the TSN is currently on version 4.0.0-35-1.0.0 or later, skip this step.

To prepare for the upgrade, the 4.0.0 TSN VMs need to be patched in-place.

Run /home/admin/.local/share/csar/tsn/<uplevel version>/resources/prepare-for-40-tsn-upgrade --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --ssh-key-secret-id rvt-simpl-private-key-id prepare-for-upgrade.

This will update the TSN VMs one by one. For each VM, you will see output similar to the output below:

Setting up a connection to 1.2.3.4
Connected (version 2.0, client OpenSSH_7.4)
Auth banner: b'WARNING: Access to this system is for authorized users only.\n'
Authentication (publickey) successful!
Preparing VM mydeployment-tsn-1 for upgrade from 4.0 to 4.1
Stopping initconf
Taking backup
Applying patch
[chan 9] Opened sftp connection (server version 3)
Starting initconf
Waiting for initconf to converge
Initconf has converged

When the message

Completed successfully

is printed, the process is complete on all VMs.

2.8 Collect diagnostics

We recommend gathering diagnostic archives for all TSN VMs in the deployment.

On the SIMPL VM, run the command ./rvtconfig gather-diags --sdf /home/admin/uplevel-config/sdf-rvt.yaml -t tsn --ssh-key-secret-id <SSH key secret ID> --ssh-username sentinel --output-dir <diags-bundle>.

If <diags-bundle> does not exist, the command will create the directory for you.

Each diagnostic archive can be up to 200 MB per VM. Ensure you have enough disk space on the SIMPL VM to collect all diagnostics. The command will be aborted if the SIMPL VM does not have enough disk space to collect all diagnostic archives from all the VMs in your deployment specified in the provided SDF.

You are using the uplevel rvtconfig and uplevel SDF to gather diagnostics for downlevel VMs. This is intentional, as the uplevel rvtconfig features the gather-diags command that only works with the uplevel SDF due to schema changes.

2.9 Pause Initconf in non-TSN nodes

Set the running state of initconf processes in non-TSN VMs to a paused state.

./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Stopped.

You should see an output similar to this, indicating that the initconf process of non-TSN nodes are in state Stopped.

Connected to MDM at 10.0.0.192
Put desired state = Stopped for Instance mydeployment-mag-1
Put desired state = Stopped for Instance mydeployment-shcm-1
Put desired state = Stopped for Instance mydeployment-mmt-gsm-1
Put desired state = Stopped for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Stopped",
    "mydeployment-shcm-1": "Stopped",
    "mydeployment-mmt-gsm-1": "Stopped",
    "mydeployment-smo-1": "Stopped"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

2.10 Take a CDS backup

Take a backup of the CDS database by issuing the command below.

./rvtconfig backup-cds --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --output-dir <backup-cds-bundle> --ssh-key-secret-id <SSH key secret ID> -c <CDS address> <CDS auth args>

The output should look like this:

Capturing cds_keyspace_schema
Capturing ramdisk_keyspace_schema
cleaning snapshot metaswitch_tas_deployment_snapshot
...
...
...
running nodetool snapshot command
Requested creating snapshot(s) for [metaswitch_tas_deployment_info] with snapshot name [metaswitch_tas_deployment_snapshot] and options {skipFlush=false}
...
...
...

Final CDS backup archive has been created at <backup-cds-bundle>/tsn_cassandra_backup_20230711095409.tar

If the command ended successfully, you can continue with the procedure. If it failed, do not continue the procedure without a CDS backup and contact your Customer Care Representative to investigate the issue.

2.11 Increase replication factor

The 4.1 VMs have increased the replication factor for CDS tables to be more resilient to TSN failures when there are 5 or more TSNs deployed. The 4.0 VMs always had a replication factor of 3 for CDS tables. The replication factor is not automatically updated during the upgrade.

Perform this step only if the number of TSN nodes deployed is 5 or more.

SSH to any of the TSN nodes and run the cqlsh command
```
[sentinel@my-tsn-1 ~]$ cqlsh
```

Update replication factor to 5 by running:

cqlsh> ALTER KEYSPACE "metaswitch_tas_deployment_info" with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : '5' };

Check that it has been updated:

cqlsh> DESCRIBE metaswitch_tas_deployment_info;

Note: this will output lots of information about the CDS tables but only the first line needs to be checked. The expected output is below:

CREATE KEYSPACE metaswitch_tas_deployment_info WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '5'}  AND durable_writes = true;
<snipped for brevity>

Exit out of cqlsh.
```
cqslh> exit
```

Run the repair tool to apply the replication factor update on ALL of the TSNs. This must be done one TSN at a time: DO NOT run this in parallel.

[sentinel@my-tsn-1 ~]$ nodetool repair -full metaswitch_tas_deployment_info

An example of the expected output is below.

[sentinel@my-tsn-1 ~]$ nodetool repair -full metaswitch_tas_deployment_info
[2022-03-26 17:52:27,277] Starting repair command #1 .... <snipped for brevity>
[2022-03-26 17:52:30,617] Repair session .... <snipped for brevity>
[2022-03-26 17:52:30,644] Repair completed successfully
[2022-03-26 17:52:30,646] Repair command #1 finished in 3 seconds

2.12 Begin the upgrade

Carry out a csar import of the tsn VMs

Prepare for the upgrade by running the following command on the SIMPL VM csar import --vnf tsn --sdf /home/admin/uplevel-config/sdf-rvt.yaml to import terraform templates.

First, SIMPL VM connects to your VNFI to check the credentials specified in the SDF and QSG are correct. If this is successful, it displays the message All validation checks passed..

Next, SIMPL VM compares the SDF you provided with the one that was previously used to deploy or upgrade the TSN nodes. Since the SDF was updated significantly in the pre-upgrade steps, you should see the following prompt (details may vary):

The following changes have been made to the SDF since it was used to deploy/update tsn
(Note: if only a subset of VMs were deployed or updated previously, then this diff won't fully reflect the changes that will be made to the other VMs in the deployment)

Ensure all changes are as expected. In particular, ensure the following change matches the changes to the tsn VNFC:

-          secrets-private-key: abcde
+          primary-user-password-id: rvt-primary-user-password
+          secrets-private-key-id: rvt-secrets-private-key
       type: tsn
-      version: 4.0.0-14-1.0.0
+      version: 4.1-7-1.0.0
       vim-configuration:
         vsphere:

Do you want to continue? [yes/no]:

If the differences are not as you expect, then:

Type no. The csar import will be aborted.
Investigate why there are unexpected changes in the SDF.
Correct the SDF as necessary.
Retry this step.

Otherwise, accept the prompt by typing yes.

After you do this, SIMPL VM will import the terraform state. If successful, it outputs this message:

Done. Imported all VNFs.

If the output does not look like this, investigate and resolve the underlying cause, then re-run the import command again until it shows the expected output.

Begin the upgrade of the tsn VMs

Determine the csar update command

To upgrade the VMs, determine the correct csar update command. The required command is as follows:

SKIP_MW_CHECK=1 csar update --skip force-in-series-update-with-l3-permission --vnf tsn --sdf /home/admin/uplevel-config/sdf-rvt.yaml --use-target-version-csar-info.

Execute the csar update command

Execute the csar update command on the SIMPL VM.

First, SIMPL VM connects to your VNFI to check the credentials specified in the SDF and QSG are correct. If this is successful, it displays the message All validation checks passed..

Next, SIMPL VM compares the specified SDF with the SDF used for the csar import command above. Since the contents have not changed since you ran the csar import, the output should indicate that the SDF has not changed.

If there are differences in the SDF, a message similar to this will be output:

Comparing current SDF with previously used SDF.
site site1:
    tsn:
        tsn-1:
             networks:
             - ip-addresses:
                 ip:
            -    - 10.244.21.106
            +    - 10.244.21.196
                 - 10.244.21.107
               name: Management
               subnet: mgmt-subnet
Do you want to continue? [yes/no]: yes

If you see this, you must:

Type no. The upgrade will be aborted.
Go back to the start of the upgrade section and run through the csar import section again, until the SDF differences are resolved.
Retry this step.

Next, you will see the following prompt:

You have chosen to force csar update to proceed in series. This option will slow down update for VNFs which support update in parallel and MUST only be used with L3 permission.
Please discuss this option with your support representative if you have not already done so.
Type 'l3permission' to continue

Type l3permission to continue.

Afterwards, the SIMPL VM displays the VMs that will be upgraded:

You are about to update VMs as follows:

- VNF tsn:
    - For site site1:
    - update all VMs in VNFC service group mydeployment-tsn/4.1-7-1.0.0:
        - mydeployment-tsn-1 (index 0)
        - mydeployment-tsn-2 (index 1)
        - mydeployment-tsn-3 (index 2)

Type 'yes' to continue, or run 'csar update --help' for more information.

Continue? [yes/no]:

Check this output displays the version you expect (the uplevel version) and exactly the set of VMs that you expect to be upgraded. If anything looks incorrect, type no to abort the upgrade process, and recheck the VMs listed and the version field in /home/admin/uplevel-config/sdf-rvt.yaml. Also check you are passing the correct SDF path and --vnf argument to the csar update command.

Otherwise, accept the prompt by typing yes.

Next, each VM in your cluster will perform health checks. If successful, the output will look similar to this.

Running ansible scripts in '/home/admin/.local/share/csar/tsn/4.1-1-1.0.0/update_healthcheck_scripts' for node 'mydeployment-tsn-1'
Running script: check_config_uploaded…
Running script: check_ping_management_ip…
Running script: check_maintenance_window…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Running script: check_rhino_alarms…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-05-02-05-51.log
All ansible update healthchecks have passed successfully

If a script fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log.

Running ansible scripts in '/home/admin/.local/share/csar/tsn/4.1-1-1.0.0/update_healthcheck_scripts' for node 'mydeployment-tsn-1'
Running script: check_config_uploaded...
Running script: check_ping_management_ip...
Running script: check_maintenance_window...
Running script: check_can_sudo...
Running script: check_converged...
Running script: check_liveness...
ERROR: Script failed. Specific error lines from the ansible output will be logged to screen. For more details see the ansible_output file (/var/log/csar/ansible_output-2023-01-05-21-02-17.log). This file has only ansible output, unlike the main command log file.

fatal: [mydeployment-tsn-1]: FAILED! => {"ansible_facts": {"liveness_report": {"cassandra": true, "cassandra_ramdisk": true, "cassandra_repair_timer": true, "cdsreport": true, "cleanup_sbbs_activities": false, "config_hash_report": true, "docker": true, "initconf": true, "linkerd": true, "mdm_state_and_status_ok": true, "mdmreport": true, "nginx": true, "no_ocss7_alarms": true, "ocss7": true, "postgres": true, "rem": true, "restart_rhino": true, "rhino": true}}, "attempts": 1, "changed": false, "msg": "The following liveness checks failed: ['cleanup_sbbs_activities']", "supports_liveness_checks": true}
Running script: check_rhino_alarms...
Detailed output can be found in /var/log/csar/ansible_output-2023-01-05-21-02-17.log
***Some tests failed for CSAR 'tsn/4.1-1-1.0.0' - see output above***

The msg field under each ansible task explains why the script failed.

If there are failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.

Make sure the cassandra_version_3_11 was added to the SDF as described here: Make product-specific changes to the SDF for RVT 4.1. If the option was not added:

The first upgraded VM will start with Cassandra version 4.1.3.
It will not join the cluster as the other nodes are running Cassandra 3.11.13.

You can recover from this situation by executing the command sequence described below:

Add the option to the SDF and upload config with ./rvtconfig upload-config -c <CDS address> <CDS auth args> -t tsn -i /home/admin/uplevel-config --vm-version <uplevel version>.
Run csar import --vnf tsn --sdf /home/admin/tsntests1/rvt41-tsn40-sdf/sdf-rvt.yaml.
For each failed VM, perform redeploy with csar redeploy --vm <failed-vm> --sdf /home/admin/uplevel-config/sdf-rvt.yaml.

Retry this step once all failures have been corrected by running the command SKIP_MW_CHECK=1 csar update … as described at the begining of this section.

Once the pre-upgrade health checks have been verified, SIMPL VM now proceeds to upgrade each of the VMs. Monitor the further output of csar update as the upgrade progresses, as described in the next step.

2.13 Monitor `csar update` output

For each VM:

The VM will be quiesced and destroyed.
SIMPL VM will create a replacement VM using the uplevel version.
The VM will automatically start applying configuration from the files you uploaded to CDS in the above steps.
Once configuration is complete, the VM will be ready for service. At this point, the csar update command will move on to the next TSN VM.

The output of the csar update command will look something like the following, repeated for each VM.

Decommissioning 'dc1-mydeployment-tsn-1' in MDM, passing desired version 'vm.version=4.1-7-1.0.0', with a 900 second timeout
dc1-mydeployment-tsn-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'decommissioned'
dc1-mydeployment-tsn-1: Current status 'in_progress'- desired status 'complete'
…
dc1-mydeployment-tsn-1: Current status 'complete', current state 'decommissioned' - desired status 'complete', desired state 'decommissioned'
Running update for VM group [0]
Performing health checks for service group mydeployment-tsn with a 1200 second timeout
Running MDM status health-check for dc1-mydeployment-tsn-1
dc1-mydeployment-tsn-1: Current status 'in_progress'- desired status 'complete'
…
dc1-mydeployment-tsn-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'

If you see this error:

Failed to retrieve instance summary for 'dc1-<VM hostname>' from MDM
(404)
Reason: Not Found

it can be safely ignored, provided that you do eventually see a Current status 'in_progress'… line. This error is caused by the newly-created VM taking a few seconds to register itself with MDM when it boots up.

Once all VMs have been upgraded, you should see this success message, detailing all the VMs that were upgraded and the version they are now running, which should be the uplevel version.

Successful VNF with full per-VNFC upgrade state:

VNF: tsn
VNFC: mydeployment-tsn
    - Node name: mydeployment-tsn-1
      - Version: 4.1-7-1.0.0
      - Build Date: 2022-11-21T22:58:24+00:00
    - Node name: mydeployment-tsn-2
      - Version: 4.1-7-1.0.0
      - Build Date: 2022-11-21T22:58:24+00:00
    - Node name: mydeployment-tsn-3
     - Version: 4.1-7-1.0.0
      - Build Date: 2022-11-21T22:58:24+00:00

If the upgrade fails, you will see Failed VNF instead of Successful VNF in the above output. There will also be more details of what went wrong printed before that. Refer to the Backout procedure below.

2.14 Run basic validation tests

Run csar validate --vnf tsn --sdf /home/admin/uplevel-config/sdf-rvt.yaml to perform some basic validation tests against the uplevel nodes.

This command first performs a check that the nodes are connected to MDM and reporting that they have successfully applied the uplevel configuration:

========================
Performing healthchecks
========================
Commencing healthcheck of VNF 'tsn'
Performing health checks for service group mydeployment-tsn with a 0 second timeout
Running MDM status health-check for dc1-mydeployment-tsn-1
dc1-mydeployment-tsn-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-tsn-2
dc1-mydeployment-tsn-2: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-tsn-3
dc1-mydeployment-tsn-3: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'

After that, it performs various checks on the health of the VMs' networking and services:

================================
Running validation test scripts
================================
Running validation tests in CSAR 'tsn/4.1-7-1.0.0'
Test running for: mydeployment-tsn-1
Running script: check_ping_management_ip…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-21-51.log

If all is well, then you should see the message All tests passed for CSAR 'tsn/<uplevel version>'!.

If the VM validation fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log.

Running validation test scripts
================================
Running validation tests in CSAR 'tsn/4.1-7-1.0.0'
Test running for: mydeployment-tsn-1
Running script: check_ping_management_ip...
Running script: check_can_sudo...
Running script: check_converged...
Running script: check_liveness...
ERROR: Script failed. Specific error lines from the ansible output will be logged to screen. For more details see the ansible_output file (/var/log/csar/ansible_output-2023-01-06-03-40-37.log). This file has only ansible output, unlike the main command log file.

fatal: [mydeployment-tsn-1]: FAILED! => {"ansible_facts": {"liveness_report": {"cassandra": true, "cassandra_ramdisk": true, "cassandra_repair_timer": true, "cdsreport": true, "cleanup_sbbs_activities": false, "config_hash_report": true, "docker": true, "initconf": true, "linkerd": true, "mdm_state_and_status_ok": true, "mdmreport": true, "nginx": true, "no_ocss7_alarms": true, "ocss7": true, "postgres": true, "rem": true, "restart_rhino": true, "rhino": true}}, "attempts": 1, "changed": false, "msg": "The following liveness checks failed: ['cleanup_sbbs_activities']", "supports_liveness_checks": true}
Running script: check_rhino_alarms...
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-40-37.log
***Some tests failed for CSAR 'tsn/4.1-7-1.0.0' - see output above***

----------------------------------------------------------


WARNING: Validation script tests failed for the following CSARs:
  - 'tsn/4.1-7-1.0.0'
See output above for full details

The msg field under each ansible task explains why the script failed.

If there are failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.

3. Post-upgrade procedure

3.1 Check Cassandra version and status

Verify the status of the cassandra clusters. First, check that the primary Cassandra cluster is healthy and in the correct version. Run ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <CDS Address> for every TSN node.

Next, check that the ramdisk-based Cassandra cluster is healthy and in the correct version. Run ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <CDS Address> --ramdisk for every TSN node.

For both Cassandra clusters, check the output and verify the running cassandra version is 3.11.13.

=====> Checking cluster status on node 1.2.3.4
Setting up a connection to 172.0.0.224
Connected (version 2.0, client OpenSSH_7.4)
Auth banner: b'WARNING: Access to this system is for authorized users only.\n'
Authentication (publickey) successful!
ReleaseVersion: 3.11.13
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load      Tokens  Owns (effective)  Host ID                               Rack
UN  1.2.3.4  1.59 MiB   256          100.0%            3381adf4-8277-4ade-90c7-eb27c9816258  rack1
UN  1.2.3.5  1.56 MiB   256          100.0%            3bb6f68f-0140-451f-90a9-f5881c3fc71e  rack1
UN  1.2.3.6  1.54 MiB   256          100.0%            dbafa670-a2d0-46a7-8ed8-9a5774212e4c  rack1

Cluster Information:
    Name: mydeployment-tsn
    Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
    DynamicEndPointSnitch: enabled
    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
    Schema versions:
        1c15f3b1-3374-3597-bc45-a473179eab28: [1.2.3.4, 1.2.3.5, 1.2.3.6]

3.2 Resume Initconf in non-TSN nodes

Run ./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Started.

You should see an output similar to this, indicating that the non-TSN nodes are un the desired running state Started.

Connected to MDM at 10.0.0.192
Put desired state = Started for Instance mydeployment-mag-1
Put desired state = Started for Instance mydeployment-shcm-1
Put desired state = Started for Instance mydeployment-mmt-gsm-1
Put desired state = Started for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Started",
    "mydeployment-shcm-1": "Started",
    "mydeployment-mmt-gsm-1": "Started",
    "mydeployment-smo-1": "Started"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

3.3 Enable Cassandra repairs

Since the upgrade has been successful, the scheduled Cassandra repairs will have been activated with the new VMs. Therefore, there is no need to recreate the systemd units.

Nevertheless, in Disable Cassandra repairs, the maintenance window mode was activated. To deactivate it, run ./rvtconfig leave-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>. This will allow scheduled tasks to run on the VMs again. The output should look like this:

Maintenance window has been terminated.
The VMs will resume running scheduled tasks as per their configured schedules.

3.4 Run verification tests

If you have prepared verification tests for the deployment, run these now.

4. Post-acceptance

The upgrade of the TSN nodes is now complete.

After you have been running with the TSN nodes at the uplevel version for a while, you may want to perform post-acceptance tasks.

5. Backout Method of Procedure

First, gather the log history of the downlevel VMs. Run mkdir -p /home/admin/rvt-log-history and ./rvtconfig export-log-history -c <CDS address> <CDS auth args> -d <deployment ID> --zip-destination-dir /home/admin/rvt-log-history --secrets-private-key-id <secret ID>. The secret ID you specify for --secrets-private-key-id should be the secret ID for the secrets private key (the one used to encrypt sensitive fields in CDS). You can find this in the product-options section of each VNFC in the SDF.

Make sure the <CDS address> used is one of the remaining available TSN nodes.

Next, how much of the backout procedure to run depends on how much progress was made with the upgrade. If you did not get to the point of running csar update, start from the Cleanup after backout section below.

If you ran csar update and it failed, the output will tell you which VMs failed to upgrade.

  Successfully updated tsn VMs with indexes: 0,1
  Not started updating tsn VMs with indexes: 3,4
  Failed whilst updating tsn VM with index: 2

Perform a rollback of all the VMs listed under "Successfully updated" and "Failed whilst updating".

If you encounter further failures during recovery or rollback, contact your Customer Care Representative to investigate and recover the deployment.

5.1 Collect diagnostics

We recommend gathering diagnostic archives for all TSN VMs in the deployment.

If <diags-bundle> does not exist, the command will create the directory for you.

5.2 Disable Cassandra repairs

Disable the scheduled Cassandra repairs to minimise the possibility that they will interact with this procedure.

Complete the following procedure for each TSN node.

Establish an SSH session to the management IP of the node. Then run systemctl list-timers. This should give output of this form:

NEXT                          LEFT     LAST                          PASSED    UNIT                         ACTIVATES
Sat 2023-01-14 02:00:00 NZDT  12h left Fri 2023-01-13 02:00:00 NZDT  12h ago   cassandra-repair-daily.timer cassandra-repair-daily.service
Sat 2023-01-14 13:00:00 NZDT  23h left Fri 2023-01-13 13:00:00 NZDT  1h ago    systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service

If there is no line with UNIT as cassandra-repair-daily.timer, and also no line with UNIT as cassandra-repair-weekly.timer, move on to the next node. Otherwise, run the following commands:

sudo systemctl disable cassandra-repair-daily.timer
sudo systemctl stop cassandra-repair-daily.timer
sudo systemctl disable cassandra-repair-weekly.timer
sudo systemctl stop cassandra-repair-weekly.timer
systemctl list-timers

Depending on your version, you will either have cassandra-repair-daily.timer or cassandra-repair-weekly.timer. Therefore, exactly two of the commands will fail. This is expected.

This should give output of this form:

NEXT                          LEFT     LAST                          PASSED    UNIT                         ACTIVATES
Sat 2023-01-14 13:00:00 NZDT  23h left Fri 2023-01-13 13:00:00 NZDT  1h ago    systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service

You should no longer see an entry for cassandra-repair-daily.timer or cassandra-repair-weekly.timer.

Prepare for Maintenance Window

Only perform this step if this is the first, or only, node type being upgraded.

First, establish an SSH session to the management IP of the first TSN node. Type cqlsh to enter the cassandra shell and execute the following cql statement:

  CREATE TABLE IF NOT EXISTS
  metaswitch_tas_deployment_info.maintenance_window (
       deployment_id text, site_id text, end_timestamp int,
      PRIMARY KEY (deployment_id, site_id)
  );

Run ./rvtconfig enter-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID> --hours <MW duration in hours>. The output will look similar to:

Maintenance window is now active until 04 Nov 2022 21:38:06 NZDT.
Use the leave-maintenance-window command once maintenance is complete.

This will prevent scheduled tasks running once the new VMs are started until the time given in the output.

5.3 Pause Initconf in non-TSN nodes

Set the running state of initconf processes in non-TSN VMs to a paused state.

./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Stopped.

You should see an output similar to this, indicating that the initconf process of non-TSN nodes are in state Stopped.

Connected to MDM at 10.0.0.192
Put desired state = Stopped for Instance mydeployment-mag-1
Put desired state = Stopped for Instance mydeployment-shcm-1
Put desired state = Stopped for Instance mydeployment-mmt-gsm-1
Put desired state = Stopped for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Stopped",
    "mydeployment-shcm-1": "Stopped",
    "mydeployment-mmt-gsm-1": "Stopped",
    "mydeployment-smo-1": "Stopped"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

5.4 Take a CDS backup

Take a backup of the CDS database by issuing the command below.

./rvtconfig backup-cds --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --output-dir <backup-cds-bundle> --ssh-key-secret-id <SSH key secret ID> -c <CDS address> <CDS auth args>

The output should look like this:

Capturing cds_keyspace_schema
Capturing ramdisk_keyspace_schema
cleaning snapshot metaswitch_tas_deployment_snapshot
...
...
...
running nodetool snapshot command
Requested creating snapshot(s) for [metaswitch_tas_deployment_info] with snapshot name [metaswitch_tas_deployment_snapshot] and options {skipFlush=false}
...
...
...

Final CDS backup archive has been created at <backup-cds-bundle>/tsn_cassandra_backup_20230711095409.tar

5.5 Roll back VMs

To roll back the VMs, the procedure is essentially to perform an "upgrade" back to the downlevel version, that is, with <downlevel version> and <uplevel version> swapped. You can refer to the Begin the upgrade section above for details on the prompts and output of csar update.

As described above, we roll back to a patched version of the downlevel CSAR. That is, we do not just swap <downlevel version> and <uplevel version>, but also replace <downlevel version> by <downlevel version>-from-41-rollback. In other words, we perform an upgrade from <uplevel version> to <downlevel version>from-41-rollback.

Please note that, unlike normal application of patches, there is no need to reupload configuration for the patched version, as this patch contains special handling to reuse the configuration for <downlevel version>.

Open the /home/admin/rvt-rollback-sdf/sdf-rvt.yaml file using vi. Find the vnfcs section, and within that the TSN VNFC. Within the VNFC, locate the version field and change it to <downlevel version>-from-41-rollback. Save and close the file.

Run SKIP_MW_CHECK=1 csar update --skip pre-update-checks,force-in-series-update-with-l3-permission --vnf tsn --sdf /home/admin/rvt-rollback-sdf/sdf-rvt.yaml --sites <site name> --service-group <service group name> --index-range <index range>.

Once the csar update command completes successfully, proceed with the next steps below.

The <index range> argument is a comma-separated list of VM indices, where the first VM has index 0. Only include the VMs you want to roll back. For example, suppose there are three TSN VMs named tsn-1, tsn-2 and tsn-3. If VMs tsn-1 and tsn-3 need to be rolled back, the index range is 0,2. Do not include any spaces in the index range.

Contiguous ranges can be expressed with a hyphen (-). For example, 1,2,3,4 can be abbreviated to 1-4.

If you want to roll back just one node, use --index-range 0 (or whichever index).

If you want to roll back all nodes, omit the --index-range argument completely.

The --index-range argument requires that a single site, service group and VNF are specified with --sites, --service-group and --vnf arguments.

If csar update fails, check the output for which VMs failed. For each VM that failed, run csar redeploy --vm <failed VM name> --sdf /home/admin/current-config/sdf-rvt.yaml.

If csar redeploy fails, contact your Customer Care Representative to start the recovery procedures.

If all the csar redeploy commands were successful, then run the previously used csar update command on the VMs that were neither rolled back nor redeployed yet.

To help you determine which VMs were neither rolled back nor redeployed yet, connect via SSH to each VM and run cat /etc/msw-release and keep note of the version field.

5.6 Decrease replication factor

Perform this step only if the number of TSN nodes deployed is 5 or more.

SSH to any of the TSN nodes and run the cqlsh command
```
[sentinel@my-tsn-1 ~]$ cqlsh
```

Update replication factor to 3 by running:

cqlsh> ALTER KEYSPACE "metaswitch_tas_deployment_info" with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : '3' };

Check that it has been updated:

cqlsh> DESCRIBE metaswitch_tas_deployment_info;

Note: this will output la lot of information about the CDS tables but only the first line needs to be checked. The expected output is below:

CREATE KEYSPACE metaswitch_tas_deployment_info WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
<snipped for brevity>

Exit out of cqlsh.
```
cqslh> exit
```

Run the repair tool to apply the replication factor update on ALL of the TSNs. This must be done one TSN at a time: DO NOT run this in parallel.

[sentinel@my-tsn-1 ~]$ nodetool repair -full metaswitch_tas_deployment_info

An example of the expected output is below.

[sentinel@my-tsn-1 ~]$ nodetool repair -full metaswitch_tas_deployment_info
[2022-03-26 17:52:27,277] Starting repair command #1 .... <snipped for brevity>
[2022-03-26 17:52:30,617] Repair session .... <snipped for brevity>
[2022-03-26 17:52:30,644] Repair completed successfully
[2022-03-26 17:52:30,646] Repair command #1 finished in 3 seconds

5.7 Backout TSN efix for rollback

If desired, remove the patched CSAR. On the SIMPL VM, run csar remove tsn/<downlevel version>-from-41-rollback.

We recommend the patched CSAR is kept in case the upgrade is attempted again at a later time.

5.8 Backout TSN in-place patch for upgrade from 4.0.0

If desired, undo the patching of the 4.0.0 TSN VMs. On the SIMPL VM, run /home/admin/.local/share/csar/tsn/<uplevel version>/resources/prepare-for-40-tsn-upgrade --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --ssh-key-secret-id rvt-simpl-private-key-id rollback.

We recommend the patching of the 4.0.0 TSN VMs is kept in case the upgrade is attempted again at a later time.

5.9 Delete uplevel CDS data

Run ./rvtconfig delete-node-type-version -c <CDS address> <CDS auth args> -t tsn --vm-version <uplevel version> -d <deployment ID> --site-id <site ID> --ssh-key-secret-id <SSH key secret ID> to remove data for the uplevel version from CDS.

Example output from the command:

This will destroy all configuration and runtime state for the specified node type and version.
This must not be performed while VMs of this type and version are running.
Requested deletion of version '4.1-7-1.0.0'
VM status for version '4.1-7-1.0.0':
    - 1.2.3.4 (unknown) query failed: Unable to connect to 1.2.3.4
    - 1.2.3.5 (unknown) query failed: Unable to connect to 1.2.3.5
    - 1.2.3.6 (unknown) query failed: Unable to connect to 1.2.3.6
Delete version 4.1-7-1.0.0? Y/[N]

Type "Y" to confirm the deletion of the data for the uplevel version. The command will offer one further prompt for you to double-check that the uplevel version is being deleted and the downlevel version is being retained:

The following versions will be deleted: 4.1-7-1.0.0
The following versions will be retained: 4.0.0-14-1.0.0
Do you wish to continue? Y/[N] Y

Check the versions are the correct way around, and then confirm this prompt to delete the uplevel data from CDS.

5.10 Cleanup after backout

Backout procedure

If desired, remove the uplevel CSAR. On the SIMPL VM, run csar remove tsn/<uplevel version>.
If desired, remove the uplevel config directories on the SIMPL VM with rm -rf /home/admin/uplevel-config. We recommend these files are kept in case the upgrade is attempted again at a later time.

5.11 Resume Initconf in non-TSN nodes

Run ./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Started.

You should see an output similar to this, indicating that the non-TSN nodes are un the desired running state Started.

Connected to MDM at 10.0.0.192
Put desired state = Started for Instance mydeployment-mag-1
Put desired state = Started for Instance mydeployment-shcm-1
Put desired state = Started for Instance mydeployment-mmt-gsm-1
Put desired state = Started for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Started",
    "mydeployment-shcm-1": "Started",
    "mydeployment-mmt-gsm-1": "Started",
    "mydeployment-smo-1": "Started"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

5.12 Enable Cassandra repairs

Complete the following procedure for each TSN node that:

Is the first TSN node (i.e. the TSN node with the first lexicographical IP).
Is on the uplevel version.
Is on the downlevel version, if the version is at least 4.0.0-22-1.0.0.

Establish an SSH session to the management IP of the node. Then run systemctl list-timers. This should give output of this form:

NEXT                          LEFT     LAST                          PASSED    UNIT                         ACTIVATES
Sat 2023-01-14 13:00:00 NZDT  23h left Fri 2023-01-13 13:00:00 NZDT  1h ago    systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service

If there is a line with UNIT as cassandra-repair-daily.timer, or cassandra-repair-weekly.timer, move on to the next node. Otherwise, run the following commands:

sudo systemctl enable --now cassandra-repair-daily.timer
sudo systemctl enable --now cassandra-repair-weekly.timer
systemctl list-timers

Depending on your version, you will either have cassandra-repair-daily.timer or cassandra-repair-weekly.timer. Therefore, exactly one of the commands will fail. This is expected.

This should give output of this form:

NEXT                          LEFT     LAST                          PASSED    UNIT                         ACTIVATES
Sat 2023-01-14 02:00:00 NZDT  12h left n/a                           n/a       cassandra-repair-daily.timer cassandra-repair-daily.service
Sat 2023-01-14 13:00:00 NZDT  23h left Fri 2023-01-13 13:00:00 NZDT  1h ago    systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service

You should now see an entry for cassandra-repair-daily.timer or cassandra-repair-weekly.timer.

5.13 Verify service is restored

Perform verification tests to ensure the deployment is functioning as expected.

If applicable, contact your Customer Care Representative to investigate the cause of the upgrade failure.

Before re-attempting the upgrade, ensure you have run the rvtconfig delete-node-type-version command, Attempting an upgrade while there is stale uplevel data in CDS can result in needing to completely redeploy one or more VMs.

You will also need to re-upload the uplevel configuration.

Previous page Next page

Major upgrade from 4.0.0 of TSN nodes

Planning for the procedure

Important notes

Determine parameter values

Tools and access

1. Preparation for upgrade procedure

1.1 Ensure the SIMPL version is at least 6.13.3

1.2 Verify the downlevel CSAR is present

1.3 Reserve maintenance period

2. Upgrade procedure

2.1 Verify downlevel config has no changes

2.2 Disable Cassandra repairs

Prepare for Maintenance Window

2.3 Validate configuration

2.4 Upload configuration

2.5 Verify the TSN clusters are healthy

2.6 Apply TSN efix for rollback

2.7 Apply TSN in-place patch for upgrade from 4.0.0

2.8 Collect diagnostics

2.9 Pause Initconf in non-TSN nodes

2.10 Take a CDS backup

2.11 Increase replication factor

2.12 Begin the upgrade

Carry out a csar import of the tsn VMs

Begin the upgrade of the tsn VMs

Determine the csar update command

Execute the csar update command

2.13 Monitor csar update output

2.14 Run basic validation tests

3. Post-upgrade procedure

3.1 Check Cassandra version and status

3.2 Resume Initconf in non-TSN nodes

3.3 Enable Cassandra repairs

3.4 Run verification tests

4. Post-acceptance

5. Backout Method of Procedure

5.1 Collect diagnostics

5.2 Disable Cassandra repairs

Prepare for Maintenance Window

5.3 Pause Initconf in non-TSN nodes

5.4 Take a CDS backup

5.5 Roll back VMs

5.6 Decrease replication factor

5.7 Backout TSN efix for rollback

5.8 Backout TSN in-place patch for upgrade from 4.0.0

5.9 Delete uplevel CDS data

5.10 Cleanup after backout

Backout procedure

5.11 Resume Initconf in non-TSN nodes

5.12 Enable Cassandra repairs

5.13 Verify service is restored

2.13 Monitor `csar update` output