- Planning for the procedure
- 1. Preparation for upgrade procedure
- 2. Upgrade procedure
- 2.1 Verify downlevel config has no changes
- 2.2 Disable Rhino restarts
- 2.3 Disable SBB cleanup timers
- 2.4 Validate configuration
- 2.5 Upload configuration
- 2.6 Upload SAS bundles
- 2.7 Remove audit logs
- 2.8 Collect diagnostics
- 2.9 Begin the upgrade
- 2.10 Monitor
csar update
output - 2.11 Run basic validation tests
- 3. Post-upgrade procedure
- 4. Post-acceptance
- 5. Backout Method of Procedure
This page explains how to do a major upgrade from version 4.0.0 of the MAG nodes.
The page is self-sufficient, that is, if you save or print this page, you have all the required information and instructions for upgrading MAG nodes. However, before starting the procedure, make sure you are familiar with the operation of Rhino VoLTE TAS nodes, this procedure, and the use of the SIMPL VM.
-
There are links in various places below to other parts of this book, which provide more detail about certain aspects of solution setup and configuration.
-
You can find more information about SIMPL VM commands in the SIMPL VM Documentation.
-
You can find more information on
rvtconfig
commands on thervtconfig
page.
Planning for the procedure
This procedure assumes that:
-
You are familiar with UNIX operating system basics, such as the use of
vi
and command-line tools likescp
. -
You are upgrading MAG VMs from version 4.0.0 to 4.1.
-
You have completed the steps in Prepare for the upgrade.
-
You have deployed a SIMPL VM, version 6.13.3 or later. Output shown on this page is correct for version 6.13.3 of the SIMPL VM; it may differ slightly on later versions.
Check you are using a supported VNFI version:
Platform | Supported versions |
---|---|
OpenStack |
Newton to Wallaby |
VMware vSphere |
6.7 and 7.0 |
Important notes
Do not use these instructions for target versions whose major version component differs from 4.1. |
If you are not upgrading the MMT nodes in the same maintenance window as the mag nodes, then there is a low level override that needs to be applied to the MMT nodes immediately after the MAG upgrades have completed. Please contact your customer support representative for details. |
Determine parameter values
In the below steps, replace parameters marked with angle brackets (such as <deployment ID>
)
with values as follows.
(Replace the angle brackets as well, so that they are not included in the final command to be run.)
-
<deployment ID>
: The deployment ID. You can find this at the top of the SDF. On this page, the example deployment IDmydeployment
is used. -
<site ID>
: A number for the site in the formDC1
throughDC32
. You can find this at the top of the SDF. -
<site name>
: The name of the site. You can find this at the top of the SDF. -
<MW duration in hours>
: The duration of the reserved maintenance period in hours. -
<CDS address>
: The management IP address of the first TSN node. -
<SIMPL VM IP address>
: The management IP address of the SIMPL VM. -
<CDS auth args>
(authentication arguments): If your CDS has Cassandra authentication enabled, replace this with the parameters-u <username> -k <secret ID>
to specify the configured Cassandra username and the secret ID of a secret containing the password for that Cassandra user. For example,./rvtconfig -c 1.2.3.4 -u cassandra-user -k cassandra-password-secret-id …
.If your CDS is not using Cassandra authentication, omit these arguments.
-
<service group name>
: The name of the service group (also known as a VNFC - a collection of VMs of the same type), which for Rhino VoLTE TAS nodes will consist of all MAG VMs in the site. This can be found in the SDF by identifying the MAG VNFC and looking for itsname
field. -
<downlevel version>
: The current version of the VMs. On this page, the example version4.0.0-14-1.0.0
is used. -
<uplevel version>
: The version of the VMs you are upgrading to. On this page, the example version4.1-7-1.0.0
is used.
Tools and access
You must have the SSH keys required to access the SIMPL VM and the MAG VMs that are to be upgraded.
The SIMPL VM must have the right permissions on the VNFI. Refer to the SIMPL VM documentation for more information:
When starting an SSH session to the SIMPL VM, use a keepalive of 30 seconds. This prevents the session from timing out - SIMPL VM automatically closes idle connections after a few minutes. When using OpenSSH (the SSH client on most Linux distributions), this can be controlled with the option
|
rvtconfig
is a command-line tool for configuring and managing Rhino VoLTE TAS VMs.
All MAG CSARs include this tool; once the CSAR is unpacked, you can find rvtconfig
in the resources
directory, for example:
$ cdcsars
$ cd mag/<uplevel version>
$ cd resources
$ ls rvtconfig
rvtconfig
The rest of this page assumes that you are running rvtconfig
from the directory in which it resides,
so that it can be invoked as ./rvtconfig
.
It assumes you use the uplevel version of rvtconfig
, unless instructed otherwise.
If it is explicitly specified you must use the downlevel version, you can find it here:
$ cdcsars
$ cd mag/<downlevel version>
$ cd resources
$ ls rvtconfig
rvtconfig
1. Preparation for upgrade procedure
These steps can be carried out in advance of the upgrade maintenance window. They should take less than 30 minutes to complete.
1.1 Ensure the SIMPL version is at least 6.13.3
Log into the SIMPL VM and run the command simpl-version
.
The SIMPL VM version is displayed at the top of the output:
SIMPL VM, version 6.13.3
Ensure this is at least 6.13.3. If not, contact your Customer Care Representative to organise upgrading the SIMPL VM before proceeding with the upgrade of the MAG VMs.
Output shown on this page is correct for version 6.13.3 of the SIMPL VM; it may differ slightly on later versions.
1.2 Verify the downlevel CSAR is present
On the SIMPL VM, run csar list
.
Each listed CSAR will be of the form <node type>/<version>
,
for example, mag/4.0.0-14-1.0.0
.
Ensure that there is a MAG CSAR listed there with the current downlevel version.
If the downlevel CSAR is not present, return to the pre-upgrade steps.
1.3 Reserve maintenance period
The upgrade procedure requires a maintenance period. For upgrading nodes in a live network, implement measures to mitigate any unforeseen events.
Ensure you reserve enough time for the maintenance period, which must include the time for a potential rollback.
To calculate the time required for the actual upgrade or roll back of the VMs,
run rvtconfig calculate-maintenance-window -i /home/admin/uplevel-config -t mag --site-id <site ID> --sequential
.
The output will be similar to the following, stating how long it will take to do an upgrade or rollback of the MAG VMs.
Nodes will be upgraded sequentially
-----
Estimated time for a full upgrade of 3 VMs: 24 minutes
Estimated time for a full rollback of 3 VMs: 24 minutes
-----
Your maintenance window must include time for:
-
The preparation steps. Allow 30 minutes.
-
The upgrade of the VMs, as calculated above.
-
The rollback of the VMs, as calculated above.
-
Post-upgrade or rollback steps. Allow 15 minutes, plus time for any prepared verification tests.
In the example above, this would be 93 minutes.
These numbers are a conservative best-effort estimate. Various factors, including IMS load levels, VNFI hardware configuration, VNFI load levels, and network congestion can all contribute to longer upgrade times. These numbers only cover the time spent actually running the upgrade on SIMPL VM. You must add sufficient overhead for setting up the maintenance window, checking alarms, running validation tests, and so on. |
The time required for an upgrade or rollback can also be manually calculated. For node types that are upgraded sequentially, like this node type, calculate the upgrade time by using the number of nodes. The first node takes 9 minutes, while later nodes take 9 minutes each. |
You must also reserve time for:
-
The SIMPL VM to upload the image to the VNFI. Allow 2 minutes, unless the connectivity between SIMPL and the VNFI is particularly slow.
-
Any validation testing needed to determine whether the upgrade succeeded.
2. Upgrade procedure
2.1 Verify downlevel config has no changes
Skip this step if the downlevel version is 4.0.0-27-1.0.0 or below.
Run rm -rf /home/admin/config-output
on the SIMPL VM to remove that directory if it already exists.
Using rvtconfig
from the downlevel CSAR, run
./rvtconfig compare-config -c <CDS address> -d <deployment ID> --input /home/admin/current-config
to compare the live configuration to the configuration in the
--vm-version <downlevel version> --output-dir /home/admin/config-output -t mag/home/admin/current-config
directory.
Example output is listed below:
Validating node type against the schema: mag
Redacting secrets…
Comparing live config for (version=4.0.0-14-1.0.0, deployment=mydeployment, group=RVT-mag.DC1) with local directory (version=4.1-7-1.0.0, deployment=mydeployment, group=RVT-mag.DC1)
Getting per-level configuration for version '4.0.0-14-1.0.0', deployment 'mydeployment', and group 'RVT-mag.DC1'
- Found config with hash 7f6cc1f3df35b43d6286f19c252311e09216e6115f314d0cb9cc3f3a24814395
Wrote currently uploaded configuration to /tmp/tmprh2uavbh
Redacting secrets…
Redacting SDF…
No differences found in yaml files
Uploading this will have no effect unless secrets, certificates or licenses have changed, or --reload-resource-adaptors is specified
There should be no differences found, as the configuration in current-config
should match the live configuration.
If any differences are found, abort the upgrade process.
2.2 Disable Rhino restarts
This step is only required if you have enabled restarts on the MAG nodes.
Restarts are enabled if the scheduled-rhino-restarts
parameter has been configured in the mag-vmpool-config.yaml
file.
On the SIMPL VM, open /home/admin/current-config/mag-vmpool-config.yaml
using vi
.
To disable scheduled restarts, comment out all scheduled-rhino-restarts
sections.
For example:
virtual-machines:
- vm-id: mag-1
rhino-node-id: 201
# scheduled-rhino-restarts:
# day-of-week: Saturday
# time-of-day: 02:32
- vm-id: mag-2
rhino-node-id: 202
# scheduled-rhino-restarts:
# day-of-week: Saturday
# time-of-day: 04:32
Make the same changes to /home/admin/uplevel-config/mag-vmpool-config.yaml
.
Using rvtconfig
from the downlevel CSAR, run
./rvtconfig upload-config -c <CDS address> -t mag -i /home/admin/current-config --vm-version <downlevel version>
.
Assuming the previous command has succeeded, run csar validate
to confirm the configuration has converged.
This command first performs a check that the nodes have successfully applied the downlevel configuration:
========================
Performing healthchecks
========================
Commencing healthcheck of VNF 'mag'
Performing health checks for service group mydeployment-mag with a 0 second timeout
Running MDM status health-check for dc1-mydeployment-mag-1
dc1-mydeployment-mag-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-mag-2
dc1-mydeployment-mag-2: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-mag-3
dc1-mydeployment-mag-3: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
After that, it performs various checks on the health of the VMs' networking and services:
================================
Running validation test scripts
================================
Running validation tests in CSAR 'mag/4.0.0-14-1.0.0'
Test running for: mydeployment-mag-1
Running script: check_ping_management_ip…
Running script: check_ping_management_gateway…
Running script: check_can_sudo…
Running script: check_converged…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-21-51.log
If all is well, then you should see the message
All tests passed for CSAR 'mag/<downlevel version>'!
.
If there are failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.
2.3 Disable SBB cleanup timers
Disable the SBB Activities Cleanup timer to minimise the possibility that this cleanup will interact with this procedure.
Complete the following procedure for each MAG node.
Establish an SSH session to the management IP of the node.
Then run systemctl list-timers
.
This should give output of this form:
NEXT LEFT LAST PASSED UNIT ACTIVATES
Fri 2023-01-13 00:00:00 NZDT 10h left Thu 2023-01-12 00:00:00 NZDT 14h ago rhino-jstat-restart.timer rhino-jstat-restart.service
Sat 2023-01-14 02:00:00 NZDT 12h left Fri 2023-01-13 02:00:00 NZDT 12h ago cleanup-sbbs-activities.timer cleanup-sbbs-activities.service
Sat 2023-01-14 13:00:00 NZDT 23h left Fri 2023-01-13 13:00:00 NZDT 1h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
If there is no line with UNIT
as cleanup-sbbs-activities.timer
, move on to the next node.
Otherwise, run the following commands:
sudo systemctl disable cleanup-sbbs-activities.timer
sudo systemctl stop cleanup-sbbs-activities.timer
systemctl list-timers
This should give output of this form:
NEXT LEFT LAST PASSED UNIT ACTIVATES
Fri 2023-01-13 00:00:00 NZDT 10h left Thu 2023-01-12 00:00:00 NZDT 14h ago rhino-jstat-restart.timer rhino-jstat-restart.service
Sat 2023-01-14 13:00:00 NZDT 23h left Fri 2023-01-13 13:00:00 NZDT 1h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
You should no longer see an entry for cleanup-sbbs-activities.timer
.
2.4 Validate configuration
Run the command ./rvtconfig validate -t mag -i /home/admin/uplevel-config
to check that the configuration files are correctly formatted, contain valid values, and are self-consistent.
Ensure you use the uplevel version of rvtconfig
.
A successful validation with no errors or warnings produces the following output.
Validating node type against the schema: mag
YAML for node type(s) ['mag'] validates against the schema
If the output contains validation errors,
fix the configuration in the /home/admin/uplevel-config
directory
and go back to the Update the configuration files for RVT 4.1 step.
If the output contains validation warnings, consider whether you wish to address them before performing the upgrade. The VMs will accept configuration that has validation warnings, but certain functions may not work.
2.5 Upload configuration
Upload the configuration to CDS:
./rvtconfig upload-config -c <CDS address> <CDS auth args> -t mag -i /home/admin/uplevel-config --vm-version <uplevel version>
Check that the output confirms that configuration exists in CDS for both the current (downlevel) version and the uplevel version:
Validating node type against the schema: mag
Preparing configuration for node type mag…
Checking differences between uploaded configuration and provided files
Getting per-level configuration for version '4.1-7-1.0.0', deployment 'mydeployment-mag', and group 'RVT-mag.DC1'
- No configuration found
No uploaded configuration was found: this appears to be a new install or upgrade
Encrypting secrets…
Wrote config for version '4.1-7-1.0.0', deployment ID 'mydeployment', and group ID 'RVT-mag.DC1'
Versions in group RVT-mag.DC1
=============================
- Version: 4.0.0-14-1.0.0
Config hash: 7f6cc1f3df35b43d6286f19c252311e09216e6115f314d0cb9cc3f3a24814395
Active: mydeployment-mag-1, mydeployment-mag-2, mydeployment-mag-3
Leader seed:
- Version: 4.1-7-1.0.0
Config hash: f790cc96688452fdf871d4f743b927ce8c30a70e3ccb9e63773fc05c97c1d6ea
Active: None
Leader seed:
2.6 Upload SAS bundles
Upload the MAG SAS bundle for the uplevel version to the master SAS server in any site(s) containing the VMs to be upgraded. Your Customer Care Representative can provide you with the SAS bundle file.
2.7 Remove audit logs
If you are upgrading from a VM of version 4.0.0-14-1.0.0 or newer, skip this step.
Versions prior to 4.0.0-14-1.0.0 do not correctly store audit logs during an upgrade. To avoid issues, the audit logs need to be removed just before the upgrade.
For each MAG node, establish an SSH session to the management IP of the node. Run:
cd rhino/node-*/work/log
rm audit.log*
ls -altr audit*
The output should confirm that no audit logs remain:
ls: cannot access 'audit*': No such file or directory
2.8 Collect diagnostics
We recommend gathering diagnostic archives for all MAG VMs in the deployment.
On the SIMPL VM, run the command
./rvtconfig gather-diags --sdf /home/admin/uplevel-config/sdf-rvt.yaml -t mag --ssh-key-secret-id <SSH key secret ID> --ssh-username sentinel --output-dir <diags-bundle>
.
If <diags-bundle>
does not exist, the command will create the directory for you.
Each diagnostic archive can be up to 200 MB per VM. Ensure you have enough disk space on the SIMPL VM to collect all diagnostics. The command will be aborted if the SIMPL VM does not have enough disk space to collect all diagnostic archives from all the VMs in your deployment specified in the provided SDF.
You are using the uplevel |
2.9 Begin the upgrade
Carry out a csar import of the mag VMs
Prepare for the upgrade by running the following command on the SIMPL VM csar import --vnf mag --sdf /home/admin/uplevel-config/sdf-rvt.yaml
to import terraform templates.
First, SIMPL VM connects to your VNFI to check the credentials specified in the SDF and QSG are correct.
If this is successful, it displays the message All validation checks passed.
.
Next, SIMPL VM compares the SDF you provided with the one that was previously used to deploy or upgrade the MAG nodes. Since the SDF was updated significantly in the pre-upgrade steps, you should see the following prompt (details may vary):
The following changes have been made to the SDF since it was used to deploy/update mag
(Note: if only a subset of VMs were deployed or updated previously, then this diff won't fully reflect the changes that will be made to the other VMs in the deployment)
Ensure all changes are as expected.
In particular, ensure the following change matches the changes to the mag
VNFC:
- secrets-private-key: abcde
+ primary-user-password-id: rvt-primary-user-password
+ secrets-private-key-id: rvt-secrets-private-key
type: mag
- version: 4.0.0-14-1.0.0
+ version: 4.1-7-1.0.0
vim-configuration:
vsphere:
Do you want to continue? [yes/no]:
If the differences are not as you expect, then:
-
Type
no
. The csar import will be aborted. -
Investigate why there are unexpected changes in the SDF.
-
Correct the SDF as necessary.
-
Retry this step.
Otherwise, accept the prompt by typing yes
.
After you do this, SIMPL VM will import the terraform state. If successful, it outputs this message:
Done. Imported all VNFs.
If the output does not look like this, investigate and resolve the underlying cause, then re-run the import command again until it shows the expected output.
Begin the upgrade of the mag VMs
The mag VMs must be upgraded in two separate batches:
-
All nodes but the node with the lowest ID
-
The node with the lowest ID
To accomplish this, you should follow these steps:
-
Determine the csar command to use to upgrade all but the lowest ID VM.
-
Execute steps under section "Execute the csar update command" using this command.
-
Determine the csar command to use to upgrade the lowest ID VM.
-
Execute steps under section "Execute the csar update command" using this command.
Determine the csar update command to upgrade the first batch of mag VMs
The command to upgrade all but the lowest ID VM is:
SKIP_MW_CHECK=1 csar update --skip force-in-series-update-with-l3-permission --vnf mag --sites <site> --service-group <service_group> --index-range <range> --sdf /home/admin/uplevel-config/sdf-rvt.yaml --use-target-version-csar-info
.
The <range> value is of the form 1-X, where X is the highest node ID. Note that IDs start from 0. For example, if there are 3 nodes to upgrade, the value provided should be 1-2. For four nodes, use 1-3, and so on.
Determine the csar update command to upgrade the last mag VM
The command to upgrade the lowest ID VM is:
SKIP_MW_CHECK=1 csar update --skip force-in-series-update-with-l3-permission --vnf mag --sites <site> --service-group <service_group> --index-range 0 --sdf /home/admin/uplevel-config/sdf-rvt.yaml --use-target-version-csar-info
.
Execute the csar update command
Execute the csar update command on the SIMPL VM.
First, SIMPL VM connects to your VNFI to check the credentials specified in the SDF and QSG are correct.
If this is successful, it displays the message All validation checks passed.
.
Next, SIMPL VM compares the specified SDF with the SDF used for the csar import command above. Since the contents have not changed since you ran the csar import, the output should indicate that the SDF has not changed.
If there are differences in the SDF, a message similar to this will be output:
Comparing current SDF with previously used SDF.
site site1:
mag:
mag-1:
networks:
- ip-addresses:
ip:
- - 10.244.21.106
+ - 10.244.21.196
- 10.244.21.107
name: Management
subnet: mgmt-subnet
Do you want to continue? [yes/no]: yes
If you see this, you must:
-
Type
no
. The upgrade will be aborted. -
Go back to the start of the upgrade section and run through the csar import section again, until the SDF differences are resolved.
-
Retry this step.
Next, you will see the following prompt:
You have chosen to force csar update to proceed in series. This option will slow down update for VNFs which support update in parallel and MUST only be used with L3 permission.
Please discuss this option with your support representative if you have not already done so.
Type 'l3permission' to continue
Type l3permission
to continue.
Afterwards, the SIMPL VM displays the VMs that will be upgraded:
You are about to update VMs as follows:
- VNF mag:
- For site site1:
- update all VMs in VNFC service group mydeployment-mag/4.1-7-1.0.0:
- mydeployment-mag-1 (index 0)
- mydeployment-mag-2 (index 1)
- mydeployment-mag-3 (index 2)
Type 'yes' to continue, or run 'csar update --help' for more information.
Continue? [yes/no]:
Check this output displays the version you expect (the uplevel version)
and exactly the set of VMs that you expect to be upgraded.
If anything looks incorrect, type no
to abort the upgrade process,
and recheck the VMs listed and the version field in /home/admin/uplevel-config/sdf-rvt.yaml
.
Also check you are passing the correct SDF path and --vnf
argument to the csar update
command.
Otherwise, accept the prompt by typing yes
.
Next, each VM in your cluster will perform health checks. If successful, the output will look similar to this.
Running ansible scripts in '/home/admin/.local/share/csar/mag/4.1-1-1.0.0/update_healthcheck_scripts' for node 'mydeployment-mag-1'
Running script: check_config_uploaded…
Running script: check_ping_management_ip…
Running script: check_maintenance_window…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Running script: check_rhino_alarms…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-05-02-05-51.log
All ansible update healthchecks have passed successfully
If a script fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log
.
Running ansible scripts in '/home/admin/.local/share/csar/mag/4.1-1-1.0.0/update_healthcheck_scripts' for node 'mydeployment-mag-1'
Running script: check_config_uploaded...
Running script: check_ping_management_ip...
Running script: check_maintenance_window...
Running script: check_can_sudo...
Running script: check_converged...
Running script: check_liveness...
ERROR: Script failed. Specific error lines from the ansible output will be logged to screen. For more details see the ansible_output file (/var/log/csar/ansible_output-2023-01-05-21-02-17.log). This file has only ansible output, unlike the main command log file.
fatal: [mydeployment-mag-1]: FAILED! => {"ansible_facts": {"liveness_report": {"cassandra": true, "cassandra_ramdisk": true, "cassandra_repair_timer": true, "cdsreport": true, "cleanup_sbbs_activities": false, "config_hash_report": true, "docker": true, "initconf": true, "linkerd": true, "mdm_state_and_status_ok": true, "mdmreport": true, "nginx": true, "no_ocss7_alarms": true, "ocss7": true, "postgres": true, "rem": true, "restart_rhino": true, "rhino": true}}, "attempts": 1, "changed": false, "msg": "The following liveness checks failed: ['cleanup_sbbs_activities']", "supports_liveness_checks": true}
Running script: check_rhino_alarms...
Detailed output can be found in /var/log/csar/ansible_output-2023-01-05-21-02-17.log
***Some tests failed for CSAR 'mag/4.1-1-1.0.0' - see output above***
The msg
field under each ansible task explains why the script failed.
If the validation tests fail because of unexpected Rhino alarms,
a good place to start investigating is by logging into each node and running rhino-console listactivealarms
.
This will show you the alarm(s) in more detail.
Depending on your deployment, some Rhino alarms (such as connection alarms to other systems that may be temporarily offline, time warps and blocklist alarms) may be expected and therefore can be ignored as they do not block an upgrade.
Therefore, to skip checking for unexpected Rhino alarms, run the command SKIP_RHINO_ALARMS_CHECK=1 SKIP_MW_CHECK=1 csar update --skip force-in-series-update-with-l3-permission --vnf mag --sdf /home/admin/uplevel-config/sdf-rvt.yaml
.
If there are other failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.
Retry this step once all failures have been corrected by running the command SKIP_MW_CHECK=1 csar update …
as described at the begining of this section.
Once the pre-upgrade health checks have been verified, SIMPL VM now proceeds to upgrade each of the VMs.
Monitor the further output of csar update
as the upgrade progresses, as described in the next step.
2.10 Monitor csar update
output
For each VM:
-
The VM will be quiesced and destroyed.
-
SIMPL VM will create a replacement VM using the uplevel version.
-
The VM will automatically start applying configuration from the files you uploaded to CDS in the above steps.
-
Once configuration is complete, the VM will be ready for service. At this point, the
csar update
command will move on to the next MAG VM.
The output of the csar update
command will look something like the following, repeated for each VM.
Decommissioning 'dc1-mydeployment-mag-1' in MDM, passing desired version 'vm.version=4.1-7-1.0.0', with a 900 second timeout
dc1-mydeployment-mag-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'decommissioned'
dc1-mydeployment-mag-1: Current status 'in_progress'- desired status 'complete'
…
dc1-mydeployment-mag-1: Current status 'complete', current state 'decommissioned' - desired status 'complete', desired state 'decommissioned'
Running update for VM group [0]
Performing health checks for service group mydeployment-mag with a 1200 second timeout
Running MDM status health-check for dc1-mydeployment-mag-1
dc1-mydeployment-mag-1: Current status 'in_progress'- desired status 'complete'
…
dc1-mydeployment-mag-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
If you see this error:
it can be safely ignored, provided that you do eventually see a |
Once all VMs have been upgraded, you should see this success message, detailing all the VMs that were upgraded and the version they are now running, which should be the uplevel version.
Successful VNF with full per-VNFC upgrade state:
VNF: mag
VNFC: mydeployment-mag
- Node name: mydeployment-mag-1
- Version: 4.1-7-1.0.0
- Build Date: 2022-11-21T22:58:24+00:00
- Node name: mydeployment-mag-2
- Version: 4.1-7-1.0.0
- Build Date: 2022-11-21T22:58:24+00:00
- Node name: mydeployment-mag-3
- Version: 4.1-7-1.0.0
- Build Date: 2022-11-21T22:58:24+00:00
If the upgrade fails, you will see Failed VNF
instead of Successful VNF
in the above output.
There will also be more details of what went wrong printed before that.
Refer to the Backout procedure below.
2.11 Run basic validation tests
Run csar validate --vnf mag --sdf /home/admin/uplevel-config/sdf-rvt.yaml
to perform some basic validation tests against the uplevel nodes.
This command first performs a check that the nodes are connected to MDM and reporting that they have successfully applied the uplevel configuration:
========================
Performing healthchecks
========================
Commencing healthcheck of VNF 'mag'
Performing health checks for service group mydeployment-mag with a 0 second timeout
Running MDM status health-check for dc1-mydeployment-mag-1
dc1-mydeployment-mag-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-mag-2
dc1-mydeployment-mag-2: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-mag-3
dc1-mydeployment-mag-3: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
After that, it performs various checks on the health of the VMs' networking and services:
================================
Running validation test scripts
================================
Running validation tests in CSAR 'mag/4.1-7-1.0.0'
Test running for: mydeployment-mag-1
Running script: check_ping_management_ip…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Running script: check_rhino_alarms…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-21-51.log
If all is well, then you should see the message
All tests passed for CSAR 'mag/<uplevel version>'!
.
If the VM validation fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log
.
Running validation test scripts
================================
Running validation tests in CSAR 'mag/4.1-7-1.0.0'
Test running for: mydeployment-mag-1
Running script: check_ping_management_ip...
Running script: check_can_sudo...
Running script: check_converged...
Running script: check_liveness...
ERROR: Script failed. Specific error lines from the ansible output will be logged to screen. For more details see the ansible_output file (/var/log/csar/ansible_output-2023-01-06-03-40-37.log). This file has only ansible output, unlike the main command log file.
fatal: [mydeployment-mag-1]: FAILED! => {"ansible_facts": {"liveness_report": {"cassandra": true, "cassandra_ramdisk": true, "cassandra_repair_timer": true, "cdsreport": true, "cleanup_sbbs_activities": false, "config_hash_report": true, "docker": true, "initconf": true, "linkerd": true, "mdm_state_and_status_ok": true, "mdmreport": true, "nginx": true, "no_ocss7_alarms": true, "ocss7": true, "postgres": true, "rem": true, "restart_rhino": true, "rhino": true}}, "attempts": 1, "changed": false, "msg": "The following liveness checks failed: ['cleanup_sbbs_activities']", "supports_liveness_checks": true}
Running script: check_rhino_alarms...
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-40-37.log
***Some tests failed for CSAR 'mag/4.1-7-1.0.0' - see output above***
----------------------------------------------------------
WARNING: Validation script tests failed for the following CSARs:
- 'mag/4.1-7-1.0.0'
See output above for full details
The msg
field under each ansible task explains why the script failed.
If the validation tests fail because of unexpected Rhino alarms,
a good place to start investigating is by logging into each node and running rhino-console listactivealarms
.
This will show you the alarm(s) in more detail.
Depending on your deployment, some Rhino alarms (such as connection alarms to other systems that may be temporarily offline, time warps and blocklist alarms) may be expected and therefore can be ignored.
If there are other failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.
3. Post-upgrade procedure
3.1 Apply MMT Config for VSC
Skip this step if you are upgrading the MMT nodes in the same maintenance window
If you do not plan to upgrade the MMT nodes at this point then you need to update and upload the MMT configuration for the downlevel version.
On the SIMPL VM, open the file /home/admin/current-config/sentinel-volte-gsm-config.yaml
and find the value for host
under xcap-data-update
.
Replace the prefix internal-xcap
with xcap.internal
.
Run ./rvtconfig upload-config -c <CDS address> <CDS auth args> -t mmt-gsm -i /home/admin/current-config --vm-version <downlevel version>
3.2 Enable Rhino restarts
This step is only required if you want to re-enable Rhino restarts were disabled in the Disable Rhino restarts step.
On the SIMPL VM, open /home/admin/uplevel-config/mag-vmpool-config.yaml
using vi
.
To enable scheduled restarts, uncomment all scheduled-rhino-restarts
sections
you previously commented out in the Disable Rhino restarts step.
For example:
virtual-machines:
- vm-id: mag-1
rhino-node-id: 201
scheduled-rhino-restarts:
day-of-week: Saturday
time-of-day: 02:32
- vm-id: mag-2
rhino-node-id: 202
scheduled-rhino-restarts:
day-of-week: Saturday
time-of-day: 04:32
Run ./rvtconfig upload-config -c <CDS address> <CDS auth args> -t mag -i /home/admin/uplevel-config --vm-version <uplevel version>
.
Assuming the previous command has succeeded, re-run the basic validation tests to ensure the configuration has been applied correctly.
4. Post-acceptance
The upgrade of the MAG nodes is now complete.
After you have been running with the MAG nodes at the uplevel version for a while, you may want to perform post-acceptance tasks.
5. Backout Method of Procedure
First, gather the log history of the downlevel VMs. Run mkdir -p /home/admin/rvt-log-history
and ./rvtconfig export-log-history -c <CDS address> <CDS auth args> -d <deployment ID> --zip-destination-dir /home/admin/rvt-log-history --secrets-private-key-id <secret ID>
.
The secret ID you specify for --secrets-private-key-id
should be the secret ID for the secrets private key
(the one used to encrypt sensitive fields in CDS). You can find this in the product-options
section of each VNFC in the SDF.
Make sure the <CDS address> used is one of the remaining available TSN nodes.
|
Next, how much of the backout procedure to run depends on how much progress was made with the upgrade.
If you did not get to the point of running csar update
, start from the Cleanup after backout section below.
If you ran csar update
and it failed, the output will tell you which VMs failed to upgrade.
Successfully updated mag VMs with indexes: 0,1
Not started updating mag VMs with indexes: 3,4
Failed whilst updating mag VM with index: 2
Perform a rollback of all the VMs listed under "Successfully updated" and "Failed whilst updating".
If you encounter further failures during recovery or rollback, contact your Customer Care Representative to investigate and recover the deployment.
5.1 Collect diagnostics
We recommend gathering diagnostic archives for all MAG VMs in the deployment.
On the SIMPL VM, run the command
./rvtconfig gather-diags --sdf /home/admin/uplevel-config/sdf-rvt.yaml -t mag --ssh-key-secret-id <SSH key secret ID> --ssh-username sentinel --output-dir <diags-bundle>
.
If <diags-bundle>
does not exist, the command will create the directory for you.
Each diagnostic archive can be up to 200 MB per VM. Ensure you have enough disk space on the SIMPL VM to collect all diagnostics. The command will be aborted if the SIMPL VM does not have enough disk space to collect all diagnostic archives from all the VMs in your deployment specified in the provided SDF.
You are using the uplevel |
5.2 Disable Rhino restarts
This step is only required if you have enabled restarts on the MAG nodes.
Restarts are enabled if the scheduled-rhino-restarts
parameter has been configured in the mag-vmpool-config.yaml
file.
On the SIMPL VM, open /home/admin/current-config/mag-vmpool-config.yaml
using vi
.
To disable scheduled restarts, comment out all scheduled-rhino-restarts
sections.
For example:
virtual-machines:
- vm-id: mag-1
rhino-node-id: 201
# scheduled-rhino-restarts:
# day-of-week: Saturday
# time-of-day: 02:32
- vm-id: mag-2
rhino-node-id: 202
# scheduled-rhino-restarts:
# day-of-week: Saturday
# time-of-day: 04:32
Make the same changes to /home/admin/uplevel-config/mag-vmpool-config.yaml
.
Using rvtconfig
from the downlevel CSAR, run
./rvtconfig upload-config -c <CDS address> -t mag -i /home/admin/current-config --vm-version <downlevel version>
.
Assuming the previous command has succeeded, run csar validate
to confirm the configuration has converged.
This command first performs a check that the nodes have successfully applied the downlevel configuration:
========================
Performing healthchecks
========================
Commencing healthcheck of VNF 'mag'
Performing health checks for service group mydeployment-mag with a 0 second timeout
Running MDM status health-check for dc1-mydeployment-mag-1
dc1-mydeployment-mag-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-mag-2
dc1-mydeployment-mag-2: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-mag-3
dc1-mydeployment-mag-3: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
After that, it performs various checks on the health of the VMs' networking and services:
================================
Running validation test scripts
================================
Running validation tests in CSAR 'mag/4.0.0-14-1.0.0'
Test running for: mydeployment-mag-1
Running script: check_ping_management_ip…
Running script: check_ping_management_gateway…
Running script: check_can_sudo…
Running script: check_converged…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-21-51.log
If all is well, then you should see the message
All tests passed for CSAR 'mag/<downlevel version>'!
.
If there are failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.
5.3 Disable SBB cleanup timers
Disable the SBB Activities Cleanup timer to minimise the possibility that this cleanup will interact with this procedure.
Complete the following procedure for each MAG node.
Establish an SSH session to the management IP of the node.
Then run systemctl list-timers
.
This should give output of this form:
NEXT LEFT LAST PASSED UNIT ACTIVATES
Fri 2023-01-13 00:00:00 NZDT 10h left Thu 2023-01-12 00:00:00 NZDT 14h ago rhino-jstat-restart.timer rhino-jstat-restart.service
Sat 2023-01-14 02:00:00 NZDT 12h left Fri 2023-01-13 02:00:00 NZDT 12h ago cleanup-sbbs-activities.timer cleanup-sbbs-activities.service
Sat 2023-01-14 13:00:00 NZDT 23h left Fri 2023-01-13 13:00:00 NZDT 1h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
If there is no line with UNIT
as cleanup-sbbs-activities.timer
, move on to the next node.
Otherwise, run the following commands:
sudo systemctl disable cleanup-sbbs-activities.timer
sudo systemctl stop cleanup-sbbs-activities.timer
systemctl list-timers
This should give output of this form:
NEXT LEFT LAST PASSED UNIT ACTIVATES
Fri 2023-01-13 00:00:00 NZDT 10h left Thu 2023-01-12 00:00:00 NZDT 14h ago rhino-jstat-restart.timer rhino-jstat-restart.service
Sat 2023-01-14 13:00:00 NZDT 23h left Fri 2023-01-13 13:00:00 NZDT 1h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
You should no longer see an entry for cleanup-sbbs-activities.timer
.
5.4 Roll back VMs
To roll back the VMs, the procedure is essentially to perform an "upgrade" back to the downlevel version,
that is, with <downlevel version>
and <uplevel version>
swapped.
You can refer to the Begin the upgrade section above for details on the prompts and output of csar update
.
Run SKIP_MW_CHECK=1 csar update --skip pre-update-checks,force-in-series-update-with-l3-permission --vnf mag --sdf /home/admin/rvt-rollback-sdf/sdf-rvt.yaml --sites <site name> --service-group <service group name> --index-range <index range>
.
Once the csar update
command completes successfully, proceed with the next steps below.
The Contiguous ranges can be expressed with a hyphen ( If you want to roll back just one node, use If you want to roll back all nodes, omit the The |
If csar update
fails, check the output for which VMs failed.
For each VM that failed, run csar redeploy --vm <failed VM name> --sdf /home/admin/current-config/sdf-rvt.yaml
.
If csar redeploy
fails, contact your Customer Care Representative to start the recovery procedures.
If all the csar redeploy
commands were successful, then run the previously used csar update
command
on the VMs that were neither rolled back nor redeployed yet.
To help you determine which VMs were neither rolled back nor redeployed yet,
connect via SSH to each VM and run |
5.5 Delete uplevel CDS data
Run ./rvtconfig delete-node-type-version -c <CDS address> <CDS auth args> -t mag --vm-version <uplevel version>
to remove data for the uplevel version from CDS.
-d <deployment ID> --site-id <site ID> --ssh-key-secret-id <SSH key secret ID>
Example output from the command:
This will destroy all configuration and runtime state for the specified node type and version.
This must not be performed while VMs of this type and version are running.
Requested deletion of version '4.1-7-1.0.0'
VM status for version '4.1-7-1.0.0':
- 1.2.3.4 (unknown) query failed: Unable to connect to 1.2.3.4
- 1.2.3.5 (unknown) query failed: Unable to connect to 1.2.3.5
- 1.2.3.6 (unknown) query failed: Unable to connect to 1.2.3.6
Delete version 4.1-7-1.0.0? Y/[N]
Type "Y" to confirm the deletion of the data for the uplevel version. The command will offer one further prompt for you to double-check that the uplevel version is being deleted and the downlevel version is being retained:
The following versions will be deleted: 4.1-7-1.0.0
The following versions will be retained: 4.0.0-14-1.0.0
Do you wish to continue? Y/[N] Y
Check the versions are the correct way around, and then confirm this prompt to delete the uplevel data from CDS.
Backout procedure
-
Revert any DNS changes that have been made to the DNS server.
-
Revert the value of
xcap-data-update.host
in/home/admin/current-config/sentinel-volte-gsm-config.yaml
. Changexcap.internal.
tointernal-xcap.
. Usingrvtconfig
from the downlevel MMT CSAR, run./rvtconfig upload-config -c <CDS address> -t mag -i /home/admin/current-config --vm-version <downlevel version>
. -
If desired, remove the uplevel CSAR. On the SIMPL VM, run
csar remove mag/<uplevel version>
. -
If desired, remove the uplevel config directories on the SIMPL VM with
rm -rf /home/admin/uplevel-config
. We recommend these files are kept in case the upgrade is attempted again at a later time.
5.7 Enable Rhino restarts
This step is only required if you want to re-enable Rhino restarts were disabled in the Disable Rhino restarts step.
On the SIMPL VM, open /home/admin/current-config/mag-vmpool-config.yaml
using vi
.
To enable scheduled restarts, uncomment all scheduled-rhino-restarts
sections
you previously commented out in the Disable Rhino restarts step.
For example:
virtual-machines:
- vm-id: mag-1
rhino-node-id: 201
scheduled-rhino-restarts:
day-of-week: Saturday
time-of-day: 02:32
- vm-id: mag-2
rhino-node-id: 202
scheduled-rhino-restarts:
day-of-week: Saturday
time-of-day: 04:32
Using rvtconfig
from the downlevel CSAR, run ./rvtconfig current-config -c <CDS address> <CDS auth args> -t mag -i /home/admin/current-config --vm-version <uplevel version>
.
Assuming the previous command has succeeded, re-run the basic validation tests to ensure the configuration has been applied correctly.
5.8 Enable SBB cleanups
Complete the following procedure for each MAG node.
Establish an SSH session to the management IP of the node.
Then run systemctl list-timers
.
This should give output of this form:
NEXT LEFT LAST PASSED UNIT ACTIVATES
Fri 2023-01-13 23:00:00 NZDT 9h left Thu 2023-01-12 23:00:00 NZDT 15h ago restart-rhino.timer restart-rhino.service
Fri 2023-01-13 00:00:00 NZDT 10h left Thu 2023-01-12 00:00:00 NZDT 14h ago rhino-jstat-restart.timer rhino-jstat-restart.service
Sat 2023-01-14 13:00:00 NZDT 23h left Fri 2023-01-13 13:00:00 NZDT 1h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
If there is a line with UNIT
as cleanup-sbbs-activities.timer
, move on to the next node.
Otherwise, run the following commands:
sudo systemctl enable --now cleanup-sbbs-activities.timer
systemctl list-timers
This should give output of this form:
NEXT LEFT LAST PASSED UNIT ACTIVATES
Fri 2023-01-13 23:00:00 NZDT 9h left Thu 2023-01-12 23:00:00 NZDT 15h ago restart-rhino.timer restart-rhino.service
Fri 2023-01-13 00:00:00 NZDT 10h left Thu 2023-01-12 00:00:00 NZDT 14h ago rhino-jstat-restart.timer rhino-jstat-restart.service
Sat 2023-01-14 02:00:00 NZDT 12h left n/a n/a cleanup-sbbs-activities.timer cleanup-sbbs-activities.service
Sat 2023-01-14 13:00:00 NZDT 23h left Fri 2023-01-13 13:00:00 NZDT 1h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
You should now see an entry for cleanup-sbbs-activities.timer
.
5.9 Verify service is restored
Perform verification tests to ensure the deployment is functioning as expected.
If applicable, contact your Customer Care Representative to investigate the cause of the upgrade failure.
Before re-attempting the upgrade, ensure you have run the You will also need to re-upload the uplevel configuration. |