Mobile Control Point Configuration Reference

This manual is a guide for configuring and upgrading the TSN and MCP nodes as virtual machines on OpenStack or VMware vSphere.

In this book

Notices
Changelogs
Introduction
VM types
- Flavors
  - TSN
  - MCP
- Open Listening Ports
  - TSN
  - MCP
Upgrades
- Rolling upgrades and patches
Verify the state of the nodes and processes
- TSN checks
- MCP checks
VM configuration
VM recovery
Troubleshooting node installation
Glossary

Notices

Copyright © 2023 Microsoft. All rights reserved. This manual is issued on a controlled basis to a specific person on the understanding that no part of the product code or documentation (including this manual) will be copied or distributed without prior agreement in writing from Metaswitch Networks and Microsoft.

Metaswitch Networks and Microsoft reserve the right to, without notice, modify or revise all or part of this document and/or change product features or specifications and shall not be responsible for any loss, cost, or damage, including consequential damage, caused by reliance on these materials. Metaswitch and the Metaswitch logo are trademarks of Metaswitch Networks. Other brands and products referenced herein are the trademarks or registered trademarks of their respective holders. Product(s) and features documented in this manual handle various forms of data relating to your users. You must comply with all laws and regulations applicable to your deployment, management, and use of said product(s), and you should take all appropriate technical and organizational measures to ensure you are handling this data appropriately according to any local legal and regulatory obligations.

You are responsible for determining whether said product(s) or feature(s) is/are appropriate for storage and processing of information subject to any specific law or regulation and for using said product(s) or feature(s) in a manner consistent with your own legal and regulatory obligations. You are also responsible for responding to any request from a third party regarding your use of said product(s), such as a request to take down content under the U.S. Digital Millennium Copyright Act or other applicable laws.

Changelogs

MCP VM Changelogs

This section contains MCP VM Build process changelogs specific to the MCP VM type.

This depends on the common VM build process used by all the Mobile Control Point VMs. For those changelogs, see Common VM Changelogs.

1.5.4

Updated the MCP VM to use the latest version of VMBC.
See the common VM changes in the 4.2-10-1.0.0 entry.

1.5.3

SecretValue, PrivateKey and Certificate are now stored in QSS secret store. (#1773389)
See the common VM changes in the 4.2-8-1.0.0 entry.

1.5.2

Added support for certificate revocation checking for the Microsoft Teams Phone System consultation API and AAD token API. (#1648574)
See the common VM changes in the 4.2-7-1.0.0 entry.

1.5.1

Updated the MCP VM to use the latest version of VMBC.
See the common VM changes in the 4.2-4-1.0.0 entry.

1.5.0

The MCP VM is now based off VMBC 3.3.
Compatibility with Redhat 8 based SIMPL V6.15 and MDM 3.8.
See the common VM changes in the 4.2-3-1.0.0 entry.

1.4.2

See the common VM changes in the 4.1-7-1.0.0 entry.

1.4.1

See the common VM changes in the 4.1-5-1.0.0 entry.

1.4.0

Support forced routing configuration (#77095)
Added secret decryption for secret store in Cassandra (#84256)
Stopped using linkerd for HTTPS requests to the Microsoft Phone System. (#217595)
See the common VM changes in the 4.1-3-1.0.0 entry.

1.3.1

Configured the client TLS protocol used by linkerd to only ever use TLS 1.2 or later. (#357797)
Added configuration options for using an HTTP proxy when sending HTTPS requests from MCP. (#421539, #432453)
Fixed an upgrade issue where MCP wouldn’t apply cluster-wide configuration on the node with the highest node ID. (#439273)
See the common VM changes in the 4.0.0-34-1.0.0 entry.

1.2.1

Updated configuration instructions for use of regional Teams Phone Mobile Consultation API server addresses. (#231035)
See the common VM changes in the 4.0.0-31-1.0.0 entry.

1.2.0

Changed the Java garbage collector from CMS to G1, and updated the heap size (4096MB → 8192 MB) and new size (512MB → 1024MB). (#233701)
See the common VM changes in the 4.0.0-30-1.0.0 entry.

1.1.0

First release of MCP.

Common VM Changelogs

This section contains VM Build process changelogs used by all the Mobile Control Point VMs. For changelogs specific to the MCP VM, see MCP VM Changelogs.

4.2-10-1.0.0

New functionality

Compatibility with SIMPL V6.17.0. (#1938379)
Redhat 8.10 is now the base operating system in all the VMs, including custom VMs. (#1921689)
Support encrypted rhino keystore passwords. (#1746028)

Fixes

Syslog will now release file handles belonging to old log files when logrotate rotates them. (#1996713)
Fix of rvtconfig validating nonexistent configuration files as valid. (#1567579)
Print a warning if rvtconfig delete-node-type does not find any configuration for given group/deployment ID (#1922977)

4.2-8-1.0.0

Fixes

Updated RHEL 8.8 base image and system package versions of bpftool, container-selinux, containerd.io, docker-ce, docker-ce-cli, iwl1000-firmware, kernel, linux-firmware, nss, openssl, perf, `postgresql, python39, wget.
Updated Cassandra version to 4.1.7 to address security vulnerabilities.
Updated NGINX container version to 1.22.0-5 to address critical CVEs (CVE-2024-45491 and CVE-2024-5535)
Updated Apache Tomcat version to 9.0.96.
Updated Microsoft JDK version to 11.0.24 to address security vulnerabilities (CVE-2024-21147)
Fixed csar ansible scripts so RVT upgrades don’t fail halfway through if you did not enter a MW at the start (#1745177)
RVT VMs raise an alarm when a Read Only partition is detected (#1865522)

New functionality

Compatibility with SIMPL V6.16.2.
REM Certificates require IP Addresses as Alternate Names (#1550033)
Updated rvtconfig to support references to secret store in configuration YAML files. (#1684972)
Updated rvtconfig compare-config command so secrets are not included on such config comparison. (#1867787)
Added new rvtconfig commands to support rotation of Cassandra user and password secrets: add-cds-user, remove-cds-user, rotate-cds-password. (#1760090 and #1760091)

4.2-7-1.0.0

Fixes

Updated RHEL 8.8 base image and system package versions of avahi-libs, bind, bpftool, container-selinux, containerd.io, cups, cups-client, cups-libs, dhcp, docker-ce, docker-ce-cli, expat-devel, glibc, iproute, iwl1000-firmware, kernel, less, libfastjson, libmaxminddb, libuuid, libxml, linux-firmware, net-snmp, NetworkManager, nss, openssh, openssl, perf, `perl, platform-python-pip, postgresql, python39-setuptools, python3-bind, python3-cryptography, python3-libxml, python3-pip, rpm-plugin-selinux, selinux-policy, sqlite, sudo, tcpdump, util-linx-user, to address security vulnerabilities. (#1586651 and #1650638)
Updated Cassandra version to 4.1.5 to address security vulnerabilities.
Updated Microsoft JDK version to 11.0.23 to address security vulnerabilities (CVE-2023-41993 and CVE-2024-21892)
Fix of rvtconfig to support paths with symlinks. (#1611148)
Fix of rvtconfig validate with SMO profile tables validation. (#1667728)
Updated Cassandra DB GC logging configuration to generate smaller files with required info for memory consumption analysis.

New functionality

Added a date field in the output of the "rvtconfig list-config" command that indicates when a specific version of a config was uploaded to the CDS. (#1508571)
Compatibility with SIMPL V6.16.1.

4.2-4-1.0.0

Fixes

Updated system package versions of bind, bpftool, container-selinux, containerd.io, cups, cups-libs, docker-ce, docker-ce-cli, glibc, kernel, less, libX11, libuuid, nss, perf, platform-python-pip, python3-bind, python3-pip, util-linux-user, NetworkManager, to address security vulnerabilities. (#1512780)
Removed SNMP alarm monitoring memAvailReal as this was frequently incorrectly alarming and we now monitor available memory in SIMon. (#1087865)
Enhanced NTP setup robustness during bootstrap. (#1521440)

4.2-3-1.0.0

Fixes

Updated system package versions of avahi-libs, bpftool, container-selinux, containerd.io, curl, docker-ce, docker-ce-cli, gnutls, iproute, iwl1000-firmware, kernel, libfastjson, libmaxminddb, linux-firmware, nss, openssh, perl, postgresql, python, rpm, sqlite, sudo, tcpdump and tzdata, to address security vulnerabilities. (#1336181)

4.2-1-1.0.0

New functionality

Redhat 8 is now the base operating system in all the VMs, including custom VMs.
Compatibility with Redhat 8 based SIMPL V6.15 and MDM 3.8.

Fixes

Added support in rvtconfig to handle SDF files that are symbolic links. (#1296638)
Added raising an alarm if MDM certificate is soon to be expired. (#1095098)

4.1-7-1.0.0

Fixes

Update Cassandra 4.1 gc.log configuration options to reduce logging printed information and to allow analysis by censum tool. (#1161334)
Updated rvconfig set-desired-running-state command so it lowercases instance names for MDM instance IDs (as SIMPL/MDM do) (#994044)
Initconf sets directory and file permissions to the primary user (instead of root) when extracting custom data from yaml configuration files. (#510353)

4.1-5-1.0.0

New functionality

Add new charging option 'cap-ro' to support mixed CAMEL and Diameter Ro deployment. (#701809)
Add support for configuring multiple destination realms for Diameter Ro. (#701814)

Fixes

Updated example configuration for conference-mrf-uri to force TCP (#737570)
Corrected the SNMP alarm that was previously monitoring totalFree memory, it now checks for availReal memory instead. (#853447)
Modified the validation scripts to avoid checking rhino liveness & alerts when IPSMGW is disabled. (#737963)
Allow upload config if there is no live node for a given VM type (#511300)
Cassandra 4 container upgraded to 4.1.3 (#987347)
Updated system package versions of libwebp, bind, bpftool, kernel, open-vm-tools, perf, and python to address security vulnerabilities. (#1023775)

4.1-3-1.0.0

New functionality

The minimum supported version of SIMPL is now 6.13.3. (#290889)
TSN upgrades are supported when all other non-TSN nodes are already upgraded to 4.1.3-1.0.0 or higher.
TSN VM supports 2 Cassandra releases - 3.11.13 and 4.1.1; the default is 4.1.1 for new deployments, 3.11.13 can be selected by setting the custom-options parameter to cassandra_version_3_11 during a VM deployment. New rvtconfig cassandra-upgrade allows one-way switch from 3.11.13 to 4.1.1 without outage.
New rvtconfig backup-cds and rvtconfig restore-cds commands allow backup and restore of CDS data.
New rvtconfig set-desired-running-state command to set the desired state of non-TSN initconf processes.

Fixes

Fixed a race condition during quiesce that could result in a VM being turned off before it had completed writing data to CDS. (#733646)
Improved the output when rvtconfig gather-diags is given hostname or site ID parameters that do not exist in the SDF, or when the SDF does not specify any VNFCs. (#515668)
Fixed an issue where rvtconfig would display an exception stack trace if given an invalid secrets ID. (#515672)
rvtconfig gather-diags now reports the correct location of the downloaded diagnostics. (#515671)
The version arguments to rvtconfig are now optional, defaulting to the version from the SDF if it matches that of rvtconfig. (#380063)
There is now reduced verbosity in the output of the upload-config command and logs are now written to a log file. (#334928)
Fixed service alarms so they will correctly clear after a reboot. (#672674)
Fixed rvtconfig gather-diags to be able to take ssh-keys that are outside the rvtcofig container. (#734624)
Fixed the rvtconfig validate command to only try to validate the optional files if they are all present. (#735591)
The CDS event check now compares the target versions of the most recent and new events before the new event is deemed to be already in the CDS. (#724431)
Extend OutputTreeDiagNode data that the non-TSN initconf reports to MDM based on the DesiredRunningState set from rvtconfig. (#290889)
Updated system package versions of nss, openssl, sudo, krb5, zlib, kpartx, bind, bpftool, kernel and perf to address security vulnerabilities. (#748702)

4.1-1-1.0.0

The minimum supported version of SIMPL is now 6.11.2. (#443131)
Added a csar validate test that runs the same liveness checks as rvtconfig report-group-status. (#397932)
Added MDM status to csar validate tests and report-group-status. (#397933)
Added the same healthchecks done in csar validate as part of the healthchecks for csar update. (#406261)
Added a healthcheck script that runs before upgrade to ensure config has been uploaded for the uplevel version. (#399673)
Added a healthcheck script that runs before upgrade and enforces the use of rvtconfig enter-maintenance-window. (#399670)
rvtconfig upload-config and related commands now ignore specific files that may be in the input directory unnecessarily. (#386665)
An error message is now output when incorrectly formatted override yaml files are inputted rather than a lengthy stack trace. (#381281)
Added a service to the VMs to allow SIMPL VM to query their version information. (#230585)
CSARs are now named with a -v6 suffix for compatibility with version 6.11 of SIMPL VM. (#396587)
Fixed an issue where the new rvtconfig calculate-maintenance-window command raised a KeyError. (#364387)
Fixed an issue where rvtconfig could not delete a node type if no config had been uploaded. (#379137)
Improved logging when calls to MDM fail. (#397974)
Update initconf zip hashes to hash file contents and names. (#399675)
Fixed an issue where rvtconfig maintenance-window-status would report that a maintenance window is active when the end time had already passed. (#399670)
Config check is now done once per node rather than unnecessarily repeated when multiple nodes are updated. (#334928)
Fixed an issue where csar validate, update or heal could fail if the target VM’s disk was full. (#468274)
The --vm-version-source argument now takes the option sdf-version that uses the version in the SDF for a given node. There is now a check that the inputted version matches the SDF version and an optional argument --skip-version-check that skips this check. (#380063)
rvtconfig now checks for, and reports, unsupported configuration changes. (#404791)
Fixed Rhino not restarting automatically if it exited unexpectedly. (#397976)
Updated system package versions of bind, bpftool, device-mapper-multipath, expat, krb5-devel, libkadm5 and python-ply to address security vulnerabilities. (#406275, #441719)

4.1-0-1.0.0

First release in the 4.1 series.

Major new functionality

Added support for VM Recovery. Depending on different situations, this allows you to recover from malfunctioning VM nodes without affecting other nodes in the same VM group.
Added a low-privilege user, named viewer. This user has read-only access to diagnostics on the VMs and no superuser capabilities. (OPT-4831)

Backwards-incompatible changes

Access to VMs is now restricted to SSH keys only (no password authentication permitted). (OPT-4341)
The minimum supported version of SIMPL is now 6.10.1. (OPT-4677, OPT-4740, OPT-4722, OPT-4726, #207131) This includes different handling of secrets, see Secrets in the SDF for more details.
Made the system-notification-enabled, rhino-notification-enabled, and sgc-notification-enabled configuration options mandatory. Ensure these are specified in snmp-config.yaml. (#270272)

Other new functionality

Added a list of expected open ports to the documentation. (OPT-3724)
Added enter-maintenance-window and leave-maintenance-window commands to rvtconfig to control scheduled tasks. (OPT-4805)
Added a command liveness-check to all VMs for a quick health overview. (OPT-4785)
Added a command rvtconfig report-group-status for a quick health overview of an entire group. (OPT-4790)
Split rvtconfig delete-node-type into rvtconfig delete-node-type-version and rvtconfig delete-node-type-all-versions commands to support different use cases. (OPT-4685)
Added rvtconfig delete-node-type-retain-version command to search for and delete configuration and state related to versions other than a specified VM version. (OPT-4685)
Added rvtconfig calculate-maintenance-window to calculate the suggested duration for an upgrade maintenance window. (#240973)
Added rvtconfig gather-diags to retrieve all diags from a deployment. This has been optimised to gather diags in parallel safely based on the node types alongside disk usage safety checks. (#399682, #454095, #454094)
Added support for Cassandra username/password authentication. (OPT-4846)
system-config.yaml and routing-config.yaml are now fully optional, rather than requiring the user to provide an empty file if they didn’t want to provide any configuration. (OPT-3614)
Added tool mdm_certificate_updater.py to allow the update of MDM certificates on a VM. (OPT-4599)
The VMs' infrastructure software now runs on Python 3.9. (OPT-4013, OPT-4210)
All RPMs and Python dependencies updated to the newest available versions.
Updated the linkerd version to 1.7.5. (#360288)

Fixes

Fixed issue with default gateway configuration.
initconf is now significantly faster. (OPT-3144, OPT-3969)
Added some additional clarifying text to the disk usage alarms. (OPT-4046)
Ensured tasks which only perform configuration actions on the leader do not complete too early. (OPT-3657)
Tightened the set of open ports used for SNMP, linkerd and the Prometheus stats reporter. (OPT-4061, OPT-4058)
Disabled NTP server function on the VMs (i.e. other devices cannot use the VM as a time source). (OPT-4061)
The report-initconf command now returns a meaningful exit code. (DEV-474)
Alarms sent from initconf will have the source value of RVT monitor. (OPT-4521)
Removed unnecessary logging about not needing to clear an alarm that hadn’t been previously raised. (OPT-4752)
Authorized site-wide SSH authorized public keys specified in the SDF on all VMs within the site. (OPT-4729)
Reduced coupling to specific SIMPL VM version, to improve forwards compatibility with SIMPL. (OPT-4699)
Moved initconf.log, mdm-quiesce-notifier.log and bootstrap.log to /var/log/tas, with symlinks from old file paths to new file paths for backwards compatibility. (OPT-4904)
Added the rvt-gather_diags script to all node types.
Increased bootstrap timeout from 5 to 15 minutes to allow time (10 minutes) to establish connectivity to NTP servers. (OPT-4917)
Increase logging from tasks which run continuously, such as Postgres and SSH key management. (OPT-2773)
Avoid a tight loop when the CDS server is unavailable, which caused a high volume of logging. (OPT-4925)
SNMPv3 authentication key and privacy key are now stored encrypted in CDS. (OPT-3822)
Added a 3-minute timeout to the quiesce task runner to prevent quiescing from hanging indefinitely if one of the tasks hangs (OPT-5053)
The report-initconf command now reports quiesce failure separately to quiesce timeout. (#235188)
Added a list of SSH authorized keys for the low-privilege user to the product options section of the SDF. (#259004)
Store the public SSH host keys for VMs in a group in CDS instead of using ssh-keyscan to discover them. (#262397)
Add mechanism to CDS state to support forward-compatible extensions. (#230677)
Logs stored in CDS during quiesce will be removed after 28 days. (#314937)
The VMs are now named "Metaswitch Virtual Appliance". (OPT-3686)
Updated system package versions of bpftool, kernel, perf, python and xz to address security vulnerabilities.
Fixed an issue where VMs would send DNS queries for the localhost hostname. (#206220)
Fixed issue that meant rvtconfig upload-config would fail when running in an environment where the input device is not a TTY. When this case is detected upload-config will default to non-interactive confirmation -y. This preserves 4.0.0-26-1.0.0 (and earlier versions) in environments where an appropriate input device is not available. (#258542)
Fixed an issue where scheduled tasks could incorrectly trigger on a reconfiguration of their schedules. (#167317)
Added rvtconfig compare-config command and made rvtconfig upload-config check config differences and request confirmation before upload. There is a new -f flag that can be used with upload-config to bypass the configuration comparison. -y flag can now be used with upload-config to provide non-interactive confirmation in the case that the comparison shows differences. (OPT-4517)

Added the rvt-gather_diags script to all node types. (#94043)
Increased bootstrap timeout from 5 to 15 minutes to allow time (10 minutes) to establish connectivity to NTP servers. (OPT-4917)
Make rvtconfig validate not fail if fields are present in the SDF it does not recognize. (OPT-4699)
Added 3 new traffic schemes: "all signaling together except SIP", "all signaling together except HTTP", and "all traffic types separated". (#60997)
Fixed an issue where updated routing rules with the same target were not correctly applied. (#169195)
Scheduled tasks can now be configured to run more than once per day, week or month; and at different frequencies on different nodes. (OPT-4373)
Updated subnet validation to be done per-site rather than across the entire SDF deployment. (OPT-4412)
Fixed an issue where unwanted notification categories can be sent to SNMP targets. (OPT-4543)
Hardened linkerd by closing the prometheus stats port and changing the proxy port to listen on localhost only. (OPT-4840)
Added an optional node types field in the routing rules YAML configuration. This ensures the routing rule is only attempted to apply to VMs that are of the specified node types. (OPT-4079)
initconf will not exit on invalid configuration. VM will be allowed to quiesce or upload new configuration. (OPT-4389)
rvtconfig now only uploads a single group’s configuration to that group’s entry in CDS. This means that initconf no longer fails if some other node type has invalid configuration. (OPT-4392)
Fixed a race condition that could result in the quiescence tasks failing to run. (OPT-4468)
The rvtconfig upload-config command now displays leader seed information as part of the printed config version summary. (OPT-3962)
Added rvtconfig print-leader-seed command to display the current leader seed for a deployment and group. (OPT-3962)
Enum types stored in CDS cross-level refactored to string types to enable backwards compatibility. (OPT-4072)
Updated system package versions of bind, dhclient, dhcp, bpftool, libX11, linux-firmware, kernel, nspr, nss, openjdk and perf to address security vulnerabilities. (OPT-4332)
Made ip-address.ip field optional during validation for non-RVT VNFCs. RVT and Custom VNFCs will still require the field. (OPT-4532)
Fix SSH daemon configuration to reduce system log sizes due to error messages. (OPT-4538)
Allowed the primary user’s password to be configured in the product options in the SDF. (OPT-4448)
Updated system package version of glib2 to address security vulnerabilities. (OPT-4198)
Updated NTP services to ensure the system time is set correctly on system boot. (OPT-4204)
Include deletion of leader-node state in rvtconfig delete-node-type, resolving an issue where the first node deployed after running that command wouldn’t deploy until the leader was re-deployed. (OPT-4213)
Rolled back SIMPL support to 6.6.3. (OPT-43176)
Disk and service monitor notification targets that use SNMPv3 are now configured correctly if both SNMPv2c and SNMPv3 are enabled. (OPT-4054)
Fixed issue where initconf would exit (and restart 15 minutes later) if it received a 400 response from the MDM. (OPT-4106)
The Sentinel GAA Cassandra keyspace is now created with a replication factor of 3. (OPT-4080)
snmptrapd is now enabled even if no targets are configured for system monitor notifications, in order to log any notifications that would have been sent. (OPT-4102)
Fixed bug where the SNMPv3 user’s authentication and/or privacy keys could not be changed. (OPT-4102)
Making SNMPv3 queries to the VMs now requires encryption. (OPT-4102)
Fixed bug where system monitor notification traps would not be sent if SNMPv3 is enabled but v2c is not. Note that these traps are still sent as v2c only, even when v2c is not otherwise in use. (OPT-4102)
Removed support for the signaling and signaling2 traffic type names. All traffic types should now be specified using the more granular names, such as ss7. Refer to the page Traffic types and traffic schemes in the Install Guide for a list of available traffic types. (OPT-3820)
Ensured ntpd is in slew mode, but always step the time on boot before Cassandra, Rhino and OCSS7 start. (OPT-4131, OPT-4143)

4.0.0-14-1.0.0

Changed the rvtconfig delete-node-type command to also delete OID mappings as well as all virtual machine events for the specified version from cross-level group state. (OPT-3745)
Fixed systemd units so that systemd does not restart Java applications after a systemctl kill. (OPT-3938)
Added additional validation rules for traffic types in the SDF. (OPT-3834)
Increased the severity of SNMP alarms raised by the disk monitor. (OPT-3987)
Added --cds-address and --cds-addresses aliases for the -c parameter in rvtconfig. (OPT-3785)

4.0.0-13-1.0.0

Added support for separation of traffic types onto different network interfaces. (OPT-3818)
Improved the validation of SDF and YAML configuration files, and the errors reported when validation fails. (OPT-3656)
Added logging of the instance ID of the leader while waiting during initconf. (OPT-3558)
Do not use YAML anchors/aliases in the example SDFs. (OPT-3606)
Fixed a race condition that could cause initconf to hang indefinitely. (OPT-3742)
Improved error reporting in rvtconfig.
Updated SIMPL VM dependency to 6.6.1. (OPT-3857)
Adjusted linkerd OOM score so it will no longer be terminated by the OOM killer (OPT-3780)
Disabled all yum repositories. (OPT-3781)
Disabled the TLSv1 and TLSv1.1 algorithms for Java. (OPT-3781)
Changed initconf to treat the reload-resource-adaptors flag passed to rvtconfig as an intrinsic part of the configuration, when determining if the configuration has been updated. (OPT-3766)
Updated system package versions of bind, bpftool, kernel, nettle, perf and screen to address security vulnerabilities. (OPT-3874)
Added an option to rvtconfig dump-config to dump the config to a specified directory. (OPT-3876)
Fixed the confirmation prompt for rvtconfig delete-node-type and rvtconfig delete-deployment commands when run on the SIMPL VM. (OPT-3707)
Corrected a regression and a race condition that prevented configuration being reapplied after a leader seed change. (OPT-3862)

4.0.0-9-1.0.0

All SDFs are now combined into a single SDF named sdf-rvt.yaml. (OPT-2286)
Added the ability to set certain OS-level (kernel) parameters via YAML configuration. (OPT-3403)
Updated to SIMPL 6.5.0. (OPT-3358, OPT-3545)
Make the default gateway optional for the clustering interface. (OPT-3417)
initconf will no longer block startup of a configured VM if MDM is unavailable. (OPT-3206)
Enforce a single secrets-private-key in the SDF. (OPT-3441)
Made the message logged when waiting for config be more detailed about which parameters are being used to determine which config to retrieve. (OPT-3418)
Removed image name from example SDFs, as this is derived automatically by SIMPL. (OPT-3485)
Make systemctl status output for containerised services not print benign errors. (OPT-3407)
Added a command delete-node-type to facilitate re-deploying a node type after a failed deployment. (OPT-3406)
Updated system package versions of glibc, iwl1000-firmware, net-snmp and perl to address security vulnerabilities. (OPT-3620)

4.0.0-8-1.0.0

Fix bug (affecting 4.0.0-7-1.0.0 only) where rvtconfig was not reporting the public version string, but rather the internal build version (OPT-3268).
Update sudo package for CVE-2021-3156 vulnerability (OPT-3497)
Validate the product-options for each node type in the SDF. (OPT-3321)
Clustered MDM installations are now supported. Initconf will failover across multiple configured MDMs. (OPT-3181)

4.0.0-7-1.0.0

If YAML validation fails, print the filename where an error was found alongside the error. (OPT-3108)
Improved support for backwards compatibility with future CDS changes. (OPT-3274)
Change the report-initconf script to check for convergence since the last time config was received. (OPT-3341)
Improved exception handling when CDS is not available. (OPT-3288)
Change rvtconfig upload-config and rvtconfig initial-configure to read the deployment ID from the SDFs and not a command line argument. (OPT-3111)
Publish imageless CSARs for all node types. (OPT-3410)
Added message to initconf.log explaining some Cassandra errors are expected. (OPT-3081)
Updated system package versions of bpftool, dbus, kernel, nss, openssl and perf to address security vulnerabilities.

4.0.0-6-1.0.0

Updated to SIMPL 6.4.3. (OPT-3254)
When using a release version of rvtconfig, the correct this-rvtconfig version is now used. (OPT-3268)
All REM setup is now completed before restarting REM, to avoid unnecessary restarts. (OPT-3189)
Updated system package versions of bind-*, curl, kernel, perf and python-* to address security vulnerabilities. (OPT-3208)
Added support for routing rules on the Signaling2 interface. (OPT-3191)
Configured routing rules are now ignored if a VM does not have that interface. (OPT-3191)
Added support for absolute paths in rvtconfig CSAR container. (OPT-3077)
The existing Rhino OIDs are now always imported for the current version. (OPT-3158)
Changed behaviour of initconf to not restart resource adaptors by default, to avoid an unexpected outage. A restart can be requested using the --reload-resource-adaptors parameter to rvtconfig upload-config. (OPT-2906)
Changed the SAS resource identifier to match the provided SAS resource bundles. (OPT-3322)
Added information about MDM and SIMPL to the documentation. (OPT-3074)

4.0.0-4-1.0.0

Added list-config and describe-config operations to rvtconfig to list configurations already in CDS and describe the meaning of the special this-vm and this-rvtconfig values. (OPT-3064)
Renamed rvtconfig initial-configure to rvtconfig upload-config, with the old command remaining as a synonym. (OPT-3064)
Fixed rvtconfig pre-upgrade-init-cds to create a necessary table for upgrades from 3.1.0. (OPT-3048)
Fixed crash due to missing Cassandra tables when using rvtconfig pre-upgrade-init-cds. (OPT-3094)
rvtconfig pre-upgrade-init-cds and rvtconfig push-pre-upgrade-state now supports absolute paths in arguments. (OPT-3094)
Reduced timeout for DNS server failover. (OPT-2934)
Updated rhino-node-id max to 32767. (OPT-3153)
Diagnostics at the top of initconf.log now include system version and CDS group ID. (OPT-3056)
Random passwords for the Rhino client and server keystores are now generated and stored in CDS. (OPT-2636)
Updated to SIMPL 6.4.0. (OPT-3179)
Increased the healthcheck and decommision timeouts to 20 minutes and 15 minutes respectively. (OPT-3143)
Updated example SDFs to work with MDM 2.28.0, which is now the supported MDM version. (OPT-3028)
Added support to report-initconf for handling rolled over initconf-json.log files. The script can now read historic log files when building a report if necessary. (OPT-1440)
Fixed potential data loss in Cassandra when doing an upgrade or rollback. (OPT-3004)

4.0.0-3-1.0.0

Introduction

This manual describes the configuration, recovery and upgrade of Mobile Control Point VMs.

Introduction to the Mobile Control Point product

This manual is a reference guide to configure and upgrade the Rhino nodes used in Metaswitch’s Mobile Control Point product. Follow procedures in this manual only when directed by the Microsoft Teams Phone Mobile and Metaswitch Products Integration Guide and/or by your support representative.

The Mobile Control Point product is deployed into an existing VoLTE network (with a third-party IMS core and VoLTE TAS) or alongside the Metaswitch VoLTE solution. In the latter case, the TSN and REM services are provided by existing components within the VoLTE solution.

If you are deploying into an existing network:
- Use this guide to configure MCP, TSN, and REM.
If you are deploying alongside the Metaswitch VoLTE solution:
- Use this guide to configure MCP.
- Obtain configuration files for TSN in the VoLTE Solution here.
- Configure the VoLTE TSN using the RVT VM Install Guide.

Upgrades

Terminology

The current version of the VMs being upgraded is known as the downlevel version, and the version that the VMs are being upgraded to is known as the uplevel version.

A rolling upgrade is a procedure where each VM is replaced, one at a time, with a new VM running the uplevel version of software. The Mobile Control Point nodes are designed to allow rolling upgrades with little or no service outage time.

Method

As with installation, upgrades and rollbacks use the SIMPL VM. The user starts the upgrade process by running csar update on the SIMPL VM. SIMPL VM destroys, in turn, each downlevel node and replaces it with an uplevel node. This is repeated until all nodes have been upgraded.

Configuration for the uplevel nodes is uploaded in advance. As nodes are recreated, they immediately pick up the uplevel configuration and resume service.

If an upgrade goes wrong, rollback to the previous version is also supported.

See the Rolling upgrades and patches page for detailed instructions on how to perform an upgrade.

CSAR EFIX patches

CSAR EFIX patches, also known as VM patches, are based on the SIMPL VM’s csar efix command. The command is used to combine a CSAR EFIX file (a tar file containing some metadata and files to update), and an existing unpacked CSAR on the SIMPL. This creates a new, patched CSAR on the SIMPL VM. It does not patch any VMs in-place, but instead patches the CSAR itself offline on the SIMPL VM. A normal rolling upgrade is then used to migrate to the patched version.

Once a CSAR has been patched, the newly created CSAR is entirely separate, with no linkage between them. Applying patch EFIX_1 to the original CSAR creates a new CSAR with the changes from patch EFIX_1.

In general:

Applying patch EFIX_2 to the original CSAR will yield a new CSAR without the changes from EFIX_1.

Applying EFIX_2 to the already patched CSAR will yield a new CSAR with the changes from both EFIX_1 and EFIX_2.

VM patches which target SLEE components (e.g. a service or feature change) contain the full deployment state of Rhino, including all SLEE components. As such, if applying multiple patches of this type, only the last such patch will take effect, because the last patch contains all the SLEE components. In other words, a patch to SLEE components should contain all the desired SLEE component changes, relative to the original release of the VM. For example, patch EFIX_1 contains a fix for the HTTP RA SLEE component X and patch EFIX_2 contains an fix for a SLEE Service component Y. When EFIX_2 is generated it will contain the component X and Y fixes for the VM.

However, it is possible to apply a specific patch with a generic CSAR EFIX patch that only contains files to update. For example, patch EFIX_1 contains a specific patch that contains a fix for the HTTP RA SLEE component, and patch EFIX_2 contains an update to the linkerd config file. We can apply patch EFIX_1 to the original CSAR, then patch EFIX_2 to the patched CSAR.

We can also apply EFIX_2 first then EFIX_1.

When a CSAR EFIX patch is applied, a new CSAR is created with the versions of the target CSAR and the CSAR EFIX version.

Configuration

The configuration model is "declarative". To change the configuration, you upload a complete set of files containing the entire configuration for all nodes, and the VMs will attempt to alter their configuration ("converge") to match. This allows for integration with GitOps (keeping configuration in a source control system), as well as ease of generating configuration via scripts.

Configuration is stored in a database called CDS, which is a set of tables in a Cassandra database. These tables contain version information, so that you can upload configuration in preparation for an upgrade without affecting the live system.

The TSN nodes provide the CDS database. The tables are created automatically when the TSN nodes start for the first time; no manual installation or configuration of Cassandra is required.

Configuration files are written in YAML format. Using the rvtconfig tool, their contents can be syntax-checked and verified for validity and self-consistency before uploading them to CDS.

See VM configuration for detailed information about writing configuration files and the (re)configuration process.

Recovery

When a VM malfunctions, recover it using commands run from the SIMPL VM.

Two approaches are available:

heal, for cases where the failing VM(s) are sufficiently responsive
redeploy, for cases where you cannot heal the failing VM(s)

In both cases, the failing VM(s) are destroyed, and then replaced with an equivalent VM.

See VM recovery for detailed information about which procedure to use, and the steps involved.

VM types

This page describes the different Mobile Control Point VM type(s) documented in this manual.

Node types

TSN

A TAS Storage Node (TSN) is a VM that runs two Cassandra databases and provides these databases' services to the other node types in a Rhino VoLTE TAS deployment. TSNs run in a cluster with between 3 and 30 nodes per cluster depending on deployment size; load-balancing is performed automatically.

MCP

A Mobile Control Point (MCP) node is a VM that runs the Rhino MCP application. It is an application server that streamlines integration between the mobile network and Microsoft Teams, delivering optimized voice quality and network reliability to Teams communications. The MCP node queries Microsoft Teams to determine whether a given call involves a subscriber, and if it does it updates the call signaling to instruct the core network to route the call to the Microsoft Phone System. The MCP node optionally includes Forced Routing functionality too, which allows the operator to route calls to other third party services by modifying the call signaling without involving Microsoft Teams. MCP nodes run in a 3 node cluster.

Flavors

Each node type has a set of specifications that defines RAM, storage, and CPU requirements for different deployment sizes, known as flavors. Refer to the pages of the individual node types for flavor specifications.

The term flavor is used in OpenStack terminology to define the virtual hardware sizing of a VM, but the term is used here in the context of any host platform. On OpenStack you must create a flavor with the specified properties before deploying the VMs; on VMware you reference the flavor as a configuration property.

The sizes given in this section are the same for all host platforms.

Node types

TSN
MCP

TSN

The TSN nodes can be installed using the following flavors. This option has to be selected in the SDF. The selected option determines the values for RAM, hard disk space and virtual CPU count.

New deployments must not use flavors marked as DEPRECATED. Existing deployments can upgrade to VMs with deprecated flavors if resizing the VMs at the time of upgrade is not feasible.

Deploying VMs with sizings outside of the defined flavors is not supported.

Spec Use case Resources

Spec	Use case	Resources
`tsnsmall`	Lab trials and small-size production environments	RAM: 16384MB Hard Disk: 30GB CPU: 4 vCPUs
`tsn`	DEPRECATED. Mid-size production environments	RAM: 16384MB Hard Disk: 30GB CPU: 8 vCPUs
`tsnlarge`	DEPRECATED. Large-size production environments	RAM: 24576MB Hard Disk: 30GB CPU: 8 vCPUs
`tsn-medium-v2`	Mid-size production environments	RAM: 16384MB Hard Disk: 100GB CPU: 10 vCPUs
`tsn-large-v2`	Large-size production environments	RAM: 24576MB Hard Disk: 100GB CPU: 12 vCPUs

tsnsmall

Lab trials and small-size production environments

RAM: 16384MB
Hard Disk: 30GB
CPU: 4 vCPUs

tsn

DEPRECATED. Mid-size production environments

RAM: 16384MB
Hard Disk: 30GB
CPU: 8 vCPUs

tsnlarge

DEPRECATED. Large-size production environments

RAM: 24576MB
Hard Disk: 30GB
CPU: 8 vCPUs

tsn-medium-v2

Mid-size production environments

RAM: 16384MB
Hard Disk: 100GB
CPU: 10 vCPUs

tsn-large-v2

Large-size production environments

RAM: 24576MB
Hard Disk: 100GB
CPU: 12 vCPUs

MCP

The MCP nodes can be installed using the following flavors. This option has to be selected in the SDF. The selected option determines the values for RAM, hard disk space and virtual CPU count.

New deployments must not use flavors marked as DEPRECATED. Existing deployments can upgrade to VMs with deprecated flavors if resizing the VMs at the time of upgrade is not feasible.

Deploying VMs with sizings outside of the defined flavors is not supported.

Spec Use case Resources

Spec	Use case	Resources
`small`	Lab and small-size production environments	RAM: 16384MB Hard Disk: 30GB CPU: 4 vCPUs
`medium`	Mid-sized production deployments	RAM: 16384MB Hard Disk: 30GB CPU: 8 vCPUs

small

Lab and small-size production environments

RAM: 16384MB
Hard Disk: 30GB
CPU: 4 vCPUs

medium

Mid-sized production deployments

RAM: 16384MB
Hard Disk: 30GB
CPU: 8 vCPUs

Open Listening Ports

Each node type opens a different set of listening ports. Please refer to the pages for the individual node types.

Node types

TSN
MCP

TSN

The TSN node opens the following listening ports. Please refer to the tables below to configure your firewall rules appropriately.

Static ports

This table describes listening ports that will normally always be open at the specified port number.

Purpose	Port Number	Transport Layer Protocol	Interface	Notes
Cassandra cqlsh	9042	TCP	global
Cassandra nodetool	7199	TCP	global
Nodetool for the ramdisk Cassandra	17199	TCP	global
Ramdisk Cassandra cqlsh	19042	TCP	global
Cassandra cluster communication	7000	TCP	internal
Cluster communication for the ramdisk Cassandra	17000	TCP	internal
NTP - local administration	323	UDP	localhost	ntpd listens on both the IPv4 and IPv6 localhost addresses
Receive and forward SNMP trap messages	162	UDP	localhost
SNMP Multiplexing protocol	199	TCP	localhost
Allow querying of system-level statistics using SNMP	161	UDP	management
NTP - time synchronisation with external server(s)	123	UDP	management	This port is only open to this node’s registered NTP server(s)
Port for serving version information to SIMPL VM over HTTP	3000	TCP	management
SSH connections	22	TCP	management
Stats collection for SIMon	9100	TCP	management

Purpose

Port Number

Transport Layer Protocol

Interface

Notes

Cassandra cqlsh

9042

TCP

global

Cassandra nodetool

7199

TCP

global

Nodetool for the ramdisk Cassandra

17199

TCP

global

Ramdisk Cassandra cqlsh

19042

TCP

global

Cassandra cluster communication

7000

TCP

internal

Cluster communication for the ramdisk Cassandra

17000

TCP

internal

NTP - local administration

323

UDP

localhost

ntpd listens on both the IPv4 and IPv6 localhost addresses

Receive and forward SNMP trap messages

162

UDP

localhost

SNMP Multiplexing protocol

199

TCP

localhost

Allow querying of system-level statistics using SNMP

161

UDP

management

NTP - time synchronisation with external server(s)

123

UDP

management

This port is only open to this node’s registered NTP server(s)

Port for serving version information to SIMPL VM over HTTP

3000

TCP

management

SSH connections

22

TCP

management

Stats collection for SIMon

9100

TCP

management

Port ranges

This table describes listening ports which may be open at any port number within a range. Unless otherwise specified, a single port in a range will be open.

These port numbers are often in the ephemeral port range of 32768 to 60999.

Purpose	Minimum Port Number	Maximum Port Number	Transport Layer Protocol	Interface	Notes
Outbound SNMP traps	32768	60999	udp	localhost

Purpose

Minimum Port Number

Maximum Port Number

Transport Layer Protocol

Interface

Notes

Outbound SNMP traps

32768

60999

udp

localhost

MCP

The MCP node opens the following listening ports. Please refer to the tables below to configure your firewall rules appropriately.

Unresolved directive in mcp-vm-configuration-guide/vm-types/open-listening-ports/listening-ports-custom.adoc - include::../../../mcp-vm-configuration-guide-resources/vm-documentation-common/autogen/listening-port-manifest-custom.adoc[]

Upgrades

The steps below describe how to upgrade the nodes that make up your deployment. Select the steps that are appropriate for your VM host: OpenStack or VMware vSphere.

The supported versions for the platforms are listed below:

Platform	Supported versions
OpenStack	Newton to Wallaby
VMware vSphere	6.7 and 7.0

Platform

Supported versions

OpenStack

Newton to Wallaby

VMware vSphere

6.7 and 7.0

Live migration of a node to a new VMware vSphere host or a new OpenStack compute node is not supported. To move such a node to a new host, remove it from the old host and add it again to the new host.

Notes on parallel vs sequential upgrade

Some node types support parallel upgrade, that is, SIMPL upgrades multiple VMs simultaneously. This can save a lot of time when you upgrade large deployments.

SIMPL VM upgrades one quarter of the nodes (rounding down any remaining fraction) simultaneously, up to a maximum of ten nodes. Once all those nodes have been upgraded, SIMPL VM upgrades the next set of nodes. For example, in a deployment of 26 nodes, SIMPL VM upgrades the first six nodes simultaneously, then six more, then six more, then six more and finally the last two.

The following node types support parallel upgrade: . All other node types are upgraded one VM at a time.

Preparing for an upgrade

Task	More information
Set up and/or verify your OpenStack or VMware vSphere deployment	The installation procedures assume that you are upgrading VMs on an existing OpenStack or VMware vSphere host(s). Ensure the host(s) have sufficient vCPU, RAM and disk space capacity for the VMs. Note that for upgrades, you will temporarily need approximately one more VM’s worth of vCPU and RAM, and potentially more than double the disk space, than your existing deployment currently uses. You can later clean up older images to save disk space once you are happy that the upgrade was successful. Perform health checks on your host(s), such as checking for active alarms, to ensure they are in a suitable state to perform VM lifecycle operations. Ensure the VM host credentials that you will use in your SDF are valid and have sufficient permission to create/destroy VMs, power them on and off, change their properties, and access a VM’s terminal via the console.
Prepare service configuration	VM configuration information can be found at VM Configuration.

Task

More information

Set up and/or verify your OpenStack or VMware vSphere deployment

The installation procedures assume that you are upgrading VMs on an existing OpenStack or VMware vSphere host(s).

Ensure the host(s) have sufficient vCPU, RAM and disk space capacity for the VMs. Note that for upgrades, you will temporarily need approximately one more VM’s worth of vCPU and RAM, and potentially more than double the disk space, than your existing deployment currently uses. You can later clean up older images to save disk space once you are happy that the upgrade was successful.

Perform health checks on your host(s), such as checking for active alarms, to ensure they are in a suitable state to perform VM lifecycle operations.

Ensure the VM host credentials that you will use in your SDF are valid and have sufficient permission to create/destroy VMs, power them on and off, change their properties, and access a VM’s terminal via the console.

Prepare service configuration

VM configuration information can be found at VM Configuration.

Upgrades

The following table sets out the steps you need to execute a rolling upgrade of an existing VM deployment.

Step	Task	Link
Rolling upgrade	Rolling upgrade of TSN nodes	Rolling upgrade of TSN nodes
	Rolling upgrade of MCP nodes	Rolling upgrade of MCP nodes
	Post-acceptance tasks	Post-acceptance tasks

Step

Task

Link

Rolling upgrade

Rolling upgrade of TSN nodes

Rolling upgrade of MCP nodes

Post-acceptance tasks

Rolling upgrades and patches

This section provides information on performing a rolling upgrade of the VMs. These instructions are intended for any upgrades from at least MCP 1.4 version. To upgrade from prior versions, first upgrade to MCP 1.4 following Major upgrade to 1.4 instructions.

Each of the links below contains standalone instructions for upgrading a particular node type. The normal procedure is to upgrade only one node type in any given maintenance window, though you can upgrade multiple node types if the maintenance window is long enough.

Most call traffic will function as normal when the nodes are running different versions of the software. However, do not leave a deployment in this state for an extended period of time:

Certain call types cannot function when the cluster is running mixed software versions.
Part of the upgrade procedure is to disable scheduled tasks for the duration of the upgrade. Without these tasks running, the performance and health of the system will degrade.

Always finish upgrading all nodes of one node type before starting on another node type.

To apply a patch, first use the csar efix command on the SIMPL VM. This command creates a copy of a specified CSAR but with the patch applied. You then upgrade to the patched CSAR using the procedure for a normal rolling upgrade. Detailed instructions for using csar efix can be found within the individual upgrade pages below.

Rolling upgrade of TSN nodes
Rolling upgrade of MCP nodes
Post-acceptance tasks

Rolling upgrade of TSN nodes

The page is self-sufficient, that is, if you save or print this page, you have all the required information and instructions for upgrading TSN nodes. However, before starting the procedure, make sure you are familiar with the operation of Mobile Control Point nodes, this procedure, and the use of the SIMPL VM.

There are links in various places below to other parts of this book, which provide more detail about certain aspects of solution setup and configuration.
You can find more information about SIMPL VM commands in the SIMPL VM Documentation.
You can find more information on rvtconfig commands on the rvtconfig page.

Planning for the procedure

This procedure assumes that:

You are familiar with UNIX operating system basics, such as the use of vi and command-line tools like scp.
You have deployed a SIMPL VM, version 6.15.3 or later. Output shown on this page is correct for version 6.15.3 of the SIMPL VM; it may differ slightly on later versions.

Check you are using a supported VNFI version:

Platform	Supported versions
OpenStack	Newton to Wallaby
VMware vSphere	6.7 and 7.0

Platform

Supported versions

OpenStack

Newton to Wallaby

VMware vSphere

6.7 and 7.0

Important notes

Do not use these instructions for target versions whose major version component differs from 1.5.

Determine parameter values

In the below steps, replace parameters marked with angle brackets (such as <deployment ID>) with values as follows. (Replace the angle brackets as well, so that they are not included in the final command to be run.)

<deployment ID>: The deployment ID. You can find this at the top of the SDF. On this page, the example deployment ID mydeployment is used.
<site ID>: A number for the site in the form DC1 through DC32. You can find this at the top of the SDF.
<site name>: The name of the site. You can find this at the top of the SDF.
<MW duration in hours>: The duration of the reserved maintenance period in hours.
<CDS address>: The management IP address of the first TSN node.
<SIMPL VM IP address>: The management IP address of the SIMPL VM.
<CDS auth args> (authentication arguments): If your CDS has Cassandra authentication enabled, replace this with the parameters -u <username> -k <secret ID> to specify the configured Cassandra username and the secret ID of a secret containing the password for that Cassandra user. For example, ./rvtconfig -c 1.2.3.4 -u cassandra-user -k cassandra-password-secret-id ….

If your CDS is not using Cassandra authentication, omit these arguments.
<service group name>: The name of the service group (also known as a VNFC - a collection of VMs of the same type), which for Mobile Control Point nodes will consist of all TSN VMs in the site. This can be found in the SDF by identifying the TSN VNFC and looking for its name field.
<uplevel version>: The version of the VMs you are upgrading to. On this page, the example version 4.2-10-1.0.0 is used.
<SSH key secret ID>: The secret store ID of the SSH key used to access the node. You can find this in the SDF, or by running csar secret status on the SIMPL VM.
<diags-bundle>`: The name of the diagnostics bundle directory. If this directory doesn’t already exist, it will be created.

Tools and access

You must have the SSH keys required to access the SIMPL VM and the TSN VMs that are to be upgraded.

The SIMPL VM must have the right permissions on the VNFI. Refer to the SIMPL VM documentation for more information:

When starting an SSH session to the SIMPL VM, use a keepalive of 30 seconds. This prevents the session from timing out - SIMPL VM automatically closes idle connections after a few minutes.

When using OpenSSH (the SSH client on most Linux distributions), this can be controlled with the option ServerAliveInterval - for example, ssh -i <SSH private key file for SIMPL VM> -o ServerAliveInterval=30 admin@<SIMPL VM IP address>.

rvtconfig is a command-line tool for configuring and managing Mobile Control Point VMs. All TSN CSARs include this tool; once the CSAR is unpacked, you can find rvtconfig in the resources directory, for example:

$ cdcsars
$ cd tsn/<uplevel version>
$ cd resources
$ ls rvtconfig
rvtconfig

The rest of this page assumes that you are running rvtconfig from the directory in which it resides, so that it can be invoked as ./rvtconfig. It assumes you use the uplevel version of rvtconfig, unless instructed otherwise. If it is explicitly specified you must use the downlevel version, you can find it here:

$ cdcsars
$ cd tsn/<downlevel version>
$ cd resources
$ ls rvtconfig
rvtconfig

1. Preparation for upgrade procedure

These steps can be carried out in advance of the upgrade maintenance window. They should take less than 30 minutes to complete.

1.1 Ensure the SIMPL version is at least 6.15.3

Log into the SIMPL VM and run the command simpl-version. The SIMPL VM version is displayed at the top of the output:

SIMPL VM, version 6.15.3

Ensure this is at least 6.15.3. If not, contact your Customer Care Representative to organise upgrading the SIMPL VM before proceeding with the upgrade of the TSN VMs.

Output shown on this page is correct for version 6.15.3 of the SIMPL VM; it may differ slightly on later versions.

1.2 Upload and unpack uplevel CSAR

Your Customer Care Representative will have provided you with the uplevel TSN CSAR. Use scp to copy this to /csar-volume/csar/ on the SIMPL VM.

Once the copy is complete, run csar unpack /csar-volume/csar/<filename> on the SIMPL VM (replacing <filename> with the filename of the CSAR, which will end with .zip).

The csar unpack command may fail if there is insufficient disk space available. If this occurs, SIMPL VM will report this with instructions to remove some CSARs to free up disk space. You can list all unpacked CSARs with csar list and remove a CSAR with csar remove <node type>/<version>.

1.3 Verify the downlevel CSAR is present

On the SIMPL VM, run csar list.

Ensure that there is a TSN CSAR listed there with the current downlevel version.

1.4 Apply patches (if appropriate)

If you are upgrading to an image that doesn’t require patching, or have already applied the patch, skip this step.

To patch a set of VMs, rather than modify the code directly on the VMs, the procedure is instead to patch the CSAR on SIMPL VM and then upgrade to the patched CSAR.

If you have a patch to apply, it will be provided to you in the form of a .tar.gz file. Use scp to transfer this file to /csar-volume/csar/ on the SIMPL VM. Apply it to the uplevel CSAR by running csar efix tsn/<uplevel version> <patch file>, for example, csar efix tsn/4.2-10-1.0.0/csar-volume/csar/mypatch.tar.gz. This takes about five minutes to complete.

Check the output of the patching process states that SIMPL VM successfully created a patch. Example output for a patch named mypatch on version 4.2-10-1.0.0 and a vSphere deployment is:

Applying efix to tsn/4.2-10-1.0.0
Patching tsn-4.2-10-1.0.0-vsphere-mypatch.ova,  this may take several minutes
Updating manifest
Successfully created tsn/4.2-10-1.0.0-mypatch

You can verify that a patched CSAR now exists by running csar list again - you should see a CSAR named tsn/<uplevel version>-<patch name> (for the above example that would be tsn/4.2-10-1.0.0-mypatch).

For all future steps on this page, wherever you type the <uplevel version>, be sure to include the suffix with the patch name, for example 4.2-10-1.0.0-mypatch.

If the csar efix command fails, be sure to delete any partially-created patched CSAR before retrying the patch process. Run csar list as above, and if you see the patched CSAR, delete it with csar remove <CSAR>.

1.5 Prepare downlevel config directory

If you keep the configuration hosted on the SIMPL VM, find it and rename it to /home/admin/current-config. Verify the contents by running ls /home/admin/current-config and checking that at least the SDF (sdf-rvt.yaml) is present there. If it isn’t, or you prefer to keep your configuration outside of the SIMPL VM, then create this directory on the SIMPL VM:

mkdir /home/admin/current-config

Use scp to upload the SDF (sdf-rvt.yaml) to this directory.

1.6 Prepare uplevel config directory including an SDF

On the SIMPL VM, run mkdir /home/admin/uplevel-config. This directory is for holding the uplevel configuration files.

Use scp (or cp if the files are already on the SIMPL VM, for example in /home/admin/current-config as detailed in the previous section) to copy the following files to this directory. Include configuration for the entire deployment, not just the TSN nodes.

The uplevel configuration files.
The current SDF for the deployment.

1.7 Update SDF

Open the /home/admin/uplevel-config/sdf-rvt.yaml file using vi. Find the vnfcs section, and within that the TSN VNFC. Within the VNFC, locate the version field and change its value to the uplevel version, for example 4.2-10-1.0.0. Save and close the file.

You can verify the change you made by using diff -u2 /home/admin/current-config/sdf-rvt.yaml /home/admin/uplevel-config/sdf-rvt.yaml. The diff should look like this (context lines and line numbers may vary), with only a change to the version for the relevant node type:

--- sdf-rvt.yaml        2022-10-31 14:14:49.282166672 +1300
+++ sdf-rvt.yaml        2022-11-04 13:58:42.054003577 +1300
@@ -211,5 +211,5 @@
           shcm-vnf: shcm
       type: tsn
-      version: {example-downlevel-version}
+      version: 4.2-10-1.0.0
       vim-configuration:
         vsphere:

1.8 Reserve maintenance period

The upgrade procedure requires a maintenance period. For upgrading nodes in a live network, implement measures to mitigate any unforeseen events.

Ensure you reserve enough time for the maintenance period, which must include the time for a potential rollback.

To calculate the time required for the actual upgrade or roll back of the VMs, run rvtconfig calculate-maintenance-window -i /home/admin/uplevel-config -t tsn --site-id <site ID>. The output will be similar to the following, stating how long it will take to do an upgrade or rollback of the TSN VMs.

Nodes will be upgraded sequentially

-----

Estimated time for a full upgrade of 3 VMs: 24 minutes
Estimated time for a full rollback of 3 VMs: 24 minutes

-----

These numbers are a conservative best-effort estimate. Various factors, including IMS load levels, VNFI hardware configuration, VNFI load levels, and network congestion can all contribute to longer upgrade times.

These numbers only cover the time spent actually running the upgrade on SIMPL VM. You must add sufficient overhead for setting up the maintenance window, checking alarms, running validation tests, and so on.

The time required for an upgrade or rollback can also be manually calculated.

For node types that are upgraded sequentially, like this node type, calculate the upgrade time by using the number of nodes. The first node takes 30 minutes, while later nodes take 30 minutes each.

You must also reserve time for:

The SIMPL VM to upload the image to the VNFI. Allow 2 minutes, unless the connectivity between SIMPL and the VNFI is particularly slow.
Any validation testing needed to determine whether the upgrade succeeded.

1.9 Carry out dry run

The csar update dry run command carries out more extensive validation of the SDF and VM states than rvtconfig validate does.

Carrying out this step now, before the upgrade is due to take place, ensures problems with the SDF files are identified early and can be rectified beforehand.

The --dry-run operation will not make any changes to your VMs, it is safe to run at any time, although we always recommend running it during a maintenance window if possible.

Please run the following command to execute the dry run.

csar update --sdf /home/admin/uplevel-config/sdf-rvt.yaml --vnf tsn --sites <site name> --service-group <service_group> --skip force-in-series-update-with-l3-permission --dry-run

Confirm the output does not flag any problems or errors. The end of the command output should look similar to this.

You are about to update VMs as follows:

- VNF tsn:
    - For site <site name>:
      - update all VMs in VNFC service group <service_group>/4.2-8-1.0.0:
        - tsn-1 (index 0)
        - tsn-2 (index 1)
        - tsn-3 (index 2)

Please confirm the set of nodes you are upgrading looks correct, and that the software version against the service group correctly indicates the software version you are planning to upgrade to.

If you see any errors, please address them, then re-run the dry run command until it indicates success.

2. Upgrade procedure

2.1 Run basic validation tests on downlevel nodes

Before starting the upgrade procedure, run VNF validation tests from the SIMPL VM against the downlevel nodes: csar validate --vnf tsn --sdf /home/admin/current-config/sdf-rvt.yaml

This command performs various checks on the health of the VMs' networking and services:

================================
Running validation test scripts
================================
Running validation tests in CSAR 'tsn/{example-downlevel-version}'
Test running for: mydeployment-tsn-1
Running script: check_ping_management_ip…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-21-51.log

If all is well, then you should see the message All tests passed for CSAR 'tsn/{example-downlevel-version}'!.

If the VM validation fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log. The msg field under each ansible task explains why the script failed.

If there are failures, the upgrade cannot take place. Investigate them with the help of your Customer Care Representative and the Troubleshooting pages.

Once the VNF validation tests pass, you can proceed with the next step.

2.2 Disable scheduled tasks

Only perform this step if this is the first, or only, node type being upgraded.

Run ./rvtconfig enter-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID> --hours <MW duration in hours>. The output will look similar to:

Maintenance window is now active until 04 Nov 2022 21:38:06 NZDT.
Use the leave-maintenance-window command once maintenance is complete.

This will prevent scheduled tasks running on the VMs until the time given in the output.

If at any point in the upgrade process you wish to confirm the end time of the maintenance window, you can run ./rvtconfig maintenance-window-status -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>.

2.3 Verify uplevel config has no unexpected or prohibited changes

Run rm -rf /home/admin/config-output on the SIMPL VM to remove that directory if it already exists. Then use the command ./rvtconfig compare-config -c <CDS address> <CDS auth args> -d <deployment ID> --input /home/admin/uplevel-config --vm-version <downlevel version> --output-dir /home/admin/config-output -t tsn to compare the live configuration to the configuration in the /home/admin/uplevel-config directory.

Example output is listed below:

Validating node type against the schema: tsn
Redacting secrets…
Comparing live config for (version=4.2-8-1.0.0, deployment=mydeployment, group=RVT-tsn.DC1) with local directory (version=4.2-10-1.0.0, deployment=mydeployment, group=RVT-tsn.DC1)
Getting per-level configuration for version '4.2-8-1.0.0', deployment 'mydeployment', and group 'RVT-tsn.DC1'
  - Found config with hash 7f6cc1f3df35b43d6286f19c252311e09216e6115f314d0cb9cc3f3a24814395

Wrote currently uploaded configuration to /tmp/tmprh2uavbh
Redacting secrets…
Found
  - 1 difference in file sdf-rvt.yaml

Differences have been written to /home/admin/config-output
Error: Line 110 exited with status 3

You can then view the differences using commands such as cat /home/admin/config-output/sdf-rvt.yaml.diff (there will be one .diff file for every file that has differences). Aside from the version parameter in the SDF, there should normally be no other changes. If there are other unexpected changes, pause the procedure here and correct the configuration by editing the files in /home/admin/uplevel-config.

When performing a rolling upgrade, some elements of the uplevel configuration must remain identical to those in the downlevel configuration. The affected elements of the TSN configuration are described in the following list:

The secrets-private-key-id in the SDF must not be altered.
The ordering of the VM instances in the SDF must not be altered.
The IP addresses and other networking information in the SDF must not be altered.

The rvtconfig compare-config command reports any unsupported changes as errors, and may also emit warnings about other changes. For example:

Found
  - 1 difference in file sdf-rvt.yaml

The configuration changes have the following ERRORS.
File sdf-rvt.yaml:
  - Changing the IP addresses, subnets or traffic type assignments of live VMs is not supported. Restore the networks section of the tsn VNFC in the SDF to its original value before uploading configuration.

Ensure you address the reported errors, if any, before proceeding. rvtconfig will not upload a set of configuration files that contains unsupported changes.

2.4 Verify the TSN clusters are healthy

First, establish an SSH session to the management IP of the first TSN node. To check that the primary Cassandra cluster is healthy, run nodetool status on the TSN node:

Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  1.2.3.4        678.58 KiB  256          ?       f81bc71d-4ba3-4400-bed5-77f317105cce  rack1
UN  1.2.3.5        935.66 KiB  256          ?       aa134a07-ef93-4e09-8631-0e438a341e57  rack1
UN  1.2.3.6        958.34 KiB  256          ?       8ce540ea-8b52-433f-9464-1581d32a99bc  rack1

Check that all TSN nodes are present and listed as UN (Up and Normal). The output in the Owns colomn may differ and is irrelevant.

Next, check that the ramdisk-based Cassandra cluster is healthy. Run nodetool status -p 17199 on the TSN node. Again, check that all TSN nodes are present and listed as UN.

If either the primary or ramdisk-based Cassandra cluster is not healthy (i.e. not all TSN nodes show up as UN in the output from nodetool status and nodetool status -p 17199), stop the upgrade process here and troubleshoot the node. Only continue after both the Cassandra clusters are healthy.

2.5 Validate configuration

Run the command ./rvtconfig validate -t tsn -i /home/admin/uplevel-config to check that the configuration files are correctly formatted, contain valid values, and are self-consistent. A successful validation with no errors or warnings produces the following output.

Validating node type against the schema: tsn
YAML for node type(s) ['tsn'] validates against the schema

If the output contains validation errors, fix the configuration in the /home/admin/uplevel-config directory

If the output contains validation warnings, consider whether you wish to address them before performing the upgrade. The VMs will accept configuration that has validation warnings, but certain functions may not work.

2.6 Upload configuration

Upload the configuration to CDS:

./rvtconfig upload-config -c <CDS address> <CDS auth args> -t tsn -i /home/admin/uplevel-config --vm-version <uplevel version>

Check that the output confirms that configuration exists in CDS for both the current (downlevel) version and the uplevel version:

Validating node type against the schema: tsn
Preparing configuration for node type tsn…
Checking differences between uploaded configuration and provided files
Getting per-level configuration for version '4.2-10-1.0.0', deployment 'mydeployment-tsn', and group 'RVT-tsn.DC1'
  - No configuration found
No uploaded configuration was found: this appears to be a new install or upgrade
Encrypting secrets…
Wrote config for version '4.2-10-1.0.0', deployment ID 'mydeployment', and group ID 'RVT-tsn.DC1'
Versions in group RVT-tsn.DC1
=============================
  - Version: {example-downlevel-version}
    Config hash: 7f6cc1f3df35b43d6286f19c252311e09216e6115f314d0cb9cc3f3a24814395
    Active: mydeployment-tsn-1, mydeployment-tsn-2, mydeployment-tsn-3
    Leader seed: {downlevel-leader-seed}

  - Version: 4.2-10-1.0.0
    Config hash: f790cc96688452fdf871d4f743b927ce8c30a70e3ccb9e63773fc05c97c1d6ea
    Active: None
    Leader seed:

2.7 Collect diagnostics

We recommend gathering diagnostic archives for all TSN VMs in the deployment.

On the SIMPL VM, run the command

If <diags-bundle> does not exist, the command will create the directory for you.

Each diagnostic archive can be up to 200 MB per VM. Ensure you have enough disk space on the SIMPL VM to collect all diagnostics. The command will be aborted if the SIMPL VM does not have enough disk space to collect all diagnostic archives from all the VMs in your deployment specified in the provided SDF.

2.8 Pause Initconf in non-TSN nodes

Set the running state of initconf processes in non-TSN VMs to a paused state.

./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Stopped.

You should see an output similar to this, indicating that the initconf process of non-TSN nodes are in state Stopped.

Connected to MDM at 10.0.0.192
Put desired state = Stopped for Instance mydeployment-mag-1
Put desired state = Stopped for Instance mydeployment-shcm-1
Put desired state = Stopped for Instance mydeployment-mmt-gsm-1
Put desired state = Stopped for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Stopped",
    "mydeployment-shcm-1": "Stopped",
    "mydeployment-mmt-gsm-1": "Stopped",
    "mydeployment-smo-1": "Stopped"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

2.9 Take a CDS backup

Take a backup of the CDS database by issuing the command below.

./rvtconfig backup-cds --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --output-dir <backup-cds-bundle> --ssh-key-secret-id <SSH key secret ID> -c <CDS address> <CDS auth args>

The output should look like this:

Capturing cds_keyspace_schema
Capturing ramdisk_keyspace_schema
cleaning snapshot metaswitch_tas_deployment_snapshot
...
...
...
running nodetool snapshot command
Requested creating snapshot(s) for [metaswitch_tas_deployment_info] with snapshot name [metaswitch_tas_deployment_snapshot] and options {skipFlush=false}
...
...
...

Final CDS backup archive has been created at <backup-cds-bundle>/tsn_cassandra_backup_20230711095409.tar

If the command ended successfully, you can continue with the procedure. If it failed, do not continue the procedure without a CDS backup and contact your Customer Care Representative to investigate the issue.

2.10 Begin the upgrade

Carry out a csar import of the tsn VMs

Prepare for the upgrade by running the following command on the SIMPL VM csar import --vnf tsn --sdf /home/admin/uplevel-config/sdf-rvt.yaml to import terraform templates.

First, SIMPL VM connects to your VNFI to check the credentials specified in the SDF and QSG are correct. If this is successful, it displays the message All validation checks passed..

Type no. The csar import will be aborted.
Investigate why there are unexpected changes in the SDF.
Correct the SDF as necessary.
Retry this step.

Otherwise, accept the prompt by typing yes.

After you do this, SIMPL VM will import the terraform state. If successful, it outputs this message:

Done. Imported all VNFs.

If the output does not look like this, investigate and resolve the underlying cause, then re-run the import command again until it shows the expected output.

Begin the upgrade of the tsn VMs

First, SIMPL VM connects to your VNFI to check the credentials specified in the SDF and QSG are correct. If this is successful, it displays the message All validation checks passed..

Next, SIMPL VM compares the specified SDF with the SDF used for the csar import command above. Since the contents have not changed since you ran the csar import, the output should indicate that the SDF has not changed.

If there are differences in the SDF, a message similar to this will be output:

Comparing current SDF with previously used SDF.
site site1:
    tsn:
        tsn-1:
             networks:
             - ip-addresses:
                 ip:
            -    - 10.244.21.106
            +    - 10.244.21.196
                 - 10.244.21.107
               name: Management
               subnet: mgmt-subnet
Do you want to continue? [yes/no]: yes

If you see this, you must:

Type no. The upgrade will be aborted.
Go back to the start of the upgrade section and run through the csar import section again, until the SDF differences are resolved.
Retry this step.

Afterwards, the SIMPL VM displays the VMs that will be upgraded:

You are about to update VMs as follows:

- VNF tsn:
    - For site site1:
    - update all VMs in VNFC service group mydeployment-tsn/4.2-10-1.0.0:
        - mydeployment-tsn-1 (index 0)
        - mydeployment-tsn-2 (index 1)
        - mydeployment-tsn-3 (index 2)

Type 'yes' to continue, or run 'csar update --help' for more information.

Continue? [yes/no]:

Check this output displays the version you expect (the uplevel version) and exactly the set of VMs that you expect to be upgraded. If anything looks incorrect, type no to abort the upgrade process, and recheck the VMs listed and the version field in /home/admin/uplevel-config/sdf-rvt.yaml. Also check you are passing the correct SDF path and --vnf argument to the csar update command.

Otherwise, accept the prompt by typing yes.

Next, each VM in your cluster will perform health checks. If successful, the output will look similar to this.

Running ansible scripts in '/home/admin/.local/share/csar/tsn/4.1-1-1.0.0/update_healthcheck_scripts' for node 'mydeployment-tsn-1'
Running script: check_config_uploaded…
Running script: check_ping_management_ip…
Running script: check_maintenance_window…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Running script: check_rhino_alarms…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-05-02-05-51.log
All ansible update healthchecks have passed successfully

If a script fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log.

Running ansible scripts in '/home/admin/.local/share/csar/tsn/4.1-1-1.0.0/update_healthcheck_scripts' for node 'mydeployment-tsn-1'
Running script: check_config_uploaded...
Running script: check_ping_management_ip...
Running script: check_maintenance_window...
Running script: check_can_sudo...
Running script: check_converged...
Running script: check_liveness...
ERROR: Script failed. Specific error lines from the ansible output will be logged to screen. For more details see the ansible_output file (/var/log/csar/ansible_output-2023-01-05-21-02-17.log). This file has only ansible output, unlike the main command log file.

fatal: [mydeployment-tsn-1]: FAILED! => {"ansible_facts": {"liveness_report": {"cassandra": true, "cassandra_ramdisk": true, "cassandra_repair_timer": true, "cdsreport": true, "cleanup_sbbs_activities": false, "config_hash_report": true, "docker": true, "initconf": true, "linkerd": true, "mdm_state_and_status_ok": true, "mdmreport": true, "nginx": true, "no_ocss7_alarms": true, "ocss7": true, "postgres": true, "rem": true, "restart_rhino": true, "rhino": true}}, "attempts": 1, "changed": false, "msg": "The following liveness checks failed: ['cleanup_sbbs_activities']", "supports_liveness_checks": true}
Running script: check_rhino_alarms...
Detailed output can be found in /var/log/csar/ansible_output-2023-01-05-21-02-17.log
***Some tests failed for CSAR 'tsn/4.1-1-1.0.0' - see output above***

The msg field under each ansible task explains why the script failed.

If there are failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.

Retry this step once all failures have been corrected by running the command csar update … as described at the begining of this section.

Once the pre-upgrade health checks have been verified, SIMPL VM now proceeds to upgrade each of the VMs. Monitor the further output of csar update as the upgrade progresses, as described in the next step.

2.11 Monitor `csar update` output

For each VM:

The VM will be quiesced and destroyed.
SIMPL VM will create a replacement VM using the uplevel version.
The VM will automatically start applying configuration from the files you uploaded to CDS in the above steps.
Once configuration is complete, the VM will be ready for service. At this point, the csar update command will move on to the next TSN VM.

The output of the csar update command will look something like the following, repeated for each VM.

Decommissioning 'dc1-mydeployment-tsn-1' in MDM, passing desired version 'vm.version=4.2-10-1.0.0', with a 900 second timeout
dc1-mydeployment-tsn-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'decommissioned'
dc1-mydeployment-tsn-1: Current status 'in_progress'- desired status 'complete'
…
dc1-mydeployment-tsn-1: Current status 'complete', current state 'decommissioned' - desired status 'complete', desired state 'decommissioned'
Running update for VM group [0]
Performing health checks for service group mydeployment-tsn with a 1200 second timeout
Running MDM status health-check for dc1-mydeployment-tsn-1
dc1-mydeployment-tsn-1: Current status 'in_progress'- desired status 'complete'
…
dc1-mydeployment-tsn-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'

If you see this error:

Failed to retrieve instance summary for 'dc1-<VM hostname>' from MDM
(404)
Reason: Not Found

it can be safely ignored, provided that you do eventually see a Current status 'in_progress'… line. This error is caused by the newly-created VM taking a few seconds to register itself with MDM when it boots up.

Once all VMs have been upgraded, you should see this success message, detailing all the VMs that were upgraded and the version they are now running, which should be the uplevel version.

Successful VNF with full per-VNFC upgrade state:

VNF: tsn
VNFC: mydeployment-tsn
    - Node name: mydeployment-tsn-1
      - Version: 4.2-10-1.0.0
      - Build Date: 2022-11-21T22:58:24+00:00
    - Node name: mydeployment-tsn-2
      - Version: 4.2-10-1.0.0
      - Build Date: 2022-11-21T22:58:24+00:00
    - Node name: mydeployment-tsn-3
     - Version: 4.2-10-1.0.0
      - Build Date: 2022-11-21T22:58:24+00:00

If the upgrade fails, you will see Failed VNF instead of Successful VNF in the above output. There will also be more details of what went wrong printed before that. Refer to the Backout procedure below.

2.12 Run basic validation tests

Run csar validate --vnf tsn --sdf /home/admin/uplevel-config/sdf-rvt.yaml to perform some basic validation tests against the uplevel nodes.

This command first performs a check that the nodes are connected to MDM and reporting that they have successfully applied the uplevel configuration:

========================
Performing healthchecks
========================
Commencing healthcheck of VNF 'tsn'
Performing health checks for service group mydeployment-tsn with a 0 second timeout
Running MDM status health-check for dc1-mydeployment-tsn-1
dc1-mydeployment-tsn-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-tsn-2
dc1-mydeployment-tsn-2: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-tsn-3
dc1-mydeployment-tsn-3: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'

After that, it performs various checks on the health of the VMs' networking and services:

================================
Running validation test scripts
================================
Running validation tests in CSAR 'tsn/4.2-10-1.0.0'
Test running for: mydeployment-tsn-1
Running script: check_ping_management_ip…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-21-51.log

If all is well, then you should see the message All tests passed for CSAR 'tsn/<uplevel version>'!.

If the VM validation fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log.

Running validation test scripts
================================
Running validation tests in CSAR 'tsn/4.2-10-1.0.0'
Test running for: mydeployment-tsn-1
Running script: check_ping_management_ip...
Running script: check_can_sudo...
Running script: check_converged...
Running script: check_liveness...
ERROR: Script failed. Specific error lines from the ansible output will be logged to screen. For more details see the ansible_output file (/var/log/csar/ansible_output-2023-01-06-03-40-37.log). This file has only ansible output, unlike the main command log file.

fatal: [mydeployment-tsn-1]: FAILED! => {"ansible_facts": {"liveness_report": {"cassandra": true, "cassandra_ramdisk": true, "cassandra_repair_timer": true, "cdsreport": true, "cleanup_sbbs_activities": false, "config_hash_report": true, "docker": true, "initconf": true, "linkerd": true, "mdm_state_and_status_ok": true, "mdmreport": true, "nginx": true, "no_ocss7_alarms": true, "ocss7": true, "postgres": true, "rem": true, "restart_rhino": true, "rhino": true}}, "attempts": 1, "changed": false, "msg": "The following liveness checks failed: ['cleanup_sbbs_activities']", "supports_liveness_checks": true}
Running script: check_rhino_alarms...
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-40-37.log
***Some tests failed for CSAR 'tsn/4.2-10-1.0.0' - see output above***

----------------------------------------------------------


WARNING: Validation script tests failed for the following CSARs:
  - 'tsn/4.2-10-1.0.0'
See output above for full details

The msg field under each ansible task explains why the script failed.

If there are failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.

3. Post-upgrade procedure

3.1 Check Cassandra version and status

Verify the status of the cassandra clusters. First, check that the primary Cassandra cluster is healthy and in the correct version. Run ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <CDS Address> for every TSN node.

Next, check that the ramdisk-based Cassandra cluster is healthy and in the correct version. Run ./rvtconfig cassandra-status --ssh-key-secret-id <SSH key secret ID> --ip-addresses <CDS Address> --ramdisk for every TSN node.

For both Cassandra clusters, check the output and verify the running cassandra version is {cassandra-version}

=====> Checking cluster status on node 1.2.3.4
Setting up a connection to 172.0.0.224
Connected (version 2.0, client OpenSSH_7.4)
Auth banner: b'WARNING: Access to this system is for authorized users only.\n'
Authentication (publickey) successful!
ReleaseVersion: {cassandra-version}
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load      Tokens  Owns (effective)  Host ID                               Rack
UN  1.2.3.4  1.59 MiB   256          100.0%            3381adf4-8277-4ade-90c7-eb27c9816258  rack1
UN  1.2.3.5  1.56 MiB   256          100.0%            3bb6f68f-0140-451f-90a9-f5881c3fc71e  rack1
UN  1.2.3.6  1.54 MiB   256          100.0%            dbafa670-a2d0-46a7-8ed8-9a5774212e4c  rack1

Cluster Information:
    Name: mydeployment-tsn
    Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
    DynamicEndPointSnitch: enabled
    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
    Schema versions:
        1c15f3b1-3374-3597-bc45-a473179eab28: [1.2.3.4, 1.2.3.5, 1.2.3.6]
Stats for all nodes:
    Live: 3
    Joining: 0
    Moving: 0
    Leaving: 0
    Unreachable: 0

Data Centers:
    dc1 #Nodes: 3 #Down: 0

Database versions:
    {cassandra-version}: [1.2.3.4:7000, 1.2.3.5:7000, 1.2.3.6:7000]

Keyspaces:
...

3.2 Resume Initconf in non-TSN nodes

Run ./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Started.

You should see an output similar to this, indicating that the non-TSN nodes are un the desired running state Started.

Connected to MDM at 10.0.0.192
Put desired state = Started for Instance mydeployment-mag-1
Put desired state = Started for Instance mydeployment-shcm-1
Put desired state = Started for Instance mydeployment-mmt-gsm-1
Put desired state = Started for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Started",
    "mydeployment-shcm-1": "Started",
    "mydeployment-mmt-gsm-1": "Started",
    "mydeployment-smo-1": "Started"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

3.3 Enable scheduled tasks

Run ./rvtconfig leave-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>. This will allow scheduled tasks to run on the VMs again. The output should look like this:

Maintenance window has been terminated.
The VMs will resume running scheduled tasks as per their configured schedules.

3.4 Run verification tests

If you have prepared verification tests for the deployment, run these now.

4. Post-acceptance

The upgrade of the TSN nodes is now complete.

5. Backout Method of Procedure

First, gather the log history of the downlevel VMs. Run mkdir -p /home/admin/rvt-log-history and ./rvtconfig export-log-history -c <CDS address> <CDS auth args> -d <deployment ID> --zip-destination-dir /home/admin/rvt-log-history --secrets-private-key-id <secret ID>. The secret ID you specify for --secrets-private-key-id should be the secret ID for the secrets private key (the one used to encrypt sensitive fields in CDS). You can find this in the product-options section of each VNFC in the SDF.

Make sure the <CDS address> used is one of the remaining available TSN nodes.

Next, how much of the backout procedure to run depends on how much progress was made with the upgrade. If you did not get to the point of running csar update, start from the Cleanup after backout section below.

If you encounter further failures during recovery or rollback, contact your Customer Care Representative to investigate and recover the deployment.

5.1 Collect diagnostics

We recommend gathering diagnostic archives for all TSN VMs in the deployment.

On the SIMPL VM, run the command

If <diags-bundle> does not exist, the command will create the directory for you.

Each diagnostic archive can be up to 200 MB per VM. Ensure you have enough disk space on the SIMPL VM to collect all diagnostics. The command will be aborted if the SIMPL VM does not have enough disk space to collect all diagnostic archives from all the VMs in your deployment specified in the provided SDF.

5.2 Disable scheduled tasks

Only perform this step if this is the first, or only, node type being rolled back. You can also skip this step if the rollback is occurring immediately after a failed upgrade, such that the existing maintenance window is sufficient. You can check the remaining maintenance window time with ./rvtconfig maintenance-window-status -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>.

To start a new maintenance window (or extend an existing one), run ./rvtconfig enter-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID> --hours <MW duration in hours>. The output will look similar to:

Maintenance window is now active until 04 Nov 2022 21:38:06 NZDT.
Use the leave-maintenance-window command once maintenance is complete.

This will prevent scheduled tasks running on the VMs until the time given in the output.

If at any point in the rollback process you wish to confirm the end time of the maintenance window, you can run the above rvtconfig maintenance-window-status command.

5.3 Pause Initconf in non-TSN nodes

Set the running state of initconf processes in non-TSN VMs to a paused state.

./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Stopped.

You should see an output similar to this, indicating that the initconf process of non-TSN nodes are in state Stopped.

Connected to MDM at 10.0.0.192
Put desired state = Stopped for Instance mydeployment-mag-1
Put desired state = Stopped for Instance mydeployment-shcm-1
Put desired state = Stopped for Instance mydeployment-mmt-gsm-1
Put desired state = Stopped for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Stopped",
    "mydeployment-shcm-1": "Stopped",
    "mydeployment-mmt-gsm-1": "Stopped",
    "mydeployment-smo-1": "Stopped"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

5.4 Take a CDS backup

Take a backup of the CDS database by issuing the command below.

./rvtconfig backup-cds --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --output-dir <backup-cds-bundle> --ssh-key-secret-id <SSH key secret ID> -c <CDS address> <CDS auth args>

The output should look like this:

Capturing cds_keyspace_schema
Capturing ramdisk_keyspace_schema
cleaning snapshot metaswitch_tas_deployment_snapshot
...
...
...
running nodetool snapshot command
Requested creating snapshot(s) for [metaswitch_tas_deployment_info] with snapshot name [metaswitch_tas_deployment_snapshot] and options {skipFlush=false}
...
...
...

Final CDS backup archive has been created at <backup-cds-bundle>/tsn_cassandra_backup_20230711095409.tar

If the command ended successfully, you can continue with the procedure. If it failed, do not continue the procedure without a CDS backup and contact your Customer Care Representative to investigate the issue.

5.5 Roll back VMs

To roll back the VMs, the procedure is essentially to perform an "upgrade" back to the downlevel version, that is, with <downlevel version> and <uplevel version> swapped. You can refer to the Begin the upgrade section above for details on the prompts and output of csar update.

Once the csar update command completes successfully, proceed with the next steps below.

The <index range> argument is a comma-separated list of VM indices, where the first VM has index 0. Only include the VMs you want to roll back. For example, suppose there are three TSN VMs named tsn-1, tsn-2 and tsn-3. If VMs tsn-1 and tsn-3 need to be rolled back, the index range is 0,2. Do not include any spaces in the index range.

Contiguous ranges can be expressed with a hyphen (-). For example, 1,2,3,4 can be abbreviated to 1-4.

If you want to roll back just one node, use --index-range 0 (or whichever index).

If you want to roll back all nodes, omit the --index-range argument completely.

The --index-range argument requires that a single site, service group and VNF are specified with --sites, --service-group and --vnf arguments.

If csar update fails, check the output for which VMs failed. For each VM that failed, run csar redeploy --vm <failed VM name> --sdf /home/admin/current-config/sdf-rvt.yaml.

If csar redeploy fails, contact your Customer Care Representative to start the recovery procedures.

If all the csar redeploy commands were successful, then run the previously used csar update command on the VMs that were neither rolled back nor redeployed yet.

To help you determine which VMs were neither rolled back nor redeployed yet,

5.6 Delete uplevel CDS data

Run ./rvtconfig delete-node-type-version -c <CDS address> <CDS auth args> -t tsn --vm-version <uplevel version> -d <deployment ID> --site-id <site ID> --ssh-key-secret-id <SSH key secret ID> to remove data for the uplevel version from CDS.

Example output from the command:

The following versions will be deleted: 4.2-10-1.0.0
The following versions will be retained: {example-downlevel-version}
Do you wish to continue? Y/[N] Y

Check the versions are the correct way around, and then confirm this prompt to delete the uplevel data from CDS.

5.7 Cleanup after backout

If desired, remove the uplevel CSAR. On the SIMPL VM, run csar remove tsn/<uplevel version>.
If desired, remove the uplevel config directories on the SIMPL VM with rm -rf /home/admin/uplevel-config. We recommend these files are kept in case the upgrade is attempted again at a later time.

5.8 Resume Initconf in non-TSN nodes

Run ./rvtconfig set-desired-running-state --sdf /home/admin/uplevel-config/sdf-rvt.yaml --site-id <site ID> --state Started.

You should see an output similar to this, indicating that the non-TSN nodes are un the desired running state Started.

Connected to MDM at 10.0.0.192
Put desired state = Started for Instance mydeployment-mag-1
Put desired state = Started for Instance mydeployment-shcm-1
Put desired state = Started for Instance mydeployment-mmt-gsm-1
Put desired state = Started for Instance mydeployment-smo-1
Getting desired state for each instance.
Final desired state for instances: {
    "mydeployment-mag-1": "Started",
    "mydeployment-shcm-1": "Started",
    "mydeployment-mmt-gsm-1": "Started",
    "mydeployment-smo-1": "Started"
}

This desired running state does not mean the VMs, Rhino, SGC, etc., are started or stopped. This desired running state indicates the status of the initconf process.

When in Stopped state, the initconf will pause any configuration activities.
When in Started state, the initconf will resume any configuration activities.

5.9 Enable scheduled tasks

Run ./rvtconfig leave-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>. This will allow scheduled tasks to run on the VMs again. The output should look like this:

Maintenance window has been terminated.
The VMs will resume running scheduled tasks as per their configured schedules.

5.10 Verify service is restored

Perform verification tests to ensure the deployment is functioning as expected.

If applicable, contact your Customer Care Representative to investigate the cause of the upgrade failure.

Before re-attempting the upgrade, ensure you have run the rvtconfig delete-node-type-version command, Attempting an upgrade while there is stale uplevel data in CDS can result in needing to completely redeploy one or more VMs.

You will also need to re-upload the uplevel configuration.

Rolling upgrade of MCP nodes

The page is self-sufficient, that is, if you save or print this page, you have all the required information and instructions for upgrading MCP nodes. However, before starting the procedure, make sure you are familiar with the operation of Mobile Control Point nodes, this procedure, and the use of the SIMPL VM.

There are links in various places below to other parts of this book, which provide more detail about certain aspects of solution setup and configuration.
You can find more information about SIMPL VM commands in the SIMPL VM Documentation.
You can find more information on rvtconfig commands on the rvtconfig page.

Planning for the procedure

This procedure assumes that:

You are familiar with UNIX operating system basics, such as the use of vi and command-line tools like scp.
You have deployed a SIMPL VM, version 6.15.3 or later. Output shown on this page is correct for version 6.15.3 of the SIMPL VM; it may differ slightly on later versions.

Check you are using a supported VNFI version:

Platform	Supported versions
OpenStack	Newton to Wallaby
VMware vSphere	6.7 and 7.0

Platform

Supported versions

OpenStack

Newton to Wallaby

VMware vSphere

6.7 and 7.0

Important notes

Do not use these instructions for target versions whose major version component differs from 1.5.

Determine parameter values

In the below steps, replace parameters marked with angle brackets (such as <deployment ID>) with values as follows. (Replace the angle brackets as well, so that they are not included in the final command to be run.)

<deployment ID>: The deployment ID. You can find this at the top of the SDF. On this page, the example deployment ID mydeployment is used.
<site ID>: A number for the site in the form DC1 through DC32. You can find this at the top of the SDF.
<site name>: The name of the site. You can find this at the top of the SDF.
<MW duration in hours>: The duration of the reserved maintenance period in hours.
<CDS address>: The management IP address of the first TSN node.
<SIMPL VM IP address>: The management IP address of the SIMPL VM.
<CDS auth args> (authentication arguments): If your CDS has Cassandra authentication enabled, replace this with the parameters -u <username> -k <secret ID> to specify the configured Cassandra username and the secret ID of a secret containing the password for that Cassandra user. For example, ./rvtconfig -c 1.2.3.4 -u cassandra-user -k cassandra-password-secret-id ….

If your CDS is not using Cassandra authentication, omit these arguments.
<service group name>: The name of the service group (also known as a VNFC - a collection of VMs of the same type), which for Mobile Control Point nodes will consist of all MCP VMs in the site. This can be found in the SDF by identifying the MCP VNFC and looking for its name field.
<uplevel version>: The version of the VMs you are upgrading to. On this page, the example version 4.2-10-1.0.0 is used.
<SSH key secret ID>: The secret store ID of the SSH key used to access the node. You can find this in the SDF, or by running csar secret status on the SIMPL VM.
<diags-bundle>`: The name of the diagnostics bundle directory. If this directory doesn’t already exist, it will be created.

Tools and access

You must have the SSH keys required to access the SIMPL VM and the MCP VMs that are to be upgraded.

The SIMPL VM must have the right permissions on the VNFI. Refer to the SIMPL VM documentation for more information:

When starting an SSH session to the SIMPL VM, use a keepalive of 30 seconds. This prevents the session from timing out - SIMPL VM automatically closes idle connections after a few minutes.

When using OpenSSH (the SSH client on most Linux distributions), this can be controlled with the option ServerAliveInterval - for example, ssh -i <SSH private key file for SIMPL VM> -o ServerAliveInterval=30 admin@<SIMPL VM IP address>.

rvtconfig is a command-line tool for configuring and managing Mobile Control Point VMs. All MCP CSARs include this tool; once the CSAR is unpacked, you can find rvtconfig in the resources directory, for example:

$ cdcsars
$ cd mcp/<uplevel version>
$ cd resources
$ ls rvtconfig
rvtconfig

The rest of this page assumes that you are running rvtconfig from the directory in which it resides, so that it can be invoked as ./rvtconfig. It assumes you use the uplevel version of rvtconfig, unless instructed otherwise. If it is explicitly specified you must use the downlevel version, you can find it here:

$ cdcsars
$ cd mcp/<downlevel version>
$ cd resources
$ ls rvtconfig
rvtconfig

1. Preparation for upgrade procedure

These steps can be carried out in advance of the upgrade maintenance window. They should take less than 30 minutes to complete.

1.1 Ensure the SIMPL version is at least 6.15.3

Log into the SIMPL VM and run the command simpl-version. The SIMPL VM version is displayed at the top of the output:

SIMPL VM, version 6.15.3

Ensure this is at least 6.15.3. If not, contact your Customer Care Representative to organise upgrading the SIMPL VM before proceeding with the upgrade of the MCP VMs.

Output shown on this page is correct for version 6.15.3 of the SIMPL VM; it may differ slightly on later versions.

1.2 Upload and unpack uplevel CSAR

Your Customer Care Representative will have provided you with the uplevel MCP CSAR. Use scp to copy this to /csar-volume/csar/ on the SIMPL VM.

Once the copy is complete, run csar unpack /csar-volume/csar/<filename> on the SIMPL VM (replacing <filename> with the filename of the CSAR, which will end with .zip).

The csar unpack command may fail if there is insufficient disk space available. If this occurs, SIMPL VM will report this with instructions to remove some CSARs to free up disk space. You can list all unpacked CSARs with csar list and remove a CSAR with csar remove <node type>/<version>.

1.3 Verify the downlevel CSAR is present

On the SIMPL VM, run csar list.

Ensure that there is a MCP CSAR listed there with the current downlevel version.

1.4 Prepare downlevel config directory

If you keep the configuration hosted on the SIMPL VM, find it and rename it to /home/admin/current-config. Verify the contents by running ls /home/admin/current-config and checking that at least the SDF (sdf-rvt.yaml) is present there. If it isn’t, or you prefer to keep your configuration outside of the SIMPL VM, then create this directory on the SIMPL VM:

mkdir /home/admin/current-config

Use scp to upload the SDF (sdf-rvt.yaml) to this directory.

1.5 Prepare uplevel config directory including an SDF

On the SIMPL VM, run mkdir /home/admin/uplevel-config. This directory is for holding the uplevel configuration files.

Use scp (or cp if the files are already on the SIMPL VM, for example in /home/admin/current-config as detailed in the previous section) to copy the following files to this directory. Include configuration for the entire deployment, not just the MCP nodes.

The uplevel configuration files.
The current SDF for the deployment.

1.6 Update SDF

Open the /home/admin/uplevel-config/sdf-rvt.yaml file using vi. Find the vnfcs section, and within that the MCP VNFC. Within the VNFC, locate the version field and change its value to the uplevel version, for example 4.2-10-1.0.0. Save and close the file.

You can verify the change you made by using diff -u2 /home/admin/current-config/sdf-rvt.yaml /home/admin/uplevel-config/sdf-rvt.yaml. The diff should look like this (context lines and line numbers may vary), with only a change to the version for the relevant node type:

--- sdf-rvt.yaml        2022-10-31 14:14:49.282166672 +1300
+++ sdf-rvt.yaml        2022-11-04 13:58:42.054003577 +1300
@@ -211,5 +211,5 @@
           shcm-vnf: shcm
       type: mcp
-      version: {example-downlevel-version}
+      version: 4.2-10-1.0.0
       vim-configuration:
         vsphere:

1.7 Reserve maintenance period

The upgrade procedure requires a maintenance period. For upgrading nodes in a live network, implement measures to mitigate any unforeseen events.

Ensure you reserve enough time for the maintenance period, which must include the time for a potential rollback.

To calculate the time required for the actual upgrade or roll back of the VMs, run rvtconfig calculate-maintenance-window -i /home/admin/uplevel-config -t custom --site-id <site ID>. The output will be similar to the following, stating how long it will take to do an upgrade or rollback of the MCP VMs.

Nodes will be upgraded sequentially

-----

Estimated time for a full upgrade of 3 VMs: 24 minutes
Estimated time for a full rollback of 3 VMs: 24 minutes

-----

These numbers are a conservative best-effort estimate. Various factors, including IMS load levels, VNFI hardware configuration, VNFI load levels, and network congestion can all contribute to longer upgrade times.

These numbers only cover the time spent actually running the upgrade on SIMPL VM. You must add sufficient overhead for setting up the maintenance window, checking alarms, running validation tests, and so on.

The time required for an upgrade or rollback can also be manually calculated.

For node types that are upgraded sequentially, like this node type, calculate the upgrade time by using the number of nodes. The first node takes 999 minutes, while later nodes take 999 minutes each.

You must also reserve time for:

The SIMPL VM to upload the image to the VNFI. Allow 2 minutes, unless the connectivity between SIMPL and the VNFI is particularly slow.
Any validation testing needed to determine whether the upgrade succeeded.

1.8 Carry out dry run

The csar update dry run command carries out more extensive validation of the SDF and VM states than rvtconfig validate does.

Carrying out this step now, before the upgrade is due to take place, ensures problems with the SDF files are identified early and can be rectified beforehand.

The --dry-run operation will not make any changes to your VMs, it is safe to run at any time, although we always recommend running it during a maintenance window if possible.

Please run the following command to execute the dry run.

csar update --sdf /home/admin/uplevel-config/sdf-rvt.yaml --vnf mcp --sites <site name> --service-group <service_group> --skip force-in-series-update-with-l3-permission --dry-run

Confirm the output does not flag any problems or errors. The end of the command output should look similar to this.

You are about to update VMs as follows:

- VNF mcp:
    - For site <site name>:
      - update all VMs in VNFC service group <service_group>/4.2-8-1.0.0:
        - mcp-1 (index 0)
        - mcp-2 (index 1)
        - mcp-3 (index 2)

Please confirm the set of nodes you are upgrading looks correct, and that the software version against the service group correctly indicates the software version you are planning to upgrade to.

If you see any errors, please address them, then re-run the dry run command until it indicates success.

2. Upgrade procedure

2.1 Run basic validation tests on downlevel nodes

Before starting the upgrade procedure, run VNF validation tests from the SIMPL VM against the downlevel nodes: csar validate --vnf mcp --sdf /home/admin/current-config/sdf-rvt.yaml

This command performs various checks on the health of the VMs' networking and services:

================================
Running validation test scripts
================================
Running validation tests in CSAR 'mcp/{example-downlevel-version}'
Test running for: mydeployment-mcp-1
Running script: check_ping_management_ip…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-21-51.log

If all is well, then you should see the message All tests passed for CSAR 'mcp/{example-downlevel-version}'!.

If the VM validation fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log. The msg field under each ansible task explains why the script failed.

If there are failures, the upgrade cannot take place. Investigate them with the help of your Customer Care Representative and the Troubleshooting pages.

Once the VNF validation tests pass, you can proceed with the next step.

2.2 Disable scheduled tasks

Only perform this step if this is the first, or only, node type being upgraded.

Run ./rvtconfig enter-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID> --hours <MW duration in hours>. The output will look similar to:

Maintenance window is now active until 04 Nov 2022 21:38:06 NZDT.
Use the leave-maintenance-window command once maintenance is complete.

This will prevent scheduled tasks running on the VMs until the time given in the output.

If at any point in the upgrade process you wish to confirm the end time of the maintenance window, you can run ./rvtconfig maintenance-window-status -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>.

2.3 Verify uplevel config has no unexpected or prohibited changes

Run rm -rf /home/admin/config-output on the SIMPL VM to remove that directory if it already exists. Then use the command ./rvtconfig compare-config -c <CDS address> <CDS auth args> -d <deployment ID> --input /home/admin/uplevel-config --vm-version <downlevel version> --output-dir /home/admin/config-output -t custom to compare the live configuration to the configuration in the /home/admin/uplevel-config directory.

Example output is listed below:

Validating node type against the schema: custom
Redacting secrets…
Comparing live config for (version=4.2-8-1.0.0, deployment=mydeployment, group=RVT-custom.DC1) with local directory (version=4.2-10-1.0.0, deployment=mydeployment, group=RVT-custom.DC1)
Getting per-level configuration for version '4.2-8-1.0.0', deployment 'mydeployment', and group 'RVT-custom.DC1'
  - Found config with hash 7f6cc1f3df35b43d6286f19c252311e09216e6115f314d0cb9cc3f3a24814395

Wrote currently uploaded configuration to /tmp/tmprh2uavbh
Redacting secrets…
Found
  - 1 difference in file sdf-rvt.yaml

Differences have been written to /home/admin/config-output
Error: Line 110 exited with status 3

You can then view the differences using commands such as cat /home/admin/config-output/sdf-rvt.yaml.diff (there will be one .diff file for every file that has differences). Aside from the version parameter in the SDF, there should normally be no other changes. If there are other unexpected changes, pause the procedure here and correct the configuration by editing the files in /home/admin/uplevel-config.

When performing a rolling upgrade, some elements of the uplevel configuration must remain identical to those in the downlevel configuration. The affected elements of the MCP configuration are described in the following list:

The secrets-private-key-id in the SDF must not be altered.
The ordering of the VM instances in the SDF must not be altered.
The IP addresses and other networking information in the SDF must not be altered.

The rvtconfig compare-config command reports any unsupported changes as errors, and may also emit warnings about other changes. For example:

Found
  - 1 difference in file sdf-rvt.yaml

The configuration changes have the following ERRORS.
File sdf-rvt.yaml:
  - Changing the IP addresses, subnets or traffic type assignments of live VMs is not supported. Restore the networks section of the custom VNFC in the SDF to its original value before uploading configuration.

Ensure you address the reported errors, if any, before proceeding. rvtconfig will not upload a set of configuration files that contains unsupported changes.

2.4 Validate configuration

Run the command ./rvtconfig validate -t custom -i /home/admin/uplevel-config to check that the configuration files are correctly formatted, contain valid values, and are self-consistent. A successful validation with no errors or warnings produces the following output.

Validating node type against the schema: custom
YAML for node type(s) ['custom'] validates against the schema

If the output contains validation errors, fix the configuration in the /home/admin/uplevel-config directory

If the output contains validation warnings, consider whether you wish to address them before performing the upgrade. The VMs will accept configuration that has validation warnings, but certain functions may not work.

2.5 Upload configuration

Upload the configuration to CDS:

./rvtconfig upload-config -c <CDS address> <CDS auth args> -t custom -i /home/admin/uplevel-config --vm-version <uplevel version>

Check that the output confirms that configuration exists in CDS for both the current (downlevel) version and the uplevel version:

Validating node type against the schema: custom
Preparing configuration for node type custom…
Checking differences between uploaded configuration and provided files
Getting per-level configuration for version '4.2-10-1.0.0', deployment 'mydeployment-custom', and group 'RVT-custom.DC1'
  - No configuration found
No uploaded configuration was found: this appears to be a new install or upgrade
Encrypting secrets…
Wrote config for version '4.2-10-1.0.0', deployment ID 'mydeployment', and group ID 'RVT-custom.DC1'
Versions in group RVT-custom.DC1
=============================
  - Version: {example-downlevel-version}
    Config hash: 7f6cc1f3df35b43d6286f19c252311e09216e6115f314d0cb9cc3f3a24814395
    Active: mydeployment-custom-1, mydeployment-custom-2, mydeployment-custom-3
    Leader seed: {downlevel-leader-seed}

  - Version: 4.2-10-1.0.0
    Config hash: f790cc96688452fdf871d4f743b927ce8c30a70e3ccb9e63773fc05c97c1d6ea
    Active: None
    Leader seed:

2.6 Collect diagnostics

We recommend gathering diagnostic archives for all MCP VMs in the deployment.

On the SIMPL VM, run the command

If <diags-bundle> does not exist, the command will create the directory for you.

Each diagnostic archive can be up to 200 MB per VM. Ensure you have enough disk space on the SIMPL VM to collect all diagnostics. The command will be aborted if the SIMPL VM does not have enough disk space to collect all diagnostic archives from all the VMs in your deployment specified in the provided SDF.

2.7 Begin the upgrade

Carry out a csar import of the mcp VMs

Prepare for the upgrade by running the following command on the SIMPL VM csar import --vnf mcp --sdf /home/admin/uplevel-config/sdf-rvt.yaml to import terraform templates.

First, SIMPL VM connects to your VNFI to check the credentials specified in the SDF and QSG are correct. If this is successful, it displays the message All validation checks passed..

Type no. The csar import will be aborted.
Investigate why there are unexpected changes in the SDF.
Correct the SDF as necessary.
Retry this step.

Otherwise, accept the prompt by typing yes.

After you do this, SIMPL VM will import the terraform state. If successful, it outputs this message:

Done. Imported all VNFs.

If the output does not look like this, investigate and resolve the underlying cause, then re-run the import command again until it shows the expected output.

Begin the upgrade of the mcp VMs

First, SIMPL VM connects to your VNFI to check the credentials specified in the SDF and QSG are correct. If this is successful, it displays the message All validation checks passed..

Next, SIMPL VM compares the specified SDF with the SDF used for the csar import command above. Since the contents have not changed since you ran the csar import, the output should indicate that the SDF has not changed.

If there are differences in the SDF, a message similar to this will be output:

Comparing current SDF with previously used SDF.
site site1:
    mcp:
        mcp-1:
             networks:
             - ip-addresses:
                 ip:
            -    - 10.244.21.106
            +    - 10.244.21.196
                 - 10.244.21.107
               name: Management
               subnet: mgmt-subnet
Do you want to continue? [yes/no]: yes

If you see this, you must:

Type no. The upgrade will be aborted.
Go back to the start of the upgrade section and run through the csar import section again, until the SDF differences are resolved.
Retry this step.

Afterwards, the SIMPL VM displays the VMs that will be upgraded:

You are about to update VMs as follows:

- VNF mcp:
    - For site site1:
    - update all VMs in VNFC service group mydeployment-mcp/4.2-10-1.0.0:
        - mydeployment-mcp-1 (index 0)
        - mydeployment-mcp-2 (index 1)
        - mydeployment-mcp-3 (index 2)

Type 'yes' to continue, or run 'csar update --help' for more information.

Continue? [yes/no]:

Check this output displays the version you expect (the uplevel version) and exactly the set of VMs that you expect to be upgraded. If anything looks incorrect, type no to abort the upgrade process, and recheck the VMs listed and the version field in /home/admin/uplevel-config/sdf-rvt.yaml. Also check you are passing the correct SDF path and --vnf argument to the csar update command.

Otherwise, accept the prompt by typing yes.

Next, each VM in your cluster will perform health checks. If successful, the output will look similar to this.

Running ansible scripts in '/home/admin/.local/share/csar/mcp/4.1-1-1.0.0/update_healthcheck_scripts' for node 'mydeployment-custom-1'
Running script: check_config_uploaded…
Running script: check_ping_management_ip…
Running script: check_maintenance_window…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Running script: check_rhino_alarms…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-05-02-05-51.log
All ansible update healthchecks have passed successfully

If a script fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log.

Running ansible scripts in '/home/admin/.local/share/csar/mcp/4.1-1-1.0.0/update_healthcheck_scripts' for node 'mydeployment-custom-1'
Running script: check_config_uploaded...
Running script: check_ping_management_ip...
Running script: check_maintenance_window...
Running script: check_can_sudo...
Running script: check_converged...
Running script: check_liveness...
ERROR: Script failed. Specific error lines from the ansible output will be logged to screen. For more details see the ansible_output file (/var/log/csar/ansible_output-2023-01-05-21-02-17.log). This file has only ansible output, unlike the main command log file.

fatal: [mydeployment-mcp-1]: FAILED! => {"ansible_facts": {"liveness_report": {"cassandra": true, "cassandra_ramdisk": true, "cassandra_repair_timer": true, "cdsreport": true, "cleanup_sbbs_activities": false, "config_hash_report": true, "docker": true, "initconf": true, "linkerd": true, "mdm_state_and_status_ok": true, "mdmreport": true, "nginx": true, "no_ocss7_alarms": true, "ocss7": true, "postgres": true, "rem": true, "restart_rhino": true, "rhino": true}}, "attempts": 1, "changed": false, "msg": "The following liveness checks failed: ['cleanup_sbbs_activities']", "supports_liveness_checks": true}
Running script: check_rhino_alarms...
Detailed output can be found in /var/log/csar/ansible_output-2023-01-05-21-02-17.log
***Some tests failed for CSAR 'mcp/4.1-1-1.0.0' - see output above***

The msg field under each ansible task explains why the script failed.

If there are failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.

Retry this step once all failures have been corrected by running the command csar update … as described at the begining of this section.

Once the pre-upgrade health checks have been verified, SIMPL VM now proceeds to upgrade each of the VMs. Monitor the further output of csar update as the upgrade progresses, as described in the next step.

2.8 Monitor `csar update` output

For each VM:

The VM will be quiesced and destroyed.
SIMPL VM will create a replacement VM using the uplevel version.
The VM will automatically start applying configuration from the files you uploaded to CDS in the above steps.
Once configuration is complete, the VM will be ready for service. At this point, the csar update command will move on to the next MCP VM.

The output of the csar update command will look something like the following, repeated for each VM.

Decommissioning 'dc1-mydeployment-mcp-1' in MDM, passing desired version 'vm.version=4.2-10-1.0.0', with a 900 second timeout
dc1-mydeployment-mcp-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'decommissioned'
dc1-mydeployment-mcp-1: Current status 'in_progress'- desired status 'complete'
…
dc1-mydeployment-mcp-1: Current status 'complete', current state 'decommissioned' - desired status 'complete', desired state 'decommissioned'
Running update for VM group [0]
Performing health checks for service group mydeployment-mcp with a 1200 second timeout
Running MDM status health-check for dc1-mydeployment-mcp-1
dc1-mydeployment-mcp-1: Current status 'in_progress'- desired status 'complete'
…
dc1-mydeployment-mcp-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'

If you see this error:

Failed to retrieve instance summary for 'dc1-<VM hostname>' from MDM
(404)
Reason: Not Found

it can be safely ignored, provided that you do eventually see a Current status 'in_progress'… line. This error is caused by the newly-created VM taking a few seconds to register itself with MDM when it boots up.

Once all VMs have been upgraded, you should see this success message, detailing all the VMs that were upgraded and the version they are now running, which should be the uplevel version.

Successful VNF with full per-VNFC upgrade state:

VNF: mcp
VNFC: mydeployment-mcp
    - Node name: mydeployment-mcp-1
      - Version: 4.2-10-1.0.0
      - Build Date: 2022-11-21T22:58:24+00:00
    - Node name: mydeployment-mcp-2
      - Version: 4.2-10-1.0.0
      - Build Date: 2022-11-21T22:58:24+00:00
    - Node name: mydeployment-mcp-3
     - Version: 4.2-10-1.0.0
      - Build Date: 2022-11-21T22:58:24+00:00

If the upgrade fails, you will see Failed VNF instead of Successful VNF in the above output. There will also be more details of what went wrong printed before that. Refer to the Backout procedure below.

2.9 Run basic validation tests

Run csar validate --vnf mcp --sdf /home/admin/uplevel-config/sdf-rvt.yaml to perform some basic validation tests against the uplevel nodes.

This command first performs a check that the nodes are connected to MDM and reporting that they have successfully applied the uplevel configuration:

========================
Performing healthchecks
========================
Commencing healthcheck of VNF 'mcp'
Performing health checks for service group mydeployment-mcp with a 0 second timeout
Running MDM status health-check for dc1-mydeployment-mcp-1
dc1-mydeployment-mcp-1: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-mcp-2
dc1-mydeployment-mcp-2: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'
Running MDM status health-check for dc1-mydeployment-mcp-3
dc1-mydeployment-mcp-3: Current status 'complete', current state 'commissioned' - desired status 'complete', desired state 'commissioned'

After that, it performs various checks on the health of the VMs' networking and services:

================================
Running validation test scripts
================================
Running validation tests in CSAR 'mcp/4.2-10-1.0.0'
Test running for: mydeployment-mcp-1
Running script: check_ping_management_ip…
Running script: check_can_sudo…
Running script: check_converged…
Running script: check_liveness…
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-21-51.log

If all is well, then you should see the message All tests passed for CSAR 'mcp/<uplevel version>'!.

If the VM validation fails, you can find details in the log file. The log file can be found in /var/log/csar/ansible_output-<timestamp>.log.

Running validation test scripts
================================
Running validation tests in CSAR 'mcp/4.2-10-1.0.0'
Test running for: mydeployment-custom-1
Running script: check_ping_management_ip...
Running script: check_can_sudo...
Running script: check_converged...
Running script: check_liveness...
ERROR: Script failed. Specific error lines from the ansible output will be logged to screen. For more details see the ansible_output file (/var/log/csar/ansible_output-2023-01-06-03-40-37.log). This file has only ansible output, unlike the main command log file.

fatal: [mydeployment-custom-1]: FAILED! => {"ansible_facts": {"liveness_report": {"cassandra": true, "cassandra_ramdisk": true, "cassandra_repair_timer": true, "cdsreport": true, "cleanup_sbbs_activities": false, "config_hash_report": true, "docker": true, "initconf": true, "linkerd": true, "mdm_state_and_status_ok": true, "mdmreport": true, "nginx": true, "no_ocss7_alarms": true, "ocss7": true, "postgres": true, "rem": true, "restart_rhino": true, "rhino": true}}, "attempts": 1, "changed": false, "msg": "The following liveness checks failed: ['cleanup_sbbs_activities']", "supports_liveness_checks": true}
Running script: check_rhino_alarms...
Detailed output can be found in /var/log/csar/ansible_output-2023-01-06-03-40-37.log
***Some tests failed for CSAR 'mcp/4.2-10-1.0.0' - see output above***

----------------------------------------------------------


WARNING: Validation script tests failed for the following CSARs:
  - 'mcp/4.2-10-1.0.0'
See output above for full details

The msg field under each ansible task explains why the script failed.

If there are failures, investigate them with the help of your Customer Care Representative and the Troubleshooting pages.

3. Post-upgrade procedure

3.1 Enable scheduled tasks

Run ./rvtconfig leave-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>. This will allow scheduled tasks to run on the VMs again. The output should look like this:

Maintenance window has been terminated.
The VMs will resume running scheduled tasks as per their configured schedules.

3.2 Run verification tests

If you have prepared verification tests for the deployment, run these now.

4. Post-acceptance

The upgrade of the MCP nodes is now complete.

5. Backout Method of Procedure

First, gather the log history of the downlevel VMs. Run mkdir -p /home/admin/rvt-log-history and ./rvtconfig export-log-history -c <CDS address> <CDS auth args> -d <deployment ID> --zip-destination-dir /home/admin/rvt-log-history --secrets-private-key-id <secret ID>. The secret ID you specify for --secrets-private-key-id should be the secret ID for the secrets private key (the one used to encrypt sensitive fields in CDS). You can find this in the product-options section of each VNFC in the SDF.

Make sure the <CDS address> used is one of the remaining available TSN nodes.

Next, how much of the backout procedure to run depends on how much progress was made with the upgrade. If you did not get to the point of running csar update, start from the Cleanup after backout section below.

If you encounter further failures during recovery or rollback, contact your Customer Care Representative to investigate and recover the deployment.

5.1 Collect diagnostics

We recommend gathering diagnostic archives for all MCP VMs in the deployment.

On the SIMPL VM, run the command

If <diags-bundle> does not exist, the command will create the directory for you.

Each diagnostic archive can be up to 200 MB per VM. Ensure you have enough disk space on the SIMPL VM to collect all diagnostics. The command will be aborted if the SIMPL VM does not have enough disk space to collect all diagnostic archives from all the VMs in your deployment specified in the provided SDF.

5.2 Disable scheduled tasks

Only perform this step if this is the first, or only, node type being rolled back. You can also skip this step if the rollback is occurring immediately after a failed upgrade, such that the existing maintenance window is sufficient. You can check the remaining maintenance window time with ./rvtconfig maintenance-window-status -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID>.

To start a new maintenance window (or extend an existing one), run ./rvtconfig enter-maintenance-window -c <CDS address> <CDS auth args> -d <deployment ID> --site-id <site ID> --hours <MW duration in hours>. The output will look similar to:

Maintenance window is now active until 04 Nov 2022 21:38:06 NZDT.
Use the leave-maintenance-window command once maintenance is complete.

This will prevent scheduled tasks running on the VMs until the time given in the output.

If at any point in the rollback process you wish to confirm the end time of the maintenance window, you can run the above rvtconfig maintenance-window-status command.

5.3 Roll back VMs

To roll back the VMs, the procedure is essentially to perform an "upgrade" back to the downlevel version, that is, with <downlevel version> and <uplevel version> swapped. You can refer to the Begin the upgrade section above for details on the prompts and output of csar update.

Once the csar update command completes successfully, proceed with the next steps below.

The <index range> argument is a comma-separated list of VM indices, where the first VM has index 0. Only include the VMs you want to roll back. For example, suppose there are three MCP VMs named custom-1, custom-2 and custom-3. If VMs custom-1 and custom-3 need to be rolled back, the index range is 0,2. Do not include any spaces in the index range.