This guide explains how to upgrade Sentinel IP-SM-GW to the current version.

Operational Tools Architecture

This upgrade manual makes references to elements of the Operational Tools Architecture, which explains at a high level the design of the upgrade process as well as low-level information on use of the upgrade bundles and tools. Make an offline copy of this book to consult while executing upgrade on customer sites.

It is recommended that the user familiarize themselves with the Cluster Migration Workflow before planning an upgrade.

Overview and Terminology

Terminology

  • The downlevel (software) version is the version being upgraded from.

  • The uplevel (software) version is the version being upgraded to.

  • The installed (software) version refers to the software currently running on the Rhino cluster, including any customizations, additions and patches that may have been made to deployable units, SLEE components and configuration.

  • orca is the tool, delivered within the upgrade package, which drives the whole upgrade process, connecting to each of the hosts and running commands on them remotely.

  • A Deployable Unit (DU) is a packaged component that provides a particular feature of the software.

  • A (Rhino) cluster is a group of nodes with the same cluster ID. The cluster ID is an integer between 1 and 255 used to identify nodes that are linked together and share traffic. The cluster is backed by a PostgreSQL database containing the DUs and configuration. This architecture has the following implications, which are the fundamentals of the Cluster Migration Workflow design for upgrades:

    • Changing the cluster ID of a node will remove a node from the cluster. This can be done without reconfiguring the other nodes in the cluster.

    • A new node can be added to a cluster simply by installing Rhino on it and ensuring the cluster ID of the new node is the same as the existing nodes. When Rhino is started, it will join the cluster and obtain the product DUs and configuration automatically from the shared PostgreSQL, then deploy them, thus becoming a fully-functional node that can handle traffic.

An upgrade can be one of two types: minor or major.

  • A minor upgrade is one where only the final component of the software version number changes, e.g. 2.7.0.2 to 2.7.0.3. It can include bugfixes, new minor functionality and new configuration, but the configuration schemas are always compatible with the downlevel software.

  • A major upgrade is one where any of the first three components of the version number changes, e.g. 2.7.0.12 to 2.8.0.0. It can include bugfixes, new features and new configuration. The new configuration may use a different schema, for example some profile spec IDs and profile table schemas may change. As such the configuration can not simply be imported from the downlevel cluster and needs to be transformed by orca so it is suitable for the uplevel cluster’s schema.

Both minor and major upgrades can include a new Rhino version. Major upgrades may also include a new version of Java. Where present, these are deployed automatically by orca at the appropriate point in the upgrade process.

Important
Limitation of one major version per major upgrade

orca only supports upgrading across a single major version (e.g. 2.6.0 to 2.7.0, 2.7.0 to 2.7.1, etc). It cannot perform an upgrade that bridges two or more major versions. To do this, you will need to perform several separate upgrades.

  • orca is a command-line tool which can perform a variety of maintenance operations on Rhino clusters. It implements the Cluster Migration Workflow.

  • An upgrade bundle is a zip file containing the uplevel software, orca, any new Rhino and/or Java version, plus ancillary resources (such as feature script configuration) required during the upgrade process. Upgrade bundles are provided to customers by Metaswitch Customer Care.

  • A customization package is a package included in an upgrade bundle which applies customer-specific modifications to the product. They come in two types: post-install (applied after product installation) or post-configure (applied after the configuration has been transformed and imported). Your Professional Services representative can assist in creation of customization packages and their inclusion in upgrade bundles.

Upgrade process overview

Upgrading a Sentinel product involves the following steps:

  • Read the product changelog to understand the changes introduced in the uplevel software, any new configuration required, and any workarounds/limitations you may encounter

  • Obtain the upgrade package from your Metaswitch Customer Care Representative, including any customization packages and licenses if required

  • Prepare for the upgrade

    • plan a maintenance window for when you expect traffic through the system to be low (Although the rolling nature of the upgrade means that the system is always able to accept new traffic throughout the upgrade, traffic that is still present on a specific node at the end of its draining timeout will be curtailed, leading for example to terminated phone calls).

    • identify the first node to be upgraded

  • Upgrade the first node

  • Perform validation tests, if required

  • Upgrade the remainder of the nodes

  • Perform validation tests, if required

  • Post-upgrade steps including restoring other network element configuration

The uplevel software is provided in the form of a bundle (zip file). The procedure of upgrading the software is performed using the orca tool, which is included in the bundle. Run orca from a machine (either an operator’s PC or a node in the deployment to be upgraded) that meets the following requirements:

  • Linux OS, with Python 2.7 installed and accessible in the PATH as python2 (you can check by running python2 --version)

  • At least 3GB of free hard disk space (you can check with df -h)

  • Passwordless SSH connectivity (via SSH keys) to all Rhino nodes in the cluster to be upgraded

    • Configure the SSH agent to log in as the user under which Rhino is installed on the nodes, normally sentinel. This can be configured in ~/.ssh/config in the home directory of the user that will be running orca, for example:

Host host1
  User sentinel
  IdentityFile /path/to/ssh_private_key
  ...

Upgrading will take 30 minutes for the first node and 10 minutes for each subsequent node, not including validation tests or manual reconfiguration steps. It is a current limitation that the upgrade proceeds sequentially across all cluster nodes.

Next steps

To plan your upgrade, see Preparing for a Sentinel IP-SM-GW Upgrade.

Preparing for a Sentinel IP-SM-GW upgrade

This section describes how to prepare for a Sentinel IP-SM-GW upgrade. Be sure you are familiar with the upgrade overview and terminology.

  • Information and Files Required gives useful information for planning the upgrade and describes how to validate the upgrade bundle.

  • Limitations describes limitations, known issues and other caveats to be aware of while performing the upgrade process.

After familiarising yourself with the above information, refer to the Executing a Sentinel IP-SM-GW Upgrade section to begin the upgrade process.

Information and Files Required

This section documents information concerning the upgrade. Be sure you are also familiar with the new features and new configuration introduced in the uplevel Sentinel IP-SM-GW software.

Information required

Maintenance window

In general an upgrade of a Rhino cluster running Sentinel IP-SM-GW can be completed in a single maintenance window, though for operational reasons you may wish to use more than one. The maintenance window should be of adequate length (at least 30 minutes for the first node, 10 minutes for subsequent nodes, plus additional time for traffic draining, and validation tests according to your test plan).

Important
Manual reconfiguration steps

For deployments utilizing any or all of:

  • extensive customizations to the standard Sentinel IP-SM-GW software

  • feature script modifications

  • a cluster with 3 or mode nodes

  • multiple clusters running the same product

it is strongly recommended that the upgrade is trialled in a lab deployment first. This will allow you to identify any breaking issues with the upgrade, and to note any customizations to feature scripts and other configuration that needs to be applied after the upgrade.

Traffic impact

Traffic capacity will be reduced by a fraction of 1/N where N is the number of nodes in the cluster to be upgraded. That is to say, at any given point during the upgrade one node will be unavailable but the rest will still be able to handle traffic. As such the maintenance window should be planned for a period where call traffic volumes are quieter.

Product upgrade order

A Sentinel deployment may consist of multiple products: REM, GAA, VoLTE, and IPSMGW. The clusters must be upgraded in the following order: REM (with plugins), then GAA, then VoLTE, then IPSMGW. (For clarity, all GAA nodes must be upgraded before any VoLTE nodes, and all VoLTE before any IPSMGW.)

For major upgrades, you will need to upgrade all products in close succession, since having upgraded REM to a new version first, you need to upgrade all the other products soon after to ensure that the REM plugins for the products retain maximum compatibility, and are able to provide the best management interface.

Node order

The first node in each Rhino cluster is handled separately to the rest, since the software upgrade only needs to actually be performed on one node and the rest can pick up the upgraded software by joining the new cluster (see Cluster Migration Workflow).

Always specify the hosts in reverse order of their node IDs, i.e. from highest to lowest. This is necessary since certain Rhino features use the node ID as a tie-breaker when they can otherwise not determine which node has preference, so we want to make sure the highest number node is processed first to avoid problems.

orca takes a list of hosts. The hosts are upgraded in the order specified, and only the nodes specified are upgraded. As such, orca supports splitting the upgrade across multiple maintenance windows by specifying only a subset of hosts on each window.

Important
Limitation of one cluster

orca only supports working with a list of hosts where all hosts are part of the same cluster. If your deployment has multiple clusters to be upgraded, then these upgrades need to be done separately. This does mean that (if such an approach is suitable for your overall deployment architecture) you can use multiple machines to run orca, each upgrading a single cluster, to parallelize the upgrade; however, this will increase correspondingly the service impact..

Files required

Upgrade bundle

Your Metaswitch Customer Care Representative should have provided you with an upgrade bundle, normally named ipsmgw-<uplevel-version>-upgrade-bundle.zip or similar. unzip the bundle to a directory of your choice and cd to it.

Important
orca working directory

orca must always be run from the directory where it resides. It will fail to operate correctly if run from any other location.

Verify the orca script is present and executable by running ./orca --help. You should see usage information.

Verify that orca can contact the hosts you wish to upgrade by running ./orca --hosts <host1,host2,…​> status. You should see status information, such as the current live cluster and lists of services, about all the hosts specified.

Note
SSH access to hosts

orca requires passwordless SSH access to hosts, i.e. all hosts in the cluster must be accessible from the machine running orca using SSH keys. If this is not the case, orca will throw an error saying it is unable to contact one or more of the hosts. Ensure SSH key-based access is set up to all such hosts and retry the status command until it works.

install.properties

Upgrading requires an install.properties file. Such a file was generated by the installer when installing the product manually (the "non-interactive mode"). When doing an upgrade, copy the 'install.properties` into the same directory as orca in the unzipped bundle, so as to associate the install.properties file with this upgrade.

If you have run a previous upgrade for this product, then you can reuse the version of the file from that previous upgrade, and if that upgrade followed the recommendations in this document, then the file will be in the /home/sentinel/install directory on whichever is regarded as the main one of the remote Rhino hosts. However, if this is an installation that has never been upgraded, then you may be able to find the file within the hidden /home/sentinel/.install directory on the actual host that the install was done on (where if it exists it will be prefixed with the product name, thus ipsmgw.install.properties). Note that in both cases this remote Rhino host is probably not the machine that you are trying to run orca from. You will need to copy the file from the remote machine, to the one you have orca on, renaming it if necessary in the process from ipsmgw.install.properties to install.properties.

If you do not have an install.properties file from a previous installation or upgrade, you can regenerate it by running the installer on a scratch VM with Rhino installed - follow the instructions in the Administration Guide. When the installer asks "Install now?", you can select no and the installation procedure will terminate early, but still write the install.properties file. As above, copy the file from the scratch VM to the machine where you have unzipped the upgrade bundle to run orca from.

Limitations

This page describes limitations, known issues and workarounds to be aware of that relate to the Sentinel IP-SM-GW upgrade process.

General limitations

  • Each machine in the Rhino cluster must contain only one Rhino node and the Rhino installation must conform to orca's standardized path structure.

  • orca cannot auto-detect the cluster membership, so to upgrade the whole cluster it must always be given the full list of hosts in the cluster.

  • All products monitored by a single Rhino Element Manager (REM) instance must be upgraded together, and in a fixed order. See here for more information.

  • While orca is waiting to shut down a node, the node will continue to accept new traffic that is directed at it. As such, or if pre-existing traffic continues, if the node never becomes completely free of traffic whilst draining before the given shutdown-timeout is reached, then the node will be killed resulting in any active calls being terminated. Thus the upgrade should be done during a maintenance window, when overall traffic is low, and fewest users will be inconvenienced by such terminated calls.

  • Tracer levels (and for Rhino 2.6.0 or later, notification sources and loggers) are not maintained across an upgrade. If you previously set some custom levels which you wish to retain, then you will need to manually reconfigured them after the upgrade is complete. This can be done either using rhino-console commands, or via the Monitoring menu in REM.

  • After upgrade, SNMP OIDs for some objects and their alarms/statistics may be different. You will need to reconfigure monitoring systems with the new OIDs. (This is a temporary limitation, that will be addressed in a forthcoming release).

  • Symmetric activation state mode must be enabled prior to an upgrade. If your cluster normally operates with symmetric activation mode disabled, you will need to enable it before the upgrade, and after the upgrade disable it again and possibly adjust service and RA entity activation states to the desired settings. Refer to the Rhino documentation here for more information.

  • orca only supports ASCII characters in pathnames (notably the Rhino installation path on each node).

Limitations when upgrade includes an installation of a new Rhino version

  • Rhino configuration in node-XXX/config/pernode-mlet.conf and node-XXX/config/permachine-mlet.conf is not preserved. Refer to the Rhino Administration guide for information on how to reconfigure this if required.

Limitations when upgrade includes an installation of a new Java version

  • The new Java installation will only be used for the Rhino cluster being upgraded. If there are other applications on the machine that also use Java, these will need to be manually reconfigured. See here for more information.

  • When upgrading from Java 7 to Java 8, deprecated perm-gen settings will not be removed from Rhino configuration. This may result in benign warnings appearing. You can manually remove the deprecated configuration at a later stage.

Executing a Sentinel IP-SM-GW Upgrade

This section describes the steps to perform the Sentinel IP-SM-GW upgrade. Be sure you are familiar with the upgrade overview and terminology.

Pre-Upgrade Checklist

This page describes steps to take before starting a Sentinel IP-SM-GW upgrade.

Note that there is no specific need to take an export prior to starting the upgrade. Taking an export is the first step that orca will perform when issued the upgrade command, and the export will be available at the end of the upgrade process.

Create validation test plans

Devise a validation test plan to ensure that, after the upgrade:

  • traffic is routed as expected and calls work (including audio and, where appropriate, video)

  • any features in the downlevel version that you rely on still work as expected

  • any new features you are expecting to use in the uplevel version work as expected.

Your network configuration may support setting up test numbers where calls to these numbers are routed to specific nodes. This is a particularly useful tool for verifying the upgrade early, as after upgrading the first node you can make calls using test numbers configured to route specifically to the first node.

During or after the upgrade, if you find the validation tests are failing, contact your Metaswitch Customer Care Representative to discuss next steps.

Ensure the nodes are configured with the standardized path structure

orca requires the nodes to have a standardized path structure. Within the HOME directory (normally /home/sentinel) there must be:

  • one or more Rhino installation directories with a name of the following format: ipsmgw-<version>-cluster-<id>, e.g. ipsmgw-2.7.1.2-cluster-41

  • a symbolic link named rhino pointing at the currently live Rhino installation

  • a directory named export

  • a directory named install.

See Standardized Path Structure for more information.

If your installation does not conform to this setup, consult your Metaswitch Customer Care Representative who can advise on how to convert it.

Record the old cluster ID

In case you need to rollback the upgrade, you will need the current cluster ID.

Execute the status command on all the hosts:

orca --hosts <host1,host2,…​> status

The first part of the output for each host will include the cluster status. One of the clusters should be marked as "LIVE".

Status of host host1

Clusters:
 - ipsmgw-2.7.1.2-cluster-41
 - ipsmgw-2.7.1.4-cluster-42 - LIVE

On the "LIVE" cluster, the last part of the name given is the cluster ID. For example, in the above output the cluster ID is 42.

Ensure all clusters have the same live cluster ID. Make a note of this value.

Verify license validity

In the status output from the previous section, check the License information section for each host to ensure the license is valid and will remain valid until after the planned finish time of the upgrade.

If your upgrade includes a new major Rhino version, e.g 2.6.0 → 2.7.0 then you may require a new license, as all products deployed on Rhino are licensed against a Rhino version. Your Metaswitch Customer Care Representative should be able to provide you with one, in which case you should copy it to the directory where you extracted the upgrade bundle, and pass it to orca by appending the parameter --license <license file> when running the first upgrade command. Alternatively, they may advise that the license is already present in the upgrade bundle, or that no new license is needed (in which case orca will automatically transfer the existing license to the uplevel cluster).

Verify disk space

Log into each node in turn and verify the disk space using df -h. Ensure there is at least 3GB of free disk space. Refer to this section if you need to clean up old clusters and/or exports in order to free disk space.

Check for unexpected alarms

Log into REM and access the connection to the Rhino cluster. Check there are no unexpected alarms. Fix any issues you find. If required, make a note of expected active alarms for correlation with the post-upgrade list of alarms.

Upgrade Rhino Element Manager (REM) nodes

With REM upgraded, you will be able to monitor the nodes as they join the new cluster during the upgrade process.

Check that other network elements are configured to redirect traffic on failure

During the upgrade, at any given point one node of the cluster will be unavailable to handle traffic. As such, calls directed to this node will fail. Network elements such as loadbalancers and S-CSCFs generally utilize a blacklist mechanism, so that INVITEs that get a timeout failure cause both a retry for the call on a different node, and also block that node from being tried for further calls for a while afterwards.

Enable symmetric activation state mode if required

orca only supports upgrading a cluster with symmetric activation state mode enabled. If your cluster normally operates with it disabled, you will need to enable it here and restore the state after the upgrade.

To check if the cluster has symmetric activation state mode enabled, examine the orca status output. At the top under "Global info" it will say Symmetric activation state mode is currently enabled or Nodes with per-node activation state: <node list>. If you see the former output then symmetric activation state mode is enabled, while the latter means it is disabled.

To enable symmetric activation state mode, follow the instructions in the Rhino documentation here.

Rerun the orca status command orca --hosts <host1,host2,…​> status and verify that the status reports Symmetric activation state mode is currently enabled.

Transformation Rules

This page describes the migration performed via the transformation rules during a major upgrade of Sentinel IP-SM-GW.

Transformation rules are very version specific, so make sure that you are reading the docs for the specific version you are ungrading to.

The rules performed the following:

Transformation rules for upgrading IPSMGW 2.7.1 to 2.8.0

Detailed descriptions are yet to be written.

Post-Upgrade Checklist

This page describes the actions to take after the upgrade is completed and all relevant validation tests have passed.

Clean up REM connections

You should be able to access the entire Rhino cluster using the original connection configured in REM. If you created an additional connection to monitor the new cluster during the upgrade, delete it now.

Note that when refreshing the REM web application, be sure to do a "hard refresh" (Ctrl+F5 in most browsers) so that the browser retrieves up-to-date information from the REM server rather than reloading from its cache.

Check service state and alarms

Note
Symmetric activation state mode

Symmetric activation state mode must be enabled prior to the upgrade. If your deployment normally operates with it disabled, you will need to manually disable it at this point. See Symmetric Activation State in the Rhino documentation. You may also need to update RA entity activation states.

Log into REM and access the connection to the Rhino cluster. Check that all services are active on all nodes. Also check that the RA entity activation state is as expected on all nodes. Check for unexpected alarms.

Verify SAS configuration

If your deployment is now running Rhino 2.6.0 or later and includes a MetaView Service Assurance Server (SAS), verify the SAS configuration in Rhino is as expected. See SAS Tracing in the Rhino documentation.

Check that calls made in your validation tests appear in SAS.

Restore external network element configuration

If you made any changes to external network elements (for example to timeout INVITEs more quickly or to redirect traffic), then restore the original configuration on those elements.

Archive the downlevel export generated during the upgrade, and generate a new export

On the first node, orca will have generated an export of the Rhino installation at the downlevel version, prior to any migration or upgrade steps. This can be found in the ~/export directory, labelled with the version and cluster ID, and is a useful "restore point" in case problems with the upgrade are encountered later that cannot be simply undone with the rollback command. Copy (for example using rsync) the downlevel export directory with all its contents to your backup storage, if you have one.

Follow the Rhino documentation to generate a post-upgrade export and archive this too.

Note
Uplevel exports generated during major upgrade

Note that during major upgrade orca will automatically generate an export of the uplevel installation. However, this export is intended for orca's use only and is not suitable to restore from - for example, it will not include a large amount of Rhino configuration nor any feature script changes made during the merge feature scripts step. There will also be an export labelled with transformed-for-<uplevel version>, to which the same caveat applies.

These exports can be ignored, and deleted at a later date.

Archive the upgrade logs

In the directory from which orca was run, there will be a directory logs containing many subdirectories with log files. Copy (for example using rsync) this logs directory with all its contents to your backup storage, if you have one. These logs can be useful for Metaswitch Customer Care in the event of a problem with the upgrade.

Save the install.properties file

It is a good idea to save the install.properties file for use in a future upgrade.

Remember that the file is currently on the machine that you chose to run orca from, and that may not be the same machine chosen by someone doing a future upgrade.

A good location to put the file is in a /home/sentinel/install directory on whatever you regard as the main Rhino host. This may be the one you always provide first in the list of hosts for example, or the one originally used to install Rhino in the first place.

Configure new features

If the uplevel software introduced new features that you plan to use in your deployment, you may wish to configure and verify them here. Refer to the Sentinel IP-SM-GW manuals and changelogs for more information.

If required, clean up downlevel clusters and unneeded exports

Once the upgrade is confirmed to be working, you may wish to clean up old downlevel cluster(s) to save disk space.

Run the status command to view existing clusters and exports

Run the status command and observe the clusters and exports sections of the output.

./orca --hosts <host1,host2,host3,…​> status

Tip

For upgrades, exports are only generated on the first node.

Identify any clusters or exports you no longer wish to keep. Note their cluster IDs, which is the last part of the name. For example, given this output:

Status of host host1

Clusters:
 - ipsmgw-2.7.1.2-cluster-41
 - ipsmgw-2.7.1.4-cluster-42
 - ipsmgw-2.8.0.2-cluster-43 - LIVE

[...]

Exports:
 - ipsmgw-2.7.1.2-cluster-41
 - ipsmgw-2.7.1.4-cluster-42
 - ipsmgw-2.7.1.4-cluster-42-transformed-for-2.8.0.2
 - ipsmgw-2.8.0.2-cluster-43

[...]

Status of host host2

Clusters:
 - ipsmgw-2.7.1.2-cluster-41
 - ipsmgw-2.7.1.4-cluster-42
 - ipsmgw-2.8.0.2-cluster-43 - LIVE

[...]

you may decide to delete cluster 41 and exports 41 and 42.

Tip
Retain one old cluster

You are advised to always leave the most recent downlevel cluster in place as a fallback.

Be sure you have an external backup of any export directories you plan to delete, unless you are absolutely sure that you will not need them in the future.

Run the cleanup command to delete clusters and exports

Run the cleanup command, specifying the clusters and exports to delete as comma-separated lists of IDs (without whitespace). For example:

./orca --hosts <host1,host2,host3,...> cleanup --clusters 41
./orca --hosts <host1> cleanup --exports 41,42

Update Java version of other applications running on the hosts

If performing a major upgrade with a new version of Java, orca will install the new Java but it will only be applied to the new Rhino installation. Global environment variables such as JAVA_HOME, or other applications that use Java, will not be updated.

The new Java installation can be found at ~/java/<version> where <version> is the JDK version that orca installed, e.g. 8u162. If you want other applications running on the node to use this new Java installation, update the appropriate environment variables and/or configuration files to reference this directory.

Upgrade Process

Before you begin the upgrade, ensure you have completed the pre-upgrade checklist.

Upgrade process

Important
Specify all hosts

Unless otherwise specified, in all orca commands, you should specify all the hosts in the cluster. orca will automatically handle the split between the first node and the others.

Hosts are specified as a comma-separated list (without whitespace), e.g. --hosts host1,host2,host3. They can be specified as IP addresses or hostnames. Specify the nodes in descending order of node ID.

Note that the following process is for a major upgrade. Minor and major upgrades are very similar; the differences for minor upgrade are listed inline.

Upgrade the first node

Start the upgrade using the following command. For minor upgrade, replace major-upgrade with minor-upgrade.

./orca --hosts <host1,host2,…​> major-upgrade --stop-timeout <timeout> --pause {extra-parameter-major-upgrade} packages <install.properties>

where

  • <timeout> is the maximum amount of time you wish to allow for all calls to stop on each node

  • <install.properties> is the path to the install.properties file

  • packages is a literal name, and should not be changed.

If your upgrade includes a new Rhino version and you have been given a separate license file, specify this here by appending the parameter --license <path to license file>.

This will take approximately 30 minutes. At the end of the upgrade process, you will be prompted to continue the operation when ready:

Major upgrade has been paused after applying it to just host1.
You should now test that the major upgrade has worked.
Once this is verified, use the following command to complete the major upgrade:
  ./orca --hosts host1,host2,host3 major-upgrade --continue packages install.properties

If the upgrade failed, refer to Troubleshooting to resolve the problem. You will normally need to rollback to the original cluster in order to try again.

Verify the new node is visible in REM

Log into the REM web application. Create a new connection to the first host and connect to it. You should be able to see information about the first node. Ensure there are no unexpected alarms.

Edit the connection to include the other hosts, by updating the address field from a single host address to a list of all the required hosts.

Merge and import feature scripts

If the output from orca prompted you to merge feature scripts, follow the instructions in Feature Scripts conflicts and resolution to merge and import the feature scripts into the uplevel installation. Note that when running orca’s `import-feature-scripts command, you need only specify the first host.

Perform first node validation tests

If your test plan includes validating the first node after upgrade, run these tests now.

Upgrade the rest of the nodes

Run the same command as in the "Upgrade the first node" section, but replace the --pause with --continue. For minor upgrade, replace major-upgrade with minor-upgrade.

./orca --hosts <host1,host2,…​> major-upgrade --stop-timeout <timeout> --continue packages <install.properties>

If the output from orca reports Given hosts are not in the correct state to continue, then this probably indicates that you have not issued the correct command to continue - in particular you must ensure that the first host listed in the --hosts list is the one representing the first node that was upgraded, and that all the other hosts were also present in the original list, since those are the only ones that have been correctly prepared for update. There are 2 simple ways to get the correct command

  • either copy the command from the orca output, which tells you the exact command to use

  • or find the original command in your terminal command history, and simply replace the --pause with --continue.

During this stage, each remaining host will be migrated to the new cluster one-by-one. If you remain logged into REM and viewing the connection you created earlier, as each node is migrated into the new cluster they will become visible in REM. Note that some unexpected alarms may temporarily appear on these new nodes until they have fully deployed the new software and configuration, due to having a mixture of old and new in place. Once the new has fully deployed, alarms caused in this way will disappear, and their temporary appearance is nothing to worry about.

Tip
Migrating in two or more maintenance windows

For the purposes of example, suppose there are five hosts named node41 through node45, and that each host has a node on it whose ID follows the same order as the hosts - that is host node41 has node 41, host node42 has node 42 etc. through to host node45 with node 45 on it. This means that the node ID order is the same as the host name order. The nodes must be specified in reverse node ID order, so we use node45 down to node41. You wish to upgrade three hosts in the first maintenance window and two in the second.

In the first maintenance window, you prepare all the hosts, and upgrade the first one using the --pause option

./orca --hosts node45,node44,node43,node42,node41 major-upgrade --pause packages <install.properties>

Once you have tested the first node is performing as expected, you issue the --continue form of the command, giving just the set of nodes that you want to be upgraded in this window

./orca --hosts node45,node44,node43 major-upgrade --continue packages <install.properties>

In the second maintenance window, the command you use must keep the first host the same (since that is the host holding the node whose upgrade is being duplicated to the other nodes), but you change the rest of the list to be the remainder of the hosts that need to be upgraded

./orca --hosts node45,node42,node41 major-upgrade --continue packages <install.properties>

Perform post-upgrade validation tests

Perform your post-upgrade test plan at this stage.

The upgrade is now complete.

Next steps

Aborting or Reverting an Upgrade

This page describes the steps required to abort an upgrade partway through, or revert back to the downlevel system after the upgrade is complete.

Rolling back the upgrade will take approximately 10 minutes per affected node. Which nodes are affected depends on which nodes were partially or fully upgraded.

Be sure you know the original cluster ID as noted in the pre-upgrade checklist.

Note that in the normal case there should be no need to restore the Rhino cluster from an export. The cluster directory being rolled back to will contain all relevant configuration. However, any configuration changes made since the upgrade started will be lost.

Restore network element configuration

If, in preparation to the upgrade, you made changes to other elements in your network, you should undo these changes before starting the rollback procedure.

Check the current cluster status

Use orca's status command to check the current cluster status.

./orca --hosts <host1,host2,host3,…​> status

For each host there may be multiple clusters listed, but only one of those will be marked as LIVE, so note its ID, which is the last part of the cluster name. If the various hosts have nodes in different 'LIVE' clusters, then any nodes in a 'LIVE' cluster that is not the original live cluster will need to have their hosts rolled back. (In most circumstances, the original live cluster will have a lower ID number, so rollback will normally be needed for any host that does not have that lowest number cluster ID marked as being 'LIVE').

If required, run the rollback command

If you identified any hosts that need to be rolled back, run the rollback command. Hosts are specified as a comma-separated list without whitespace, e.g. --hosts host1,host2. Specify the hosts in reverse order of node ID (that is, highest node ID first).

./orca --hosts <hosts to be rolled back> rollback --stop-timeout <timeout>

where <timeout> is the time you wish to allow for traffic to drain on each node.

If you are rolling back every host in the cluster (or more specifically, the cluster ID that you are rolling back to currently contains no members), then append the parameter --special-first-node to the above command line. This instructs orca that the first node specified will need to be made primary so that the cluster starts correctly.

This procedure will take approximately 10 minutes per host specified.

If required, run the cleanup command

Follow the instructions here.

Perform validation tests

Ensure you can log into REM and connect to the cluster.

  • Ensure all services are started.

  • Ensure RA entity activation state is as expected. Start or stop any RA entities as required.

  • Check for and resolve unexpected alarms.

Perform test calls and other validation tests to ensure the rolled-back cluster is operating normally.

The rollback procedure is now complete.

Next steps

Discuss with your Metaswitch Customer Care Representative the next steps, for example obtaining a patch or more recent release that addresses any problems encountered during the upgrade, or retrying the upgrade at a later date.

Feature Scripts conflicts and resolution

Resolving the Feature Scripts conflicts

After orca finishes the major upgrade in the first node, there might be Feature Scripts conflicts which need to be solved and applied to the system for correct operation.

The files in the feature-scripts path will contain the scripts with:

  • the proposed merged version

  • the installed version

  • the new version (uplevel version)

  • the original downlevel version

The cases that require full manual intervention are ones where the file presents this line:

<merge conflict: some message>, e.g, <merge conflict: all three versions differ>

In order to show how to solve conflict in the Feature Scripts consider the examples below.

Example of default_Post_SipAccess_SubscriberCheck:

<merge conflict: all three versions differ>

### Write your script above. This line, and anything below it, will be removed

=== Currently installed version of script

featurescript SipAccessSubscriberCheck-SysPost-Default {
    if not session.MonitorCallOnly {
        run B2BUAScurPostFeature
    }
    run SDPRewriter
    run SDPMonitor mode "post"
    run RecordTimestamps mode "inbound"
    run ExtractNetworkInfo
    run RemoveHeadersFromOutgoingMessages
}

>>> New version of script

featurescript SipAccessSubscriberCheck-SysPost-Default {
    if not session.MonitorCallOnly {
        run B2BUAScurPostFeature
    }
    run SDPRewriter
    run SDPMonitor mode "post"
    run RecordTimestamps mode "outbound"
    run ExtractNetworkInfo
    run ExternalSessionTracking
    run RemoveHeadersFromOutgoingMessages
}

<<< Original version of script

featurescript SipAccessSubscriberCheck-SysPost-Default {
    if not session.MonitorCallOnly {
        run B2BUAScurPostFeature
    }
    run SDPRewriter
    run SDPMonitor mode "post"
    run ExtractNetworkInfo
    run RemoveHeadersFromOutgoingMessages
}

This case shows all 3 versions are different, specifically run RecordTimestamps mode "inbound" changed to run RecordTimestamps mode "outbound" and run ExternalSessionTracking was added.

One correct solution would be to keep the new version of the script. The file after editing would have:

featurescript SipAccessSubscriberCheck-SysPost-Default {
    if not session.MonitorCallOnly {
        run B2BUAScurPostFeature
    }
    run SDPRewriter
    run SDPMonitor mode "post"
    run RecordTimestamps mode "outbound"
    run ExtractNetworkInfo
    run ExternalSessionTracking
    run RemoveHeadersFromOutgoingMessages
}

### Write your script above. This line, and anything below it, will be removed

=== Currently installed version of script

featurescript SipAccessSubscriberCheck-SysPost-Default {
    if not session.MonitorCallOnly {
        run B2BUAScurPostFeature
    }
    run SDPRewriter
    run SDPMonitor mode "post"
    run RecordTimestamps mode "inbound"
    run ExtractNetworkInfo
    run RemoveHeadersFromOutgoingMessages
}

>>> New version of script

featurescript SipAccessSubscriberCheck-SysPost-Default {
    if not session.MonitorCallOnly {
        run B2BUAScurPostFeature
    }
    run SDPRewriter
    run SDPMonitor mode "post"
    run RecordTimestamps mode "outbound"
    run ExtractNetworkInfo
    run ExternalSessionTracking
    run RemoveHeadersFromOutgoingMessages
}

<<< Original version of script

featurescript SipAccessSubscriberCheck-SysPost-Default {
    if not session.MonitorCallOnly {
        run B2BUAScurPostFeature
    }
    run SDPRewriter
    run SDPMonitor mode "post"
    run ExtractNetworkInfo
    run RemoveHeadersFromOutgoingMessages
}

Example file MMTelTerm_SipAccess_PartyRequest:

featurescript SipAccessPartyRequest-User-MmtelTerm {
    if session.ICBBarredWithAnnouncement or session.PlayCDIVAnnouncement or session.PlayCWAnnouncement or session.EndSessionWithAnnouncement {
        run SipPlayAnnouncement
    }
    if not session.FlexibleAlertingMode.NONE {
        if session.FlexibleAlertingMode.PARALLEL {
            run MMTelParallelFA
        }
        else {
            run MMTelSequentialFA
        }
    }
    run MMTelOIP
    run MMTelECT
    run MMTelStodProcessHandover
    run DetermineChargeableLeg
    if session.AccessLegTrackingActive {
        run AccessLegTracking
    }
}

### Write your script above. This line, and anything below it, will be removed

=== Currently installed version of script

featurescript SipAccessPartyRequest-User-MmtelTerm {
    if session.ICBBarredWithAnnouncement or session.PlayCDIVAnnouncement or session.PlayCWAnnouncement or session.EndSessionWithAnnouncement {
        run SipPlayAnnouncement
    }
    if not session.FlexibleAlertingMode.NONE {
        if session.FlexibleAlertingMode.PARALLEL {
            run MMTelParallelFA
        }
        else {
            run MMTelSequentialFA
        }
    }
    run MMTelOIP
    run MMTelECT
    run MMTelStodProcessHandover
    run DetermineChargeableLeg
}

>>> New version of script

featurescript SipAccessPartyRequest-User-MmtelTerm {
    if session.ICBBarredWithAnnouncement or session.PlayCDIVAnnouncement or session.PlayCWAnnouncement or session.EndSessionWithAnnouncement {
        run SipPlayAnnouncement
    }
    if not session.FlexibleAlertingMode.NONE {
        if session.FlexibleAlertingMode.PARALLEL {
            run MMTelParallelFA
        }
        else {
            run MMTelSequentialFA
        }
    }
    run MMTelOIP
    run MMTelECT
    run MMTelStodProcessHandover
    run DetermineChargeableLeg
    if session.AccessLegTrackingActive {
        run AccessLegTracking
    }
}

<<< Original version of script

featurescript SipAccessPartyRequest-User-MmtelTerm {
    if session.ICBBarredWithAnnouncement or session.PlayCDIVAnnouncement or session.PlayCWAnnouncement or session.EndSessionWithAnnouncement {
        run SipPlayAnnouncement
    }
    if not session.FlexibleAlertingMode.NONE {
        if session.FlexibleAlertingMode.PARALLEL {
            run MMTelParallelFA
        }
        else {
            run MMTelSequentialFA
        }
    }
    run MMTelOIP
    run MMTelECT
    run MMTelStodProcessHandover
    run DetermineChargeableLeg
}

The change is that the new version introduces the change

    if session.AccessLegTrackingActive {
        run AccessLegTracking
    }

It matches the case 2, so the uplevel version is the correct one to use and no changes in the file is required.

Importing the Feature Scripts after resolving the conflicts

After the conflicts are solved run the command from the same path you ran to do the major upgrade:

./orca --hosts <first host> import-feature-scripts

The output should be:

Importing feature script SCCTerm_HLR_SipAccess_ServiceTimer...
Importing feature script default_Post_SipMidSession_ChargingReauth...
Importing feature script MMTelOrig_SipMidSession_PartyRequest...
Importing feature script MMTelConf_SipAccess_SubscriberCheck...
Importing feature script SCCTermAnchor_SipAccess_ServiceTimer...
Importing feature script SCC_Post_SipEndSession...
Importing feature script MMTel_Pre_SipAccess_SessionStart...
Importing feature script SCC_SipAccess_PartyRequest...

... other feature scripts ...

Done on localhost

If some Feature Script files are not correct orca will print warnings:

5 scripts could not be imported (see above for errors):
  - MMTel_Pre_SipMidSession_PartyRequest
  - SCC_Post_SipAccess_PartyRequest
  - MMTel_Post_SipMidSession_PartyRequest
  - default_Post_SipMidSession_PartyResponse
  - MMTelOrig_Post_SubscriptionSipResponse

You can fix them and do the same import procedure as indicated above.

Troubleshooting

Besides the information on the console, orca provides detailed output of the actions taken in the log file. The log file by default is located on the host that executed the command under the path logs.

orca can’t connect to the remote hosts

Check if the trusted connection via ssh is working. The command ssh <the host to connect to> should work without asking for a password.

You can add a trusted connection by executing the steps below

  • Create SSH key by using default locations and empty passphrase. Just hit enter until you’re done

ssh-keygen -t rsa
  • Copy the SSH key onto all VMs that need to be accessed including the node you are on. You will have to enter the password for the user

ssh-copy-id -i $HOME/.ssh/id_rsa.pub sentinel@<VM_ADDRESS>

where VM_ADDRESS is the host name you want the key to be copied to.

To check run:

ssh VM_ADDRESS

It should return a shell of the remote host (VM_ADDRESS).

orca failed to create the management database

orca uses the psql client to connect to and do operations on the PostgreSQL database. Check the first host in the --hosts list has psql installed.

Get the database hostname from the file /home/sentinel/rhino/node-xxx/config/config_variables Search for the properties MANAGEMENT_DATABASE_NAME, MANAGEMENT_DATABASE_HOST , MANAGEMENT_DATABASE_USER, MANAGEMENT_DATABASE_PASSWORD

example

MANAGEMENT_DATABASE_NAME=rhino_50
MANAGEMENT_DATABASE_HOST=postgresql
MANAGEMENT_DATABASE_PORT=5432
MANAGEMENT_DATABASE_USER=rhino_user
MANAGEMENT_DATABASE_PASSWORD=rhino_password

Test the connection:

psql -h postgresql -U rhino_user -p 5432 rhino_50

Enter the password and you expect to see

Password for user rhino:
psql (9.5.14)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

rhino_50=>

If the connection fails try with the database name from the previous cluster. If it still fails the procedure below is a guide on how to add permission to the database to accept remote connections.

  • Log in to the database host.

  • Open up the pg_hba.conf file, using sudo.

sudo vim /var/lib/pgsql/data/pg_hba.conf
  • Replace the line that looks like this…​ host rhino-xxx sentinel xxx.xxx.xxx.xxx/xx password with a line that looks like this…​

host  all  sentinel  xxx.xxx.xxx.xxx/xx  password

Where 'rhino-xxx' is the name of cluster xxx’s postgres database, and xxx.xxx.xxx.xxx/xx covers the signalling addresses of the nodes.

  • Reload the config. /usr/bin/pg_ctl reload

  • Or failing that, try this command. sudo service postgresql restart

Installer runs but does not install

The recommendation is to rollback the node to the previous cluster, cleanup the prepared cluster and start again.

./orca --hosts host1 rollback
./orca --hosts host1,host2,host3 cleanup --cluster <cluster id>
./orca --hosts host1,host2,host3 major-upgrade <path to the product sdk zip> <path to the install.properties> --pause

Rhino nodes shutdown or reboot during migration

orca requires the node being migrated to allow management commands via rhino-console. If the node is not active orca will fail.

If while executing orca --hosts <list of hosts> major-upgrade packages install.properties --continue the node being currently migrated shuts down or reboots, there are 3 options:

1- Skip the node and proceed to the other nodes and migrate the failed node later

Except for the first host name, remove the hosts that were already migrated from the orca command, including the host that failed and execute the command orca --hosts <list of hosts> major-upgrade packages install.properties --continue again to continue with the upgrade.

2- Restart the node manually and continue with the upgrade

If the node shutdown and hadn’t rebooted, execute the rhino.sh start script. If the node rebooted, check if rhino-console command works:

<path to the old cluster >/client/bin/rhino-console

Except for the first host name, remove the hosts that were already migrated from the orca command and execute the command orca --hosts <list of hosts> major-upgrade packages install.properties --continue again to continue with the upgrade.

3- Do the migration manually

This is an extreme case, but the procedure is simple.

For each node that was not migrated:

Identify the new cluster

cd $HOME
ls -ls rhino

lrwxrwxrwx 1 sentinel sentinel 24 Nov 21 21:48 rhino -> volte-2.7.1.2-cluster-50

will show you the current cluster name in the format <product>-<version>-cluster-<id>.

The new cluster will have a similar name <product>-<new version>-cluster-<id + 1>, but the cluster id will be one number higher. For example: volte-2.8.0.3-cluster-51

Kill the current node

rhino/rhino.sh kill

Link the rhino path to the new cluster

rm rhino
ln -s <new cluster> rhino

where <new cluster> is the path identified above. Example: volte-2.8.0.3-cluster-51

Start the node and check for connection

./rhino/rhino.sh start

Wait some minutes and check if you can connect to the node using REM or use rhino-console.