This guide explains how to upgrade Sh Cache Microservice to the current version.

Operational Tools Architecture

This upgrade manual makes references to elements of the Operational Tools Architecture, which explains at a high level the design of the upgrade process as well as low-level information on use of the upgrade bundles and tools. Make an offline copy of this book to consult while executing upgrade on customer sites.

Overview and Terminology

Terminology

  • The downlevel (software) version is the version being upgraded from.

  • The uplevel (software) version is the version being upgraded to.

  • The installed (software) version refers to the software currently running on the Rhino cluster, including any customizations, additions and patches that may have been made to deployable units, SLEE components and configuration.

  • Orca is the tool, delivered within the upgrade package, which drives the whole upgrade process, connecting to each of the hosts and running commands on them remotely.

  • A Deployable Unit (DU) is a packaged component that provides a particular feature of the software.

  • A (Rhino) cluster is a group of nodes with the same cluster ID. The cluster ID is an integer between 1 and 255 used to identify nodes that are linked together and share traffic. The cluster is backed by a PostgreSQL database containing the DUs and configuration. This architecture has the following implications, which are the fundamentals of the Cluster Migration Workflow design for upgrades:

    Important In the case of Sh Cache Microservice there is only ever one node in the cluster.

An upgrade can be one of two types: minor or major.

  • A minor upgrade is one where only the final component of the software version number changes, e.g. 1.0.0.1 to 1.0.0.2. It can include bugfixes, new minor functionality and new configuration, but the configuration schemas are always compatible with the downlevel software.

  • A major upgrade (supported in future ShCM releases) is one where any of the first three components of the version number changes, e.g. 1.0.0.1 to 1.1.0.0. It can include bugfixes, new features and new configuration. The new configuration may use a different schema, for example some profile spec IDs and profile table schemas may change. As such the configuration can not simply be imported from the downlevel cluster and needs to be transformed by orca so it is suitable for the uplevel cluster’s schema.

Both minor and major upgrades can include a new Rhino version. Major upgrades may also include a new version of Java. Where present, these are deployed automatically by orca at the appropriate point in the upgrade process.

  • orca is a command-line tool which can perform a variety of maintenance operations on Rhino clusters. It implements the Cluster Migration Workflow.

  • An upgrade bundle is a zip file containing the uplevel software, orca, any new Rhino and/or Java version, plus ancillary resources (such as feature script configuration) required during the upgrade process. Upgrade bundles are provided to customers by Metaswitch Customer Care.

  • A customization package is a package included in an upgrade bundle which applies customer-specific modifications to the product. They come in two types: post-install (applied after product installation) or post-configure (applied after the configuration has been transformed and imported). Your Professional Services representative can assist in creation of customization packages and their inclusion in upgrade bundles.

Upgrade process overview

Upgrading an Sh Cache Microservice product involves the following steps:

  • Read the product changelog to understand the changes introduced in the uplevel software, any new configuration required, and any workarounds/limitations you may encounter

  • Obtain the upgrade package from your Metaswitch Customer Care Representative, including any customization packages and licenses if required

  • Prepare for the upgrade

    • plan a maintenance window for when you expect traffic through the system to be low. (Although the rolling nature of the upgrade means that the system is always able to accept new traffic throughout the upgrade, traffic that is still present on a specific node at the end of its draining timeout will be curtailed, leading for example to terminated phone calls).

    • identify the first node to be upgraded

  • Upgrade the first node

  • Perform validation tests, if required

  • Upgrade the remainder of the nodes

  • Perform validation tests, if required

  • Post-upgrade steps including restoring other network element configuration

The uplevel software is provided in the form of a bundle (zip file). The procedure of upgrading the software is performed using the orca tool, which is included in the bundle. Run orca from a machine (either an operator’s PC or a node in the deployment to be upgraded) that meets the following requirements:

  • Linux OS, with Python 2.7 installed and accessible in the PATH as python2 (you can check by running python2 --version)

  • At least {product-disk-space} of free hard disk space (you can check with df -h)

  • Passwordless SSH connectivity (via SSH keys) to all Rhino nodes in the cluster to be upgraded

    • Configure the SSH agent to log in as the user under which Rhino is installed on the nodes, normally sentinel. This can be configured in ~/.ssh/config in the home directory of the user that will be running orca, for example:

Host host1
  User sentinel
  IdentityFile /path/to/ssh_private_key
  ...

Upgrading will take 1-2 hours for the first node and {product-time-other-nodes} for each subsequent node, not including validation tests or manual reconfiguration steps. It is a current limitation that the upgrade proceeds sequentially across all cluster nodes.

Next steps

Preparing for an Sh Cache Microservice upgrade

This section describes how to prepare for an Sh Cache Microservice upgrade. Be sure you are familiar with the upgrade overview and terminology.

  • Information and Files Required gives useful information for planning the upgrade and describes how to validate the upgrade bundle.

  • Limitations describes limitations, known issues and other caveats to be aware of while performing the upgrade process.

After familiarising yourself with the above information, refer to the Executing an Sh Cache Microservice Upgrade section to begin the upgrade process.

Information and Files Required

This section documents information concerning the upgrade. Be sure you are also familiar with the new features and new configuration introduced in the uplevel Sh Cache Microservice software.

Information required

Maintenance window

In general an upgrade of an Sh Cache Microservice node can be completed in a single maintenance window, though for operational reasons you may wish to use more than one. The maintenance window should be of adequate length (at least 1-2 hours the each node and validation tests according to your test plan).

Files required

Upgrade bundle

Your Metaswitch Customer Care Representative should have provided you with an upgrade bundle, normally named ShCM-<uplevel-version>-upgrade-bundle.zip or similar. unzip the bundle to a directory of your choice and cd to it.

Important
orca working directory

orca must always be run from the directory where it resides. It will fail to operate correctly if run from any other location.

Verify the orca script is present and executable by running ./orca --help. You should see usage information.

Verify that orca can contact the hosts you wish to upgrade by running ./orca --hosts host1 status. You should see status information, such as the current live cluster and lists of services, about all the hosts specified.

Note
SSH access to hosts

orca requires passwordless SSH access to the hosts. If this is not the case, orca will throw an error saying it is unable to contact one or more of the hosts. Ensure SSH key-based access is set up to all such hosts and retry the status command until it works.

Limitations

This page describes limitations, known issues and workarounds to be aware of that relate to the Sh Cache Microservice upgrade process.

General limitations

  • While orca is waiting to shut down a node, the node will continue to accept new requests that is directed at it. As such, or if pre-existing requests continues, if the node never becomes completely free of requests whilst draining before the given shutdown-timeout is reached, then the node will be killed resulting in any active requests being terminated.

  • Tracer levels (and for Rhino 2.6.0 or later, notification sources and loggers) are not maintained across an upgrade. If you previously set some custom levels which you wish to retain, then you will need to manually reconfigured them after the upgrade is complete. This can be done either using rhino-console commands, or via the Monitoring menu in REM.

  • orca only supports ASCII characters in pathnames (notably the Rhino installation path on each node).

Limitations when upgrade includes an installation of a new Java version

  • The new Java installation will only be used for the host being upgraded. If there are other applications on the machine that also use Java, these will need to be manually reconfigured. See here for more information.

Executing an Sh Cache Microservice Upgrade

This section describes the steps to perform the Sh Cache Microservice upgrade. Be sure you are familiar with the upgrade overview and terminology.

Pre-Upgrade Checklist

This page describes steps to take before starting an Sh Cache Microservice upgrade.

Note that there is no specific need to take an export prior to starting the upgrade. Taking an export is the first step that orca will perform when issued the upgrade command, and the export will be available at the end of the upgrade process.

Create validation test plans

Devise a validation test plan to ensure that, after the upgrade:

  • Sh Cache Microservice API works

  • any features in the downlevel version that you rely on still work as expected

  • any new features you are expecting to use in the uplevel version work as expected.

During or after the upgrade, if you find the validation tests are failing, contact your Metaswitch Customer Care Representative to discuss next steps.

Ensure the nodes are configured with the standardized path structure

orca requires the nodes to have a standardized path structure. Within the HOME directory (normally /home/sentinel) there must be:

  • one or more Rhino installation directories with a name of the following format: ShCM-<version>-cluster-<id>, e.g. ShCM-1.0.0.1-cluster-41

  • a symbolic link named rhino pointing at the currently live Rhino installation

  • a directory named export

  • a directory named install.

See Standardized Path Structure for more information.

If your installation does not conform to this setup, consult your Metaswitch Customer Care Representative who can advise on how to convert it.

Record the old cluster ID

In case you need to rollback the upgrade, you will need the current cluster ID.

Execute the status command:

orca --hosts host1 status

The first part of the output for each host will include the cluster status. One of the clusters should be marked as "LIVE".

Status of host host1

Clusters:
 - ShCM-1.0.0.0-cluster-41
 - ShCM-1.0.0.1-cluster-42 - LIVE

On the "LIVE" cluster, the last part of the name given is the cluster ID. For example, in the above output the cluster ID is 42.

Ensure all clusters have the same live cluster ID. Make a note of this value.

Verify license validity

In the status output from the previous section, check the License information section for each host to ensure the license is valid and will remain valid until after the planned finish time of the upgrade.

If your upgrade includes a new major Rhino version, e.g 2.6.0 → 2.7.0 then you may require a new license, as all products deployed on Rhino are licensed against a Rhino version. Your Metaswitch Customer Care Representative should be able to provide you with one, in which case you should copy it to the directory where you extracted the upgrade bundle, and pass it to orca by appending the parameter --license <license file> when running the first upgrade command. Alternatively, they may advise that the license is already present in the upgrade bundle, or that no new license is needed (in which case orca will automatically transfer the existing license to the uplevel cluster).

Verify disk space

Log into each node in turn and verify the disk space using df -h. Ensure there is at least {product-disk-space} of free disk space. Refer to this section if you need to clean up old clusters and/or exports in order to free disk space.

Check for unexpected alarms

Log into REM and access the connection to the Rhino cluster. Check there are no unexpected alarms. Fix any issues you find. If required, make a note of expected active alarms for correlation with the post-upgrade list of alarms.

Enable symmetric activation state mode if required

orca only supports upgrading a cluster with symmetric activation state mode enabled. If your cluster normally operates with it disabled, you will need to enable it here and restore the state after the upgrade.

To check if the cluster has symmetric activation state mode enabled, examine the orca status output. At the top under "Global info" it will say Symmetric activation state mode is currently enabled or Nodes with per-node activation state: <node list>. If you see the former output then symmetric activation state mode is enabled, while the latter means it is disabled.

To enable symmetric activation state mode, follow the instructions in the Rhino documentation here.

Rerun the orca status command orca --hosts host1 status and verify that the status reports Symmetric activation state mode is currently enabled.

Post-Upgrade Checklist

This page describes the actions to take after the upgrade is completed and all relevant validation tests have passed.

Check service state and alarms

Refer to the verification section in Sh Cache Microservice Install guide to check the state of the Sh Cache Microservice.

Verify SAS configuration

If your deployment is now running Rhino 2.6.0 or later and includes a MetaView Service Assurance Server (SAS), verify the SAS configuration in Rhino is as expected. See Sh Cache Microservice Install guide
in the Rhino documentation.

Check that Sh Cache Microservice requests made in your validation tests appear in SAS.

Archive the downlevel export generated during the upgrade, and generate a new export

On the first node, orca will have generated an export of the Rhino installation at the downlevel version, prior to any migration or upgrade steps. This can be found in the ~/export directory, labelled with the version and cluster ID, and is a useful "restore point" in case problems with the upgrade are encountered later that cannot be simply undone with the rollback command. Copy (for example using rsync) the downlevel export directory with all its contents to your backup storage, if you have one.

Follow the Rhino documentation to generate a post-upgrade export and archive this too.

Note
Uplevel exports generated during major upgrade

Note that during major upgrade orca will automatically generate an export of the uplevel installation. However, this export is intended for orca's use only and is not suitable to restore from - for example, it will not include a large amount of Rhino configuration. There will also be an export labelled with transformed-for-<uplevel version>, to which the same caveat applies.

These exports can be ignored, and deleted at a later date.

Archive the upgrade logs

In the directory from which orca was run, there will be a directory logs containing many subdirectories with log files. Copy (for example using rsync) this logs directory with all its contents to your backup storage, if you have one. These logs can be useful for Metaswitch Customer Care in the event of a problem with the upgrade.

Save the install.properties file

It is a good idea to save the install.properties file for use in a future upgrade.

Remember that the file is currently on the machine that you chose to run orca from, and that may not be the same machine chosen by someone doing a future upgrade.

A good location to put the file is in a /home/sentinel/install directory.

If required, clean up downlevel clusters and unneeded exports

Once the upgrade is confirmed to be working, you may wish to clean up old downlevel cluster to save disk space.

Run the status command to view existing clusters and exports

Run the status command and observe the clusters and exports sections of the output.

./orca --hosts host1 status

Identify any clusters or exports you no longer wish to keep. Note their cluster IDs, which is the last part of the name. For example, given this output:

Status of host host1

Clusters:
 - ShCM-1.0.0.1-cluster-41
 - ShCM-1.0.0.1-cluster-42
 - ShCM-1.0.0.3-cluster-43 - LIVE

[...]

Exports:
 - ShCM-1.0.0.1-cluster-41
 - ShCM-1.0.0.2-cluster-42
 - ShCM-1.0.0.2-cluster-42-transformed-for-1.0.0.3
 - ShCM-1.0.0.3-cluster-43

[...]

Status of host host2

Clusters:
 - ShCM-1.0.0.1-cluster-41
 - ShCM-1.0.0.2-cluster-42
 - ShCM-1.0.0.3-cluster-43 - LIVE

[...]

you may decide to delete cluster 41 and exports 41 and 42.

Tip
Retain one old cluster

You are advised to always leave the most recent downlevel cluster in place as a fallback.

Be sure you have an external backup of any export directories you plan to delete, unless you are absolutely sure that you will not need them in the future.

Run the cleanup command to delete clusters and exports

Run the cleanup command, specifying the clusters and exports to delete as comma-separated lists of IDs (without whitespace). For example:

./orca --hosts host1 cleanup --clusters 41
./orca --hosts host1 cleanup --exports 41,42

Update Java version of other applications running on the host

If performing a major upgrade with a new version of Java, orca will install the new Java but it will only be applied to the new Rhino installation. Global environment variables such as JAVA_HOME, or other applications that use Java, will not be updated.

The new Java installation can be found at ~/java/<version> where <version> is the JDK version that orca installed, e.g. 8u162. If you want other applications running on the node to use this new Java installation, update the appropriate environment variables and/or configuration files to reference this directory.

Upgrade Process

Before you begin the upgrade, ensure you have completed the pre-upgrade checklist.

Upgrade process

Note that the following process is for a minor upgrade. Minor and major upgrades are very similar; the differences for major upgrade are listed inline.

Upgrade a ShCM Node

Start the upgrade using the following command. For major upgrade, replace minor-upgrade with major-upgrade.

./orca --hosts host1 minor-upgrade --stop-timeout <timeout> --pause {extra-parameter-minor-upgrade} packages <install.properties>

where

  • <timeout> is the maximum amount of time you wish to allow for all calls to stop on each node

  • <install.properties> is the path to the install.properties file

  • packages is a literal name, and should not be changed.

If your upgrade includes a new Rhino version and you have been given a separate license file, specify this here by appending the parameter --license <path to license file>.

This will take approximately 1-2 hours. At the end of the upgrade process, you will be prompted to continue the operation when ready:

If the upgrade failed, refer to orca troubleshooting to resolve the problem. You will normally need to rollback to the original cluster in order to try again.

Merge and import feature scripts

If the output from orca prompted you to merge feature scripts, follow the instructions in Feature Scripts conflicts and resolution to merge and import the feature scripts into the uplevel installation. Note that when running orca’s `import-feature-scripts command, you need only specify the first host.

Perform node validation tests

If your test plan includes validating the first node after upgrade, run these tests now.

Perform post-upgrade validation tests

Perform your post-upgrade test plan at this stage.

The upgrade is now complete.

Next steps

Aborting or Reverting an Upgrade

This page describes the steps required to abort an upgrade partway through, or revert back to the downlevel system after the upgrade is complete.

Rolling back the upgrade will take approximately {product-time-other-nodes} per affected node. Which nodes are affected depends on which nodes were partially or fully upgraded.

Be sure you know the original cluster ID as noted in the pre-upgrade checklist.

Note that in the normal case there should be no need to restore the Rhino cluster from an export. The cluster directory being rolled back to will contain all relevant configuration. However, any configuration changes made since the upgrade started will be lost.

Restore network element configuration

If, in preparation to the upgrade, you made changes to other elements in your network, you should undo these changes before starting the rollback procedure.

Check the current cluster status

Use orca's status command to check the current cluster status.

./orca --hosts <host1,host2,host3,…​> status

For each host there may be multiple clusters listed, but only one of those will be marked as LIVE, so note its ID, which is the last part of the cluster name. If the various hosts have nodes in different 'LIVE' clusters, then any nodes in a 'LIVE' cluster that is not the original live cluster will need to have their hosts rolled back. (In most circumstances, the original live cluster will have a lower ID number, so rollback will normally be needed for any host that does not have that lowest number cluster ID marked as being 'LIVE').

If required, run the rollback command

If you identified any hosts that need to be rolled back, run the rollback command. Hosts are specified as a comma-separated list without whitespace, e.g. --hosts host1,host2. Specify the hosts in reverse order of node ID (that is, highest node ID first).

./orca --hosts <hosts to be rolled back> rollback --stop-timeout <timeout>

where <timeout> is the time you wish to allow for traffic to drain on each node.

If you are rolling back every host in the cluster (or more specifically, the cluster ID that you are rolling back to currently contains no members), then append the parameter --special-first-node to the above command line. This instructs orca that the first node specified will need to be made primary so that the cluster starts correctly.

This procedure will take approximately {product-time-other-nodes} per host specified.

If required, run the cleanup command

Follow the instructions here.

Perform validation tests

Ensure you can log into REM and connect to the cluster.

  • Ensure all services are started.

  • Ensure RA entity activation state is as expected. Start or stop any RA entities as required.

  • Check for and resolve unexpected alarms.

Perform test calls and other validation tests to ensure the rolled-back cluster is operating normally.

The rollback procedure is now complete.

Next steps

Discuss with your Metaswitch Customer Care Representative the next steps, for example obtaining a patch or more recent release that addresses any problems encountered during the upgrade, or retrying the upgrade at a later date.

Troubleshooting

Besides the information on the console, orca provides detailed output of the actions taken in the log file. The log file by default is located on the host that executed the command under the path logs.

orca can’t connect to the remote hosts

Check if the trusted connection via ssh is working. The command ssh <the host to connect to> should work without asking for a password.

You can add a trusted connection by executing the steps below

  • Create SSH key by using default locations and empty passphrase. Just hit enter until you’re done

ssh-keygen -t rsa
  • Copy the SSH key onto all VMs that need to be accessed including the node you are on. You will have to enter the password for the user

ssh-copy-id -i $HOME/.ssh/id_rsa.pub sentinel@<VM_ADDRESS>

where VM_ADDRESS is the host name you want the key to be copied to.

To check run:

ssh VM_ADDRESS

It should return a shell of the remote host (VM_ADDRESS). If it prompts for a password ensure that group permission is correct.

chmod -R go= ~/.ssh

orca failed to create the management database

orca uses the psql client to connect to and do operations on the PostgreSQL database. Check the first host in the --hosts list has psql installed.

Get the database hostname from the file /home/sentinel/rhino/node-xxx/config/config_variables Search for the properties MANAGEMENT_DATABASE_NAME, MANAGEMENT_DATABASE_HOST , MANAGEMENT_DATABASE_USER, MANAGEMENT_DATABASE_PASSWORD

example

MANAGEMENT_DATABASE_NAME=rhino_50
MANAGEMENT_DATABASE_HOST=postgresql
MANAGEMENT_DATABASE_PORT=5432
MANAGEMENT_DATABASE_USER=rhino_user
MANAGEMENT_DATABASE_PASSWORD=rhino_password

Test the connection:

psql -h postgresql -U rhino_user -p 5432 rhino_50

Enter the password and you expect to see

Password for user rhino:
psql (9.5.14)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

rhino_50=>

If the connection fails try with the database name from the previous cluster. If it still fails the procedure below is a guide on how to add permission to the database to accept remote connections.

  • Log in to the database host.

  • Open up the pg_hba.conf file, using sudo.

sudo vim /var/lib/pgsql/data/pg_hba.conf
  • Replace the line that looks like this…​ host rhino-xxx sentinel xxx.xxx.xxx.xxx/xx password with a line that looks like this…​

host  all  sentinel  xxx.xxx.xxx.xxx/xx  password

Where 'rhino-xxx' is the name of cluster xxx’s postgres database, and xxx.xxx.xxx.xxx/xx covers the signalling addresses of the nodes.

  • Reload the config. /usr/bin/pg_ctl reload

  • Or failing that, try this command. sudo systemctl restart postgresql

Installer runs but does not install

The recommendation is to rollback the node to the previous cluster, cleanup the prepared cluster and start again.

./orca --hosts host1 rollback
./orca --hosts host1 cleanup --cluster <cluster id>
./orca --hosts host1 minor-upgrade <path to the product sdk zip> <path to the install.properties> --pause

Rhino nodes shutdown or restart during migration

orca requires the node being migrated to allow management commands via rhino-console. If the node is not active orca will fail.

If while executing orca --hosts <list of hosts> minor-upgrade packages install.properties --continue the node being currently migrated shuts down or restart, there are 3 options:

1- Skip the node and proceed to the other nodes and migrate the failed node later

Except for the first host name, remove the hosts that were already migrated from the orca command, including the host that failed and execute the command orca --hosts <list of hosts> minor-upgrade packages install.properties --continue again to continue with the upgrade.

2- Restart the node manually and continue with the upgrade

If the node shutdown and hadn’t restart, execute the rhino.sh start script. If the node restarted, check if rhino-console command works:

<path to the old cluster >/client/bin/rhino-console

Except for the first host name, remove the hosts that were already migrated from the orca command and execute the command orca --hosts <list of hosts> minor-upgrade packages install.properties --continue again to continue with the upgrade.

3- Do the migration manually

This is an extreme case, but the procedure is simple.

For each node that was not migrated:

Identify the new cluster

cd $HOME
ls -ls rhino

lrwxrwxrwx 1 sentinel sentinel 24 Nov 21 21:48 rhino -> shcm-1.0.0.0-cluster-50

will show you the current cluster name in the format <product>-<version>-cluster-<id>.

The new cluster will have a similar name <product>-<new version>-cluster-<id + 1>, but the cluster id will be one number higher. For example: shcm-1.0.0.1-cluster-51

Kill the current node

rhino/rhino.sh kill

Link the rhino path to the new cluster

rm rhino
ln -s <new cluster> rhino

where <new cluster> is the path identified above. Example: shcm-1.0.0.1-cluster-51

Start the node and check for connection

./rhino/rhino.sh start

Wait some minutes and check if you can connect to the node using rhino-console.