This guide explains how to upgrade Sh Cache Microservice to the current version.
Operational Tools Architecture
This upgrade manual makes references to elements of the Operational Tools Architecture, which explains at a high level the design of the upgrade process as well as low-level information on use of the upgrade bundles and tools. Make an offline copy of this book to consult while executing upgrade on customer sites.
Overview and Terminology
Terminology
-
The downlevel (software) version is the version being upgraded from.
-
The uplevel (software) version is the version being upgraded to.
-
The installed (software) version refers to the software currently running on the Rhino cluster, including any customizations, additions and patches that may have been made to deployable units, SLEE components and configuration.
-
Orca is the tool, delivered within the upgrade package, which drives the whole upgrade process, connecting to each of the hosts and running commands on them remotely.
-
A Deployable Unit (DU) is a packaged component that provides a particular feature of the software.
-
A (Rhino) cluster is a group of nodes with the same cluster ID. The cluster ID is an integer between 1 and 255 used to identify nodes that are linked together and share traffic. The cluster is backed by a PostgreSQL database containing the DUs and configuration. This architecture has the following implications, which are the fundamentals of the Cluster Migration Workflow design for upgrades:
In the case of Sh Cache Microservice there is only ever one node in the cluster.
An upgrade can be one of two types: minor or major.
-
A minor upgrade is one where only the final component of the software version number changes, e.g. 1.0.0.1 to 1.0.0.2. It can include bugfixes, new minor functionality and new configuration, but the configuration schemas are always compatible with the downlevel software.
-
A major upgrade (supported in future ShCM releases) is one where any of the first three components of the version number changes, e.g. 1.0.0.1 to 1.1.0.0. It can include bugfixes, new features and new configuration. The new configuration may use a different schema, for example some profile spec IDs and profile table schemas may change. As such the configuration can not simply be imported from the downlevel cluster and needs to be transformed by
orca
so it is suitable for the uplevel cluster’s schema.
Both minor and major upgrades can include a new Rhino version. Major upgrades may also include a new version of Java. Where present, these are deployed automatically by orca
at the appropriate point in the upgrade process.
-
orca
is a command-line tool which can perform a variety of maintenance operations on Rhino clusters. It implements the Cluster Migration Workflow. -
An upgrade bundle is a
zip
file containing the uplevel software,orca
, any new Rhino and/or Java version, plus ancillary resources (such as feature script configuration) required during the upgrade process. Upgrade bundles are provided to customers by Metaswitch Customer Care. -
A customization package is a package included in an upgrade bundle which applies customer-specific modifications to the product. They come in two types: post-install (applied after product installation) or post-configure (applied after the configuration has been transformed and imported). Your Professional Services representative can assist in creation of customization packages and their inclusion in upgrade bundles.
Upgrade process overview
Upgrading an Sh Cache Microservice product involves the following steps:
-
Read the product changelog to understand the changes introduced in the uplevel software, any new configuration required, and any workarounds/limitations you may encounter
-
Obtain the upgrade package from your Metaswitch Customer Care Representative, including any customization packages and licenses if required
-
Prepare for the upgrade
-
plan a maintenance window for when you expect traffic through the system to be low. (Although the rolling nature of the upgrade means that the system is always able to accept new traffic throughout the upgrade, traffic that is still present on a specific node at the end of its draining timeout will be curtailed, leading for example to terminated phone calls).
-
identify the first node to be upgraded
-
-
Upgrade the first node
-
Perform validation tests, if required
-
Upgrade the remainder of the nodes
-
Perform validation tests, if required
-
Post-upgrade steps including restoring other network element configuration
The uplevel software is provided in the form of a bundle (zip
file). The procedure of upgrading the software is performed using the orca
tool, which is included in the bundle. Run orca
from a machine (either an operator’s PC or a node in the deployment to be upgraded) that meets the following requirements:
-
Linux OS, with Python 2.7 installed and accessible in the
PATH
aspython2
(you can check by runningpython2 --version
) -
At least {product-disk-space} of free hard disk space (you can check with
df -h
) -
Passwordless SSH connectivity (via SSH keys) to all Rhino nodes in the cluster to be upgraded
-
Configure the SSH agent to log in as the user under which Rhino is installed on the nodes, normally
sentinel
. This can be configured in~/.ssh/config
in the home directory of the user that will be runningorca
, for example:
-
Host host1
User sentinel
IdentityFile /path/to/ssh_private_key
...
Upgrading will take 1-2 hours for the first node and {product-time-other-nodes} for each subsequent node, not including validation tests or manual reconfiguration steps. It is a current limitation that the upgrade proceeds sequentially across all cluster nodes.
Next steps
To plan your upgrade, see Preparing for an Sh Cache Microservice Upgrade.
Preparing for an Sh Cache Microservice upgrade
This section describes how to prepare for an Sh Cache Microservice upgrade. Be sure you are familiar with the upgrade overview and terminology.
-
Information and Files Required gives useful information for planning the upgrade and describes how to validate the upgrade bundle.
-
Limitations describes limitations, known issues and other caveats to be aware of while performing the upgrade process.
After familiarising yourself with the above information, refer to the Executing an Sh Cache Microservice Upgrade section to begin the upgrade process.
Information and Files Required
This section documents information concerning the upgrade. Be sure you are also familiar with the new features and new configuration introduced in the uplevel Sh Cache Microservice software.
Information required
Maintenance window
In general an upgrade of an Sh Cache Microservice node can be completed in a single maintenance window, though for operational reasons you may wish to use more than one. The maintenance window should be of adequate length (at least 1-2 hours the each node and validation tests according to your test plan).
Files required
Upgrade bundle
Your Metaswitch Customer Care Representative should have provided you with an upgrade bundle, normally named ShCM-<uplevel-version>-upgrade-bundle.zip
or similar. unzip
the bundle to a directory of your choice and cd
to it.
orca working directory
orca must always be run from the directory where it resides. It will fail to operate correctly if run from any other location. |
Verify the orca
script is present and executable by running ./orca --help
. You should see usage information.
Verify that orca
can contact the hosts you wish to upgrade by running ./orca --hosts host1 status
. You should see status information, such as the current live cluster and lists of services, about all the hosts specified.
SSH access to hosts
orca requires passwordless SSH access to the hosts. If this is not the case, orca will throw an error saying it is unable to contact one or more of the hosts. Ensure SSH key-based access is set up to all such hosts and retry the status command until it works. |
Limitations
This page describes limitations, known issues and workarounds to be aware of that relate to the Sh Cache Microservice upgrade process.
General limitations
-
While
orca
is waiting to shut down a node, the node will continue to accept new requests that is directed at it. As such, or if pre-existing requests continues, if the node never becomes completely free of requests whilst draining before the givenshutdown-timeout
is reached, then the node will be killed resulting in any active requests being terminated. -
Tracer levels (and for Rhino 2.6.0 or later, notification sources and loggers) are not maintained across an upgrade. If you previously set some custom levels which you wish to retain, then you will need to manually reconfigured them after the upgrade is complete. This can be done either using
rhino-console
commands, or via the Monitoring menu in REM. -
orca
only supports ASCII characters in pathnames (notably the Rhino installation path on each node).
Limitations when upgrade includes an installation of a new Java version
-
The new Java installation will only be used for the host being upgraded. If there are other applications on the machine that also use Java, these will need to be manually reconfigured. See here for more information.
Executing an Sh Cache Microservice Upgrade
This section describes the steps to perform the Sh Cache Microservice upgrade. Be sure you are familiar with the upgrade overview and terminology.
-
Pre-Upgrade Checklist details the steps to take prior to starting the upgrade.
-
Upgrade Process details the steps to carry out the upgrade.
-
Post-Upgrade Checklist details the steps to finish and verify the upgrade after the uplevel cluster is fully ready.
-
Aborting or Reverting an Upgrade details steps to take if you wish to go back to the downlevel cluster, for example after discovering a breaking issue with the upgrade.
-
Troubleshooting details the known problems and how to solve them.
Pre-Upgrade Checklist
This page describes steps to take before starting an Sh Cache Microservice upgrade.
Note that there is no specific need to take an export prior to starting the upgrade. Taking an export is the first step that orca
will perform when issued the upgrade command, and the export will be available at the end of the upgrade process.
Create validation test plans
Devise a validation test plan to ensure that, after the upgrade:
-
Sh Cache Microservice API works
-
any features in the downlevel version that you rely on still work as expected
-
any new features you are expecting to use in the uplevel version work as expected.
During or after the upgrade, if you find the validation tests are failing, contact your Metaswitch Customer Care Representative to discuss next steps.
Ensure the nodes are configured with the standardized path structure
orca
requires the nodes to have a standardized path structure. Within the HOME
directory (normally /home/sentinel
) there must be:
-
one or more Rhino installation directories with a name of the following format:
ShCM-<version>-cluster-<id>
, e.g.ShCM-1.0.0.1-cluster-41
-
a symbolic link named
rhino
pointing at the currently live Rhino installation -
a directory named
export
-
a directory named
install
.
See Standardized Path Structure for more information.
If your installation does not conform to this setup, consult your Metaswitch Customer Care Representative who can advise on how to convert it.
Record the old cluster ID
In case you need to rollback the upgrade, you will need the current cluster ID.
Execute the status
command:
orca --hosts host1 status
The first part of the output for each host will include the cluster status. One of the clusters should be marked as "LIVE".
Status of host host1
Clusters:
- ShCM-1.0.0.0-cluster-41
- ShCM-1.0.0.1-cluster-42 - LIVE
On the "LIVE" cluster, the last part of the name given is the cluster ID. For example, in the above output the cluster ID is 42
.
Ensure all clusters have the same live cluster ID. Make a note of this value.
Verify license validity
In the status output from the previous section, check the License information
section for each host to ensure the license is valid and will remain valid until after the planned finish time of the upgrade.
If your upgrade includes a new major Rhino version, e.g 2.6.0 → 2.7.0 then you may require a new license, as all products deployed on Rhino are licensed against a Rhino version. Your Metaswitch Customer Care Representative should be able to provide you with one, in which case you should copy it to the directory where you extracted the upgrade bundle, and pass it to orca
by appending the parameter --license <license file>
when running the first upgrade command. Alternatively, they may advise that the license is already present in the upgrade bundle, or that no new license is needed (in which case orca
will automatically transfer the existing license to the uplevel cluster).
Verify disk space
Log into each node in turn and verify the disk space using df -h
. Ensure there is at least {product-disk-space} of free disk space. Refer to this section if you need to clean up old clusters and/or exports in order to free disk space.
Check for unexpected alarms
Log into REM and access the connection to the Rhino cluster. Check there are no unexpected alarms. Fix any issues you find. If required, make a note of expected active alarms for correlation with the post-upgrade list of alarms.
Enable symmetric activation state mode if required
orca
only supports upgrading a cluster with symmetric activation state mode enabled. If your cluster normally operates with it disabled, you will need to enable it here and restore the state after the upgrade.
To check if the cluster has symmetric activation state mode enabled, examine the orca
status output. At the top under "Global info" it will say Symmetric activation state mode is currently enabled
or Nodes with per-node activation state: <node list>
. If you see the former output then symmetric activation state mode is enabled, while the latter means it is disabled.
To enable symmetric activation state mode, follow the instructions in the Rhino documentation here.
Rerun the orca
status command orca --hosts host1 status
and verify that the status reports Symmetric activation state mode is currently enabled
.
Post-Upgrade Checklist
This page describes the actions to take after the upgrade is completed and all relevant validation tests have passed.
Check service state and alarms
Refer to the verification section in RVT VM Install guide to check the state of the Sh Cache Microservice.
Verify SAS configuration
If your deployment is now running Rhino 2.6.0 or later and includes a MetaView Service Assurance Server (SAS), verify the SAS configuration in Rhino is as expected. See RVT VM Install guide
in the Rhino documentation.
Check that Sh Cache Microservice requests made in your validation tests appear in SAS.
Archive the downlevel export generated during the upgrade, and generate a new export
On the first node, orca
will have generated an export of the Rhino installation at the downlevel version, prior to any migration or upgrade steps. This can be found in the ~/export
directory, labelled with the version and cluster ID, and is a useful "restore point" in case problems with the upgrade are encountered later that cannot be simply undone with the rollback
command. Copy (for example using rsync
) the downlevel export directory with all its contents to your backup storage, if you have one.
Follow the Rhino documentation to generate a post-upgrade export and archive this too.
Uplevel exports generated during major upgrade
Note that during major upgrade These exports can be ignored, and deleted at a later date. |
Archive the upgrade logs
In the directory from which orca
was run, there will be a directory logs
containing many subdirectories with log files. Copy (for example using rsync
) this logs
directory with all its contents to your backup storage, if you have one. These logs can be useful for Metaswitch Customer Care in the event of a problem with the upgrade.
Save the install.properties
file
It is a good idea to save the install.properties
file for use in a future upgrade.
Remember that the file is currently on the machine that you chose to run orca
from, and that may not be the same machine chosen by someone doing a future upgrade.
A good location to put the file is in a /home/sentinel/install
directory.
If required, clean up downlevel clusters and unneeded exports
Once the upgrade is confirmed to be working, you may wish to clean up old downlevel cluster to save disk space.
Run the status
command to view existing clusters and exports
Run the status
command and observe the clusters
and exports
sections of the output.
./orca --hosts host1 status
Identify any clusters or exports you no longer wish to keep. Note their cluster IDs, which is the last part of the name. For example, given this output:
Status of host host1
Clusters:
- ShCM-1.0.0.1-cluster-41
- ShCM-1.0.0.1-cluster-42
- ShCM-1.0.0.3-cluster-43 - LIVE
[...]
Exports:
- ShCM-1.0.0.1-cluster-41
- ShCM-1.0.0.2-cluster-42
- ShCM-1.0.0.2-cluster-42-transformed-for-1.0.0.3
- ShCM-1.0.0.3-cluster-43
[...]
Status of host host2
Clusters:
- ShCM-1.0.0.1-cluster-41
- ShCM-1.0.0.2-cluster-42
- ShCM-1.0.0.3-cluster-43 - LIVE
[...]
you may decide to delete cluster 41
and exports 41
and 42
.
Retain one old cluster
You are advised to always leave the most recent downlevel cluster in place as a fallback. Be sure you have an external backup of any export directories you plan to delete, unless you are absolutely sure that you will not need them in the future. |
Update Java version of other applications running on the host
If performing a major upgrade with a new version of Java, orca
will install the new Java but it will only be applied to the new Rhino installation. Global environment variables such as JAVA_HOME
, or other applications that use Java, will not be updated.
The new Java installation can be found at ~/java/<version>
where <version>
is the JDK version that orca installed, e.g. 8u162
. If you want other applications running on the node to use this new Java installation, update the appropriate environment variables and/or configuration files to reference this directory.
Upgrade Process
Before you begin the upgrade, ensure you have completed the pre-upgrade checklist.
Upgrade process
Note that the following process is for a minor upgrade. Minor and major upgrades are very similar; the differences for major upgrade are listed inline.
Upgrade a ShCM Node
Start the upgrade using the following command. For major upgrade, replace minor-upgrade
with major-upgrade
.
./orca --hosts host1 minor-upgrade --stop-timeout <timeout> --pause {extra-parameter-minor-upgrade} packages <install.properties>
where
-
<timeout> is the maximum amount of time you wish to allow for all calls to stop on each node
-
<install.properties> is the path to the install.properties file
-
packages is a literal name, and should not be changed.
If your upgrade includes a new Rhino version and you have been given a separate license file, specify this here by appending the parameter --license <path to license file>
.
This will take approximately 1-2 hours. At the end of the upgrade process, you will be prompted to continue the operation when ready:
If the upgrade failed, refer to orca troubleshooting to resolve the problem. You will normally need to rollback to the original cluster in order to try again.
Merge and import feature scripts
If the output from orca
prompted you to merge feature scripts, follow the instructions in Feature Scripts conflicts and resolution to merge and import the feature scripts into the uplevel installation. Note that when running orca’s `import-feature-scripts
command, you need only specify the first host.
Next steps
Follow the post-upgrade checklist.
Aborting or Reverting an Upgrade
This page describes the steps required to abort an upgrade partway through, or revert back to the downlevel system after the upgrade is complete.
Rolling back the upgrade will take approximately {product-time-other-nodes} per affected node. Which nodes are affected depends on which nodes were partially or fully upgraded.
Be sure you know the original cluster ID as noted in the pre-upgrade checklist.
Note that in the normal case there should be no need to restore the Rhino cluster from an export. The cluster directory being rolled back to will contain all relevant configuration. However, any configuration changes made since the upgrade started will be lost.
Restore network element configuration
If, in preparation to the upgrade, you made changes to other elements in your network, you should undo these changes before starting the rollback procedure.
Check the current cluster status
Use orca
's status
command to check the current cluster status.
./orca --hosts <host1,host2,host3,…> status
For each host there may be multiple clusters listed, but only one of those will be marked as LIVE
, so note its ID, which is the last part of the cluster name. If the various hosts have nodes in different 'LIVE' clusters, then any nodes in a 'LIVE' cluster that is not the original live cluster will need to have their hosts rolled back. (In most circumstances, the original live cluster will have a lower ID number, so rollback will normally be needed for any host that does not have that lowest number cluster ID marked as being 'LIVE').
If required, run the rollback
command
If you identified any hosts that need to be rolled back, run the rollback
command. Hosts are specified as a comma-separated list without whitespace, e.g. --hosts host1,host2
. Specify the hosts in reverse order of node ID (that is, highest node ID first).
./orca --hosts <hosts to be rolled back> rollback --stop-timeout <timeout>
where <timeout> is the time you wish to allow for traffic to drain on each node.
If you are rolling back every host in the cluster (or more specifically, the cluster ID that you are rolling back to currently contains no members), then append the parameter --special-first-node
to the above command line. This instructs orca
that the first node specified will need to be made primary so that the cluster starts correctly.
This procedure will take approximately {product-time-other-nodes} per host specified.
If required, run the cleanup
command
Follow the instructions here.
Perform validation tests
Ensure you can log into REM and connect to the cluster.
-
Ensure all services are started.
-
Ensure RA entity activation state is as expected. Start or stop any RA entities as required.
-
Check for and resolve unexpected alarms.
Perform test calls and other validation tests to ensure the rolled-back cluster is operating normally.
The rollback procedure is now complete.
Troubleshooting
Besides the information on the console, orca
provides detailed output of the actions taken in the log file. The log file by default is located on the host that executed the command under the path logs
.
orca
can’t connect to the remote hosts
Check if the trusted connection via ssh is working. The command ssh <the host to connect to>
should work without asking for a password.
You can add a trusted connection by executing the steps below
-
Create SSH key by using default locations and empty passphrase. Just hit enter until you’re done
ssh-keygen -t rsa
-
Copy the SSH key onto all VMs that need to be accessed including the node you are on. You will have to enter the password for the user
ssh-copy-id -i $HOME/.ssh/id_rsa.pub sentinel@<VM_ADDRESS>
where VM_ADDRESS is the host name you want the key to be copied to.
To check run:
ssh VM_ADDRESS
It should return a shell of the remote host (VM_ADDRESS). If it prompts for a password ensure that group permission is correct.
chmod -R go= ~/.ssh
orca
failed to create the management database
orca
uses the psql
client to connect to and do operations on the PostgreSQL database. Check the first host in the --hosts
list has psql installed.
Get the database hostname from the file /home/sentinel/rhino/node-xxx/config/config_variables
Search for the properties MANAGEMENT_DATABASE_NAME
, MANAGEMENT_DATABASE_HOST
, MANAGEMENT_DATABASE_USER
, MANAGEMENT_DATABASE_PASSWORD
example
MANAGEMENT_DATABASE_NAME=rhino_50 MANAGEMENT_DATABASE_HOST=postgresql MANAGEMENT_DATABASE_PORT=5432 MANAGEMENT_DATABASE_USER=rhino_user MANAGEMENT_DATABASE_PASSWORD=rhino_password
Test the connection:
psql -h postgresql -U rhino_user -p 5432 rhino_50 Enter the password and you expect to see Password for user rhino: psql (9.5.14) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. rhino_50=>
If the connection fails try with the database name from the previous cluster. If it still fails the procedure below is a guide on how to add permission to the database to accept remote connections.
-
Log in to the database host.
-
Open up the pg_hba.conf file, using sudo.
sudo vim /var/lib/pgsql/data/pg_hba.conf
-
Replace the line that looks like this… host rhino-xxx sentinel xxx.xxx.xxx.xxx/xx password with a line that looks like this…
host all sentinel xxx.xxx.xxx.xxx/xx password
Where 'rhino-xxx' is the name of cluster xxx’s postgres database, and xxx.xxx.xxx.xxx/xx covers the signalling addresses of the nodes.
-
Reload the config. /usr/bin/pg_ctl reload
-
Or failing that, try this command. sudo systemctl restart postgresql
Installer runs but does not install
The recommendation is to rollback the node to the previous cluster, cleanup the prepared cluster and start again.
./orca --hosts host1 rollback ./orca --hosts host1 cleanup --cluster <cluster id> ./orca --hosts host1 minor-upgrade <path to the product sdk zip> <path to the install.properties> --pause
Rhino nodes shutdown or restart during migration
orca
requires the node being migrated to allow management commands via rhino-console. If the node is not active orca
will fail.
If while executing orca --hosts <list of hosts> minor-upgrade packages install.properties --continue
the node being currently migrated shuts down or restart, there are 3 options:
1- Skip the node and proceed to the other nodes and migrate the failed node later
Except for the first host name, remove the hosts that were already migrated from the orca command, including the host that failed and execute the command orca --hosts <list of hosts> minor-upgrade packages install.properties --continue
again to continue with the upgrade.
2- Restart the node manually and continue with the upgrade
If the node shutdown and hadn’t restart, execute the rhino.sh
start script. If the node restarted, check if rhino-console command works:
<path to the old cluster >/client/bin/rhino-console
Except for the first host name, remove the hosts that were already migrated from the orca command and execute the command orca --hosts <list of hosts> minor-upgrade packages install.properties --continue
again to continue with the upgrade.
3- Do the migration manually
This is an extreme case, but the procedure is simple.
For each node that was not migrated:
Identify the new cluster
cd $HOME ls -ls rhino lrwxrwxrwx 1 sentinel sentinel 24 Nov 21 21:48 rhino -> shcm-1.0.0.0-cluster-50
will show you the current cluster name in the format <product>-<version>-cluster-<id>.
The new cluster will have a similar name <product>-<new version>-cluster-<id + 1>, but the cluster id will be one number higher. For example: shcm-1.0.0.1-cluster-51
Kill the current node
rhino/rhino.sh kill
Link the rhino path to the new cluster
rm rhino ln -s <new cluster> rhino
where <new cluster> is the path identified above. Example: shcm-1.0.0.1-cluster-51
Start the node and check for connection
./rhino/rhino.sh start
Wait some minutes and check if you can connect to the node using rhino-console.