This guide explains how to upgrade Sentinel Authentication Gateway to the current version.
Operational Tools Architecture
This upgrade manual makes references to elements of the Operational Tools Architecture, which explains at a high level the design of the upgrade process as well as low-level information on use of the upgrade bundles and tools. Make an offline copy of this book to consult while executing upgrade on customer sites.
It is recommended that the user familiarize themselves with the Cluster Migration Workflow before planning an upgrade.
Overview and Terminology
Terminology
-
The downlevel (software) version is the version being upgraded from.
-
The uplevel (software) version is the version being upgraded to.
-
The installed (software) version refers to the software currently running on the Rhino cluster, including any customizations, additions and patches that may have been made to deployable units, SLEE components and configuration.
-
orca is the tool, delivered within the upgrade package, which drives the whole upgrade process, connecting to each of the hosts and running commands on them remotely.
-
A Deployable Unit (DU) is a packaged component that provides a particular feature of the software.
-
A (Rhino) cluster is a group of nodes with the same cluster ID. The cluster ID is an integer between 1 and 255 used to identify nodes that are linked together and share traffic. The cluster is backed by a PostgreSQL database containing the DUs and configuration. This architecture has the following implications, which are the fundamentals of the Cluster Migration Workflow design for upgrades:
-
Changing the cluster ID of a node will remove a node from the cluster. This can be done without reconfiguring the other nodes in the cluster.
-
A new node can be added to a cluster simply by installing Rhino on it and ensuring the cluster ID of the new node is the same as the existing nodes. When Rhino is started, it will join the cluster and obtain the product DUs and configuration automatically from the shared PostgreSQL, then deploy them, thus becoming a fully-functional node that can handle traffic.
-
An upgrade can be one of two types: minor or major.
-
A minor upgrade is one where only the final component of the software version number changes, e.g. 2.7.0.2 to 2.7.0.3. It can include bugfixes, new minor functionality and new configuration, but the configuration schemas are always compatible with the downlevel software.
-
A major upgrade is one where any of the first three components of the version number changes, e.g. 2.7.0.12 to 2.8.0.0. It can include bugfixes, new features and new configuration. The new configuration may use a different schema, for example some profile spec IDs and profile table schemas may change. As such the configuration can not simply be imported from the downlevel cluster and needs to be transformed by
orca
so it is suitable for the uplevel cluster’s schema.
Both minor and major upgrades can include a new Rhino version. Major upgrades may also include a new version of Java. Where present, these are deployed automatically by orca
at the appropriate point in the upgrade process.
Limitation of one major version per major upgrade
|
-
orca
is a command-line tool which can perform a variety of maintenance operations on Rhino clusters. It implements the Cluster Migration Workflow. -
An upgrade bundle is a
zip
file containing the uplevel software,orca
, any new Rhino and/or Java version, plus ancillary resources (such as feature script configuration) required during the upgrade process. Upgrade bundles are provided to customers by Metaswitch Customer Care. -
A customization package is a package included in an upgrade bundle which applies customer-specific modifications to the product. They come in two types: post-install (applied after product installation) or post-configure (applied after the configuration has been transformed and imported). Your Professional Services representative can assist in creation of customization packages and their inclusion in upgrade bundles.
Upgrade process overview
Upgrading a Sentinel product involves the following steps:
-
Read the product changelog to understand the changes introduced in the uplevel software, any new configuration required, and any workarounds/limitations you may encounter
-
Obtain the upgrade package from your Metaswitch Customer Care Representative, including any customization packages and licenses if required
-
Prepare for the upgrade
-
plan a maintenance window for when you expect traffic through the system to be low (Although the rolling nature of the upgrade means that the system is always able to accept new traffic throughout the upgrade, traffic that is still present on a specific node at the end of its draining timeout will be curtailed, leading for example to terminated phone calls).
-
identify the first node to be upgraded
-
-
Upgrade the first node
-
Perform validation tests, if required
-
Upgrade the remainder of the nodes
-
Perform validation tests, if required
-
Post-upgrade steps including restoring other network element configuration
The uplevel software is provided in the form of a bundle (zip
file). The procedure of upgrading the software is performed using the orca
tool, which is included in the bundle. Run orca
from a machine (either an operator’s PC or a node in the deployment to be upgraded) that meets the following requirements:
-
Linux OS, with Python 2.7 installed and accessible in the
PATH
aspython2
(you can check by runningpython2 --version
) -
At least 2GB of free hard disk space (you can check with
df -h
) -
Passwordless SSH connectivity (via SSH keys) to all Rhino nodes in the cluster to be upgraded
-
Configure the SSH agent to log in as the user under which Rhino is installed on the nodes, normally
sentinel
. This can be configured in~/.ssh/config
in the home directory of the user that will be running orca, for example:
-
Host host1
User sentinel
IdentityFile /path/to/ssh_private_key
...
Upgrading will take 15 minutes for the first node and 10 minutes for each subsequent node, not including validation tests or manual reconfiguration steps. It is a current limitation that the upgrade proceeds sequentially across all cluster nodes.
Next steps
To plan your upgrade, see Preparing for a Sentinel Authentication Gateway Upgrade.
Preparing for a Sentinel Authentication Gateway upgrade
This section describes how to prepare for a Sentinel Authentication Gateway upgrade. Be sure you are familiar with the upgrade overview and terminology.
-
Information and Files Required gives useful information for planning the upgrade and describes how to validate the upgrade bundle.
-
Limitations describes limitations, known issues and other caveats to be aware of while performing the upgrade process.
After familiarising yourself with the above information, refer to the Executing a Sentinel Authentication Gateway Upgrade section to begin the upgrade process.
Information and Files Required
This section documents information concerning the upgrade. Be sure you are also familiar with the new features and new configuration introduced in the uplevel Sentinel Authentication Gateway software.
Information required
Maintenance window
In general an upgrade of a Rhino cluster running Sentinel Authentication Gateway can be completed in a single maintenance window, though for operational reasons you may wish to use more than one. The maintenance window should be of adequate length (at least 15 minutes for the first node, 10 minutes for subsequent nodes, plus additional time for traffic draining, and validation tests according to your test plan).
Manual reconfiguration steps
For deployments utilizing any or all of:
it is strongly recommended that the upgrade is trialled in a lab deployment first. This will allow you to identify any breaking issues with the upgrade, and to note any customizations to feature scripts and other configuration that needs to be applied after the upgrade. |
Traffic impact
Traffic capacity will be reduced by a fraction of 1/N where N is the number of nodes in the cluster to be upgraded. That is to say, at any given point during the upgrade one node will be unavailable but the rest will still be able to handle traffic. As such the maintenance window should be planned for a period where call traffic volumes are quieter.
Product upgrade order
A Sentinel deployment may consist of multiple products: REM, GAA, VoLTE, and IPSMGW. The clusters must be upgraded in the following order: REM (with plugins), then GAA, then VoLTE, then IPSMGW. (For clarity, all GAA nodes must be upgraded before any VoLTE nodes, and all VoLTE before any IPSMGW.)
For major upgrades, you will need to upgrade all products in close succession, since having upgraded REM to a new version first, you need to upgrade all the other products soon after to ensure that the REM plugins for the products retain maximum compatibility, and are able to provide the best management interface.
Node order
The first node in each Rhino cluster is handled separately to the rest, since the software upgrade only needs to actually be performed on one node and the rest can pick up the upgraded software by joining the new cluster (see Cluster Migration Workflow).
Always specify the hosts in reverse order of their node IDs, i.e. from highest to lowest. This is necessary since certain Rhino features use the node ID as a tie-breaker when they can otherwise not determine which node has preference, so we want to make sure the highest number node is processed first to avoid problems.
orca
takes a list of hosts. The hosts are upgraded in the order specified, and only the nodes specified are upgraded. As such, orca
supports splitting the upgrade across multiple maintenance windows by specifying only a subset of hosts on each window.
Limitation of one cluster
orca only supports working with a list of hosts where all hosts are part of the same cluster. If your deployment has multiple clusters to be upgraded, then these upgrades need to be done separately. This does mean that (if such an approach is suitable for your overall deployment architecture) you can use multiple machines to run |
Files required
Upgrade bundle
Your Metaswitch Customer Care Representative should have provided you with an upgrade bundle, normally named gaa-<uplevel-version>-upgrade-bundle.zip
or similar. unzip
the bundle to a directory of your choice and cd
to it.
orca working directory
orca must always be run from the directory where it resides. It will fail to operate correctly if run from any other location. |
Verify the orca
script is present and executable by running ./orca --help
. You should see usage information.
Verify that orca
can contact the hosts you wish to upgrade by running ./orca --hosts <host1,host2,…> status
. You should see status information, such as the current live cluster and lists of services, about all the hosts specified.
SSH access to hosts
orca requires passwordless SSH access to hosts, i.e. all hosts in the cluster must be accessible from the machine running orca using SSH keys. If this is not the case, orca will throw an error saying it is unable to contact one or more of the hosts. Ensure SSH key-based access is set up to all such hosts and retry the status command until it works. |
install.properties
Upgrading any product requires an install.properties
file. If you are familiar with some of the other Sentinel products, specifically VoLTE or IPSMGW, then you may be aware that for those products an install.properties
file is generated when you install them, and can serve as the basis for the one used to upgrade with.
However, Sentinel Authentication Gateway does not have such an installer, and thus does not generate an install.properties
file in the same way.
Instead you should manually create the install.properties
file containing the following lines:
platform.operator.name=<platform-operator-name>
deployrhino=false
doinstall=true
rhinoclientdir=<rhino-client-dir>
where <platform-operator-name> is the name your installation uses as the operator name, and <rhino-client-dir> is the full path of the directory where the Rhino client is installed in your installation.
In addition, you should include the other settings described in Default configuration for the BSF server.
The file can be placed anywhere, though normal practice would be to put it in the same directory as orca
in the unzipped bundle so as to associate the install.properties
file with this upgrade.
Note that some new releases may introduce new mandatory install.properties
fields. If so, and these are missing from the provided install.properties
, orca
will raise this early on in the upgrade process. Edit the install.properties
file to add the missing values (your Support Representative can tell you the keys and values required).
Limitations
This page describes limitations, known issues and workarounds to be aware of that relate to the Sentinel Authentication Gateway upgrade process.
General limitations
-
Each machine in the Rhino cluster must contain only one Rhino node and the Rhino installation must conform to
orca
's standardized path structure. -
orca
cannot auto-detect the cluster membership, so to upgrade the whole cluster it must always be given the full list of hosts in the cluster. -
All products monitored by a single Rhino Element Manager (REM) instance must be upgraded together, and in a fixed order. See here for more information.
-
While
orca
is waiting to shut down a node, the node will continue to accept new traffic that is directed at it. As such, or if pre-existing traffic continues, if the node never becomes completely free of traffic whilst draining before the givenshutdown-timeout
is reached, then the node will be killed resulting in any active calls being terminated. Thus the upgrade should be done during a maintenance window, when overall traffic is low, and fewest users will be inconvenienced by such terminated calls. -
Tracer levels (and for Rhino 2.6.0 or later, notification sources and loggers) are not maintained across an upgrade. If you previously set some custom levels which you wish to retain, then you will need to manually reconfigured them after the upgrade is complete. This can be done either using
rhino-console
commands, or via the Monitoring menu in REM. -
After upgrade, SNMP OIDs for some objects and their alarms/statistics may be different. You will need to reconfigure monitoring systems with the new OIDs. (This is a temporary limitation, that will be addressed in a forthcoming release).
-
Symmetric activation state mode must be enabled prior to an upgrade. If your cluster normally operates with symmetric activation mode disabled, you will need to enable it before the upgrade, and after the upgrade disable it again and possibly adjust service and RA entity activation states to the desired settings. Refer to the Rhino documentation here for more information.
-
orca
only supports ASCII characters in pathnames (notably the Rhino installation path on each node).
Limitations when upgrade includes an installation of a new Rhino version
-
Rhino configuration in
node-XXX/config/pernode-mlet.conf
andnode-XXX/config/permachine-mlet.conf
is not preserved. Refer to the Rhino Administration guide for information on how to reconfigure this if required.
Limitations when upgrade includes an installation of a new Java version
-
The new Java installation will only be used for the Rhino cluster being upgraded. If there are other applications on the machine that also use Java, these will need to be manually reconfigured. See here for more information.
-
When upgrading from Java 7 to Java 8, deprecated perm-gen settings will not be removed from Rhino configuration. This may result in benign warnings appearing. You can manually remove the deprecated configuration at a later stage.
Executing a Sentinel Authentication Gateway Upgrade
This section describes the steps to perform the Sentinel Authentication Gateway upgrade. Be sure you are familiar with the upgrade overview and terminology.
-
Pre-Upgrade Checklist details the steps to take prior to starting the upgrade.
-
Upgrade Process details the steps to carry out the upgrade.
-
Post-Upgrade Checklist details the steps to finish and verify the upgrade after the uplevel cluster is fully ready.
-
Transformation Rules details what steps the transformation rules take, to migrate the old configuration to the new during the upgrade.
-
Aborting or Reverting an Upgrade details steps to take if you wish to go back to the downlevel cluster, for example after discovering a breaking issue with the upgrade.
-
Feature Scripts conflicts and resolution details how to solve feature scripts conflicts and how to install the feature scripts after inspection.
-
Troubleshooting details the known problems and how to solve them.
Pre-Upgrade Checklist
This page describes steps to take before starting a Sentinel Authentication Gateway upgrade.
Note that there is no specific need to take an export prior to starting the upgrade. Taking an export is the first step that orca
will perform when issued the upgrade command, and the export will be available at the end of the upgrade process.
Create validation test plans
Devise a validation test plan to ensure that, after the upgrade:
-
traffic is routed as expected and calls work (including audio and, where appropriate, video)
-
any features in the downlevel version that you rely on still work as expected
-
any new features you are expecting to use in the uplevel version work as expected.
Your network configuration may support setting up test numbers where calls to these numbers are routed to specific nodes. This is a particularly useful tool for verifying the upgrade early, as after upgrading the first node you can make calls using test numbers configured to route specifically to the first node.
During or after the upgrade, if you find the validation tests are failing, contact your Metaswitch Customer Care Representative to discuss next steps.
Ensure the nodes are configured with the standardized path structure
orca
requires the nodes to have a standardized path structure. Within the HOME
directory (normally /home/sentinel
) there must be:
-
one or more Rhino installation directories with a name of the following format:
gaa-<version>-cluster-<id>
, e.g.gaa-2.7.1.2-cluster-41
-
a symbolic link named
rhino
pointing at the currently live Rhino installation -
a directory named
export
-
a directory named
install
.
See Standardized Path Structure for more information.
If your installation does not conform to this setup, consult your Metaswitch Customer Care Representative who can advise on how to convert it.
Record the old cluster ID
In case you need to rollback the upgrade, you will need the current cluster ID.
Execute the status
command on all the hosts:
orca --hosts <host1,host2,…> status
The first part of the output for each host will include the cluster status. One of the clusters should be marked as "LIVE".
Status of host host1
Clusters:
- gaa-2.7.1.2-cluster-41
- gaa-2.7.1.4-cluster-42 - LIVE
On the "LIVE" cluster, the last part of the name given is the cluster ID. For example, in the above output the cluster ID is 42
.
Ensure all clusters have the same live cluster ID. Make a note of this value.
Verify license validity
In the status output from the previous section, check the License information
section for each host to ensure the license is valid and will remain valid until after the planned finish time of the upgrade.
If your upgrade includes a new major Rhino version, e.g 2.6.0 → 2.7.0 then you may require a new license, as all products deployed on Rhino are licensed against a Rhino version. Your Metaswitch Customer Care Representative should be able to provide you with one, in which case you should copy it to the directory where you extracted the upgrade bundle, and pass it to orca
by appending the parameter --license <license file>
when running the first upgrade command. Alternatively, they may advise that the license is already present in the upgrade bundle, or that no new license is needed (in which case orca
will automatically transfer the existing license to the uplevel cluster).
Verify disk space
Log into each node in turn and verify the disk space using df -h
. Ensure there is at least 2GB of free disk space. Refer to this section if you need to clean up old clusters and/or exports in order to free disk space.
Check for unexpected alarms
Log into REM and access the connection to the Rhino cluster. Check there are no unexpected alarms. Fix any issues you find. If required, make a note of expected active alarms for correlation with the post-upgrade list of alarms.
Upgrade Rhino Element Manager (REM) nodes
With REM upgraded, you will be able to monitor the nodes as they join the new cluster during the upgrade process.
Check that other network elements are configured to redirect traffic on failure
During the upgrade, at any given point one node of the cluster will be unavailable to handle traffic. As such, calls directed to this node will fail. Network elements such as loadbalancers and S-CSCFs generally utilize a blacklist mechanism, so that INVITEs that get a timeout failure cause both a retry for the call on a different node, and also block that node from being tried for further calls for a while afterwards.
Enable symmetric activation state mode if required
orca
only supports upgrading a cluster with symmetric activation state mode enabled. If your cluster normally operates with it disabled, you will need to enable it here and restore the state after the upgrade.
To check if the cluster has symmetric activation state mode enabled, examine the orca
status output. At the top under "Global info" it will say Symmetric activation state mode is currently enabled
or Nodes with per-node activation state: <node list>
. If you see the former output then symmetric activation state mode is enabled, while the latter means it is disabled.
To enable symmetric activation state mode, follow the instructions in the Rhino documentation here.
Rerun the orca
status command orca --hosts <host1,host2,…> status
and verify that the status reports Symmetric activation state mode is currently enabled
.
Transformation Rules
This page describes the migration performed via the transformation rules during a major upgrade of Sentinel Authentication Gateway.
Transformation rules are very version specific, so make sure that you are reading the docs for the specific version you are ungrading to.
The rules performed the following:
Transformation rules for upgrading GAA 2.6.0 to 2.7.0
Detailed descriptions are yet to be written.
Post-Upgrade Checklist
This page describes the actions to take after the upgrade is completed and all relevant validation tests have passed.
Clean up REM connections
You should be able to access the entire Rhino cluster using the original connection configured in REM. If you created an additional connection to monitor the new cluster during the upgrade, delete it now.
Note that when refreshing the REM web application, be sure to do a "hard refresh" (Ctrl+F5
in most browsers) so that the browser retrieves up-to-date information from the REM server rather than reloading from its cache.
Check service state and alarms
Symmetric activation state mode
Symmetric activation state mode must be enabled prior to the upgrade. If your deployment normally operates with it disabled, you will need to manually disable it at this point. See Symmetric Activation State in the Rhino documentation. You may also need to update RA entity activation states. |
Log into REM and access the connection to the Rhino cluster. Check that all services are active on all nodes. Also check that the RA entity activation state is as expected on all nodes. Check for unexpected alarms.
Verify SAS configuration
If your deployment is now running Rhino 2.6.0 or later and includes a MetaView Service Assurance Server (SAS), verify the SAS configuration in Rhino is as expected. See SAS Tracing in the Rhino documentation.
Check that calls made in your validation tests appear in SAS.
Restore external network element configuration
If you made any changes to external network elements (for example to timeout INVITEs more quickly or to redirect traffic), then restore the original configuration on those elements.
Archive the downlevel export generated during the upgrade, and generate a new export
On the first node, orca
will have generated an export of the Rhino installation at the downlevel version, prior to any migration or upgrade steps. This can be found in the ~/export
directory, labelled with the version and cluster ID, and is a useful "restore point" in case problems with the upgrade are encountered later that cannot be simply undone with the rollback
command. Copy (for example using rsync
) the downlevel export directory with all its contents to your backup storage, if you have one.
Follow the Rhino documentation to generate a post-upgrade export and archive this too.
Uplevel exports generated during major upgrade
Note that during major upgrade These exports can be ignored, and deleted at a later date. |
Archive the upgrade logs
In the directory from which orca
was run, there will be a directory logs
containing many subdirectories with log files. Copy (for example using rsync
) this logs
directory with all its contents to your backup storage, if you have one. These logs can be useful for Metaswitch Customer Care in the event of a problem with the upgrade.
Save the install.properties
file
It is a good idea to save the install.properties
file for use in a future upgrade.
Remember that the file is currently on the machine that you chose to run orca
from, and that may not be the same machine chosen by someone doing a future upgrade.
A good location to put the file is in a /home/sentinel/install
directory on whatever you regard as the main Rhino host. This may be the one you always provide first in the list of hosts for example, or the one originally used to install Rhino in the first place.
Configure new features
If the uplevel software introduced new features that you plan to use in your deployment, you may wish to configure and verify them here. Refer to the Sentinel Authentication Gateway manuals and changelogs for more information.
If required, clean up downlevel clusters and unneeded exports
Once the upgrade is confirmed to be working, you may wish to clean up old downlevel cluster(s) to save disk space.
Run the status
command to view existing clusters and exports
Run the status
command and observe the clusters
and exports
sections of the output.
./orca --hosts <host1,host2,host3,…> status
For upgrades, exports are only generated on the first node. |
Identify any clusters or exports you no longer wish to keep. Note their cluster IDs, which is the last part of the name. For example, given this output:
Status of host host1
Clusters:
- gaa-2.7.1.2-cluster-41
- gaa-2.7.1.4-cluster-42
- gaa-2.8.0.2-cluster-43 - LIVE
[...]
Exports:
- gaa-2.7.1.2-cluster-41
- gaa-2.7.1.4-cluster-42
- gaa-2.7.1.4-cluster-42-transformed-for-2.8.0.2
- gaa-2.8.0.2-cluster-43
[...]
Status of host host2
Clusters:
- gaa-2.7.1.2-cluster-41
- gaa-2.7.1.4-cluster-42
- gaa-2.8.0.2-cluster-43 - LIVE
[...]
you may decide to delete cluster 41
and exports 41
and 42
.
Retain one old cluster
You are advised to always leave the most recent downlevel cluster in place as a fallback. Be sure you have an external backup of any export directories you plan to delete, unless you are absolutely sure that you will not need them in the future. |
Run the cleanup
command to delete clusters and exports
Run the cleanup
command, specifying the clusters and exports to delete as comma-separated lists of IDs (without whitespace). For example:
./orca --hosts <host1,host2,host3,...> cleanup --clusters 41
./orca --hosts <host1> cleanup --exports 41,42
Update Java version of other applications running on the hosts
If performing a major upgrade with a new version of Java, orca
will install the new Java but it will only be applied to the new Rhino installation. Global environment variables such as JAVA_HOME
, or other applications that use Java, will not be updated.
The new Java installation can be found at ~/java/<version>
where <version>
is the JDK version that orca installed, e.g. 8u162
. If you want other applications running on the node to use this new Java installation, update the appropriate environment variables and/or configuration files to reference this directory.
Upgrade Process
Before you begin the upgrade, ensure you have completed the pre-upgrade checklist.
Upgrade process
Specify all hosts
Unless otherwise specified, in all Hosts are specified as a comma-separated list (without whitespace), e.g. |
Note that the following process is for a major upgrade. Minor and major upgrades are very similar; the differences for minor upgrade are listed inline.
Upgrade the first node
Start the upgrade using the following command. For minor upgrade, replace major-upgrade
with minor-upgrade
.
./orca --hosts <host1,host2,…> major-upgrade --stop-timeout <timeout> --pause {extra-parameter-major-upgrade} packages <install.properties>
where
-
<timeout> is the maximum amount of time you wish to allow for all calls to stop on each node
-
<install.properties> is the path to the install.properties file
-
packages is a literal name, and should not be changed.
If your upgrade includes a new Rhino version and you have been given a separate license file, specify this here by appending the parameter --license <path to license file>
.
This will take approximately 15 minutes. At the end of the upgrade process, you will be prompted to continue the operation when ready:
Major upgrade has been paused after applying it to just host1.
You should now test that the major upgrade has worked.
Once this is verified, use the following command to complete the major upgrade:
./orca --hosts host1,host2,host3 major-upgrade --continue packages install.properties
If the upgrade failed, refer to orca troubleshooting to resolve the problem. You will normally need to rollback to the original cluster in order to try again.
Verify the new node is visible in REM
Log into the REM web application. Create a new connection to the first host and connect to it. You should be able to see information about the first node. Ensure there are no unexpected alarms.
Edit the connection to include the other hosts, by updating the address field from a single host address to a list of all the required hosts.
Perform first node validation tests
If your test plan includes validating the first node after upgrade, run these tests now.
Upgrade the rest of the nodes
Run the same command as in the "Upgrade the first node" section, but replace the --pause
with --continue
. For minor upgrade, replace major-upgrade
with minor-upgrade
.
./orca --hosts <host1,host2,…> major-upgrade --stop-timeout <timeout> --continue packages <install.properties>
If the output from orca
reports Given hosts are not in the correct state to continue
, then this probably indicates that you have not issued the correct command to continue - in particular you must ensure that the first host listed in the --hosts
list is the one representing the first node that was upgraded, and that all the other hosts were also present in the original list, since those are the only ones that have been correctly prepared for update. There are 2 simple ways to get the correct command
-
either copy the command from the
orca
output, which tells you the exact command to use -
or find the original command in your terminal command history, and simply replace the
--pause
with--continue
.
During this stage, each remaining host will be migrated to the new cluster one-by-one. If you remain logged into REM and viewing the connection you created earlier, as each node is migrated into the new cluster they will become visible in REM. Note that some unexpected alarms may temporarily appear on these new nodes until they have fully deployed the new software and configuration, due to having a mixture of old and new in place. Once the new has fully deployed, alarms caused in this way will disappear, and their temporary appearance is nothing to worry about.
Migrating in two or more maintenance windows
For the purposes of example, suppose there are five hosts named In the first maintenance window, you prepare all the hosts, and upgrade the first one using the
Once you have tested the first node is performing as expected, you issue the
In the second maintenance window, the command you use must keep the first host the same (since that is the host holding the node whose upgrade is being duplicated to the other nodes), but you change the rest of the list to be the remainder of the hosts that need to be upgraded
|
Next steps
Follow the post-upgrade checklist.
Aborting or Reverting an Upgrade
This page describes the steps required to abort an upgrade partway through, or revert back to the downlevel system after the upgrade is complete.
Rolling back the upgrade will take approximately 10 minutes per affected node. Which nodes are affected depends on which nodes were partially or fully upgraded.
Be sure you know the original cluster ID as noted in the pre-upgrade checklist.
Note that in the normal case there should be no need to restore the Rhino cluster from an export. The cluster directory being rolled back to will contain all relevant configuration. However, any configuration changes made since the upgrade started will be lost.
Restore network element configuration
If, in preparation to the upgrade, you made changes to other elements in your network, you should undo these changes before starting the rollback procedure.
Check the current cluster status
Use orca
's status
command to check the current cluster status.
./orca --hosts <host1,host2,host3,…> status
For each host there may be multiple clusters listed, but only one of those will be marked as LIVE
, so note its ID, which is the last part of the cluster name. If the various hosts have nodes in different 'LIVE' clusters, then any nodes in a 'LIVE' cluster that is not the original live cluster will need to have their hosts rolled back. (In most circumstances, the original live cluster will have a lower ID number, so rollback will normally be needed for any host that does not have that lowest number cluster ID marked as being 'LIVE').
If required, run the rollback
command
If you identified any hosts that need to be rolled back, run the rollback
command. Hosts are specified as a comma-separated list without whitespace, e.g. --hosts host1,host2
. Specify the hosts in reverse order of node ID (that is, highest node ID first).
./orca --hosts <hosts to be rolled back> rollback --stop-timeout <timeout>
where <timeout> is the time you wish to allow for traffic to drain on each node.
If you are rolling back every host in the cluster (or more specifically, the cluster ID that you are rolling back to currently contains no members), then append the parameter --special-first-node
to the above command line. This instructs orca
that the first node specified will need to be made primary so that the cluster starts correctly.
This procedure will take approximately 10 minutes per host specified.
If required, run the cleanup
command
Follow the instructions here.
Perform validation tests
Ensure you can log into REM and connect to the cluster.
-
Ensure all services are started.
-
Ensure RA entity activation state is as expected. Start or stop any RA entities as required.
-
Check for and resolve unexpected alarms.
Perform test calls and other validation tests to ensure the rolled-back cluster is operating normally.
The rollback procedure is now complete.
Feature Scripts conflicts and resolution
Resolving the Feature Scripts conflicts
After orca
finishes the major upgrade in the first node, there might be Feature Scripts conflicts which need to be solved and applied to the system for correct operation.
The files in the feature-scripts
path will contain the scripts with:
-
the proposed merged version
-
the installed version
-
the new version (uplevel version)
-
the original downlevel version
The cases that require full manual intervention are ones where the file presents this line:
<merge conflict: some message>
, e.g, <merge conflict: all three versions differ>
In order to show how to solve conflict in the Feature Scripts consider the examples below.
Example of default_Post_SipAccess_SubscriberCheck
:
<merge conflict: all three versions differ> ### Write your script above. This line, and anything below it, will be removed === Currently installed version of script featurescript SipAccessSubscriberCheck-SysPost-Default { if not session.MonitorCallOnly { run B2BUAScurPostFeature } run SDPRewriter run SDPMonitor mode "post" run RecordTimestamps mode "inbound" run ExtractNetworkInfo run RemoveHeadersFromOutgoingMessages } >>> New version of script featurescript SipAccessSubscriberCheck-SysPost-Default { if not session.MonitorCallOnly { run B2BUAScurPostFeature } run SDPRewriter run SDPMonitor mode "post" run RecordTimestamps mode "outbound" run ExtractNetworkInfo run ExternalSessionTracking run RemoveHeadersFromOutgoingMessages } <<< Original version of script featurescript SipAccessSubscriberCheck-SysPost-Default { if not session.MonitorCallOnly { run B2BUAScurPostFeature } run SDPRewriter run SDPMonitor mode "post" run ExtractNetworkInfo run RemoveHeadersFromOutgoingMessages }
This case shows all 3 versions are different, specifically run RecordTimestamps mode "inbound"
changed to run RecordTimestamps mode "outbound"
and run ExternalSessionTracking
was added.
One correct solution would be to keep the new version of the script. The file after editing would have:
featurescript SipAccessSubscriberCheck-SysPost-Default { if not session.MonitorCallOnly { run B2BUAScurPostFeature } run SDPRewriter run SDPMonitor mode "post" run RecordTimestamps mode "outbound" run ExtractNetworkInfo run ExternalSessionTracking run RemoveHeadersFromOutgoingMessages } ### Write your script above. This line, and anything below it, will be removed === Currently installed version of script featurescript SipAccessSubscriberCheck-SysPost-Default { if not session.MonitorCallOnly { run B2BUAScurPostFeature } run SDPRewriter run SDPMonitor mode "post" run RecordTimestamps mode "inbound" run ExtractNetworkInfo run RemoveHeadersFromOutgoingMessages } >>> New version of script featurescript SipAccessSubscriberCheck-SysPost-Default { if not session.MonitorCallOnly { run B2BUAScurPostFeature } run SDPRewriter run SDPMonitor mode "post" run RecordTimestamps mode "outbound" run ExtractNetworkInfo run ExternalSessionTracking run RemoveHeadersFromOutgoingMessages } <<< Original version of script featurescript SipAccessSubscriberCheck-SysPost-Default { if not session.MonitorCallOnly { run B2BUAScurPostFeature } run SDPRewriter run SDPMonitor mode "post" run ExtractNetworkInfo run RemoveHeadersFromOutgoingMessages }
Example file MMTelTerm_SipAccess_PartyRequest
:
featurescript SipAccessPartyRequest-User-MmtelTerm { if session.ICBBarredWithAnnouncement or session.PlayCDIVAnnouncement or session.PlayCWAnnouncement or session.EndSessionWithAnnouncement { run SipPlayAnnouncement } if not session.FlexibleAlertingMode.NONE { if session.FlexibleAlertingMode.PARALLEL { run MMTelParallelFA } else { run MMTelSequentialFA } } run MMTelOIP run MMTelECT run MMTelStodProcessHandover run DetermineChargeableLeg if session.AccessLegTrackingActive { run AccessLegTracking } } ### Write your script above. This line, and anything below it, will be removed === Currently installed version of script featurescript SipAccessPartyRequest-User-MmtelTerm { if session.ICBBarredWithAnnouncement or session.PlayCDIVAnnouncement or session.PlayCWAnnouncement or session.EndSessionWithAnnouncement { run SipPlayAnnouncement } if not session.FlexibleAlertingMode.NONE { if session.FlexibleAlertingMode.PARALLEL { run MMTelParallelFA } else { run MMTelSequentialFA } } run MMTelOIP run MMTelECT run MMTelStodProcessHandover run DetermineChargeableLeg } >>> New version of script featurescript SipAccessPartyRequest-User-MmtelTerm { if session.ICBBarredWithAnnouncement or session.PlayCDIVAnnouncement or session.PlayCWAnnouncement or session.EndSessionWithAnnouncement { run SipPlayAnnouncement } if not session.FlexibleAlertingMode.NONE { if session.FlexibleAlertingMode.PARALLEL { run MMTelParallelFA } else { run MMTelSequentialFA } } run MMTelOIP run MMTelECT run MMTelStodProcessHandover run DetermineChargeableLeg if session.AccessLegTrackingActive { run AccessLegTracking } } <<< Original version of script featurescript SipAccessPartyRequest-User-MmtelTerm { if session.ICBBarredWithAnnouncement or session.PlayCDIVAnnouncement or session.PlayCWAnnouncement or session.EndSessionWithAnnouncement { run SipPlayAnnouncement } if not session.FlexibleAlertingMode.NONE { if session.FlexibleAlertingMode.PARALLEL { run MMTelParallelFA } else { run MMTelSequentialFA } } run MMTelOIP run MMTelECT run MMTelStodProcessHandover run DetermineChargeableLeg }
The change is that the new version introduces the change
if session.AccessLegTrackingActive { run AccessLegTracking }
It matches the case 2, so the uplevel version is the correct one to use and no changes in the file is required.
Importing the Feature Scripts after resolving the conflicts
After the conflicts are solved run the command from the same path you ran to do the major upgrade:
./orca --hosts <first host> import-feature-scripts
The output should be:
Importing feature script SCCTerm_HLR_SipAccess_ServiceTimer... Importing feature script default_Post_SipMidSession_ChargingReauth... Importing feature script MMTelOrig_SipMidSession_PartyRequest... Importing feature script MMTelConf_SipAccess_SubscriberCheck... Importing feature script SCCTermAnchor_SipAccess_ServiceTimer... Importing feature script SCC_Post_SipEndSession... Importing feature script MMTel_Pre_SipAccess_SessionStart... Importing feature script SCC_SipAccess_PartyRequest... ... other feature scripts ... Done on localhost
If some Feature Script files are not correct orca will print warnings:
5 scripts could not be imported (see above for errors): - MMTel_Pre_SipMidSession_PartyRequest - SCC_Post_SipAccess_PartyRequest - MMTel_Post_SipMidSession_PartyRequest - default_Post_SipMidSession_PartyResponse - MMTelOrig_Post_SubscriptionSipResponse
You can fix them and do the same import procedure as indicated above.
Troubleshooting
Besides the information on the console, orca
provides detailed output of the actions taken in the log file. The log file by default is located on the host that executed the command under the path logs
.
orca
can’t connect to the remote hosts
Check if the trusted connection via ssh is working. The command ssh <the host to connect to>
should work without asking for a password.
You can add a trusted connection by executing the steps below
-
Create SSH key by using default locations and empty passphrase. Just hit enter until you’re done
ssh-keygen -t rsa
-
Copy the SSH key onto all VMs that need to be accessed including the node you are on. You will have to enter the password for the user
ssh-copy-id -i $HOME/.ssh/id_rsa.pub sentinel@<VM_ADDRESS>
where VM_ADDRESS is the host name you want the key to be copied to.
To check run:
ssh VM_ADDRESS
It should return a shell of the remote host (VM_ADDRESS).
orca
failed to create the management database
orca
uses the psql
client to connect to and do operations on the PostgreSQL database. Check the first host in the --hosts
list has psql installed.
Get the database hostname from the file /home/sentinel/rhino/node-xxx/config/config_variables
Search for the properties MANAGEMENT_DATABASE_NAME
, MANAGEMENT_DATABASE_HOST
, MANAGEMENT_DATABASE_USER
, MANAGEMENT_DATABASE_PASSWORD
example
MANAGEMENT_DATABASE_NAME=rhino_50 MANAGEMENT_DATABASE_HOST=postgresql MANAGEMENT_DATABASE_PORT=5432 MANAGEMENT_DATABASE_USER=rhino_user MANAGEMENT_DATABASE_PASSWORD=rhino_password
Test the connection:
psql -h postgresql -U rhino_user -p 5432 rhino_50 Enter the password and you expect to see Password for user rhino: psql (9.5.14) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. rhino_50=>
If the connection fails try with the database name from the previous cluster. If it still fails the procedure below is a guide on how to add permission to the database to accept remote connections.
-
Log in to the database host.
-
Open up the pg_hba.conf file, using sudo.
sudo vim /var/lib/pgsql/data/pg_hba.conf
-
Replace the line that looks like this… host rhino-xxx sentinel xxx.xxx.xxx.xxx/xx password with a line that looks like this…
host all sentinel xxx.xxx.xxx.xxx/xx password
Where 'rhino-xxx' is the name of cluster xxx’s postgres database, and xxx.xxx.xxx.xxx/xx covers the signalling addresses of the nodes.
-
Reload the config. /usr/bin/pg_ctl reload
-
Or failing that, try this command. sudo service postgresql restart
Installer runs but does not install
The recommendation is to rollback the node to the previous cluster, cleanup the prepared cluster and start again.
./orca --hosts host1 rollback ./orca --hosts host1,host2,host3 cleanup --cluster <cluster id> ./orca --hosts host1,host2,host3 major-upgrade <path to the product sdk zip> <path to the install.properties> --pause
Rhino nodes shutdown or reboot during migration
orca
requires the node being migrated to allow management commands via rhino-console. If the node is not active orca
will fail.
If while executing orca --hosts <list of hosts> major-upgrade packages install.properties --continue
the node being currently migrated shuts down or reboots, there are 3 options:
1- Skip the node and proceed to the other nodes and migrate the failed node later
Except for the first host name, remove the hosts that were already migrated from the orca command, including the host that failed and execute the command orca --hosts <list of hosts> major-upgrade packages install.properties --continue
again to continue with the upgrade.
2- Restart the node manually and continue with the upgrade
If the node shutdown and hadn’t rebooted, execute the rhino.sh
start script. If the node rebooted, check if rhino-console command works:
<path to the old cluster >/client/bin/rhino-console
Except for the first host name, remove the hosts that were already migrated from the orca command and execute the command orca --hosts <list of hosts> major-upgrade packages install.properties --continue
again to continue with the upgrade.
3- Do the migration manually
This is an extreme case, but the procedure is simple.
For each node that was not migrated:
Identify the new cluster
cd $HOME ls -ls rhino lrwxrwxrwx 1 sentinel sentinel 24 Nov 21 21:48 rhino -> volte-2.7.1.2-cluster-50
will show you the current cluster name in the format <product>-<version>-cluster-<id>.
The new cluster will have a similar name <product>-<new version>-cluster-<id + 1>, but the cluster id will be one number higher. For example: volte-2.8.0.3-cluster-51
Kill the current node
rhino/rhino.sh kill
Link the rhino path to the new cluster
rm rhino ln -s <new cluster> rhino
where <new cluster> is the path identified above. Example: volte-2.8.0.3-cluster-51
Start the node and check for connection
./rhino/rhino.sh start
Wait some minutes and check if you can connect to the node using REM or use rhino-console.