SIS 2.7.0 :: SIS Overview and Concepts :: SIP Session Replication

As of version 2.6.1, the SIS-SIP EasySIP Resource Adaptor supports session replication. This means that SIP dialogs can be failed over to other cluster nodes if the original node failed.

On this page...

Overview of Operation
Configuration for session replication
Per-node SRV addresses
- Example
- Initial requests
EasySIP API changes

Overview of Operation

Session replication is disabled by default. It can be enabled for all dialogs, or alternatively an application can specify that a particular dialog must be replicated, using an API call.

In the SIS-SIP EasySIP API, a SIP dialog is represented by the SipSession type.

Normal Operation

When replication is enabled, the SIS initially creates the dialog state in local memory. When the dialog reaches the "Confirmed" state, the SIS writes the dialog state to its replicated store. This is either Rhino’s in-memory database, or an external key-value store, such as Cassandra.

Early dialogs (dialogs where the initial request has not yet received a 2xx response) are not replicated.

When the creating node receives mid-dialog requests, it processes the requests normally using the dialog state in local memory. When the SIP transaction completes, the updated dialog state is written to the replicated store.

When a dialog-terminating request such as BYE is processed, the local and replicated dialog state is removed when the transaction completes.

Failover and Recovery

If a node fails, it is assumed that an external mechanism, such as DNS or a load balancer, will direct SIP traffic to the surviving nodes in a cluster.

When a node receives a mid-dialog request for a dialog that does not exist in local memory, it attempts to load the dialog from its replicated state.

If the dialog is found, it is copied into local memory and the node can continue processing the mid-dialog request.
If the dialog is not found, then the SIS node rejects the request with a 481 Call/Transaction Does Not Exist response.

When the original node recovers, failed-over dialogs do not migrate back to that node. Rather, the dialog remains on the node that last took over ownership of the dialog, as long as that node is alive. This is managed by Session Ownership.

Session Ownership

The SIS uses Rhino’s Session Ownership facility to track which node is currently responsible for a dialog. This is to ensure that requests in a dialog are processed on the same node if possible, for consistency. Otherwise dialog state could be updated by several nodes, leading to errors. In other words, dialogs are "sticky" and will only migrate to another node in the event of a node failure.

When a SIS node receives a mid-dialog request for a dialog that it does not currently own, the request is automatically proxied to the owning node, as determined by the Session Ownership facility. If the owning node is down, then the current node may take over ownership of the dialog and resume processing the request. The Session Ownership facility ensures that subsequent requests for this dialog will be directed to the correct node.

Session Ownership records are not automatically created by the SIS; they must be created explicitly by the application processing the initial request. The Sentinel VoLTE session tracking features perform this function.

Configuration for session replication

Enabling Session Replication and Session Ownership requires some configuration changes in the Rhino platform, as well as the SIS instance.

Session Replication and Session Ownership are only available in SIS 2.6.1, which requires Rhino 2.6.1 or later.

Replication method

The replication method is determined by the Rhino namespace that the SIS instance is deployed into.

Rhino 2.6.1 supports two replication methods:

savanna — state is replicated to cluster nodes using Rhino’s Savanna reliable multicast protocol.
key-value-store — state is written to an external key-value store database, currently Cassandra.

The key-value-store method is preferred for large clusters.

Session ownership

Rhino’s Session Ownership facility is automatically available to the SIS instance if its namespace has session ownership enabled. This requires configuring a Session Ownership store in Rhino, and creating a namespace with session ownership support.

Session replication mode

The sessionReplicationMode configuration property on the SIS-SIP EasySIP RA determines whether session replication will be used. This property has three values:

disabled — no replication will be used.
enabled — replication is enabled.
automatic — replication is enabled if the replication method is key-value-store.

The default value is automatic.

Replicate by default

When replication is enabled in the SIS, there is also the option to selectively enable it on particular sessions. This is controlled by the replicateByDefault boolean configuration property:

If true, and session replication is enabled, all sessions will be replicated.
If false, and session replication is enabled, then a session is only replicated on demand, when the application calls SipSession.startReplicating().

The default value is true.

When startReplicating() is called, the SIS replicates the session as soon as it is able to. If the dialog is not yet confirmed, replication will begin when it reaches the confirmed state. If the dialog is already confirmed, replication begins immediately, storing the current dialog state.

Dynamic SRV Name Format

To make use of per-node SRV addresses, the DynamicSRVNameFormat network interface property must be configured.

The value of this property is a string that describes how a DNS SRV name is derived from the node’s IP address. The special token ${IP} is replaced with a DNS-safe encoding of the node’s IP address, that may be used as a domain name component.

For example if DynamicSRVNameFormat is tas-${IP}.site1.home1.net, then the node with IP address 192.168.10.1 will use the hostname tas-192-168-10-1.site1.home1.net in its Contact or Record-Route URIs.

For IPv6 addresses, each 2-octet group is hex-encoded and delimited with a hyphen. For example the address 1080::8:800:200C:417A is encoded as 1080-0000-0000-0000-0008-0800-200c-417a.

Per-node SRV addresses

For dialogs created by the SIS, mid-dialog requests are routed to the SIP URI provided in the Contact or Record-Route headers of the SIS’s dialog-creating request or response. By default this URI will contain the IP address of the SIS node. This means in the event of a node failure, mid-dialog requests cannot fail over to another IP address, so will fail.

The SIS already had the capability to use virtual addresses in its URIs, using the VirtualAddresses network interface property. The virtual address could be a DNS name that resolves to a load balancer address, or it might refer to a DNS NAPTR, SRV or A (address) record.

In the event of a node failure, some load balancers may be able to ensure that subsequent requests in a dialog are routed to the same surviving node. But for sites relying on DNS and SIP’s standard RFC 3263 DNS procedures for locating servers, there is no guarantee that subsequent requests in a dialog will be routed to the same node after a failure — the single virtual address in the SIS’s URI can resolve to any SIS node.

To support DNS failover in a more predictable fashion, the SIS may use per-node SRV addresses in its own SIP URIs. These are DNS SRV names generated from the SIS node’s IP address, so they are specific to that node. The operator can provision corresponding DNS SRV records to specify the primary and backup nodes. Requests will be routed to the given node when it is available, but will fail over to the given backup nodes by the rules of RFC 3263.

The use of this feature requires that the Session Ownership subsystem is available, and that each SIP dialog has a Session Ownership record.

Example

Say we have three SIS nodes, hostnames sis-1.home1.net, sis-2.home1.net and sis-3.home1.net. Their address records in DNS are:

sis-1.home1.net    <ttl> IN    A     192.168.10.1
sis-2.home1.net    <ttl> IN    A     192.168.10.2
sis-3.home1.net    <ttl> IN    A     192.168.10.3

By configuring the DynamicSRVNameFormat network interface property with value ip-${IP}.home1.net, the SIS nodes will use URIs of the form sip:ip-192-168-10-1;lr;transport=tcp.

When DynamicSRVNameFormat is configured, the SIS URIs always specify a transport parameter but do not specify a port. This is to ensure that SIP clients perform a DNS SRV lookup on the host name as per RFC 3263.

The operator provisions the corresponding DNS SRV records:

;; SRV address                                           Priority  Weight  Port   Target
_sip._tcp.ip-192-168-10-1.home1.net.  <ttl> IN    SRV    0         1       5060   sis-1.home1.net.
_sip._tcp.ip-192-168-10-1.home1.net.  <ttl> IN    SRV    10        1       5060   sis-2.home1.net.
_sip._tcp.ip-192-168-10-1.home1.net.  <ttl> IN    SRV    10        1       5060   sis-3.home1.net.

_sip._tcp.ip-192-168-10-2.home1.net.  <ttl> IN    SRV    0         1       5060   sis-2.home1.net.
_sip._tcp.ip-192-168-10-2.home1.net.  <ttl> IN    SRV    10        1       5060   sis-3.home1.net.
_sip._tcp.ip-192-168-10-2.home1.net.  <ttl> IN    SRV    10        1       5060   sis-1.home1.net.

_sip._tcp.ip-192-168-10-3.home1.net.  <ttl> IN    SRV    0         1       5060   sis-3.home1.net.
_sip._tcp.ip-192-168-10-3.home1.net.  <ttl> IN    SRV    10        1       5060   sis-1.home1.net.
_sip._tcp.ip-192-168-10-3.home1.net.  <ttl> IN    SRV    10        1       5060   sis-2.home1.net.

Corresponding records would be needed for UDP (_sip._udp) and TLS (_sips._tcp) transports, if used.

For each per-node SRV address, there are 3 records. The first points to the node’s own host name, and the other records point to the 2 other node’s host names. The addresses are always tried in priority order (lowest to highest), by the rules of RFC 3263 and RFC 2782.

If node sis-1 fails, mid-dialog requests routed to sip:ip-192-168-10-1.home1.net will go to either sis-2 or sis-3 - assuming that both are available. This ensures that dialogs are "sticky" to the node that created them and are only sent to other nodes in the presence of a failure. Mid-dialog requests are automatically routed to one of the other nodes, and through the use of the Session Ownership subsystem are proxied to a single node such that requests for the same dialog are always processed on one node.

Initial requests

Initial requests can be directed to a virtual address, DNS NAPTR or SRV address that distributes traffic over all nodes in the cluster. So for the example cluster above, using a DNS SRV address we can say that all nodes have equal priority and weight:

;; SRV address                               Priority  Weight  Port   Target
_sip._tcp.sis.home1.net.  <ttl> IN    SRV    10        1       5060   sis-1.home1.net.
_sip._tcp.sis.home1.net.  <ttl> IN    SRV    10        1       5060   sis-2.home1.net.
_sip._tcp.sis.home1.net.  <ttl> IN    SRV    10        1       5060   sis-3.home1.net.

When a SIP client sends an initial request to the URI sip:sis.home1.net;transport=tcp, it will automatically select one of the three nodes at random as the destination for that request.

The address of the SIS in the network, for example its IMS public service identity (PSI) or the address used in initial filter criteria (iFC), should be a virtual address as above that represents the SIS cluster.

EasySIP API changes

Several enhancements were made to the EasySIP API to support SIP session replication.

Encoding and decoding of SIP messages

SipMessage objects may now be easily encoded to byte streams and back. This makes it easy to store messages in SBB CMP fields, for example. See SipFactory.encodeMessage() and SipFactory.decodeMessage().

Using SipMessages stored in CMP

Using Rhino’s datatype codec support, it is possible to create CMP fields that store a SipMessage directly, rather than as a byte array. Sentinel’s SipMessageCodec class is an example.

If it is a requirement to store an IncomingSipRequest in CMP, and use that request to create responses after it has been stored, then the CMP field must satisfy the following requirements:

It must use use a datatype codec such as Sentinel’s SipMessageCodec.
It must use the @PassByReference annotation with scope PERMANENT.

For example:

import com.opencloud.rhino.cmp.PassByReference;
import com.opencloud.rhino.cmp.codecs.DatatypeCodecType;
import com.opencloud.sentinel.multileg.SipMessageCodec;
import static com.opencloud.rhino.cmp.PassByReference.Scope.PERMANENT;

@DatatypeCodecType(SipMessageCodec.class)
@PassByReference(scope = PERMANENT)
public abstract SipMessage getInitialRequest();
public abstract void setInitialRequest(SipMessage request);

This will allow the request to create and send responses as long as it is restored on the same node where it was created. Once the request is restored on a different node however, it can no longer be used for creating responses. The request can still be inspected to obtain header and body values.

Replicate sessions on demand

The method SipSession.startReplicating() requests that the session be replicated if possible. It is only meaningful if the replicateByDefault boolean configuration property is false.

The method may be called at any point in the session’s lifetime. The session will not actually be replicated until it reaches the "Confirmed" state. If the session is already in the "Confirmed" state when the method is invoked, replication begins immediately. Multiple invocations of this method have no effect.

Obtain dialog ID for session

The SipSession.getDialogID() method gets the session’s DialogID, consisting of Call-ID, local and remote tags.

The string form of this dialog ID may be used as the tracking key in the Session Ownership facility.

Internal Application Server URI

Each SIS instance has its own unique URI, referred to as the internal application server URI (AS URI). This is a SIP URI used by the SIS for its own communication between SIS nodes. Currently this is used when Session Ownership determines that a mid-dialog request must be handled by another node.

Applications must use the the internal AS URI as the "owner URI" for a session ownership record. The SipFactory methods SipFactory.getInternalASURI() and SipFactory.isInternalASURI() are used to obtain and check these URIs.

Previous page Next page