This book contains performance benchmarks using Sentinel VoLTE for a variety of scenarios and cluster configurations.

It contains these sections:

Note These benchmarks are not directly comparable to those of previous versions due to substantial changes to all elements of the benchmarking process.

For more information:

Test methodology

This page describes the methodology used when running the benchmarks.

Rationale

Benchmarks were performed using a real VoLTE cluster, a real Cassandra DB cluster, and simulated network functions. Where present, SS7 routing is handled by a pair of real SGC clusters. Network functions reachable via SS7 are simulated. The network functions (HSS (data cached in ShCM), OCS, SCSCF) were simulated to abstract away performance considerations for these functions. The simulated network functions are run on separate hosts from the VoLTE cluster.

In our benchmarks the VoLTE cluster processed calls for both originating and terminating triggers. Each call is processed in exactly one trigger — either originating or terminating. SAS tracing is enabled for all benchmarks.

Benchmarks were run at maximum sustainable load level for each node. In this configuration there is no tolerance for node failure, any additional incoming calls will be dropped. To allow for node failure, additional nodes need to be added to provide an acceptable margin (an N+K configuration). As the load distribution from adding K nodes for redundancy over the minimum N nodes for stable operation is strongly dependent on the combined cluster size, we test but do not publish the performance of a cluster sized to support failover.

Capacity overhead to support node failure is calculated based on the maximum acceptable number of failed nodes. Typically this is 10% of the cluster, rounded up to the nearest whole number. For example, an installation with up to 10 event processing nodes should have sufficient spare capacity to accept a single node failure. This means that for 10 node cluster, if each node can handle 120SPS, the maximum call rate per deployed node should be 0.9*120 or 108SPS, for a whole-cluster rate of 1080SPS. A three node cluster would only be able to support 240SPS (0.66*120*3) if sized to allow one node to fail.

On virtualized systems, the failure is usually rounded to the nearest multiple of the number of VMs on a single host. For example, a typical deployment with two Rhino nodes per physical host should accept an even number of failed nodes. With the same 10 node cluster and 120SPS the maximum call rate per node is 0.8*120 or 96SPS, for a whole-cluster rate of 960SPS.

Subscriber definition

We assume that a single subscriber will be involved in 1 call attempt during the busy hour.

Call attempt handling requires four B2BUAs: Originating SCC, Originating MMTel, Terminating MMTel, and Terminating SCC. To help ensure performance does not fall short of expectations we assume that all call attempts will be on net. This requires that our VoLTE handle all B2BUAs for all sessions.

270K subscribers produce 270K calls per hour at a BHCA of 1. Assuming calls are uniformly distributed through the busy hour, this is 75 call attempts per second. MMTel results and SCC results show that a three node cluster can support 75 call attempts per second.

Each Benchmark scenario includes 1 B2BUA of the 4. We run 300 sessions per second, so to convert from B2BUAs to call attempts, simply divide by 4.

Cluster configurations

We test a 3 node cluster, with each node on a separate VM, with replication and SAS tracing enabled. The Cassandra database is also configured as a 3 node cluster on 3 virtual hosts.

Test setup

Each test includes a ramp-up period of 15 minutes before full load is reached. This is included as the Oracle JVM provides a Just In Time (JIT) compiler. The JIT compiler compiles Java bytecode to machinecode, and recompiles code on the fly to take advantage of optimizations not otherwise possible. This dynamic compilation and optimization process takes some time to complete. During the early stages of JIT compilation/optimization, the node cannot process full load. JVM garbage collection does not reach full efficiency until several major garbage collection cycles have completed.

15 minutes of ramp up allows for several major garbage collection cycles to complete, and the majority of JIT compilation. At this point, the node is ready to enter full service.

The tests are run for one hour after reaching full load. Load is not stopped between ramp up and starting the test timer.

Benchmark scenarios

MMTel

A representative sample of commonly invoked MMTel features were used to select the callflows. Online charging via Diameter is active.

  • 51% of calls are originating

  • 49% of calls are terminating

Scenario Percentage

VoLTE full preconditions

50%

CDIV Success-response

40%

CDIV Success-response with OIP

4.5%

CDIV Busy-response

4.5%

VoLTE full preconditions with one session refresh

1%

Call setup time (latency) is measured by the simulator playing the initiating role. For all CDIV scenarios, latency is measured from INVITE to final response. For both preconditions scenario, latency is measured from INVITE to ACK.

VoLTE full preconditions

A basic VoLTE call, with no active MMTel features. Radio bearer quality of service preconditions are used. The B party answers the call, A party hangs up. The call has a 95 second hold time.

Latency is measured from the A party INVITE, to A party ACKing the 200.

VoLTE full preconditions with one session refresh

A basic VoLTE call, with no active MMTel features. Radio bearer quality of service preconditions are used. The B party answers the call, A party hangs up. This variant lasts for the default SIP session refresh interval, 10 minutes.

Latency is measured from the A party INVITE, to A party ACKing the 200.

CDIV Success-response

Callflow with Call Diversion active. B party rejects call, call is diverted to C party, C party accepts call. The call has a 50 second hold time.

Latency is measured from the A party INVITE, to A party receiving the 200.

CDIV Success-response with OIP

Callflow with Call Diversion active, and Originating Identity Presentation (OIP) feature active. B party rejects call, call is diverted to C party, C party accepts call. The call has a 50 second hold time.

Latency is measured from the A party INVITE, to A party receiving the 200.

Sets the Privacy header to: Privacy: user;header;id, to exercise the OIP feature.

CDIV Busy-response

Callflow with Call Diversion active. B party rejects call, call is diverted to C party, C party is not found, call ends with 404.

Latency is measured from the A party Invite, to A party receiving the 404.

SCC

A representative sample of commonly invoked SCC features were used to select the callflows. Online charging is not active.

  • 50% of calls are originating

  • 50% of calls are terminating

Scenario Percentage

SCC originating with provisional SRVCC

30%

SCC T-ADS

50%

Access-Transfer Originating

20%

Call setup time (latency) is measured by the simulator playing the initiating role.

SCC originating with provisional SRVCC

A simple originating SCC call with SRVCC and enabled. B party accepts the call. The call has a variable hold time from 10 seconds to 180 seconds.

Latency is measured from the A party INVITE, to the A party ACKing the 200.

SCC T-ADS

A terminating SCC call with T-ADS. VOPS lookup receives a negative result, call T-ADS to CS and is answered. The call has a 25s hold time.

Latency is measured from the A party INVITE, to the A party ACKing the 200.

Access-Transfer Originating

An originating call with access-transfer. The call is answered then transferred to another bearer after 15s. The call continues for a further 120s.

Latency is measured from the A party INVITE, to the A party receiving the 200.

Hardware and software

This page describes the hardware and software used when running the benchmarks.

Hardware

All machines used for benchmarking are provided by Amazon’s EC2 public cloud. We use exclusively c5.2xlarge instances as described here. The c5.2xlarge instance type has 8 vCPU (hardware threads) and 16 GiB of ram, with up to 10 Gbps networking. Of particular note here, there are two generations of Intel hardware which may be used for any instance, entirely at Amazon’s discretion.

Software

Software Version

Sentinel VoLTE

4.0.0.1

Rhino TAS - Telecom Application Server

3.0.0.5

Apache Cassandra

3.11.2

Java

Oracle Corporation OpenJDK 64-Bit Server VM 11.0.4+11-LTS

Sentinel VoLTE configuration

Sentinel VoLTE configuration is unchanged from the out-of-the-box install, aside from the location of the various simulated network elements. Some customization of Rhino and JVM configuration is required to be suitable for the VoLTE application. These are detailed below.

JVM parameters
Parameter Value

-Xmx

10240m

-Xms

10240m

-XX:MaxNewSize

1024m

-XX:NewSize

1024m

Rhino parameters
Parameter Value

Staging queue size

5000

MMTel results

This page summarizes the results for the MMTel scenario group executed with three hosts, each running a single node. Replication to support failover using the Cassandra Key-Value store is enabled. Detailed metrics follow the summary tables.

Benchmarks

Call rate

300 calls per second split unevenly across all scenarios scenarios

CPU usage

Node Usage

101

61%

102

58%

103

63%

Heap usage

Node Average heap

101

4600MB

102

4500MB

103

4600MB

Scenario latencies

Scenario 50th 75th 90th 95th 99th

CDIV Busy-response

43.0ms

76.1ms

129.1ms

165.78ms

277.6ms

CDIV Success-response

42.7ms

76.5ms

128.6ms

165.6ms

274.5ms

CDIV Success-response with OIP

43.2ms

76.8ms

128.6ms

164.9ms

270.64ms

VoLTE full preconditions and 1 session refresh scenario latencies

53.7ms

91.38ms

150.48ms

191.59ms

302.39ms

VoLTE full preconditions

47.7ms

81.9ms

137.35ms

176.8ms

294.3ms

Note Callflows for each of the scenarios are available in Benchmark scenarios.

Detailed metrics

CPU usage

Heap usage

CDIV Busy-response scenario latencies

CDIV Success-response scenario latencies

CDIV Success-response with OIP scenario latencies

VoLTE full preconditions scenario latencies

VoLTE full preconditions and 1 session refresh scenario latencies

SCC results

This page summarizes the results for the SCC+IMSSF scenario group executed with three hosts, each running a single node. Replication to support failover using the Cassandra Key-Value store is enabled. Detailed metrics follow the summary tables.

Benchmarks

Call rate

300 calls per second split unevenly across all scenarios

CPU usage

Node Usage

101

52%

102

45%

103

46%

Heap usage

Node Average heap

101

4600MB

102

4600MB

103

4600MB

Scenario latencies

Scenario 50th 75th 90th 95th 99th

Access Transfer originating scenario latencies

10.8ms

20.4ms

44.4ms

76.4ms

148.1ms

SCC T-ADS scenario latencies

51.5ms

94.5ms

158.7ms

208.8ms

351.0ms

SCC provisional srvcc scenario latencies

40.9ms

81.3ms

145.4ms

195.8ms

348.3ms

Note Callflows for each of the scenarios are available in Benchmark scenarios.

Detailed metrics

CPU usage

Heap usage

Access Transfer originating scenario latencies

SCC T-ADS scenario latencies

SCC provisional srvcc scenario latencies