Sentinel VoLTE Benchmarks

This book contains performance benchmarks using Sentinel VoLTE for a variety of scenarios and cluster configurations.

It contains these sections:

Test Methodology — details of methods used for benchmarks
Benchmark Scenarios — the call flows for each of the benchmark scenarios
Hardware and Software — details of the hardware, software, and configuration used for the benchmarks
Benchmark Results — summaries of the benchmarks and links to detailed metrics.

For more information:

review the other Sentinel VoLTE documentation available, especially the Sentinel VoLTE Administration Guide
for documentation on the other OpenCloud products used in the benchmarks, see the OpenCloud DevPortal.

Test Methodology

This page describes the methodology used when running the benchmarks.

Rationale

Benchmarks were performed using simulated network functions, and a real VoLTE cluster. The simulated network functions are run on separate physical hosts from the VoLTE cluster. The network functions (HSS, OCS, SCSCF) were simulated to abstract away performance considerations for these functions.

In our benchmarks the VoLTE cluster processed calls for both originating and terminating triggers. Each call is processed in exactly one trigger — either originating or terminating.

50% of calls are originating (VoLTE full preconditions)
50% of calls are terminating (All other callflows)

Benchmarks were run at 50% of maximum sustainable load level. This provides good call setup times (approximately 15 milliseconds), and allows for up to half a cluster to fail without causing cascading failures.

Call rate is determined per physical host, not per Rhino node.

Callflows

A representative sample of commonly invoked MMTEL features were used to select the callflows, for these scenarios:

VoLTE full preconditions
CDIV Success-response
CDIV Success-response with OIP
CDIV-busy-response.

For the full callflows, see Benchmark Scenarios.

Each test runs a total of x sessions per second, across all callflowsL

Scenario	Percentage
VoLTE full preconditions	50%
CDIV Success-response	40%
CDIV Success-response with OIP	5%
CDIV-busy-response	5%

Call setup time (latency) is measured by the simulator playing the initiating role. For all CDIV scenarios, latency is measured from INVITE to final response. For the preconditions scenario, latency is measured from INVITE to 180.

Numa architecture

Most commercial off-the-shelf systems available currently have a non-uniform memory architecture (NUMA). In NUMA, memory access times are not equal for all sockets and all memory locations. Memory access is fastest for memory directly attached to the socket. The JAVA virtual machine does not provide any mechanisms for controlling where memory is allocated on a NUMA machine.

NUMA combined with the JVM has significant implications for performance. The best performance can be achieved by always NUMA binding each Rhino node to a single socket. In all cases this provides better performance, and greater tolerance of node failure.

All hosts used for benchmarks have 2 CPUs using NUMA. 1-node-per-host test configurations do not use NUMA bindings. 2-node-per-host test configurations use NUMA binding to restrict each node to 1 CPU.

Cluster configurations

These configurations were tested:

1 VoLTE node on 1 host machine
2 VoLTE nodes on 1 host machine
2 VoLTE nodes on 2 host machines
4 VoLTE nodes on 2 host machines.

All hosts are physical machines. Virtualisation is not used.

Test setup

Each test includes a ramp-up period of 15 minutes before full load is reached. This is included as the Oracle JVM provides a Just In Time (JIT) compiler. The JIT compiler compiles Java bytecode to machinecode, and recompiles code on the fly to take advantage of optimizations not otherwise possible. This dynamic compilation and optimization process takes some time to complete. During the early stages of JIT compilation/optimization, the node cannot process full load. JVM garbage collection does not reach full efficiency until several major garbage collection cycles have completed.

15 minutes of ramp up allows for several major garbage collection cycles to complete, and the majority of JIT compilation. At this point, the node is ready to enter full service.

The tests are run for one hour after reaching full load. Load is not stopped between ramp up and starting the test timer.

Benchmark Scenarios

The VoLTE benchmarks use four representative MMTEL scenarios, run repeatedly and simultaneously.

Red lines in callflow diagrams mark the points call setup time is measured between.

VoLTE full preconditions

A basic VoLTE call, with no active MMTEL features. Radio bearer quality of service preconditions are used.

CDIV Success-response

Callflow with Call Diversion active. B party rejects call, call is diverted to C party, C party accepts call, call lasts 50 seconds.

CDIV Success-response with OIP

Callflow with Call Diversion active, and Originating Identity Presentation (OIP) feature active. B party rejects call, call is diverted to C party, C party accepts call, call lasts 50 seconds.

Includes specified privacy headers, to exercise the OIP feature.

CDIV Busy-response

Callflow with Call Diversion active. B party rejects call, call is diverted to C party, C party is not found, call fails.

Hardware and Software

This page describes the hardware and software used when running the benchmarks.

Sentinel VoLTE hosts

Host hardware
Logical Processor Count	24
CPU Count	2
Cores per CPU	6
CPU Vendor	Intel
CPU Type	Intel® Xeon® CPU X5660@2.80GHz
Total RAM	24,016M

Benchmarks with two hosts used two identical servers

Operating System
Operating System	CentOS 6.6
Kernel Version	Linux 2.6.32-504.8.1.el6.x86_64
Platform	64-bit

Software
Vendor	Software	Version
Oracle	Java	1.7.0_71-b14
OpenCloud	Sentinel VoLTE	2.6.0.1
OpenCloud	Rhino TAS	2.5.0.1-M4
OpenCloud	CGIN Connectivity Pack	1.5.4.1
OpenCloud	Service Interaction SLEE	2.5.4.1

Simulators

Server hardware
Logical Processor Count	24
CPU Count	2
Cores per CPU	6
CPU Vendor	Intel
CPU Type	Intel® Xeon® CPU X5650@2.67GHz
Total RAM — single host benchmark	24,016M
Total RAM — two-host benchmark	36,047M

Operating System
Operating System	Red Hat 7.2
Kernel Version	Linux 3.10.0-327.28.3.el7.x86_64
Platform	64-bit

Software
Vendor	Software	Version
Oracle	Java	1.7.0_71-b14
OpenCloud	Scenario Simulator	2.3.0.10
OpenCloud	IN Scenario Pack	1.5.4.0
OpenCloud	SIP Scenario Pack	1.0.3-TRUNK.0-M4
OpenCloud	Diameter Scenario Pack	2.6.0.3

Rhino configuration

JVM arguments (1 node per host)

JVM_ARCH=64
HEAP_SIZE=12,288m
MAX_NEW_SIZE=3,072m
NEW_SIZE=3,072m
PERMGEN_SIZE=384m

JVM arguments (2 nodes per host)

JVM_ARCH=64
HEAP_SIZE=6,656m
MAX_NEW_SIZE=1,664m
NEW_SIZE=1,664m
PERMGEN_SIZE=384m

In-memory database size limits

ReplicatedMemoryDatabase committed-size = 400M
DomainedMemoryDatabase committed-size = 400M
ProfileDatabase committed-size = 400M
LocalMemoryDatabase committed-size = 400M

Sentinel VoLTE configuration

Sentinel VoLTE configuration is unchanged from the out-of-the-box install, aside from the locations of the HSS and OCS.

Benchmark Results

A summary of the benchmark results follows. Click on a benchmark name for detailed results.

Benchmark

Rate

CPU Usage

NUMA state

One Host with One Node

275 calls per second

Node

Across 24 cores

1

640.0%

total

640.0%

NUMA bindings not used

One Host with Two Nodes

275 calls per second

Node

Across 24 cores

1

340.0%

2

330.0%

total

670.0%

NUMA bindings used. CPU usage and call setup times improved over One Host with One Node

Two Hosts with One Node per Host

550 calls per second

Node

Across 24 cores

1

610.0%

2

590.0%

total

1200.0%

NUMA bindings not used

Two Hosts with Two Nodes per Host

550 calls per second

Node	Across 24 cores
1	370.0%
2	350.0%
3	350.0%
4	360.0%
total	1430.0%

NUMA bindings used. Call setup times improved over Two Hosts with One Node per Host

One Host with One Node

This page summarises the results for benchmarks executed with a single host running a single node. Detailed metrics follow the summary tables.

Benchmarks

Call rate

275 calls per second across the four scenarios

CPU usage

Node

Across 24 cores

1

640.0%

total

640.0%

Maximum theoretical CPU usage is 2400%.

Heap usage

Node

Average heap

1

6300MB

Scenario latencies

Scenario	50th	90th	95th	99th
CDIV Busy-response	13.6ms	22.7ms	132.0ms	295.0ms
CDIV Success-response	10.6ms	18.3ms	123.0ms	286.0ms
CDIV Success-response with OIP	13.7ms	20.2ms	129.0ms	291.0ms
VoLTE full preconditions	15.9ms	30.6ms	152.0ms	311.0ms

Scenario

50th

90th

95th

99th

CDIV Busy-response

13.6ms

22.7ms

132.0ms

295.0ms

CDIV Success-response

10.6ms

18.3ms

123.0ms

286.0ms

CDIV Success-response with OIP

13.7ms

20.2ms

129.0ms

291.0ms

VoLTE full preconditions

15.9ms

30.6ms

152.0ms

311.0ms

Callflows for each of the scenarios are available in Benchmark Scenarios.

Detailed metrics

CPU usage

Heap usage

CDIV Busy-response scenario latencies

1node 1host scscfA sim cdiv busy response basic latency

CDIV Success-response scenario latencies

1node 1host scscfA sim cdiv success latency

CDIV Success-response with OIP scenario latencies

1node 1host scscfA sim oip cdiv success latency

VoLTE full preconditions scenario latencies

1node 1host scscfA sim volte full latency

One Host with Two Nodes

This page summarises the results for benchmarks executed with a single host running two nodes. Detailed metrics follow the summary tables.

Benchmarks

Call rate

275 calls per second across the four scenarios

CPU usage

Node

Across 24 cores

1

340.0%

2

330.0%

total

670.0%

Maximum theoretical CPU usage is 2400%.

Heap usage

Node

Average heap

1

5500MB

2

5300MB

Scenario latencies

Scenario	50th	90th	95th	99th
CDIV Busy-response	11.8ms	14.6ms	22.9ms	173.0ms
CDIV Success-response	11.7ms	13.5ms	23.7ms	174.0ms
CDIV Success-response with OIP	11.9ms	14.9ms	23.9ms	175.0ms
VoLTE full preconditions	17.4ms	19.8ms	34.1ms	186.0ms

Scenario

50th

90th

95th

99th

CDIV Busy-response

11.8ms

14.6ms

22.9ms

173.0ms

CDIV Success-response

11.7ms

13.5ms

23.7ms

174.0ms

CDIV Success-response with OIP

11.9ms

14.9ms

23.9ms

175.0ms

VoLTE full preconditions

17.4ms

19.8ms

34.1ms

186.0ms

Callflows for each of the scenarios are available in Benchmark Scenarios.

Detailed metrics

CPU usage

Heap usage

CDIV Busy-response scenario latencies

2node 1host scscfA sim cdiv busy response basic latency

CDIV Success-response scenario latencies

2node 1host scscfA sim cdiv success latency

CDIV Success-response with OIP scenario latencies

2node 1host scscfA sim oip cdiv success latency

VoLTE full preconditions scenario latencies

2node 1host scscfA sim volte full latency

Two Hosts with One Node per Host

This page summarises the results for benchmarks executed with two hosts, each running a single node. Detailed metrics follow the summary tables.

Benchmarks

Call rate

550 calls per second across the four scenarios

CPU usage

Node

Across 24 cores

1

610.0%

2

590.0%

total

1200.0%

Maximum theoretical CPU usage is 2400%.

Heap usage

Node

Average heap

1

5900MB

2

5800MB

Scenario latencies

Scenario	50th	90th	95th	99th
CDIV Busy-response	10.8ms	18.7ms	79.0ms	188.0ms
CDIV Success-response	10.7ms	18.7ms	79.1ms	187.0ms
CDIV Success-response with OIP	10.9ms	18.1ms	78.2ms	187.0ms
VoLTE full preconditions	16.1ms	27.0ms	96.7ms	203.0ms

Scenario

50th

90th

95th

99th

CDIV Busy-response

10.8ms

18.7ms

79.0ms

188.0ms

CDIV Success-response

10.7ms

18.7ms

79.1ms

187.0ms

CDIV Success-response with OIP

10.9ms

18.1ms

78.2ms

187.0ms

VoLTE full preconditions

16.1ms

27.0ms

96.7ms

203.0ms

Callflows for each of the scenarios are available in Benchmark Scenarios.

Detailed metrics

CPU usage

2hosts 1node per host rhino cluster 101 cpu

2hosts 1node per host rhino cluster 102 cpu

Heap usage

2hosts 1node per host rhino cluster 101 heap

2hosts 1node per host rhino cluster 102 heap

CDIV Busy-response scenario latencies

2hosts 1node per host scscfA sim cdiv busy response basic latency

CDIV Success-response scenario latencies

2hosts 1node per host scscfA sim cdiv success latency

CDIV Success-response with OIP scenario latencies

2hosts 1node per host scscfA sim oip cdiv success latency

VoLTE full preconditions scenario latencies

2hosts 1node per host scscfA sim volte full latency

Two Hosts with Two Nodes per Host

This page summarises the results for benchmarks executed with two hosts, each running two nodes. Detailed metrics follow the summary tables.

Benchmarks

Call rate

550 calls per second across the four scenarios

CPU usage

Node	Across 24 cores
1	370.0%
2	350.0%
3	350.0%
4	360.0%
total	1430.0%

Maximum theoretical CPU usage is 2400%.

Heap usage

Node	Average heap
1	3500MB
2	3400MB
3	3500MB
4	3500MB

Scenario latencies

Scenario	50th	90th	95th	99th
CDIV Busy-response	12.1ms	15.0ms	36.7ms	172.0ms
CDIV Success-response	12.0ms	14.7ms	36.2ms	172.0ms
CDIV Success-response with OIP	12.2ms	15.3ms	38.2ms	174.0ms
VoLTE full preconditions	18.0ms	21.7ms	48.2ms	186.0ms

Scenario

50th

90th

95th

99th

CDIV Busy-response

12.1ms

15.0ms

36.7ms

172.0ms

CDIV Success-response

12.0ms