This page describes the methodology used when running the benchmarks.
Rationale
Benchmarks were performed using simulated network functions, and a real VoLTE cluster. The simulated network functions are run on separate physical hosts from the VoLTE cluster. The network functions (HSS, OCS, SCSCF) were simulated to abstract away performance considerations for these functions.
In our benchmarks the VoLTE cluster processed calls for both originating and terminating triggers. Each call is processed in exactly one trigger — either originating or terminating.
-
50% of calls are originating (VoLTE full preconditions)
-
50% of calls are terminating (All other callflows)
Benchmarks were run at 50% of maximum sustainable load level. This provides good call setup times (approximately 15 milliseconds), and allows for up to half a cluster to fail without causing cascading failures.
Call rate is determined per physical host, not per Rhino node. |
Callflows
A representative sample of commonly invoked MMTEL features were used to select the callflows, for these scenarios:
For the full callflows, see Benchmark Scenarios. |
Each test runs a total of x sessions per second, across all callflowsL
Scenario | Percentage |
---|---|
VoLTE full preconditions |
50% |
CDIV Success-response |
40% |
CDIV Success-response with OIP |
5% |
CDIV-busy-response |
5% |
Call setup time (latency) is measured by the simulator playing the initiating role. For all CDIV scenarios, latency is measured from INVITE to final response. For the preconditions scenario, latency is measured from INVITE to 180.
Numa architecture
Most commercial off-the-shelf systems available currently have a non-uniform memory architecture (NUMA). In NUMA, memory access times are not equal for all sockets and all memory locations. Memory access is fastest for memory directly attached to the socket. The JAVA virtual machine does not provide any mechanisms for controlling where memory is allocated on a NUMA machine.
NUMA combined with the JVM has significant implications for performance. The best performance can be achieved by always NUMA binding each Rhino node to a single socket. In all cases this provides better performance, and greater tolerance of node failure.
All hosts used for benchmarks have 2 CPUs using NUMA. 1-node-per-host test configurations do not use NUMA bindings. 2-node-per-host test configurations use NUMA binding to restrict each node to 1 CPU.
Cluster configurations
These configurations were tested:
-
1 VoLTE node on 1 host machine
-
2 VoLTE nodes on 1 host machine
-
2 VoLTE nodes on 2 host machines
-
4 VoLTE nodes on 2 host machines.
All hosts are physical machines. Virtualisation is not used. |
Test setup
Each test includes a ramp-up period of 15 minutes before full load is reached. This is included as the Oracle JVM provides a Just In Time (JIT) compiler. The JIT compiler compiles Java bytecode to machinecode, and recompiles code on the fly to take advantage of optimizations not otherwise possible. This dynamic compilation and optimization process takes some time to complete. During the early stages of JIT compilation/optimization, the node cannot process full load. JVM garbage collection does not reach full efficiency until several major garbage collection cycles have completed.
15 minutes of ramp up allows for several major garbage collection cycles to complete, and the majority of JIT compilation. At this point, the node is ready to enter full service.
The tests are run for one hour after reaching full load. Load is not stopped between ramp up and starting the test timer.