This page describes the methodology used when running the benchmarks.

Rationale

Benchmarks were performed using a real VoLTE cluster, a real Cassandra DB cluster, and simulated network functions. Where present, SS7 routing is handled by a pair of real SGC clusters. Network functions reachable via SS7 are simulated. The network functions (HSS (data cached in ShCM), OCS, SCSCF) were simulated to abstract away performance considerations for these functions. The simulated network functions are run on separate hosts from the VoLTE cluster.

In our benchmarks the VoLTE cluster processed calls for both originating and terminating triggers. Each call is processed in exactly one trigger — either originating or terminating. SAS tracing is enabled for all benchmarks.

Benchmarks were run at maximum sustainable load level for each node. In this configuration there is no tolerance for node failure, any additional incoming calls will be dropped. To allow for node failure, additional nodes need to be added to provide an acceptable margin (an N+K configuration). As the load distribution from adding K nodes for redundancy over the minimum N nodes for stable operation is strongly dependent on the combined cluster size, we test but do not publish the performance of a cluster sized to support failover.

Capacity overhead to support node failure is calculated based on the maximum acceptable number of failed nodes. Typically this is 10% of the cluster, rounded up to the nearest whole number. For example, an installation with up to 10 event processing nodes should have sufficient spare capacity to accept a single node failure. This means that for 10 node cluster, if each node can handle 120SPS, the maximum call rate per deployed node should be 0.9*120 or 108SPS, for a whole-cluster rate of 1080SPS. A three node cluster would only be able to support 240SPS (0.66*120*3) if sized to allow one node to fail.

On virtualized systems, the failure is usually rounded to the nearest multiple of the number of VMs on a single host. For example, a typical deployment with two Rhino nodes per physical host should accept an even number of failed nodes. With the same 10 node cluster and 120SPS the maximum call rate per node is 0.8*120 or 96SPS, for a whole-cluster rate of 960SPS.

Subscriber definition

We assume that a single subscriber will be involved in 1 call attempt during the busy hour.

Call attempt handling requires four B2BUAs: Originating SCC, Originating MMTel, Terminating MMTel, and Terminating SCC. To help ensure performance does not fall short of expectations we assume that all call attempts will be on net. This requires that our VoLTE handle all B2BUAs for all sessions.

270K subscribers produce 270K calls per hour at a BHCA of 1. Assuming calls are uniformly distributed through the busy hour, this is 75 call attempts per second. MMTel results and SCC results show that a three node cluster can support 75 call attempts per second.

Each Benchmark scenario includes 1 B2BUA of the 4. We run 300 sessions per second, so to convert from B2BUAs to call attempts, simply divide by 4.

Cluster configurations

We test a 3 node cluster, with each node on a separate VM, with replication and SAS tracing enabled. The Cassandra database is also configured as a 3 node cluster on 3 virtual hosts.

Test setup

Each test includes a ramp-up period of 15 minutes before full load is reached. This is included as the Oracle JVM provides a Just In Time (JIT) compiler. The JIT compiler compiles Java bytecode to machinecode, and recompiles code on the fly to take advantage of optimizations not otherwise possible. This dynamic compilation and optimization process takes some time to complete. During the early stages of JIT compilation/optimization, the node cannot process full load. JVM garbage collection does not reach full efficiency until several major garbage collection cycles have completed.

15 minutes of ramp up allows for several major garbage collection cycles to complete, and the majority of JIT compilation. At this point, the node is ready to enter full service.

The tests are run for one hour after reaching full load. Load is not stopped between ramp up and starting the test timer.

Previous page Next page