Rhino maintains a single system image by preventing inconsistent nodes from forming a cluster. It determines cluster membership based on the set of cluster nodes reachable within a time-out period.
This page explains the strategies available for managing and configuring cluster membership.
Cluster membership is not a concern when using the Rhino SDK Getting Started Guide, where the cluster membership is always just the single SDK node. |
Below are descriptions of:
How nodes "go primary"
What is primary component selection?
A cluster node runs a primary component selection algorithm to determine whether the component it belongs to is primary or non-primary — without a priori global knowledge of the cluster. |
The primary component is the authoritative set of nodes in the cluster. A node can only perform work when in the primary component. When a node enters the primary component, we say it "goes primary". Likewise, when a node leaves the primary component, we say it "goes non-primary".
The component selector manages which nodes are in the primary component. Rhino provides a choice of two component selectors: DLV or 2-node. The component selector needs to maintain a consistent view of the primary component in several scenarios, to maintain the single system image provided by Rhino.
Segmentation and split-brain
Nodes can become isolated from each other if some networking failure causes a network segmentation. This carries the risk of a "split brain" scenario, where nodes on both sides of the segment consider themselves primary. Rhino, which is managed as a single system image, does not allow split brain scenarios. The DLV and 2-node selectors use different strategies for avoiding split-brain scenarios.
Starting and stopping nodes
Nodes may stop and start the following ways:
-
node failure — Individual cluster nodes may fail, for example due to a hardware failure. From the point of view of the remaining nodes, node failures are indistinguishable from network segmentation. Behaviour of the surviving members is determined by the component selector.
-
automatic shutdown with restart — There are cases described in this guide where the component selector "shuts down" a node, for example to prevent split-brain scenarios. It does this by shifting the node from primary to non-primary. Whenever a node goes from primary to non-primary, it self-terminates. The node will still restart if the “-k” flag was passed to the
start-rhino.sh
script. The node will become primary again as soon as the component selector determines it’s safe to do so. -
node start or restart — When a booting node enters a cluster which is primary, the node will also go primary, and will receive state from existing nodes.
-
remerge — A remerge happens after a network segmentation, when connectivity between network segments is restored. When a network segment of non-primary nodes merges with a segment of primary nodes, the non-primary nodes will also go primary, and receive state from the other nodes. In the unlikely case that two primary segments try to merge, Rhino will shut down the nodes in one of the segments, to maintain the single system image. This should only happen if two sides of a network segment are manually activated using the
-p
flag when using DLV (an administrative error), or after a network failure when using the 2-node selector.
Specifying the component selector
The main configuration choice related to cluster membership is the choice of component selector. If no component selector is specified, Rhino uses DLV as the default.
To specify the component selector, set the system property com.opencloud.rhino.component_selection_strategy
to 2node
or dlv
on each node.
Add this line near the end of read-config-variables
file under the node directory to use the 2-node strategy:
OPTIONS="$OPTIONS -Dcom.opencloud.rhino.component_selection_strategy=2node"
This property must be set consistently on every node in the cluster. Rhino will shut down a node trying to enter a cluster using a different component selector. |
The DLV component selector
What is DLV?
The DLV component selector is inspired by the dynamic-linear voting (DLV) algorithm described by Jajodia and Mutchler in their research paper Dynamic voting algorithms for maintaining the consistency of a replicated database. |
DLV is the default primary component strategy. It is suitable for most deployments, and recommended when using three or more Rhino nodes, or two nodes plus a quorum node.
The DLV component selector uses a voting algorithm where the membership of previous primary components plays a role in the selection of the next primary component. Each node persists its knowledge of the last known primary component. When the cluster membership changes, each node exchanges a voting message that contains its own knowledge of previous primary components. Once voting completes, each node, independently, uses these votes to make the same decision on whether to be primary or non-primary. A component can be primary if there are enough members present from the last known configuration to form a quorum.
The DLV component selector guarantees that in the case of a network segmentation (where sets of nodes are isolated from each other), that at most one of the segments will remain primary, to avoid a 'split-brain' scenario where two segments consider themselves primary.
This is achieved by considering any component smaller than cluster_size/2
to be non-primary.
In the case of an exactly even split (4 node cluster with 2 nodes failed), the component with the lowest nodeID survives.
Manually activating DLV
Upon first starting a cluster using DLV, the primary component must be activated.
You do this by passing the -p
flag to start-rhino.sh
when booting the first node.
DLV persists the primary/non-primary state to disk, so specifying the -p
flag is not required after the first time.
Using quorum nodes to distinguish node failure from network segmentation
What is a quorum node?
A quorum node is a lightweight node added to distinguish between network segmentation and node failure (as described above). It does not process events or run SLEE services (nodes that are not quorum nodes are sometimes called "event-router nodes"). Quorum nodes have much lighter hardware requirements than event-router nodes. To start a quorum node, you pass the |
A quorum node is useful to help distinguish between node failure and network segmentation when using just two event-router nodes.
Given a cluster of nodes {1,2}
, there are two node-failure cases:
-
If node
{2}
fails, the remaining node{1}
will stay primary because it is the distinguished node (having the lowest node ID). -
If node
{1}
fails, the remaining node{2}
will go non-primary and shut down. DLV can’t distinguish this from network segmentation, so it shuts down node{2}
to prevent the possibility of a split-brain scenario. This usually isn’t desirable, and there are two approaches for solving this case: use a 2-node component selector or add a single quorum node to the cluster.
The 2-node component selector
The 2-node selector is designed exclusively for configurations with exactly two Rhino nodes, with a redundant network connection between them. It differs from DLV in how it handles node failures. When one node fails, the other node stays primary, regardless of which of the nodes failed. Conceptually, the responsibility of avoiding a split-brain scenario shifts to the redundant network connection. For this reason, this strategy should only be used when a redundant connection is available. If network segmentation happens, and two primary components remerge, one side of the segment will be shut down.
Quorum nodes cannot be used with a 2-node selector. If you choose a 2-node selector, Rhino will prevent quorum nodes from booting. |
Activating 2-node selectors automatically
2-node selectors automatically go primary when booting.
(The -p
flag is not necessary when using the 2-node selector, and is ignored.)
When both nodes are visible, they become primary without delay.
When a single node boots, it waits for a short time (defaulting to five seconds) before going primary.
This prevents a different split-brain case when introducing new nodes.
Communications mode
Cluster membership may be run on exactly one of two communication methods. The communication mode must be chosen at cluster creation time and cannot be live reconfigured.
Multicast
This communication mode uses UDP multicast for communication between nodes. This requires that UDP multicast be available and correctly working on all hosts in the cluster.
Scattercast
This communication mode uses UDP unicast in a mesh topology for communication between nodes. This mode is intended for use where multicast support is not available, such as in the cloud. Scattercast requires significantly more complex configuration, and incurs some network overhead. Thus we do not recommend scattercast where multicast is available.