What are SS7 SGC alarms?

Alarms in the SS7 SGC stack alert the administrator to exceptional conditions. Subsystems in the SS7 SGC stack raise them upon detecting an error condition or an event of high importance. The SS7 SGC stack clears alarms automatically when the error conditions are resolved; an administrator can clear any alarm at any time. When an alarm is raised or cleared, the SS7 SGC stack generates a notification that is sent as a JMX Notification and an SNMP trap/notification.

The SS7 SGC stack defines multiple alarm types. Each alarm type corresponds to a type of error condition or important event (such as "SCTP association down"). The SGC stack can raise multiple alarms of any type (for example, multiple "SCTP association down" alarms, one for each disconnected association).

Alarms are inspected and managed through a set of commands exposed by the Command-Line Management Console, which is distributed with SGC SS7 Stack.

Active alarms and event history

The SS7 SGC Stack stores and exposes two types of alarm-related information:

  • active alarms — a list of alarms currently active

  • event history — a list of alarms and notifications that where raised or emitted in the last 24 hours (this is default value — see Configuring the SS7 SGC Stack).

At any time, an administrator can clear all or selected alarms.

Generic alarm attributes

Alarm attributes represent information about events that result in an alarm being raised. Each alarm type has the following generic attributes, plus a group of attributes specific to that alarm type (described in the following sections).

Attribute Description

id

A unique alarm instance identifier, presented as a number. This identifier can be used to track alarms, for example by using it to identify the raise and clear event entries for an alarm in the event history, or to refer to a specific alarm in the commands which can be used to manipulate alarms.

name

The name of the alarm type. A catalogue of alarm types is given below.

severity

alarm severity:

  • CRITICAL — application encountered an error which prevents it from continuing (it can no longer provide services)

  • MAJOR — application encountered an event which significantly impacts delivered services; some services may no longer be available

  • MINOR — application reports an event which does not have significant impact on delivered services

  • INFO — application reports an information event which does not have any impact on delivered services

  • CLEARED — alarm has been cleared.

timestamp

The date and time at which the event occurred.

Alarm types

This section describes all alarm types that can be raised in an SGC cluster.

General alarms

This section describes the alarms raised concerning the general operational state of the SGC or SGC cluster.

commswitchbindfailure

The commswitchbindfailure is raised when the CommSwitch is unable to bind to the configured switch-local-address and switch-port for any reason. This is typically caused by misconfiguration; the administrator must ensure that the CommSwitch is configured to use a host and port pair which is always available for the SGC’s exclusive use. This alarm is cleared when the CommSwitch is able to successfully bind the configured port.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

commswitchbindfailure

severity

alarm severity

CRITICAL

timestamp

timestamp when the event occurred

nodeId

affected node

failureDescription

the cause of the bind failure

mapdatalosspossible

The mapdatalosspossible alarm is raised when the number of SGC nodes present in the cluster exceeds 1 plus the backup-count configured for Hazelcast map data structures. See Hazelcast cluster configuration for information on how to fix this. This alarm must be cleared manually since it indicates a configuration error requiring correction and a restart of the SGC.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

mapdatalosspossible

severity

alarm severity

INFO

timestamp

timestamp when the event occurred

nodeCount

the number of SGC nodes present in the Hazelcast cluster (which may be different from the number nodes in the SGC configuration)

backupCount

the configured Hazelcast Map backup count

distributedDataInconsistency

The distributedDataInconsistency alarm is raised when a distributed data inconsistency is detected. This alarm must be cleared manually since it indicates a problem that may result in undefined behaviour within the SGC, and requires a restart of the SGC cluster to correct. When restarting the cluster it is necessary to fully stop all SGC nodes and only then begin restarting them to properly correct the problem detected by this alarm.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

distributedDataInconsistency

severity

alarm severity

CRITICAL

timestamp

timestamp when the event occurred

source

the location where the data inconsistency was detected

nodefailure

The nodefailure (node failed) alarm is raised whenever a node configured in cluster is down. It is cleared when an SGC instance acting as that particular node becomes active.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

nodefailure

severity

alarm severity

MAJOR

timestamp

timestamp when the event occurred

nodeId

affected node

failureDescription

more information about node failure

poolCongestion

The poolCongestion (Task Pool Congestion) alarm is raised whenever over 80% of a pool’s pooled objects are in use. This is typically caused by misconfiguration, see Static SGC instance configuration. It is cleared when less than 50% of pooled objects are in use.

Note
What is a task pool?

A task pool is a pool of objects used during message processing, where each allocated object represents a message that may be processing or waiting to be processed. Each SGC node uses separate task pools for outgoing and incoming messages.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

poolCongestion

severity

alarm severity

MAJOR

timestamp

timestamp when the event occurred

poolName

name of the affected task pool

nodeId

affected node

poolExhaustion

The poolExhaustion (Task Pool Exhaustion) alarm is raised whenever a task allocation request is made on a pool whose objects are all already allocated. This is typically caused by misconfiguration, see Static SGC instance configuration. This alarm must be cleared manually.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

poolExhaustion

severity

alarm severity

MAJOR

timestamp

timestamp when the event occurred

poolName

name of the affected task pool

nodeId

affected node

workgroupCongestion

The workgroupCongestion (Work Group Congestion) alarm is raised when the worker work queue is over 80% occupied. It is cleared when worker work queue is less than 50% occupied.

Note
What is a worker group?

A worker group is a group of workers (threads) that are responsible for processing tasks (incoming/outgoing messages). Each worker has a separate work queue.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

workgroupCongestion

severity

alarm severity

MAJOR

timestamp

timestamp when the event occurred

nodeID

affected node

threadIndex

affected worker index

M3UA alarms

This section describes the alarms raised concerning the M3UA layer of the SGC cluster.

asDown

The asDown (Application Server Down) alarm is raised whenever a configured M3UA Application Server is not active. This alarm is typically caused either by a misconfiguration at one or both ends of an M3UA association or by network failure. It is cleared when the Application Server becomes active again.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

asDown

severity

alarm severity

MAJOR

timestamp

timestamp when the event occurred

asId

name of affected AS

asConnDown

The asConnDown alarm is raised when an AS connection which was active becomes inactive. This alarm can be caused either by misconfiguration at one or both ends of the M3UA association used, such as by a disagreement on the routing context to be used, or by network failure. It is cleared when the Application Server becomes active on the connection.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

asConnDown

severity

alarm severity

MAJOR

timestamp

timestamp when the event occurred

asId

name of affected AS

connectionId

name of the connection on which the affected AS is down

associationCongested

The associationCongested (SCTP association congestion) alarm is raised whenever an SCTP association becomes congested. An association is considered congested if the outbound queue size grows to more than 80% of the configured out-queue-size for the connection. This alarm is cleared when the outbound queue size drops below 50% of the configured out-queue-size.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

associationCongested

severity

alarm severity

MINOR

timestamp

timestamp when the event occurred

connectionId

name of affected connection

associationDown

The associationDown (SCTP association down) alarm is raised whenever a configured connection is not active. This alarm is typically caused either by a misconfiguration at one or both ends of the M3UA association or by network failure. It is cleared when an association becomes active again.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

associationDown

severity

alarm severity

MAJOR

timestamp

timestamp when the event occurred

connectionId

name of affected connection

associationPathDown

The associationPathDown alarm is raised whenever a network path within an association becomes unreachable but the association as a whole remains functional because at least one other path remains available. This alarm is only raised for assocations using SCTP’s multi-homing feature (i.e. having multiple connection IP addresses assigned to a single connection). Association path failure is typically caused by either misconfiguration at one or both ends or by network failure. This alarm will be cleared when SCTP signals that the path is available again, or when all paths have failed, in which case a single associationDown alarm will be raised to replace all the former associationPathDown alarms.

Note This alarm will also always be raised briefly during association establishment for all paths within the association which SCTP does not consider primary while SCTP is testing the alternative paths.
Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

associationPathDown

severity

alarm severity

MINOR

timestamp

timestamp when the event occurred

connectionId

name of affected connection

pathId

the peer address which has become unreachable

dpcRestricted

The dpcRestricted (destination point code restricted) alarm is raised when the SGC receives a Destination Restricted message from its remote SGP or IPSP peer for a remote destination point code. It is cleared when the DPC restricted state abates on a particular SCTP association.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

dpcRestricted

severity

alarm severity

MINOR

timestamp

timestamp when the event occurred

asId

AS related to this DPC

dpcId

name fo the affected DPC

connectionId

name of affected connection

dpcUnavailable

The dpcUnavailable (destination point code unavailable) alarm is raised when a configured DPC is unreachable through a particular SCTP association. It is cleared when a DPC becomes reachable again through the particular SCTP association.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

dpcUnavailable

severity

alarm severity

MINOR

timestamp

timestamp when the event occurred

asId

AS related to this DPC

dpcId

name fo the affected DPC

connectionId

name of affected connection

SCCP alarms

This section describes the alarms raised concerning the SCCP layer of the SGC cluster.

sccpLocalSsnProhibited

The sccpLocalSsnProhibited (SCCP local SSN is prohibited) alarm is raised whenever all previously connected TCAP stacks (with the CGIN RA) using a particular SSN become disconnected. This is typically caused either by network failure or administrative action (such as deactivating an RA entity in Rhino). It is cleared whenever at least one TCAP stack using a given SSN connects.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

sccpLocalSsnProhibited

severity

alarm severity

MAJOR

timestamp

timestamp when the event occurred

ssn

SSN that is prohibited

sccpRemoteNodeCongestion

The sccpRemoteNodeCongestion (SCCP remote node is congested) alarm is raised whenever a remote SCCP node reports congestion. It is cleared when the congestion state abates.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

sccpRemoteNodeCongestion

severity

alarm severity

MINOR

timestamp

timestamp when the event occurred

dpc

affected DPC

sccpRemoteNodeNotAvailable

The sccpRemoteNodeNotAvailable (SCCP remote node is not available) alarm is raised whenever a remote SCCP node becomes unavailable. It is cleared when the remote node becomes available.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

sccpRemoteNodeNotAvailable

severity

alarm severity

MINOR

timestamp

timestamp when the event occurred

dpc

affected DPC

sccpRemoteSsnProhibited

The sccpRemoteSsnProhibited (SCCP remote SSN is prohibited) alarm is raised whenever a remote SCCP node reports that a particular SSN is prohibited. It is cleared whenever the remote SCCP node reports that a particular SSN is available.

Attribute Description Values of constants

id

unique alarm identifier

name

name of alarm type

sccpRemoteSsnProhibited

severity

alarm severity

MINOR

timestamp

timestamp when the event occurred

dpc

affected DPC

ssn

affected SSN

Previous page Next page