This document covers the alarms and statistics generated by the Sh Cache Microservice and its resource adapter.

Topics

This document includes the following topics:

Topic Explains…​

Sh Cache Microservice Alarms and Statistics

Sh Cache Microservice Alarms and Statistics

Sh Cache Microservice RA Alarms and Statistics

Sh Cache Microservice RA Alarms and Statistics

For documentation about VM-wide SNMP alarms and statistics, see ShCM Services and Components.

Sh Cache Microservice Alarms and Statistics

Notices

Copyright © 2014-2019 Metaswitch Networks. All rights reserved

This manual is issued on a controlled basis to a specific person on the understanding that no part of the Metaswitch Networks product code or documentation (including this manual) will be copied or distributed without prior agreement in writing from Metaswitch Networks.

Metaswitch Networks reserves the right to, without notice, modify or revise all or part of this document and/or change product features or specifications and shall not be responsible for any loss, cost, or damage, including consequential damage, caused by reliance on these materials.

Metaswitch and the Metaswitch logo are trademarks of Metaswitch Networks. Other brands and products referenced herein are the trademarks or registered trademarks of their respective holders.

Sh Cache Microservice Alarms and Statistics Overview

The Sh Cache Microservice (ShCM) provides a caching layer in front of a Home Subscriber Server (HSS) for certain queries over Diameter Sh.

Note

For a high level architecture overview see: Sh Cache Service Overview

ShCM is realised as two Services running in a Rhino TAS.

  • The Cache Service processes Diameter Sh/HTTP triggers received by clients.

  • The Notification Service processes Diameter Sh triggers received from the HSS.

Each software component in the ShCM collects statistics. Components related to the interfaces between the ShCM and external entities may generate alarms.

The following diagram shows the software components used in the Cache Service.

Cache Service components
Figure 1. Cache Service Components

The following diagram shows the software components used in the Notification Service.

Notification Service components
Figure 2. Notification Service Components

This manual provides a complete list of alarms and statistics supported by the ShCM.

Sh Cache Microservice Statistics

Each component of the Sh Cache Microservice collects and reports statistics. These statistics can be monitored using Rhino statistics tools and via SNMP.

The following sections are a summary of the statistics collected by each component. Included is the name, the type of statistic, the OID (for monitoring via SNMP) and a brief description.

The Cache Invalidator SBB

Base OID: 1.3.6.1.4.1.19808.11.1.3

Name Type OID Description

InvalidateRequest

Counter

.2

Incremented when a request is made to invalidate caches, using the Management API.

Cache Result Collator SBB

Base OID: 1.3.6.1.4.1.19808.11.1.4

Name Type OID Description

ReaderCreated

Counter

.2

Incremented when a Reader SBB is created, every time a Read request is made.

UpdaterCreated

Counter

.3

Incremented when an Updater SBB is created, every time an Update request is made.

SubscriberCreated

Counter

.4

Incremented when when a Subscriber SBB is created, every time a Subscribe to UE Reachability request is made.

The Cassandra Cache Access SBB

Base OID: 1.3.6.1.4.1.19808.11.1.5

Name Type OID Description

Read

Counter

.2

Incremented on a request to read from the Cassandra cache.

Write

Counter

.3

Incremented on a request to write to the Cassandra cache.

ReadSubscription

Counter

.4

Incremented on a request to read a subscription from the Cassandra cache.

WriteAndSubscribe

Counter

.5

Incremented on a request to both write data and a subscription to the Cassandra cache.

Delete

Counter

.6

Incremented on a request to delete data from the Cassandra cache.

QuerySucceeded

Counter

.7

Incremented if a Cassandra query regarding the Cassandra cache was successful.

QueryFailed

Counter

.8

Incremented if a Cassandra query regarding the Cassandra cache failed. This indicates a problem with Cassandra that will impact service, and warrants further investigation using the Rhino logs.

InternalFailure

Counter

.9

Incremented on an internal failure in the part of the Sh Cache Microservice responsible for maintaining the Cassandra cache. This will impact service and warrants further investigation using the Rhino logs.

DeleteIncludingSubscription

Counter

.10

Incremented on a request to delete data and subscription from the cache.

The Cassandra Subscription Access SBB

Base OID: 1.3.6.1.4.1.19808.11.1.6

Name Type OID Description

Read

Counter

.2

Incremented on a request to read information about any existing subscriptions to the HSS for UE Reachability For IP from the Cassandra database.

Write

Counter

.3

Incremented on a request to write information about a new subscription to the HSS for UE Reachability For IP to the Cassandra database.

QuerySucceeded

Counter

.4

Incremented if a Cassandra query regarding UE Reachability For IP was successful.

QueryFailed

Counter

.5

Incremented if a Cassandra query regarding UE Reachability For IP failed.

InternalFailure

Counter

.6

Incremented on an internal failure in the part of the Sh Cache Microservice responsible for maintaining the Cassandra database with information about subscriptions to the HSS for UE Reachability For IP. This will impact service and warrants further investigation using the Rhino logs.

The HSS Reader SBB

Base OID: 1.3.6.1.4.1.19808.11.1.7

Name Type OID Description

ReadRequest

Counter

.2

Incremented on a read request, which implies the Sh Cache Microservice will either read from the cache or from the HSS.

ReadFromCache

Counter

.3

Incremented on attempting to read from the cache.

ReReadFromCache

Counter

.4

Incremented on attempting to re-read from the cache. This is done after an initial cache miss, to ensure proper handling of concurrent requests.

NotCaching

Counter

.5

Incremented if this request does not require caching, due to configuration.

CacheMiss

Counter

.6

Incremented if there was no data in the cache matching the read request.

CacheHit

Counter

.7

Incremented if data was read from the cache that matched the read request.

UpdateCache

Counter

.10

Incremented when trying to update the cache.

UpdateCacheAndSubscribe

Counter

.11

Incremented when trying to update the cache and subscribe for future updates.

CacheReadFailed

Counter

.12

Incremented on a failure reading from the cache. This indicates a problem with the cache, impacting service, which should be investigated using the Rhino logs.

CacheUpdateFailed

Counter

.13

Incremented on a failure to update the cache. This indicates a problem with the cache. While this does not directly impact service, it does affect performance as less data will be cached. The cause can be investigated using the Rhino logs.

CacheWriteAndSubscribeFailed

Counter

.14

Incremented on a failure to update the cache and subscribe for future updates. This indicates a problem with the cache. While this does not directly impact service, it does affect performance as less data will be cached. The cause can be investigated using the Rhino logs.

LockExpired

Counter

.15

Incremented if a lock that was acquired expires. This indicates that a request took too long, due to issues with the Sh Cache Microservice, Cassandra or the HSS. Repeated occurrences impact service and warrant investigation using the Rhino logs.

LockAcquireFailed

Counter

.16

Incremented if there was a failure trying to acquire a lock Repeated occurrences impact service and should be investigated using the Lock Provider SBB statistics.

ShRequestTimeout

Counter

.17

Incremented if a request to the HSS timed out. This indicates a problem with the HSS impacting service, which should be investigated.

SendUDR

Counter

.18

Incremented if a Diameter Sh UDR is sent.

SendSNR

Counter

.19

Incremented if a Diameter Sh SNR is sent.

ReceivedUDA

Counter

.20

Incremented if a Diameter Sh UDA (success or failure) is received.

ReceivedSNA

Counter

.21

Incremented if a Diameter Sh SNA (success or failure) is received.

UDASuccessReceived

Counter

.22

Incremented if a Diameter UDA success was received.

UDAFailureReceived

Counter

.23

Incremented if a Diameter UDA failure was received. This could indicate either problems with the HSS, impacting service, or subscriber-specific issues such as data being absent for the requested subscriber.

SNASuccessReceived

Counter

.24

Incremented if a Diameter SNA success was received.

SNAFailureReceived

Counter

.25

Incremented if a Diameter SNA failure was received. This could indicate either problems with the HSS, impacting service, or subscriber-specific issues such as data being absent for the requested subscriber.

SNAFailureReceivedConvertedIntoSuccess

Counter

.26

Incremented if a Diameter SNA failure with result code 5106 was received, which will be passed upstream as a success response with empty body. This is expected behaviour and should not be treated as a failure.

The HSS Subscriber SBB

Base OID: 1.3.6.1.4.1.19808.11.1.8

Name Type OID Description

SubscribeRequest

Counter

.2

Incremented on a subscribe request for UE Reachability for IP.

SendSNR

Counter

.3

Incremented when sending an SNR for UE Reachability for IP.

SubscriptionUpdateSuccess

Counter

.4

Incremented when a success SNA is received for UE Reachability for IP.

SubscriptionUpdateFailed

Counter

.5

Incremented when a failure SNA is received for UE Reachability for IP.

UpdateCacheOnSuccessSNA

Counter

.6

Incremented when attempting to store the subscription for UE Reachability For IP in the database after receiving a success SNA.

ResultExpiredOnSuccessSNA

Counter

.7

Incremented if the result in a success SNA for UE Reachability For IP has expired by the time the Sh Cache Microservice receives it.

FailureSNA

Counter

.8

Incremented a failure SNA for UE Reachability For IP is received. This could indicate either problems with the HSS, impacting service, or subscriber-specific issues such as data being absent for the requested subscriber.

ShRequestTimeout

Counter

.9

Incremented if a request to the HSS timed out. This indicates a problem with the HSS impacting service, which should be investigated.

The HSS Updater SBB

Base OID: 1.3.6.1.4.1.19808.11.1.9

Name Type OID Description

UpdateRequest

Counter

.2

Incremented on an update request, which implies that the Sh Cache Microservice will send a PUR to the HSS and possibly update the cache afterwards.

ReadFromCache

Counter

.3

Incremented on attempting to read from the cache while preparing to update the HSS.

PUASuccessReceived

Counter

.4

Incremented if a PUA success answer was received.

PUAFailureReceived

Counter

.5

Incremented if a PUA failure answer was received. This could indicate either problems with the HSS, impacting service, or subscriber-specific issues such as the requested subscriber not being provisioned.

TransparentDataOutOfSync

Counter

.6

Incremented if a Diameter failure answer 5105 was received. This implies that the caller of the service sent data with the wrong sequence number, or data was being concurrently updated by an external source.

UpdateCache

Counter

.7

Incremented when trying to update the cache after the HSS has been updated.

CacheReadFailed

Counter

.8

Incremented on a failure reading from the cache while preparing to update the HSS. This indicates a problem with the cache, impacting service, which should be investigated using the Rhino logs.

CacheUpdateFailed

Counter

.9

Incremented on a failure to update the cache. The cache entry for this subscriber will be purged when this happens, so service will not be affected. However, performance will be impacted because less data will be cached.

LockExpired

Counter

.10

Incremented if a lock that was acquired expires. This indicates that a request took too long, due to issues with the Sh Cache Microservice, Cassandra or the HSS. Repeated occurrences impact service and warrant investigation using the Rhino logs.

LockAcquireFailed

Counter

.11

Incremented if there was a failure trying to acquire a lock Repeated occurrences impact service and should be investigated using the Lock Provider SBB statistics.

ShRequestTimeout

Counter

.12

Incremented if a request to the HSS timed out. This indicates a problem with the HSS impacting service, which should be investigated.

SentPUR

Counter

.13

Incremented if a Diameter Sh PUR is sent

SubscriptionCacheUpdateDeferred

Counter

.14

Incremented when creation of subscription cache entry was deferred, to force the next read to populate the cache. This is normal behaviour in some specific cases and does not indicate any problems.

DeleteCache

Counter

.15

Incremented when trying to delete the cache entry.

ReceivedPUA

Counter

.16

Incremented if a Diameter Sh PUA (success or failure) is received.

DeletingFromHSS

Counter

.17

Incremented if a Diameter PUR with no data is sent, deleting the entry from the HSS.

The HTTP Router Frontend SBB

Base OID: 1.3.6.1.4.1.19808.11.1.10

Name Type OID Description

GetRequest

Counter

.2

Incremented on receipt of an HTTP GET request (which could be either a User Data Request or a Health Check Request.

PutRequest

Counter

.3

Incremented on receipt of an HTTP PUT request (a Profile Update Request).

PostRequest

Counter

.4

Incremented on receipt of an HTTP POST request (a Subscribe to UE Reachability Request).

DeleteRequest

Counter

.5

Incremented on receipt of an HTTP DELETE request (a Cache Invalidation Request).

SendHttpResponse

Counter

.6

Incremented when an HTTP response is sent.

FailedToSendHttpResponse

Counter

.7

Incremented if an HTTP response could not be sent This indicates an internal Sh Cache Microservice issue which will affect service. Diagnose the issue using the Rhino logs.

CacheReadSuccess

Counter

.8

Incremented if the reader component was successful in reading the requested data either from the cache or from the HSS.

CacheReadFailed

Counter

.9

Incremented if the reader component failed to read the requested data from the cache and from the HSS. This indicates an issue affecting service, which should be diagnosed using the Reader SBB statistics.

CacheUpdateSuccess

Counter

.10

Incremented if the update component was successful in updating the supplied data in the HSS and in the cache.

CacheUpdateFailed

Counter

.11

Incremented if the update component failed to update the supplied data in either the HSS or in the cache. This indicates an issue affecting service, which should be diagnosed using the Updater SBB statistics.

CacheSubscribeSuccess

Counter

.12

Incremented if the subscriber component was successful in performing a subscription for UE Reachability For IP.

CacheSubscribeFailed

Counter

.13

Incremented if the subscriber component failed to process a subscription for UE Reachability For IP. This indicates an issue affecting service, which should be diagnosed using the Subscriber SBB statistics.

CacheInvalidateSuccess

Counter

.14

Incremented if the invalidate component was successful in invalidating the cache.

CacheInvalidateFailed

Counter

.15

Incremented if the invalidate component failed to invalidate the cache. This issue can be diagnosed using the Cache Invalidator SBB statistics.

MswExchangeIdNotValid

Counter

.16

Incremented if the X_MSW_EXCHANGE_ID is not a valid UUID. This will not affect service, but will affect SAS tracing.

MswMessageIdNotValid

Counter

.17

Incremented if the X_MSW_MESSAGE_ID is not a valid UUID This will not affect service, but will affect SAS tracing.

MswSpanIdNotValid

Counter

.18

Incremented if the X_MSW_SPAN_ID is not a valid UUID This will not affect service, but will affect SAS tracing.

HealthCheckUp

Counter

.19

Incremented when a health check call to '/infra/up' is made.

HealthCheckReady

Counter

.20

Incremented when a health check call to '/infra/ready' is made.

The Lock Provider SBB

Base OID: 1.3.6.1.4.1.19808.11.1.11

Name Type OID Description

AcquireSuccess

Counter

.2

Incremented when a lock is successfully acquired.

AcquireFailed

Counter

.3

Incremented when a lock failed to be acquired, due to a failure within Cassandra or the Sh Cache Microservice. As service will be impacted, this should be investigated using the Rhino logs.

ReleaseSuccess

Counter

.4

Incremented when a lock is successfully released.

ReleaseFailed

Counter

.5

Incremented when an attempt to release a lock failed. This indicates an internal failure in the Sh Cache Microservice or a failure in Cassandra. As locks are not released properly, this will affect repeated requests for the same subscriber and will therefore affect performance.

RetryRead

Counter

.6

Incremented when the Sh Cache Microservice backs off and retries reading a lock. This occurs when a lock cannot immediately be acquired because a concurrent request for this subscriber and service indication are currently in progress. This is expected behaviour and does not indicate a failure.

RetryLimitReached

Counter

.7

Incremented when the Sh Cache Microservice reaches the limit of the number of times to retry reading a lock. This either indicates an internal failure, or indicates that too many concurrent requests are taking place for the same subscriber and service indication.

CASWriteTimeout

Counter

.8

Incremented when a Cassandra WriteTimeoutException occurs on a CAS operation. After receiving this timeout, the lock status gets verified and other failure statistics will indicate any issues.

The HSS Notification SBB

Base OID: 1.3.6.1.4.1.19808.11.1.12

Name Type OID Description

PushNotificationRequest

Counter

.2

Incremented on receipt of a Push Notification Request.

SendFailurePNA

Counter

.3

Incremented on sending a failure PNA. This indicates that the HSS sent the Sh Cache Microservice an invalid Push Notification Request, or that there is an internal failure which will cause one of the failure stats below to be increased as well. This will affect service.

RetrieveSubscribers

Counter

.4

Incremented when retrieving subscriber data from the database on receipt of a Push Notification Request for UE Reachability For IP.

FailedOnUnsupportedCacheStrategy

Counter

.5

Incremented when the cache strategy related to a PNR, which is not for UE Reachability For IP, is not Subscription Cache. Will also send a failure PNA. This can occur when a cache strategy is switched from a subscription cache to a different cache type, and will not impact service.

CheckForRowInSubscriberCache

Counter

.6

Incremented when checking whether the Sh Cache Microservice holds a subscription record for the received Push Notification Request, for a data reference different from UE Reachability For IP.

NotSubscribed

Counter

.7

Incremented when the Sh Cache Microservice does not hold a subscription record to the data reference related to the PNR. This does not affect service, but could indicate an underlying problem in the communication between the Sh Cache Microservice and the HSS.

ReadFromSubscriptionCacheFailed

Counter

.8

Incremented on failing to read from the subscription database for a data reference different from UE Reachability For IP. This indicates a failure in the Sh Cache Microservice or Cassandra impacting service, and should be investigated using the Rhino logs.

ReadFromCacheFailed

Counter

.9

Incremented on failing to read from the cache. This indicates an issue affecting service, which should be diagnosed using the Reader SBB statistics.

SequenceNumberMatchesCacheIgnorePNR

Counter

.10

Incremented when the sequence number of repository data in the PNR matches the cache, and therefore the cache will not be updated. This could be due to a PNR following an PUR made by the Sh Cache Microservice, so does not necessarily imply a problem.

SequenceNumberIsNotNewerThanCache

Counter

.11

Incremented if the repository data in the PNR does not represent a more recent update that what is in the cache. The Sh Cache Microservice replies with a failure PNA. This could imply an issue with the HSS.

UpdateCache

Counter

.12

Incremented when updating the cache.

CreateNewCacheEntry

Counter

.13

Incremented when adding a new cache entry.

CacheUpdateSuccess

Counter

.14

Incremented when the cache was successfully updated.

CacheUpdateFailed

Counter

.15

Incremented when there was a failure updating the cache This indicates an issue affecting service, which should be diagnosed using the Updater SBB statistics.

FailedOnCreateSbb

Counter

.16

Incremented if there is an internal error creating a child SBB. This will impact service and should be diagnosed using the Rhino logs.

LockAcquired

Counter

.17

Incremented when a lock is acquired.

RetrieveDataFromCache

Counter

.18

Incremented when retrieving data from the cache.

LockExpired

Counter

.19

Incremented if a lock that was acquired expires. This indicates that a request took too long, due to issues with the Sh Cache Microservice, Cassandra or the HSS. Repeated occurrences impact service and warrant investigation using the Rhino logs.

LockAcquireFailed

Counter

.20

Incremented if there was a failure trying to acquire a lock Repeated occurrences impact service and should be investigated using the Lock Provider SBB statistics.

SubscriptionReadFailed

Counter

.22

Incremented when a failure occurred trying to read the subscription database for UE Reachability For IP. This issue, which will impact service, can be diagnosed using the Notification SBB statistics.

DeleteCache

Counter

.23

Incremented when trying to delete the cache and subscription entries.

DeleteCacheSuccess

Counter

.24

Incremented when cache and subscription entries successfully deleted.

DeleteCacheFailed

Counter

.25

Incremented when cache and subscription entries failed to be deleted.

The Cache Health Check SBB

Base OID: 1.3.6.1.4.1.19808.11.1.13

Name Type OID Description

HealthCheckReadyStart

Counter

.2

Incremented when a ready request is made to the Sh Cache Microservice health check.

SentCassandraQuery

Counter

.3

Incremented when a test Cassandra query is made.

CassandraSuccess

Counter

.4

Incremented when a test Cassandra query was successful.

CassandraError

Counter

.5

Incremented when a test Cassandra query was not successful. This indicates a problem with Cassandra that could impact service, if actual traffic is sent to this instance.

SentHSSQuery

Counter

.6

Incremented when a test UserDataRequest is sent to the HSS.

HSSSuccess

Counter

.7

Incremented when a test UserDataRequest to the HSS was successful.

HSSError

Counter

.8

Incremented when a test UserDataRequest to the HSS was not successful. This indicates a problem with the HSS that could impact service, if actual traffic is sent to this instance.

The HTTP Notify SBB

Base OID: 1.3.6.1.4.1.19808.11.1.14

Name Type OID Description

MalformedURL

Counter

.2

Incremented if a notification for UE Reachability For IP cannot be sent because of a malformed URL. This indicates a problem with the client who sent the original subscription request.

SendNotifyRequest

Counter

.3

Incremented when sending a notify for UE Reachability For IP as an HTTP request.

Sh Cache Microservice Alarms

This document provides detailed information on the Alarms raised by a ShCM deployment.

The following is described for each alarm:

  • Alarm Type — A type associated with the alarm

  • Level — The severity of the alarm

  • Message — The mesage associated with the alarm

  • Description — A description what the alarm signifies

  • Raised — The condition that causes the alarm to be raised

  • Cleared — The condition that causes the alarm to be cleared

For a list of Rhino TAS alarms see: Rhino Alarm List.

For alarms related to each ShCM interface see:

Note

Create your own alarms based on statistics with Threshold Alarms.

Source: OpenCloud Diameter Sh

Alarm Type

active-reconfiguration

Level

WARNING

Message

Updates to config profile value ${instance} will not take effect until the RA entity is restarted.

Description

Configuration updates have been made to the RA entity that require it to be restarted

Raised

When a fixed configuration property (such as the host or realm) is changed while the RA entity is active

Cleared

When the RA entity is deactivated

Alarm Type

diameter.misconfiguration

Level

CRITICAL

Message

Diameter RA configuration error. Update the RA entity with a valid configuration to resolve. Reason(s): %s

Description

The RA entity could not be configured or activated due to a configuration error

Raised

When an exception occurs during RA entity creation or activation

Cleared

When a valid configuration is installed by a configuration update

Alarm Type

diameter.unlicensed

Level

CRITICAL

Message

RA entity %s does not have valid license

Description

RA entity does not have a valid license.

Raised

When trying to activate RA entity without a valid License.

Cleared

When RA entity is deactivated.

Alarm Type

diameter.peer.connectiondown

Level

WARNING

Message

Connection to %s:%d is down

Description

Raised when unable to establish a peer connection.

Raised

When peer connection fails.

Cleared

When peer connection is established or Resource Adaptor is deactivated.

Alarm Type

active-reconfiguration

Level

WARNING

Message

Updates to %s ${instance} will not take effect until the RA entity is restarted.

Description

Configuration updates have been made to the RA entity that require it to be restarted

Raised

When a fixed configuration property is changed while the RA entity is active

Cleared

When the RA entity is deactivated

Alarm Type

misconfiguration

Level

CRITICAL

Message

RA configuration error, operational functionality disabled. Update the RA entity with a valid configuration to resolve.

Description

The RA entity could not be configured or activated due to a configuration error

Raised

When an exception occurs during RA entity creation or activation

Cleared

When a valid configuration is installed by a configuration update

Alarm Type

misconfiguration

Level

MINOR

Message

RA configuration update failed. Continuing operational functionality using the last valid configuration.

Description

The RA entity configuration was not updated due to a configuration error

Raised

When an exception occurs during a configuration update

Cleared

When the RA entity is deactivated, or a valid configuration update is applied

Alarm Type

update-ignored

Level

MINOR

Message

Update to %s ${instance} failed, continuing with its last valid configuration

Description

An RA component failed a configuration update, but the component can ignore failures

Raised

When an exception occurs updating a component that is allowed to ignore failed updates

Cleared

When the RA entity is deactivated, or the component is updated with a valid configuration

Source: OpenCloud HTTP

Alarm Type

active-reconfiguration

Level

WARNING

Message

Updates to %s ${instance} will not take effect until the RA entity is restarted.

Description

Configuration updates have been made to the RA entity that require it to be restarted

Raised

When a fixed configuration property is changed while the RA entity is active

Cleared

When the RA entity is deactivated

Alarm Type

misconfiguration

Level

CRITICAL

Message

RA configuration error, operational functionality disabled. Update the RA entity with a valid configuration to resolve.

Description

The RA entity could not be configured or activated due to a configuration error

Raised

When an exception occurs during RA entity creation or activation

Cleared

When a valid configuration is installed by a configuration update

Alarm Type

misconfiguration

Level

MINOR

Message

RA configuration update failed. Continuing operational functionality using the last valid configuration.

Description

The RA entity configuration was not updated due to a configuration error

Raised

When an exception occurs during a configuration update

Cleared

When the RA entity is deactivated, or a valid configuration update is applied

Alarm Type

update-ignored

Level

MINOR

Message

Update to %s ${instance} failed, continuing with its last valid configuration

Description

An RA component failed a configuration update, but the component can ignore failures

Raised

When an exception occurs updating a component that is allowed to ignore failed updates

Cleared

When the RA entity is deactivated, or the component is updated with a valid configuration

Source: OpenCloud cassandra-cql-ra

Alarm Type

CassandraCQLRA.ConnectToCluster

Level

CRITICAL

Message

Not connected to Cassandra. %s

Description

Unable to connect to the Cassandra cluster

Raised

When an attempt to query the Cassandra cluster fails due to no cluster hosts being available

Cleared

On the next successful attempt to connect to the Cassandra cluster

Alarm Type

CassandraCQLRA.ConnectToNode

Level

CRITICAL

Message

Not connected to Cassandra node: %s

Description

Unable to connect to the Cassandra node

Raised

When a specified Cassandra node in the cluster cannot be reached

Cleared

On the next successful attempt to this node, or when the node is removed from the cluster

Alarm Type

active-reconfiguration

Level

WARNING

Message

Updates to %s ${instance} will not take effect until the RA entity is restarted.

Description

Configuration updates have been made to the RA entity that require it to be restarted

Raised

When a fixed configuration property is changed while the RA entity is active

Cleared

When the RA entity is deactivated

Alarm Type

misconfiguration

Level

CRITICAL

Message

RA configuration error, operational functionality disabled. Update the RA entity with a valid configuration to resolve.

Description

The RA entity could not be configured or activated due to a configuration error

Raised

When an exception occurs during RA entity creation or activation

Cleared

When a valid configuration is installed by a configuration update

Alarm Type

misconfiguration

Level

MINOR

Message

RA configuration update failed. Continuing operational functionality using the last valid configuration.

Description

The RA entity configuration was not updated due to a configuration error

Raised

When an exception occurs during a configuration update

Cleared

When the RA entity is deactivated, or a valid configuration update is applied

Alarm Type

update-ignored

Level

MINOR

Message

Update to %s ${instance} failed, continuing with its last valid configuration

Description

An RA component failed a configuration update, but the component can ignore failures

Raised

When an exception occurs updating a component that is allowed to ignore failed updates

Cleared

When the RA entity is deactivated, or the component is updated with a valid configuration

Sh Cache Microservice RA Alarms and Statistics

Notices

Copyright © 2014-2019 Metaswitch Networks. All rights reserved

This manual is issued on a controlled basis to a specific person on the understanding that no part of the Metaswitch Networks product code or documentation (including this manual) will be copied or distributed without prior agreement in writing from Metaswitch Networks.

Metaswitch Networks reserves the right to, without notice, modify or revise all or part of this document and/or change product features or specifications and shall not be responsible for any loss, cost, or damage, including consequential damage, caused by reliance on these materials.

Metaswitch and the Metaswitch logo are trademarks of Metaswitch Networks. Other brands and products referenced herein are the trademarks or registered trademarks of their respective holders.

Sh Cache Microservice RA Statistics

The Sh Cache Microservice RA collects and reports statistics. These statistics can be monitored using Rhino statistics tools and via SNMP. The SNMP definitions for these statistics can be exported to the MIB format using the Rhino console export-mibs command.

The following sections are a summary of the statistics collected by the Sh Cache Microservice RA. Included is the name, the type of statistic, the OID (for monitoring via SNMP) and a brief description.

Sh Cache Microservice RA

The Sh Cache Microservice RA uses the Rhino system of static SNMP OIDs to define mappings between OIDs and Rhino components. This model is discussed in further detail in the Rhino documentation section: Static OID Model.

The Sh Cache Microservice RA usage parameter set type OID is defined as such:

Base OID:

1.3.6.1.4.1.19808.20.5.10.10.3.1

Intermediate OID Suffix for resource adaptor entity:

1 for a primary ShCM resource adaptor entity, 2 for a secondary ShCM resource adaptor entity, etc.

Parameter Set Type OID Suffix:

1

Overall OID for a primary ShCM resource adaptor entity’s usage statistics:

1.3.6.1.4.1.19808.20.5.10.10.3.1.1.1

Usage Interface Counter statistics

Name Type OID Description

StartActivityOk

Counter

.2

Incremented when an activity is started successfully

StartActivityFail

Counter

.3

Incremented when an activity fails to start

FireOk

Counter

.4

Incremented when an event is fired successfully

FireFail

Counter

.5

Incremented when an event fails to fire

QLEnd

Counter

.6

Incremented when an activity is ended when queried for liveness

UDROut

Counter

.7

Incremented when an UDR (i.e. a GET request) is sent

UDAIn

Counter

.8

Incremented when a UDA (i.e. a GET response) is received

PUROut

Counter

.9

Incremented when a PUR (i.e. a PUT request) is sent

PUAIn

Counter

.10

Incremented when a PUA (i.e. a PUT response) is received

SNROut

Counter

.11

Incremented when an SNR (i.e. a POST request) is sent

SNAIn

Counter

.12

Incremented when an SNA (i.e. a POST response) is received

InvalidateOut

Counter

.13

Incremented when a cache invalidation request (i.e. a DELETE request) is sent

InvalidateIn

Counter

.14

Incremented when a cache invalidation response (i.e. a DELETE response) is received

PNRIn

Counter

.15

Incremented when a PNR (i.e. a POST request) is received

PNAOut

Counter

.16

Incremented when a PNA (i.e. a POST response) is sent

Sample statistics

Name Source units Display units Description

GetRequestResponseTime

Milliseconds

Milliseconds

Sample of the elapsed time between sending a GET request and receiving a GET response

UpdateRequestResponseTime

Milliseconds

Milliseconds

Sample of the elapsed time between sending an update request (i.e. a PUT request) and sending a PUT response

SubscribeRequestResponseTime

Milliseconds

Milliseconds

Sample of the elapsed time between sending a subscribe request (i.e. a POST request) and receiving a POST response

InvalidateRequestResponseTime

Milliseconds

Milliseconds

Sample of the elapsed time between sending an invalidation request (i.e. a DELETE request) and receiving a DELETE response

Sh Cache Microservice RA Alarms

The Sh Cache Microservice RA can raise the following alarms if it encounters any issues.

Category Level Alarm Type Message

ShCMRA

CRITICAL

ShCMRA.ShCMConnectFailed

"Not connected to any instances of the configured Sh Cache Microservice host."

ShCMRA

CRITICAL

ShCMRA.ProxyConnectFailed

"Not connected to the configured proxy."

RA Framework

WARNING

active-reconfiguration

"Updates to %s "${instance}" will not take effect until the RA entity is restarted."

RA Framework

CRITICAL

misconfiguration

"RA configuration error, operational functionality disabled. Update the RA entity with a valid configuration to resolve."

RA Framework

MINOR

misconfiguration

"RA configuration update failed. Continuing operational functionality using the last valid configuration."

RA Framework

MINOR

update-ignored

"Update to %s "${instance}" failed, continuing with its last valid configuration"