This document covers the alarms and statistics generated by the Sh Cache Microservice and its resource adapter.
Topics
This document includes the following topics:
Topic | Explains… |
---|---|
Sh Cache Microservice Alarms and Statistics |
|
Sh Cache Microservice RA Alarms and Statistics |
For documentation about VM-wide SNMP alarms and statistics, see ShCM Services and Components.
Notices
Copyright © 2014-2019 Metaswitch Networks. All rights reserved
This manual is issued on a controlled basis to a specific person on the understanding that no part of the Metaswitch Networks product code or documentation (including this manual) will be copied or distributed without prior agreement in writing from Metaswitch Networks.
Metaswitch Networks reserves the right to, without notice, modify or revise all or part of this document and/or change product features or specifications and shall not be responsible for any loss, cost, or damage, including consequential damage, caused by reliance on these materials.
Metaswitch and the Metaswitch logo are trademarks of Metaswitch Networks. Other brands and products referenced herein are the trademarks or registered trademarks of their respective holders.
Sh Cache Microservice Alarms and Statistics Overview
The Sh Cache Microservice (ShCM) provides a caching layer in front of a Home Subscriber Server (HSS) for certain queries over Diameter Sh.
For a high level architecture overview see: Sh Cache Service Overview |
ShCM is realised as two Services running in a Rhino TAS.
-
The Cache Service processes Diameter Sh/HTTP triggers received by clients.
-
The Notification Service processes Diameter Sh triggers received from the HSS.
Each software component in the ShCM collects statistics. Components related to the interfaces between the ShCM and external entities may generate alarms.
The following diagram shows the software components used in the Cache Service.
The following diagram shows the software components used in the Notification Service.
This manual provides a complete list of alarms and statistics supported by the ShCM.
Sh Cache Microservice Statistics
The following sections are a summary of the statistics collected by each component. Included is the name, the type of statistic, the OID (for monitoring via SNMP) and a brief description.
The Cache Invalidator SBB
Base OID: 1.3.6.1.4.1.19808.11.1.3
Name | Type | OID | Description |
---|---|---|---|
InvalidateRequest |
Counter |
.2 |
Incremented when a request is made to invalidate caches, using the Management API. |
Cache Result Collator SBB
Base OID: 1.3.6.1.4.1.19808.11.1.4
Name | Type | OID | Description |
---|---|---|---|
ReaderCreated |
Counter |
.2 |
Incremented when a Reader SBB is created, every time a Read request is made. |
UpdaterCreated |
Counter |
.3 |
Incremented when an Updater SBB is created, every time an Update request is made. |
SubscriberCreated |
Counter |
.4 |
Incremented when when a Subscriber SBB is created, every time a Subscribe to UE Reachability request is made. |
The Cassandra Cache Access SBB
Base OID: 1.3.6.1.4.1.19808.11.1.5
Name | Type | OID | Description |
---|---|---|---|
Read |
Counter |
.2 |
Incremented on a request to read from the Cassandra cache. |
Write |
Counter |
.3 |
Incremented on a request to write to the Cassandra cache. |
ReadSubscription |
Counter |
.4 |
Incremented on a request to read a subscription from the Cassandra cache. |
WriteAndSubscribe |
Counter |
.5 |
Incremented on a request to both write data and a subscription to the Cassandra cache. |
Delete |
Counter |
.6 |
Incremented on a request to delete data from the Cassandra cache. |
QuerySucceeded |
Counter |
.7 |
Incremented if a Cassandra query regarding the Cassandra cache was successful. |
QueryFailed |
Counter |
.8 |
Incremented if a Cassandra query regarding the Cassandra cache failed. This indicates a problem with Cassandra that will impact service, and warrants further investigation using the Rhino logs. |
InternalFailure |
Counter |
.9 |
Incremented on an internal failure in the part of the Sh Cache Microservice responsible for maintaining the Cassandra cache. This will impact service and warrants further investigation using the Rhino logs. |
DeleteIncludingSubscription |
Counter |
.10 |
Incremented on a request to delete data and subscription from the cache. |
The Cassandra Subscription Access SBB
Base OID: 1.3.6.1.4.1.19808.11.1.6
Name | Type | OID | Description |
---|---|---|---|
Read |
Counter |
.2 |
Incremented on a request to read information about any existing subscriptions to the HSS for UE Reachability For IP from the Cassandra database. |
Write |
Counter |
.3 |
Incremented on a request to write information about a new subscription to the HSS for UE Reachability For IP to the Cassandra database. |
QuerySucceeded |
Counter |
.4 |
Incremented if a Cassandra query regarding UE Reachability For IP was successful. |
QueryFailed |
Counter |
.5 |
Incremented if a Cassandra query regarding UE Reachability For IP failed. |
InternalFailure |
Counter |
.6 |
Incremented on an internal failure in the part of the Sh Cache Microservice responsible for maintaining the Cassandra database with information about subscriptions to the HSS for UE Reachability For IP. This will impact service and warrants further investigation using the Rhino logs. |
The HSS Reader SBB
Base OID: 1.3.6.1.4.1.19808.11.1.7
Name | Type | OID | Description |
---|---|---|---|
ReadRequest |
Counter |
.2 |
Incremented on a read request, which implies the Sh Cache Microservice will either read from the cache or from the HSS. |
ReadFromCache |
Counter |
.3 |
Incremented on attempting to read from the cache. |
ReReadFromCache |
Counter |
.4 |
Incremented on attempting to re-read from the cache. This is done after an initial cache miss, to ensure proper handling of concurrent requests. |
NotCaching |
Counter |
.5 |
Incremented if this request does not require caching, due to configuration. |
CacheMiss |
Counter |
.6 |
Incremented if there was no data in the cache matching the read request. |
CacheHit |
Counter |
.7 |
Incremented if data was read from the cache that matched the read request. |
UpdateCache |
Counter |
.10 |
Incremented when trying to update the cache. |
UpdateCacheAndSubscribe |
Counter |
.11 |
Incremented when trying to update the cache and subscribe for future updates. |
CacheReadFailed |
Counter |
.12 |
Incremented on a failure reading from the cache. This indicates a problem with the cache, impacting service, which should be investigated using the Rhino logs. |
CacheUpdateFailed |
Counter |
.13 |
Incremented on a failure to update the cache. This indicates a problem with the cache. While this does not directly impact service, it does affect performance as less data will be cached. The cause can be investigated using the Rhino logs. |
CacheWriteAndSubscribeFailed |
Counter |
.14 |
Incremented on a failure to update the cache and subscribe for future updates. This indicates a problem with the cache. While this does not directly impact service, it does affect performance as less data will be cached. The cause can be investigated using the Rhino logs. |
LockExpired |
Counter |
.15 |
Incremented if a lock that was acquired expires. This indicates that a request took too long, due to issues with the Sh Cache Microservice, Cassandra or the HSS. Repeated occurrences impact service and warrant investigation using the Rhino logs. |
LockAcquireFailed |
Counter |
.16 |
Incremented if there was a failure trying to acquire a lock Repeated occurrences impact service and should be investigated using the Lock Provider SBB statistics. |
ShRequestTimeout |
Counter |
.17 |
Incremented if a request to the HSS timed out. This indicates a problem with the HSS impacting service, which should be investigated. |
SendUDR |
Counter |
.18 |
Incremented if a Diameter Sh UDR is sent. |
SendSNR |
Counter |
.19 |
Incremented if a Diameter Sh SNR is sent. |
ReceivedUDA |
Counter |
.20 |
Incremented if a Diameter Sh UDA (success or failure) is received. |
ReceivedSNA |
Counter |
.21 |
Incremented if a Diameter Sh SNA (success or failure) is received. |
UDASuccessReceived |
Counter |
.22 |
Incremented if a Diameter UDA success was received. |
UDAFailureReceived |
Counter |
.23 |
Incremented if a Diameter UDA failure was received. This could indicate either problems with the HSS, impacting service, or subscriber-specific issues such as data being absent for the requested subscriber. |
SNASuccessReceived |
Counter |
.24 |
Incremented if a Diameter SNA success was received. |
SNAFailureReceived |
Counter |
.25 |
Incremented if a Diameter SNA failure was received. This could indicate either problems with the HSS, impacting service, or subscriber-specific issues such as data being absent for the requested subscriber. |
SNAFailureReceivedConvertedIntoSuccess |
Counter |
.26 |
Incremented if a Diameter SNA failure with result code 5106 was received, which will be passed upstream as a success response with empty body. This is expected behaviour and should not be treated as a failure. |
The HSS Subscriber SBB
Base OID: 1.3.6.1.4.1.19808.11.1.8
Name | Type | OID | Description |
---|---|---|---|
SubscribeRequest |
Counter |
.2 |
Incremented on a subscribe request for UE Reachability for IP. |
SendSNR |
Counter |
.3 |
Incremented when sending an SNR for UE Reachability for IP. |
SubscriptionUpdateSuccess |
Counter |
.4 |
Incremented when a success SNA is received for UE Reachability for IP. |
SubscriptionUpdateFailed |
Counter |
.5 |
Incremented when a failure SNA is received for UE Reachability for IP. |
UpdateCacheOnSuccessSNA |
Counter |
.6 |
Incremented when attempting to store the subscription for UE Reachability For IP in the database after receiving a success SNA. |
ResultExpiredOnSuccessSNA |
Counter |
.7 |
Incremented if the result in a success SNA for UE Reachability For IP has expired by the time the Sh Cache Microservice receives it. |
FailureSNA |
Counter |
.8 |
Incremented a failure SNA for UE Reachability For IP is received. This could indicate either problems with the HSS, impacting service, or subscriber-specific issues such as data being absent for the requested subscriber. |
ShRequestTimeout |
Counter |
.9 |
Incremented if a request to the HSS timed out. This indicates a problem with the HSS impacting service, which should be investigated. |
The HSS Updater SBB
Base OID: 1.3.6.1.4.1.19808.11.1.9
Name | Type | OID | Description |
---|---|---|---|
UpdateRequest |
Counter |
.2 |
Incremented on an update request, which implies that the Sh Cache Microservice will send a PUR to the HSS and possibly update the cache afterwards. |
ReadFromCache |
Counter |
.3 |
Incremented on attempting to read from the cache while preparing to update the HSS. |
PUASuccessReceived |
Counter |
.4 |
Incremented if a PUA success answer was received. |
PUAFailureReceived |
Counter |
.5 |
Incremented if a PUA failure answer was received. This could indicate either problems with the HSS, impacting service, or subscriber-specific issues such as the requested subscriber not being provisioned. |
TransparentDataOutOfSync |
Counter |
.6 |
Incremented if a Diameter failure answer 5105 was received. This implies that the caller of the service sent data with the wrong sequence number, or data was being concurrently updated by an external source. |
UpdateCache |
Counter |
.7 |
Incremented when trying to update the cache after the HSS has been updated. |
CacheReadFailed |
Counter |
.8 |
Incremented on a failure reading from the cache while preparing to update the HSS. This indicates a problem with the cache, impacting service, which should be investigated using the Rhino logs. |
CacheUpdateFailed |
Counter |
.9 |
Incremented on a failure to update the cache. The cache entry for this subscriber will be purged when this happens, so service will not be affected. However, performance will be impacted because less data will be cached. |
LockExpired |
Counter |
.10 |
Incremented if a lock that was acquired expires. This indicates that a request took too long, due to issues with the Sh Cache Microservice, Cassandra or the HSS. Repeated occurrences impact service and warrant investigation using the Rhino logs. |
LockAcquireFailed |
Counter |
.11 |
Incremented if there was a failure trying to acquire a lock Repeated occurrences impact service and should be investigated using the Lock Provider SBB statistics. |
ShRequestTimeout |
Counter |
.12 |
Incremented if a request to the HSS timed out. This indicates a problem with the HSS impacting service, which should be investigated. |
SentPUR |
Counter |
.13 |
Incremented if a Diameter Sh PUR is sent |
SubscriptionCacheUpdateDeferred |
Counter |
.14 |
Incremented when creation of subscription cache entry was deferred, to force the next read to populate the cache. This is normal behaviour in some specific cases and does not indicate any problems. |
DeleteCache |
Counter |
.15 |
Incremented when trying to delete the cache entry. |
ReceivedPUA |
Counter |
.16 |
Incremented if a Diameter Sh PUA (success or failure) is received. |
DeletingFromHSS |
Counter |
.17 |
Incremented if a Diameter PUR with no data is sent, deleting the entry from the HSS. |
The HTTP Router Frontend SBB
Base OID: 1.3.6.1.4.1.19808.11.1.10
Name | Type | OID | Description |
---|---|---|---|
GetRequest |
Counter |
.2 |
Incremented on receipt of an HTTP GET request (which could be either a User Data Request or a Health Check Request. |
PutRequest |
Counter |
.3 |
Incremented on receipt of an HTTP PUT request (a Profile Update Request). |
PostRequest |
Counter |
.4 |
Incremented on receipt of an HTTP POST request (a Subscribe to UE Reachability Request). |
DeleteRequest |
Counter |
.5 |
Incremented on receipt of an HTTP DELETE request (a Cache Invalidation Request). |
SendHttpResponse |
Counter |
.6 |
Incremented when an HTTP response is sent. |
FailedToSendHttpResponse |
Counter |
.7 |
Incremented if an HTTP response could not be sent This indicates an internal Sh Cache Microservice issue which will affect service. Diagnose the issue using the Rhino logs. |
CacheReadSuccess |
Counter |
.8 |
Incremented if the reader component was successful in reading the requested data either from the cache or from the HSS. |
CacheReadFailed |
Counter |
.9 |
Incremented if the reader component failed to read the requested data from the cache and from the HSS. This indicates an issue affecting service, which should be diagnosed using the Reader SBB statistics. |
CacheUpdateSuccess |
Counter |
.10 |
Incremented if the update component was successful in updating the supplied data in the HSS and in the cache. |
CacheUpdateFailed |
Counter |
.11 |
Incremented if the update component failed to update the supplied data in either the HSS or in the cache. This indicates an issue affecting service, which should be diagnosed using the Updater SBB statistics. |
CacheSubscribeSuccess |
Counter |
.12 |
Incremented if the subscriber component was successful in performing a subscription for UE Reachability For IP. |
CacheSubscribeFailed |
Counter |
.13 |
Incremented if the subscriber component failed to process a subscription for UE Reachability For IP. This indicates an issue affecting service, which should be diagnosed using the Subscriber SBB statistics. |
CacheInvalidateSuccess |
Counter |
.14 |
Incremented if the invalidate component was successful in invalidating the cache. |
CacheInvalidateFailed |
Counter |
.15 |
Incremented if the invalidate component failed to invalidate the cache. This issue can be diagnosed using the Cache Invalidator SBB statistics. |
MswExchangeIdNotValid |
Counter |
.16 |
Incremented if the X_MSW_EXCHANGE_ID is not a valid UUID. This will not affect service, but will affect SAS tracing. |
MswMessageIdNotValid |
Counter |
.17 |
Incremented if the X_MSW_MESSAGE_ID is not a valid UUID This will not affect service, but will affect SAS tracing. |
MswSpanIdNotValid |
Counter |
.18 |
Incremented if the X_MSW_SPAN_ID is not a valid UUID This will not affect service, but will affect SAS tracing. |
HealthCheckUp |
Counter |
.19 |
Incremented when a health check call to '/infra/up' is made. |
HealthCheckReady |
Counter |
.20 |
Incremented when a health check call to '/infra/ready' is made. |
The Lock Provider SBB
Base OID: 1.3.6.1.4.1.19808.11.1.11
Name | Type | OID | Description |
---|---|---|---|
AcquireSuccess |
Counter |
.2 |
Incremented when a lock is successfully acquired. |
AcquireFailed |
Counter |
.3 |
Incremented when a lock failed to be acquired, due to a failure within Cassandra or the Sh Cache Microservice. As service will be impacted, this should be investigated using the Rhino logs. |
ReleaseSuccess |
Counter |
.4 |
Incremented when a lock is successfully released. |
ReleaseFailed |
Counter |
.5 |
Incremented when an attempt to release a lock failed. This indicates an internal failure in the Sh Cache Microservice or a failure in Cassandra. As locks are not released properly, this will affect repeated requests for the same subscriber and will therefore affect performance. |
RetryRead |
Counter |
.6 |
Incremented when the Sh Cache Microservice backs off and retries reading a lock. This occurs when a lock cannot immediately be acquired because a concurrent request for this subscriber and service indication are currently in progress. This is expected behaviour and does not indicate a failure. |
RetryLimitReached |
Counter |
.7 |
Incremented when the Sh Cache Microservice reaches the limit of the number of times to retry reading a lock. This either indicates an internal failure, or indicates that too many concurrent requests are taking place for the same subscriber and service indication. |
CASWriteTimeout |
Counter |
.8 |
Incremented when a Cassandra WriteTimeoutException occurs on a CAS operation. After receiving this timeout, the lock status gets verified and other failure statistics will indicate any issues. |
The HSS Notification SBB
Base OID: 1.3.6.1.4.1.19808.11.1.12
Name | Type | OID | Description |
---|---|---|---|
PushNotificationRequest |
Counter |
.2 |
Incremented on receipt of a Push Notification Request. |
SendFailurePNA |
Counter |
.3 |
Incremented on sending a failure PNA. This indicates that the HSS sent the Sh Cache Microservice an invalid Push Notification Request, or that there is an internal failure which will cause one of the failure stats below to be increased as well. This will affect service. |
RetrieveSubscribers |
Counter |
.4 |
Incremented when retrieving subscriber data from the database on receipt of a Push Notification Request for UE Reachability For IP. |
FailedOnUnsupportedCacheStrategy |
Counter |
.5 |
Incremented when the cache strategy related to a PNR, which is not for UE Reachability For IP, is not Subscription Cache. Will also send a failure PNA. This can occur when a cache strategy is switched from a subscription cache to a different cache type, and will not impact service. |
CheckForRowInSubscriberCache |
Counter |
.6 |
Incremented when checking whether the Sh Cache Microservice holds a subscription record for the received Push Notification Request, for a data reference different from UE Reachability For IP. |
NotSubscribed |
Counter |
.7 |
Incremented when the Sh Cache Microservice does not hold a subscription record to the data reference related to the PNR. This does not affect service, but could indicate an underlying problem in the communication between the Sh Cache Microservice and the HSS. |
ReadFromSubscriptionCacheFailed |
Counter |
.8 |
Incremented on failing to read from the subscription database for a data reference different from UE Reachability For IP. This indicates a failure in the Sh Cache Microservice or Cassandra impacting service, and should be investigated using the Rhino logs. |
ReadFromCacheFailed |
Counter |
.9 |
Incremented on failing to read from the cache. This indicates an issue affecting service, which should be diagnosed using the Reader SBB statistics. |
SequenceNumberMatchesCacheIgnorePNR |
Counter |
.10 |
Incremented when the sequence number of repository data in the PNR matches the cache, and therefore the cache will not be updated. This could be due to a PNR following an PUR made by the Sh Cache Microservice, so does not necessarily imply a problem. |
SequenceNumberIsNotNewerThanCache |
Counter |
.11 |
Incremented if the repository data in the PNR does not represent a more recent update that what is in the cache. The Sh Cache Microservice replies with a failure PNA. This could imply an issue with the HSS. |
UpdateCache |
Counter |
.12 |
Incremented when updating the cache. |
CreateNewCacheEntry |
Counter |
.13 |
Incremented when adding a new cache entry. |
CacheUpdateSuccess |
Counter |
.14 |
Incremented when the cache was successfully updated. |
CacheUpdateFailed |
Counter |
.15 |
Incremented when there was a failure updating the cache This indicates an issue affecting service, which should be diagnosed using the Updater SBB statistics. |
FailedOnCreateSbb |
Counter |
.16 |
Incremented if there is an internal error creating a child SBB. This will impact service and should be diagnosed using the Rhino logs. |
LockAcquired |
Counter |
.17 |
Incremented when a lock is acquired. |
RetrieveDataFromCache |
Counter |
.18 |
Incremented when retrieving data from the cache. |
LockExpired |
Counter |
.19 |
Incremented if a lock that was acquired expires. This indicates that a request took too long, due to issues with the Sh Cache Microservice, Cassandra or the HSS. Repeated occurrences impact service and warrant investigation using the Rhino logs. |
LockAcquireFailed |
Counter |
.20 |
Incremented if there was a failure trying to acquire a lock Repeated occurrences impact service and should be investigated using the Lock Provider SBB statistics. |
SubscriptionReadFailed |
Counter |
.22 |
Incremented when a failure occurred trying to read the subscription database for UE Reachability For IP. This issue, which will impact service, can be diagnosed using the Notification SBB statistics. |
DeleteCache |
Counter |
.23 |
Incremented when trying to delete the cache and subscription entries. |
DeleteCacheSuccess |
Counter |
.24 |
Incremented when cache and subscription entries successfully deleted. |
DeleteCacheFailed |
Counter |
.25 |
Incremented when cache and subscription entries failed to be deleted. |
The Cache Health Check SBB
Base OID: 1.3.6.1.4.1.19808.11.1.13
Name | Type | OID | Description |
---|---|---|---|
HealthCheckReadyStart |
Counter |
.2 |
Incremented when a ready request is made to the Sh Cache Microservice health check. |
SentCassandraQuery |
Counter |
.3 |
Incremented when a test Cassandra query is made. |
CassandraSuccess |
Counter |
.4 |
Incremented when a test Cassandra query was successful. |
CassandraError |
Counter |
.5 |
Incremented when a test Cassandra query was not successful. This indicates a problem with Cassandra that could impact service, if actual traffic is sent to this instance. |
SentHSSQuery |
Counter |
.6 |
Incremented when a test UserDataRequest is sent to the HSS. |
HSSSuccess |
Counter |
.7 |
Incremented when a test UserDataRequest to the HSS was successful. |
HSSError |
Counter |
.8 |
Incremented when a test UserDataRequest to the HSS was not successful. This indicates a problem with the HSS that could impact service, if actual traffic is sent to this instance. |
The HTTP Notify SBB
Base OID: 1.3.6.1.4.1.19808.11.1.14
Name | Type | OID | Description |
---|---|---|---|
MalformedURL |
Counter |
.2 |
Incremented if a notification for UE Reachability For IP cannot be sent because of a malformed URL. This indicates a problem with the client who sent the original subscription request. |
SendNotifyRequest |
Counter |
.3 |
Incremented when sending a notify for UE Reachability For IP as an HTTP request. |
Sh Cache Microservice Alarms
Create your own alarms based on statistics with Threshold Alarms. |
Source: OpenCloud Diameter Sh
Alarm Type |
active-reconfiguration |
---|---|
Level |
WARNING |
Message |
Updates to config profile value ${instance} will not take effect until the RA entity is restarted. |
Description |
Configuration updates have been made to the RA entity that require it to be restarted |
Raised |
When a fixed configuration property (such as the host or realm) is changed while the RA entity is active |
Cleared |
When the RA entity is deactivated |
Alarm Type |
diameter.misconfiguration |
---|---|
Level |
CRITICAL |
Message |
Diameter RA configuration error. Update the RA entity with a valid configuration to resolve. Reason(s): %s |
Description |
The RA entity could not be configured or activated due to a configuration error |
Raised |
When an exception occurs during RA entity creation or activation |
Cleared |
When a valid configuration is installed by a configuration update |
Alarm Type |
diameter.unlicensed |
---|---|
Level |
CRITICAL |
Message |
RA entity %s does not have valid license |
Description |
RA entity does not have a valid license. |
Raised |
When trying to activate RA entity without a valid License. |
Cleared |
When RA entity is deactivated. |
Alarm Type |
diameter.peer.connectiondown |
---|---|
Level |
WARNING |
Message |
Connection to %s:%d is down |
Description |
Raised when unable to establish a peer connection. |
Raised |
When peer connection fails. |
Cleared |
When peer connection is established or Resource Adaptor is deactivated. |
Alarm Type |
active-reconfiguration |
---|---|
Level |
WARNING |
Message |
Updates to %s ${instance} will not take effect until the RA entity is restarted. |
Description |
Configuration updates have been made to the RA entity that require it to be restarted |
Raised |
When a fixed configuration property is changed while the RA entity is active |
Cleared |
When the RA entity is deactivated |
Alarm Type |
misconfiguration |
---|---|
Level |
CRITICAL |
Message |
RA configuration error, operational functionality disabled. Update the RA entity with a valid configuration to resolve. |
Description |
The RA entity could not be configured or activated due to a configuration error |
Raised |
When an exception occurs during RA entity creation or activation |
Cleared |
When a valid configuration is installed by a configuration update |
Alarm Type |
misconfiguration |
---|---|
Level |
MINOR |
Message |
RA configuration update failed. Continuing operational functionality using the last valid configuration. |
Description |
The RA entity configuration was not updated due to a configuration error |
Raised |
When an exception occurs during a configuration update |
Cleared |
When the RA entity is deactivated, or a valid configuration update is applied |
Alarm Type |
update-ignored |
---|---|
Level |
MINOR |
Message |
Update to %s ${instance} failed, continuing with its last valid configuration |
Description |
An RA component failed a configuration update, but the component can ignore failures |
Raised |
When an exception occurs updating a component that is allowed to ignore failed updates |
Cleared |
When the RA entity is deactivated, or the component is updated with a valid configuration |
Source: OpenCloud HTTP
Alarm Type |
active-reconfiguration |
---|---|
Level |
WARNING |
Message |
Updates to %s ${instance} will not take effect until the RA entity is restarted. |
Description |
Configuration updates have been made to the RA entity that require it to be restarted |
Raised |
When a fixed configuration property is changed while the RA entity is active |
Cleared |
When the RA entity is deactivated |
Alarm Type |
misconfiguration |
---|---|
Level |
CRITICAL |
Message |
RA configuration error, operational functionality disabled. Update the RA entity with a valid configuration to resolve. |
Description |
The RA entity could not be configured or activated due to a configuration error |
Raised |
When an exception occurs during RA entity creation or activation |
Cleared |
When a valid configuration is installed by a configuration update |
Alarm Type |
misconfiguration |
---|---|
Level |
MINOR |
Message |
RA configuration update failed. Continuing operational functionality using the last valid configuration. |
Description |
The RA entity configuration was not updated due to a configuration error |
Raised |
When an exception occurs during a configuration update |
Cleared |
When the RA entity is deactivated, or a valid configuration update is applied |
Alarm Type |
update-ignored |
---|---|
Level |
MINOR |
Message |
Update to %s ${instance} failed, continuing with its last valid configuration |
Description |
An RA component failed a configuration update, but the component can ignore failures |
Raised |
When an exception occurs updating a component that is allowed to ignore failed updates |
Cleared |
When the RA entity is deactivated, or the component is updated with a valid configuration |
Source: OpenCloud cassandra-cql-ra
Alarm Type |
CassandraCQLRA.ConnectToCluster |
---|---|
Level |
CRITICAL |
Message |
Not connected to Cassandra. %s |
Description |
Unable to connect to the Cassandra cluster |
Raised |
When an attempt to query the Cassandra cluster fails due to no cluster hosts being available |
Cleared |
On the next successful attempt to connect to the Cassandra cluster |
Alarm Type |
CassandraCQLRA.ConnectToNode |
---|---|
Level |
CRITICAL |
Message |
Not connected to Cassandra node: %s |
Description |
Unable to connect to the Cassandra node |
Raised |
When a specified Cassandra node in the cluster cannot be reached |
Cleared |
On the next successful attempt to this node, or when the node is removed from the cluster |
Alarm Type |
active-reconfiguration |
---|---|
Level |
WARNING |
Message |
Updates to %s ${instance} will not take effect until the RA entity is restarted. |
Description |
Configuration updates have been made to the RA entity that require it to be restarted |
Raised |
When a fixed configuration property is changed while the RA entity is active |
Cleared |
When the RA entity is deactivated |
Alarm Type |
misconfiguration |
---|---|
Level |
CRITICAL |
Message |
RA configuration error, operational functionality disabled. Update the RA entity with a valid configuration to resolve. |
Description |
The RA entity could not be configured or activated due to a configuration error |
Raised |
When an exception occurs during RA entity creation or activation |
Cleared |
When a valid configuration is installed by a configuration update |
Alarm Type |
misconfiguration |
---|---|
Level |
MINOR |
Message |
RA configuration update failed. Continuing operational functionality using the last valid configuration. |
Description |
The RA entity configuration was not updated due to a configuration error |
Raised |
When an exception occurs during a configuration update |
Cleared |
When the RA entity is deactivated, or a valid configuration update is applied |
Alarm Type |
update-ignored |
---|---|
Level |
MINOR |
Message |
Update to %s ${instance} failed, continuing with its last valid configuration |
Description |
An RA component failed a configuration update, but the component can ignore failures |
Raised |
When an exception occurs updating a component that is allowed to ignore failed updates |
Cleared |
When the RA entity is deactivated, or the component is updated with a valid configuration |
Notices
Copyright © 2014-2019 Metaswitch Networks. All rights reserved
This manual is issued on a controlled basis to a specific person on the understanding that no part of the Metaswitch Networks product code or documentation (including this manual) will be copied or distributed without prior agreement in writing from Metaswitch Networks.
Metaswitch Networks reserves the right to, without notice, modify or revise all or part of this document and/or change product features or specifications and shall not be responsible for any loss, cost, or damage, including consequential damage, caused by reliance on these materials.
Metaswitch and the Metaswitch logo are trademarks of Metaswitch Networks. Other brands and products referenced herein are the trademarks or registered trademarks of their respective holders.
Sh Cache Microservice RA Statistics
The following sections are a summary of the statistics collected by the Sh Cache Microservice RA. Included is the name, the type of statistic, the OID (for monitoring via SNMP) and a brief description.
Sh Cache Microservice RA
The Sh Cache Microservice RA uses the Rhino system of static SNMP OIDs to define mappings between OIDs and Rhino components. This model is discussed in further detail in the Rhino documentation section: Static OID Model.
The Sh Cache Microservice RA usage parameter set type OID is defined as such:
Base OID: |
1.3.6.1.4.1.19808.20.5.10.10.3.1 |
Intermediate OID Suffix for resource adaptor entity: |
1 for a primary ShCM resource adaptor entity, 2 for a secondary ShCM resource adaptor entity, etc. |
Parameter Set Type OID Suffix: |
1 |
Overall OID for a primary ShCM resource adaptor entity’s usage statistics: |
1.3.6.1.4.1.19808.20.5.10.10.3.1.1.1 |
Usage Interface Counter statistics
Name | Type | OID | Description |
---|---|---|---|
StartActivityOk |
Counter |
.2 |
Incremented when an activity is started successfully |
StartActivityFail |
Counter |
.3 |
Incremented when an activity fails to start |
FireOk |
Counter |
.4 |
Incremented when an event is fired successfully |
FireFail |
Counter |
.5 |
Incremented when an event fails to fire |
QLEnd |
Counter |
.6 |
Incremented when an activity is ended when queried for liveness |
UDROut |
Counter |
.7 |
Incremented when an UDR (i.e. a GET request) is sent |
UDAIn |
Counter |
.8 |
Incremented when a UDA (i.e. a GET response) is received |
PUROut |
Counter |
.9 |
Incremented when a PUR (i.e. a PUT request) is sent |
PUAIn |
Counter |
.10 |
Incremented when a PUA (i.e. a PUT response) is received |
SNROut |
Counter |
.11 |
Incremented when an SNR (i.e. a POST request) is sent |
SNAIn |
Counter |
.12 |
Incremented when an SNA (i.e. a POST response) is received |
InvalidateOut |
Counter |
.13 |
Incremented when a cache invalidation request (i.e. a DELETE request) is sent |
InvalidateIn |
Counter |
.14 |
Incremented when a cache invalidation response (i.e. a DELETE response) is received |
PNRIn |
Counter |
.15 |
Incremented when a PNR (i.e. a POST request) is received |
PNAOut |
Counter |
.16 |
Incremented when a PNA (i.e. a POST response) is sent |
Sample statistics
Name | Source units | Display units | Description |
---|---|---|---|
GetRequestResponseTime |
Milliseconds |
Milliseconds |
Sample of the elapsed time between sending a GET request and receiving a GET response |
UpdateRequestResponseTime |
Milliseconds |
Milliseconds |
Sample of the elapsed time between sending an update request (i.e. a PUT request) and sending a PUT response |
SubscribeRequestResponseTime |
Milliseconds |
Milliseconds |
Sample of the elapsed time between sending a subscribe request (i.e. a POST request) and receiving a POST response |
InvalidateRequestResponseTime |
Milliseconds |
Milliseconds |
Sample of the elapsed time between sending an invalidation request (i.e. a DELETE request) and receiving a DELETE response |
Sh Cache Microservice RA Alarms
The Sh Cache Microservice RA can raise the following alarms if it encounters any issues.
Category | Level | Alarm Type | Message |
---|---|---|---|
ShCMRA |
CRITICAL |
ShCMRA.ShCMConnectFailed |
"Not connected to any instances of the configured Sh Cache Microservice host." |
ShCMRA |
CRITICAL |
ShCMRA.ProxyConnectFailed |
"Not connected to the configured proxy." |
RA Framework |
WARNING |
active-reconfiguration |
"Updates to %s "${instance}" will not take effect until the RA entity is restarted." |
RA Framework |
CRITICAL |
misconfiguration |
"RA configuration error, operational functionality disabled. Update the RA entity with a valid configuration to resolve." |
RA Framework |
MINOR |
misconfiguration |
"RA configuration update failed. Continuing operational functionality using the last valid configuration." |
RA Framework |
MINOR |
update-ignored |
"Update to %s "${instance}" failed, continuing with its last valid configuration" |