This section describes details of components and services running on the REM nodes.
systemd services
Rhino Element Manager
REM runs as a 'webapp' inside Apache Tomcat.
This runs as a systemd service called rhino-element-manager
.
REM comes equipped with the SIS EM and Sentinel Express plugins, to simplify management of SIS and Sentinel based services.
You can examine the state of the REM service by running sudo systemctl status rhino-element-manager.service
.
This is an example of a healthy status:
[sentinel@mag-1 ~]$ sudo systemctl status rhino-element-manager.service
● rhino-element-manager.service - Rhino Element Manager (REM)
Loaded: loaded (/etc/systemd/system/rhino-element-manager.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-01-11 05:43:10 NZDT; 3s ago
Docs: https://docs.opencloud.com/ocdoc/books/devportal-documentation/1.0/documentation-index/platforms/rhino-element-manager-rem.html
Process: 4659 ExecStop=/home/sentinel/apache-tomcat/bin/systemd_relay.sh stop (code=exited, status=0/SUCCESS)
Process: 4705 ExecStart=/home/sentinel/apache-tomcat/bin/systemd_relay.sh start (code=exited, status=0/SUCCESS)
Main PID: 4713 (catalina.sh)
Tasks: 89
Memory: 962.1M
CGroup: /system.slice/rhino-element-manager.service
├─4713 /bin/sh bin/catalina.sh start
└─4715 /home/sentinel/java/current/bin/java -Djava.util.logging.config.file=/home/sentinel/apache-tomcat-8.5.38/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms2048m -Xmx2048m -...
Jan 11 05:43:00 mag-1 systemd[1]: Starting Rhino Element Manager (REM)...
Jan 11 05:43:00 mag-1 systemd_relay.sh[4705]: Tomcat started.
Jan 11 05:43:10 mag-1 systemd[1]: Started Rhino Element Manager (REM).
Alternatively, the Tomcat service will show up as Bootstrap
when running jps
.
For more information about REM, see the Rhino Element Manager (REM) Guide.
SNMP service monitor
The SNMP service monitor process is responsible for raising SNMP alarms when a disk partition gets too full.
The SNMP service monitor alarms are compatible with Rhino alarms and can be accessed in the same way. Refer to Accessing SNMP Statistics and Notifications for more information about this.
Alarms are sent to SNMP targets as configured through the configuration YAML files.
The following partitions are monitored:
-
the root partition (
/
) -
the log partition (
/var/log
)
There are two thresholds for disk monitoring, expressed as a percentage of the total partition size. When disk usage exceeds:
-
the lower threshold, a warning (MINOR severity) alarm will be raised.
-
the upper threshold, a MAJOR severity alarm will be raised, and (except for the root partition) files will be automatically cleaned up where possible.
Once disk space has returned to a non-alarmable level, the SNMP service monitor will clear the
associated alarm on the next check. By default, it checks disk usage once per day. Running the
command sudo systemctl reload disk-monitor
will force an immediate check of the disk space, for example, if an alarm was raised and you have since cleaned up the appropriate partition and want to clear the alarm.
Configuring the SNMP service monitor
The default monitoring settings should be appropriate for the vast majority of deployments.
Should your Metaswitch Customer Care Representative advise you to reconfigure the disk monitor,
you can do so by editing the file /etc/disk_monitor.yaml
(you will need to use sudo
when
editing this file due to its permissions):
global:
check_interval_seconds: 86400
log:
lower_threshold: 80
max_files_to_delete: 10
upper_threshold: 90
root:
lower_threshold: 90
upper_threshold: 95
snmp:
enabled: true
notification_type: trap
targets:
- address: 192.168.50.50
port: 162
version: 2c
The file is in YAML format, and specifies the alarm thresholds for each disk partition (as a percentage), the interval between checks in seconds, and the SNMP targets.
-
Supported SNMP versions are
2c
and3
. -
Supported notification types are
trap
andnotify
. -
Supported values for the upper and lower thresholds are:
Partition |
Lower threshold range |
Upper threshold range |
Minimum difference between thresholds |
|
50% to 80% |
60% to 90% |
10% |
|
50% to 90% |
60% to 99% |
5% |
-
check_interval_seconds
must be in the range 60 to 86400 seconds inclusive. It is recommended to keep the interval as long as possible to minimise performance impact.
After editing the file, you can apply the configuration by running
sudo systemctl reload disk-monitor
.
Verify that the service has accepted the configuration by
running sudo systemctl status disk-monitor
. If it shows an error, run journalctl -u disk-monitor
for more detailed information. Correct the errors in the configuration and apply it again.
Partitions
The nodes contain three partitions:
-
/boot
, with a size of 100MB. This contains the kernel and bootloader. -
/var/log
, with a size of 7000MB. This is where the OS and Rhino store their logfiles. The Rhino logs are within thetas
subdirectory, and within that each cluster has its own directory. -
/
, which uses up the rest of the disk. This is the root filesystem.
Monitoring
System health statistics can be retrieved using SNMP walking.
They are available via the standard UCD-SNMP-MIB
OIDs with prefix 1.3.6.1.4.1.2021
.