OmniHSS Metrics and Monitoring Guide

Monitoring Overview
Control Panel Monitoring
Database Monitoring
Log Monitoring
External Monitoring Integration
Key Performance Indicators
Alerting Strategies

Monitoring Overview

OmniHSS provides several mechanisms for monitoring system health, performance, and subscriber activity. Operations staff should utilize a combination of these tools for comprehensive visibility.

Monitoring Layers

Control Panel Monitoring

The Control Panel provides the primary real-time monitoring interface.

Overview Page Monitoring

URL: https://[hostname]:7443/overview

Key Metrics Available

Monitored Subscriber States

State	Indicator	What It Means
Idle	No location info	Subscriber powered off or out of coverage
Attached	MME present	Subscriber registered to network
PDN Active	PDN session count > 0	Active data connection
IMS Registered	S-CSCF assigned	Voice services ready
In Call	Active call count > 0	VoLTE call in progress

Extracting Metrics from Overview

While the Control Panel doesn't export metrics directly, you can:

Count visible rows for total subscribers
Scan for green checkmarks to count enabled subscribers
Review expanded details for state information
Note last seen timestamps for responsiveness

Diameter Page Monitoring

URL: https://[hostname]:7443/diameter

Key Metrics

Critical Peer Monitoring

Identify critical peers and monitor their status:

Peer Type	Criticality	Impact if Down
MME	High	No new LTE attachments
P-GW	High	No data sessions
S-CSCF	High	No IMS registrations
P-CSCF	High	No VoLTE calls
I-CSCF	Medium	IMS routing issues
AS	Low-Medium	Specific service unavailable

Application Page Monitoring

URL: https://[hostname]:7443/application

Key Metrics

Metric	Description	Normal Range	Action Threshold
Process Count	Active Erlang processes	Varies by load	> 90% of limit
Memory Usage	Total memory consumed	< 80%	> 90%
Uptime	Time since last restart	N/A	Track for stability

Database Monitoring

Direct Database Queries

Connect to SQL Database to extract detailed metrics:

Subscriber Counts

Query the database to retrieve:

Total count of all subscribers
Count of enabled subscribers
Count of IMS-enabled subscribers

Session Statistics

Query the database to retrieve:

Count of active PDN sessions
Count of active VoLTE calls
Breakdown of PDN sessions by APN profile

Location Statistics

Query the database to retrieve:

Subscriber count grouped by visited network (MCC-MNC combination)
Count of subscribers currently roaming (not on home PLMN 001-001)
Distribution of subscribers across different visited networks

Recent Activity

Query the database to retrieve:

Count of subscribers seen in the last hour
Distribution of subscribers by serving MME
Timestamp analysis of last subscriber activity

Database Health Monitoring

Monitor database health by querying:

Total database size and growth trends
Individual table sizes and row counts
Current database connection count
Query performance and resource usage

Log Monitoring

Log Output

OmniHSS outputs logs to stdout/stderr, which should be captured by your process manager.

Log Levels

Key Log Patterns to Monitor

Diameter Peer Events:

[info] Diameter peer connected: mme01.epc.example.com
[warn] Diameter peer disconnected: pgw01.epc.example.com
[error] Diameter peer connection failed: timeout

Database Events:

[info] Database connection established
[error] Database connection lost: timeout
[error] Database query failed: deadlock detected

Authentication Events:

[info] Authentication successful: IMSI 001001123456789
[warn] Authentication failed: IMSI 001001123456789, invalid vector
[error] Roaming denied: IMSI 001001123456789, MCC 310 MNC 410

Log Aggregation

For production deployments, implement log aggregation:

External Monitoring Integration

Health Check Endpoint

API Health Check: GET /api/status

curl -k https://hss.example.com:8443/api/status

Expected Response:

{"status": "ok"}

HTTP Status: 200 OK

Monitoring Tool Integration

Nagios/Icinga Example

#!/bin/bash
# check_omnihss.sh

API_URL="https://hss.example.com:8443/api/status"

response=$(curl -k -s -o /dev/null -w "%{http_code}" "$API_URL" --max-time 5)

if [ "$response" = "200" ]; then
    echo "OK - OmniHSS API responding"
    exit 0
else
    echo "CRITICAL - OmniHSS API not responding (HTTP $response)"
    exit 2
fi

Prometheus Integration

Custom exporters can be created to export OmniHSS metrics to Prometheus by querying the API and database.

SNMP Integration

For SNMP-based monitoring, custom SNMP extension scripts can query the database or API for metrics and return values via SNMP OIDs.

Key Performance Indicators

Operational KPIs

Recommended KPI Thresholds

KPI	Target	Warning	Critical
System Uptime	99.99%	< 99.95%	< 99.9%
Diameter Peer Uptime	99.9%	< 99.5%	< 99%
Authentication Success Rate	> 99%	< 99%	< 95%
Diameter Response Time	< 100ms	> 200ms	> 500ms
Database Query Time	< 50ms	> 100ms	> 500ms
Error Rate	< 0.1%	> 0.5%	> 1%

Capacity KPIs

Metric	Monitor	Plan Action At
Total Subscribers	Current count	80% of expected capacity
Concurrent PDN Sessions	Active sessions	70% of expected maximum
Database Size	MB used	80% of allocated storage
Database Connections	Active connections	80% of pool size

Alerting Strategies

Alert Priorities

Alert Definitions

Critical Alerts (P1)

System Unavailable:

API health check fails
Control Panel inaccessible
Database connection fails
Action: Immediate investigation and escalation

All Diameter Peers Disconnected:

Zero connected peers
Action: Check network, restart if necessary

Database Down:

Cannot connect to SQL Database
Action: Investigate database server, restart if necessary

High Priority Alerts (P2)

Critical Diameter Peer Down:

Primary MME disconnected
Primary P-GW disconnected
Primary S-CSCF disconnected
Action: Investigate peer connectivity within 15 minutes

High Memory Usage:

Memory > 95%
Action: Investigate memory leak, plan restart

High Authentication Failure Rate:

10% of auth requests fail
Action: Check subscriber provisioning, investigate cause

Medium Priority Alerts (P3)

Non-Critical Peer Down:

Secondary peer disconnected
Application Server disconnected
Action: Investigate within 1 hour

Elevated Memory Usage:

Memory > 85%
Action: Monitor trend, plan capacity upgrade

Elevated Error Rate:

Error rate > 1%
Action: Review logs, identify root cause

Low Priority Alerts (P4)

Capacity Warning:

Subscribers > 80% of capacity
Database > 80% of allocated storage
Action: Plan capacity expansion

Performance Degradation:

Response times elevated but acceptable
Action: Monitor and optimize queries

Alert Notification Channels

Monitoring Checklist

Daily Checks

Review Control Panel Overview - subscriber counts normal
Review Diameter page - all critical peers connected
Review Application page - memory and processes within limits
Check for error logs - no critical errors in last 24 hours
Verify backup completed successfully

Weekly Checks

Review capacity trends - subscriber growth
Review performance trends - response times
Review database size - growth rate acceptable
Review error rates - identify patterns
Test alert notifications - ensure working

Monthly Checks

Capacity planning review - project 6 months ahead
Performance optimization review - identify slow queries
Security review - certificate expiration, vulnerabilities
Documentation review - update runbooks
Disaster recovery test - verify backups restore correctly

← Back to Operations Guide | Next: Multi-Features →

Table of Contents​

Monitoring Overview​

Monitoring Layers​

Control Panel Monitoring​

Overview Page Monitoring​

Key Metrics Available​

Monitored Subscriber States​

Extracting Metrics from Overview​

Diameter Page Monitoring​

Key Metrics​

Critical Peer Monitoring​

Application Page Monitoring​

Key Metrics​

Database Monitoring​

Direct Database Queries​

Subscriber Counts​

Session Statistics​

Location Statistics​

Recent Activity​

Database Health Monitoring​

Log Monitoring​

Log Output​

Log Levels​

Key Log Patterns to Monitor​

Log Aggregation​

External Monitoring Integration​

Health Check Endpoint​

Monitoring Tool Integration​

Nagios/Icinga Example​

Prometheus Integration​

SNMP Integration​

Key Performance Indicators​

Operational KPIs​

Recommended KPI Thresholds​

Capacity KPIs​

Alerting Strategies​

Alert Priorities​

Alert Definitions​

Critical Alerts (P1)​

High Priority Alerts (P2)​

Medium Priority Alerts (P3)​

Low Priority Alerts (P4)​

Alert Notification Channels​

Monitoring Checklist​

Daily Checks​

Weekly Checks​

Monthly Checks​

Table of Contents

Monitoring Overview

Monitoring Layers

Control Panel Monitoring

Overview Page Monitoring

Key Metrics Available

Monitored Subscriber States

Extracting Metrics from Overview

Diameter Page Monitoring

Key Metrics

Critical Peer Monitoring

Application Page Monitoring

Key Metrics

Database Monitoring

Direct Database Queries

Subscriber Counts

Session Statistics

Location Statistics

Recent Activity

Database Health Monitoring

Log Monitoring

Log Output

Log Levels

Key Log Patterns to Monitor

Log Aggregation

External Monitoring Integration

Health Check Endpoint

Monitoring Tool Integration

Nagios/Icinga Example

Prometheus Integration

SNMP Integration

Key Performance Indicators

Operational KPIs

Recommended KPI Thresholds

Capacity KPIs

Alerting Strategies

Alert Priorities

Alert Definitions

Critical Alerts (P1)

High Priority Alerts (P2)

Medium Priority Alerts (P3)

Low Priority Alerts (P4)

Alert Notification Channels

Monitoring Checklist

Daily Checks

Weekly Checks

Monthly Checks