Skip to main content

OmniHSS Metrics and Monitoring Guide

← Back to Operations Guide


Table of Contents


Monitoring Overview

OmniHSS provides several mechanisms for monitoring system health, performance, and subscriber activity. Operations staff should utilize a combination of these tools for comprehensive visibility.

Monitoring Layers


Control Panel Monitoring

The Control Panel provides the primary real-time monitoring interface.

Overview Page Monitoring

URL: https://[hostname]:7443/overview

Key Metrics Available

Monitored Subscriber States

StateIndicatorWhat It Means
IdleNo location infoSubscriber powered off or out of coverage
AttachedMME presentSubscriber registered to network
PDN ActivePDN session count > 0Active data connection
IMS RegisteredS-CSCF assignedVoice services ready
In CallActive call count > 0VoLTE call in progress

Extracting Metrics from Overview

While the Control Panel doesn't export metrics directly, you can:

  1. Count visible rows for total subscribers
  2. Scan for green checkmarks to count enabled subscribers
  3. Review expanded details for state information
  4. Note last seen timestamps for responsiveness

Diameter Page Monitoring

URL: https://[hostname]:7443/diameter

Key Metrics

Critical Peer Monitoring

Identify critical peers and monitor their status:

Peer TypeCriticalityImpact if Down
MMEHighNo new LTE attachments
P-GWHighNo data sessions
S-CSCFHighNo IMS registrations
P-CSCFHighNo VoLTE calls
I-CSCFMediumIMS routing issues
ASLow-MediumSpecific service unavailable

Application Page Monitoring

URL: https://[hostname]:7443/application

Key Metrics

MetricDescriptionNormal RangeAction Threshold
Process CountActive Erlang processesVaries by load> 90% of limit
Memory UsageTotal memory consumed< 80%> 90%
UptimeTime since last restartN/ATrack for stability

Database Monitoring

Direct Database Queries

Connect to SQL Database to extract detailed metrics:

Subscriber Counts

Query the database to retrieve:

  • Total count of all subscribers
  • Count of enabled subscribers
  • Count of IMS-enabled subscribers

Session Statistics

Query the database to retrieve:

  • Count of active PDN sessions
  • Count of active VoLTE calls
  • Breakdown of PDN sessions by APN profile

Location Statistics

Query the database to retrieve:

  • Subscriber count grouped by visited network (MCC-MNC combination)
  • Count of subscribers currently roaming (not on home PLMN 001-001)
  • Distribution of subscribers across different visited networks

Recent Activity

Query the database to retrieve:

  • Count of subscribers seen in the last hour
  • Distribution of subscribers by serving MME
  • Timestamp analysis of last subscriber activity

Database Health Monitoring

Monitor database health by querying:

  • Total database size and growth trends
  • Individual table sizes and row counts
  • Current database connection count
  • Query performance and resource usage

Log Monitoring

Log Output

OmniHSS outputs logs to stdout/stderr, which should be captured by your process manager.

Log Levels

Key Log Patterns to Monitor

Diameter Peer Events:

[info] Diameter peer connected: mme01.epc.example.com
[warn] Diameter peer disconnected: pgw01.epc.example.com
[error] Diameter peer connection failed: timeout

Database Events:

[info] Database connection established
[error] Database connection lost: timeout
[error] Database query failed: deadlock detected

Authentication Events:

[info] Authentication successful: IMSI 001001123456789
[warn] Authentication failed: IMSI 001001123456789, invalid vector
[error] Roaming denied: IMSI 001001123456789, MCC 310 MNC 410

Log Aggregation

For production deployments, implement log aggregation:


External Monitoring Integration

Health Check Endpoint

API Health Check: GET /api/status

curl -k https://hss.example.com:8443/api/status

Expected Response:

{"status": "ok"}

HTTP Status: 200 OK

Monitoring Tool Integration

Nagios/Icinga Example

#!/bin/bash
# check_omnihss.sh

API_URL="https://hss.example.com:8443/api/status"

response=$(curl -k -s -o /dev/null -w "%{http_code}" "$API_URL" --max-time 5)

if [ "$response" = "200" ]; then
echo "OK - OmniHSS API responding"
exit 0
else
echo "CRITICAL - OmniHSS API not responding (HTTP $response)"
exit 2
fi

Prometheus Integration

Custom exporters can be created to export OmniHSS metrics to Prometheus by querying the API and database.

SNMP Integration

For SNMP-based monitoring, custom SNMP extension scripts can query the database or API for metrics and return values via SNMP OIDs.


Key Performance Indicators

Operational KPIs

KPITargetWarningCritical
System Uptime99.99%< 99.95%< 99.9%
Diameter Peer Uptime99.9%< 99.5%< 99%
Authentication Success Rate> 99%< 99%< 95%
Diameter Response Time< 100ms> 200ms> 500ms
Database Query Time< 50ms> 100ms> 500ms
Error Rate< 0.1%> 0.5%> 1%

Capacity KPIs

MetricMonitorPlan Action At
Total SubscribersCurrent count80% of expected capacity
Concurrent PDN SessionsActive sessions70% of expected maximum
Database SizeMB used80% of allocated storage
Database ConnectionsActive connections80% of pool size

Alerting Strategies

Alert Priorities

Alert Definitions

Critical Alerts (P1)

System Unavailable:

  • API health check fails
  • Control Panel inaccessible
  • Database connection fails
  • Action: Immediate investigation and escalation

All Diameter Peers Disconnected:

  • Zero connected peers
  • Action: Check network, restart if necessary

Database Down:

  • Cannot connect to SQL Database
  • Action: Investigate database server, restart if necessary

High Priority Alerts (P2)

Critical Diameter Peer Down:

  • Primary MME disconnected
  • Primary P-GW disconnected
  • Primary S-CSCF disconnected
  • Action: Investigate peer connectivity within 15 minutes

High Memory Usage:

  • Memory > 95%
  • Action: Investigate memory leak, plan restart

High Authentication Failure Rate:

  • 10% of auth requests fail

  • Action: Check subscriber provisioning, investigate cause

Medium Priority Alerts (P3)

Non-Critical Peer Down:

  • Secondary peer disconnected
  • Application Server disconnected
  • Action: Investigate within 1 hour

Elevated Memory Usage:

  • Memory > 85%
  • Action: Monitor trend, plan capacity upgrade

Elevated Error Rate:

  • Error rate > 1%
  • Action: Review logs, identify root cause

Low Priority Alerts (P4)

Capacity Warning:

  • Subscribers > 80% of capacity
  • Database > 80% of allocated storage
  • Action: Plan capacity expansion

Performance Degradation:

  • Response times elevated but acceptable
  • Action: Monitor and optimize queries

Alert Notification Channels


Monitoring Checklist

Daily Checks

  • Review Control Panel Overview - subscriber counts normal
  • Review Diameter page - all critical peers connected
  • Review Application page - memory and processes within limits
  • Check for error logs - no critical errors in last 24 hours
  • Verify backup completed successfully

Weekly Checks

  • Review capacity trends - subscriber growth
  • Review performance trends - response times
  • Review database size - growth rate acceptable
  • Review error rates - identify patterns
  • Test alert notifications - ensure working

Monthly Checks

  • Capacity planning review - project 6 months ahead
  • Performance optimization review - identify slow queries
  • Security review - certificate expiration, vulnerabilities
  • Documentation review - update runbooks
  • Disaster recovery test - verify backups restore correctly

← Back to Operations Guide | Next: Multi-Features →