Metrics Documentation

This document describes the Prometheus metrics exposed by the IMS Application Server components.

Metrics Endpoints
Port 9090 - System Metrics
Port 8080 - TAS Engine Metrics
Port 9093 - Media & Call Quality Metrics
Go Runtime Metrics
Process Metrics
Prometheus HTTP Metrics
Metric Types
Usage
Example Queries
Metric Time Unit Configuration
Grafana Dashboard Integration
Alerting Examples
Troubleshooting with Metrics
Performance Baselines
Best Practices

Metrics Endpoints

Port	Endpoint	Purpose	Jump to Section
9090	`/metrics`	System, gateway, and core telephony metrics	Port 9090 →
8080	`/metrics`	TAS engine, Diameter, HLR, OCS, and Erlang VM metrics	Port 8080 →
9093	`/esl?module=default`	RTP/RTCP media quality and call statistics	Port 9093 →

Port 9090 - System Metrics

Call and Session Metrics

Metric Name	Port	Description
`freeswitch_bridged_calls`	9090	Number of bridged calls currently active
`freeswitch_detailed_bridged_calls`	9090	Number of detailed bridged calls active
`freeswitch_current_calls`	9090	Number of calls currently active
`freeswitch_detailed_calls`	9090	Number of detailed calls active
`freeswitch_current_channels`	9090	Number of channels currently active
`freeswitch_current_sessions`	9090	Number of sessions currently active
`freeswitch_current_sessions_peak`	9090	Peak number of sessions since startup
`freeswitch_current_sessions_peak_last_5min`	9090	Peak number of sessions in the last 5 minutes
`freeswitch_sessions_total`	9090	Total number of sessions since startup (counter)
`freeswitch_current_sps`	9090	Current sessions per second
`freeswitch_current_sps_peak`	9090	Peak sessions per second since startup
`freeswitch_current_sps_peak_last_5min`	9090	Peak sessions per second in the last 5 minutes
`freeswitch_max_sessions`	9090	Maximum number of sessions allowed
`freeswitch_max_sps`	9090	Maximum sessions per second allowed

System Resource Metrics

Metric Name	Port	Description
`freeswitch_current_idle_cpu`	9090	Current CPU idle percentage
`freeswitch_min_idle_cpu`	9090	Minimum CPU idle percentage recorded
`freeswitch_uptime_seconds`	9090	Uptime in seconds
`freeswitch_time_synced`	9090	Whether system time is in sync with exporter host time (1=synced, 0=not synced)

Memory Metrics

Metric Name	Port	Description
`freeswitch_memory_arena`	9090	Total non-mmapped bytes (malloc arena)
`freeswitch_memory_ordblks`	9090	Number of free chunks
`freeswitch_memory_smblks`	9090	Number of free fastbin blocks
`freeswitch_memory_hblks`	9090	Number of mapped regions
`freeswitch_memory_hblkhd`	9090	Bytes in mapped regions
`freeswitch_memory_usmblks`	9090	Maximum total allocated space
`freeswitch_memory_fsmblks`	9090	Free bytes held in fastbins
`freeswitch_memory_uordblks`	9090	Total allocated space
`freeswitch_memory_fordblks`	9090	Total free space
`freeswitch_memory_keepcost`	9090	Topmost releasable block

Codec Status Metrics

Metric Name	Port	Description
`freeswitch_codec_status`	9090	Codec status with labels: ikey (module), name (codec name), type (codec). Value=1 indicates codec is available

Available Codecs Include:

G.711 alaw/ulaw
PROXY PASS-THROUGH
PROXY VIDEO PASS-THROUGH
RAW Signed Linear (16 bit)
Speex
VP8/VP9 Video
AMR variants
B64
G.723.1, G.729, G.722, G.726 variants
OPUS
MP3
ADPCM, GSM, LPC-10

Endpoint Status Metrics

Metric Name	Port	Description
`freeswitch_endpoint_status`	9090	Endpoint status with labels: ikey (module), name (endpoint name), type (endpoint). Value=1 indicates endpoint is available

Available Endpoints Include:

error, group, pickup, user (mod_dptools)
loopback, null (mod_loopback)
rtc (mod_rtc)
rtp, sofia (mod_sofia)
modem (mod_spandsp)

Module Status Metrics

Metric Name	Port	Description
`freeswitch_load_module`	9090	Module load status (1=loaded, 0=not loaded) with label: module

Key Modules Monitored:

mod_sofia (SIP)
mod_conference, mod_conference_ims
mod_opus, mod_g729, mod_amr, etc.
mod_event_socket
mod_dptools
mod_python3
mod_rtc
And many more...

Registration Metrics

Metric Name	Port	Description
`freeswitch_registrations`	9090	Total number of active registrations
`freeswitch_registration_defails`	9090	Detailed registration information with labels: expires, hostname, network_ip, network_port, network_proto, realm, reg_user, token, url

Sofia Gateway Metrics

Metric Name	Port	Description
`freeswitch_sofia_gateway_status`	9090	Gateway status with labels: context, name, profile, proxy, scheme, status (UP/DOWN)
`freeswitch_sofia_gateway_call_in`	9090	Number of inbound calls through gateway
`freeswitch_sofia_gateway_call_out`	9090	Number of outbound calls through gateway
`freeswitch_sofia_gateway_failed_call_in`	9090	Number of failed inbound calls
`freeswitch_sofia_gateway_failed_call_out`	9090	Number of failed outbound calls
`freeswitch_sofia_gateway_ping`	9090	Last ping timestamp (Unix epoch)
`freeswitch_sofia_gateway_pingtime`	9090	Last ping time in milliseconds
`freeswitch_sofia_gateway_pingfreq`	9090	Ping frequency in seconds
`freeswitch_sofia_gateway_pingcount`	9090	Number of pings sent
`freeswitch_sofia_gateway_pingmin`	9090	Minimum ping time recorded
`freeswitch_sofia_gateway_pingmax`	9090	Maximum ping time recorded

Exporter Health Metrics

Metric Name	Port	Description
`freeswitch_up`	9090	Whether the last scrape was successful (1=success, 0=failure)
`freeswitch_exporter_total_scrapes`	9090	Total number of scrapes performed (counter)
`freeswitch_exporter_failed_scrapes`	9090	Total number of failed scrapes (counter)

↑ Back to top

Port 8080 - TAS Engine Metrics

These metrics are exposed by the Telephony Application Server engine and provide insight into call processing, database operations, and Erlang VM performance.

Application Call Metrics

Metric Name	Port	Description
`call_simulations_total`	8080	Total number of call simulations (counter)
`call_attempts_total`	8080	Total number of call attempts (counter)
`call_rejections_total`	8080	Total number of call rejections by reason (counter)
`call_param_errors_total`	8080	Total number of call parameter parsing errors (counter)
`active_calls`	8080	Number of currently active calls with labels: call_type (mo/mt/emergency)
`tracked_call_sessions`	8080	Number of currently tracked call sessions in ETS

Diameter Protocol Metrics

Metric Name	Port	Description
`diameter_peer_state`	8080	State of Diameter peers (1=up, 0=down) with labels: peer_host, peer_realm, application
`diameter_requests_total`	8080	Total number of Diameter requests (counter)
`diameter_responses_total`	8080	Total number of Diameter responses (counter)
`diameter_response_duration_milliseconds`	8080	Duration of Diameter requests in milliseconds (histogram)

Telephony Operations Metrics

Metric Name	Port	Description
`hlr_lookups_total`	8080	Total number of HLR lookups (counter)
`hlr_data_duration_milliseconds`	8080	Duration of HLR data retrieval in milliseconds (histogram)
`subscriber_data_lookups_total`	8080	Total number of subscriber data lookups (counter)
`subscriber_data_duration_milliseconds`	8080	Duration of Sh subscriber data retrieval in milliseconds (histogram)
`ss7_map_operations_total`	8080	Total number of SS7 MAP operations (counter)
`ss7_map_http_duration_milliseconds`	8080	Duration of SS7 MAP HTTP requests in milliseconds (histogram)
`tracked_registrations`	8080	Number of currently tracked SIP registrations

Online Charging System (OCS) Metrics

Metric Name	Port	Description
`ocs_authorization_attempts_total`	8080	Total number of OCS authorization attempts (counter)
`ocs_authorization_duration_milliseconds`	8080	Duration of OCS authorization in milliseconds (histogram)
`online_charging_events_total`	8080	Total number of online charging events (counter)
`authorization_decisions_total`	8080	Total number of authorization decisions (counter)

Dialplan & Processing Metrics

Metric Name	Port	Description
`http_requests_total`	8080	Total number of HTTP requests with labels: endpoint, status_code (counter)
`http_dialplan_request_duration_milliseconds`	8080	Duration of HTTP dialplan requests in milliseconds (histogram)
`dialplan_module_duration_milliseconds`	8080	Duration of individual dialplan module processing (histogram)
`freeswitch_variable_set_duration_milliseconds`	8080	Duration of variable setting operations (histogram)

Event Socket Metrics

Metric Name	Port	Description
`event_socket_connected`	8080	Event Socket connection state (1=connected, 0=disconnected) with label: connection_type
`event_socket_reconnections_total`	8080	Total number of Event Socket reconnection attempts (counter) with labels: connection_type, result
`event_socket_commands_total`	8080	Total number of Event Socket commands executed (counter) with labels: command_type, result
`event_socket_command_timeouts_total`	8080	Total number of Event Socket command timeouts (counter) with label: command_type

Command Types Tracked:

uuid_setvar, uuid_dump, uuid_kill, uuid_transfer
uuid_set_media_stats
sched_hangup, sched_transfer
vm_boxcount
status, echo, show, sofia

Result Values:

success: Command completed successfully
timeout: Command exceeded timeout threshold
error: Command returned unexpected response

Feature Usage Metrics

Metric Name	Port	Description
`feature_invocations_total`	8080	Total number of TAS feature invocations (counter) with labels: feature, call_type, result
`feature_data_source_total`	8080	Total number of feature data source usages (counter) with labels: feature, source

Features:

call_forward_all - Unconditional call forwarding
call_forward_not_reachable - Call forwarding when subscriber not reachable
call_forward_no_reply - Call forwarding on no reply
call_barring - OCS-based call barring (insufficient credit)
cli_withheld - CLI privacy/screening

Call Types: mo, mt

Data Sources: sh_interface, hlr, config_fallback

Result Values: success, error, skipped

SMS Trigger Metrics

Metric Name	Port	Description
`sms_trigger_attempts_total`	8080	Total number of SMS trigger attempts (counter) with labels: trigger_type, result
`sms_trigger_errors_total`	8080	Total number of SMS trigger errors (counter) with labels: trigger_type, error_stage
`smsc_requests_total`	8080	Total number of SMSC HTTP requests (counter) with labels: message_type, result

Trigger Types: voicemail_deposit, voicemail_clear

Error Stages: vm_boxcount, template_render, smsc_request

Message Types: notification, mwi

Result Values: success, error

Erlang Mnesia Database Metrics

Metric Name	Port	Description
`erlang_mnesia_held_locks`	8080	Number of held locks
`erlang_mnesia_lock_queue`	8080	Number of transactions waiting for a lock
`erlang_mnesia_transaction_participants`	8080	Number of participant transactions
`erlang_mnesia_transaction_coordinators`	8080	Number of coordinator transactions
`erlang_mnesia_failed_transactions`	8080	Number of failed (aborted) transactions (counter)
`erlang_mnesia_committed_transactions`	8080	Number of committed transactions (counter)
`erlang_mnesia_logged_transactions`	8080	Number of transactions logged (counter)
`erlang_mnesia_restarted_transactions`	8080	Total number of transaction restarts (counter)
`erlang_mnesia_memory_usage_bytes`	8080	Total bytes allocated by all mnesia tables
`erlang_mnesia_tablewise_memory_usage_bytes`	8080	Bytes allocated per mnesia table with label: table
`erlang_mnesia_tablewise_size`	8080	Number of rows per table with label: table

Erlang VM Memory Metrics

Metric Name	Port	Description
`erlang_vm_memory_atom_bytes_total`	8080	Memory allocated for atoms with label: usage (used/free)
`erlang_vm_memory_bytes_total`	8080	Total memory allocated with label: kind (system/processes)
`erlang_vm_memory_dets_tables`	8080	DETS tables count
`erlang_vm_memory_ets_tables`	8080	ETS tables count
`erlang_vm_memory_processes_bytes_total`	8080	Memory allocated for processes with label: usage (used/free)
`erlang_vm_memory_system_bytes_total`	8080	Memory for emulator (not process-related) with label: usage (atom/binary/code/ets/other)

Erlang VM Statistics

Metric Name	Port	Description
`erlang_vm_statistics_bytes_output_total`	8080	Total bytes output to ports (counter)
`erlang_vm_statistics_bytes_received_total`	8080	Total bytes received through ports (counter)
`erlang_vm_statistics_context_switches`	8080	Total context switches since startup (counter)
`erlang_vm_statistics_dirty_cpu_run_queue_length`	8080	Length of dirty CPU run-queue
`erlang_vm_statistics_dirty_io_run_queue_length`	8080	Length of dirty IO run-queue
`erlang_vm_statistics_garbage_collection_number_of_gcs`	8080	Number of garbage collections (counter)
`erlang_vm_statistics_garbage_collection_bytes_reclaimed`	8080	Bytes reclaimed by GC (counter)
`erlang_vm_statistics_garbage_collection_words_reclaimed`	8080	Words reclaimed by GC (counter)
`erlang_vm_statistics_reductions_total`	8080	Total reductions (counter)
`erlang_vm_statistics_run_queues_length`	8080	Length of normal run-queues
`erlang_vm_statistics_runtime_milliseconds`	8080	Sum of runtime for all threads (counter)
`erlang_vm_statistics_wallclock_time_milliseconds`	8080	Real time measured (counter)

Erlang VM System Information

Metric Name	Port	Description
`erlang_vm_dirty_cpu_schedulers`	8080	Number of dirty CPU scheduler threads
`erlang_vm_dirty_cpu_schedulers_online`	8080	Number of dirty CPU schedulers online
`erlang_vm_dirty_io_schedulers`	8080	Number of dirty I/O scheduler threads
`erlang_vm_ets_limit`	8080	Maximum number of ETS tables allowed
`erlang_vm_logical_processors`	8080	Number of logical processors configured
`erlang_vm_logical_processors_available`	8080	Number of logical processors available
`erlang_vm_logical_processors_online`	8080	Number of logical processors online
`erlang_vm_port_count`	8080	Number of ports currently existing
`erlang_vm_port_limit`	8080	Maximum number of ports allowed
`erlang_vm_process_count`	8080	Number of processes currently existing
`erlang_vm_process_limit`	8080	Maximum number of processes allowed
`erlang_vm_schedulers`	8080	Number of scheduler threads
`erlang_vm_schedulers_online`	8080	Number of schedulers online
`erlang_vm_smp_support`	8080	1 if compiled with SMP support, 0 otherwise
`erlang_vm_threads`	8080	1 if compiled with thread support, 0 otherwise
`erlang_vm_thread_pool_size`	8080	Number of async threads in pool
`erlang_vm_time_correction`	8080	1 if time correction enabled, 0 otherwise
`erlang_vm_wordsize_bytes`	8080	Size of Erlang term words in bytes
`erlang_vm_atom_count`	8080	Number of atoms currently existing
`erlang_vm_atom_limit`	8080	Maximum number of atoms allowed

Erlang VM Microstate Accounting (MSACC)

Detailed time tracking for scheduler activities with labels: type, id

Metric Name	Port	Description
`erlang_vm_msacc_aux_seconds_total`	8080	Time spent handling auxiliary jobs (counter)
`erlang_vm_msacc_check_io_seconds_total`	8080	Time spent checking for new I/O events (counter)
`erlang_vm_msacc_emulator_seconds_total`	8080	Time spent executing Erlang processes (counter)
`erlang_vm_msacc_gc_seconds_total`	8080	Time spent in garbage collection (counter)
`erlang_vm_msacc_other_seconds_total`	8080	Time spent on unaccounted activities (counter)
`erlang_vm_msacc_port_seconds_total`	8080	Time spent executing ports (counter)
`erlang_vm_msacc_sleep_seconds_total`	8080	Time spent sleeping (counter)
`erlang_vm_msacc_alloc_seconds_total`	8080	Time spent managing memory (counter)
`erlang_vm_msacc_bif_seconds_total`	8080	Time spent in BIFs (counter)
`erlang_vm_msacc_busy_wait_seconds_total`	8080	Time spent busy waiting (counter)
`erlang_vm_msacc_ets_seconds_total`	8080	Time spent in ETS BIFs (counter)
`erlang_vm_msacc_gc_full_seconds_total`	8080	Time spent in fullsweep GC (counter)
`erlang_vm_msacc_nif_seconds_total`	8080	Time spent in NIFs (counter)
`erlang_vm_msacc_send_seconds_total`	8080	Time spent sending messages (counter)
`erlang_vm_msacc_timers_seconds_total`	8080	Time spent managing timers (counter)

Erlang VM Allocators

Detailed memory allocator metrics with labels: alloc, instance_no, kind, usage

Metric Name	Port	Description
`erlang_vm_allocators`	8080	Allocated (carriers_size) and used (blocks_size) memory for different allocators. See erts_alloc(3).

Allocator types include: temp_alloc, sl_alloc, std_alloc, ll_alloc, eheap_alloc, ets_alloc, fix_alloc, literal_alloc, binary_alloc, driver_alloc

↑ Back to top

Port 9093 - Media & Call Quality Metrics

These metrics provide real-time RTP/RTCP statistics and call quality information per channel.

Metric Name	Port	Description
`freeswitch_info`	9093	System info with label: version
`freeswitch_up`	9093	Ready status (1=ready, 0=not ready)
`freeswitch_stack_bytes`	9093	Stack size in bytes
`freeswitch_session_total`	9093	Total number of sessions
`freeswitch_session_active`	9093	Active number of sessions
`freeswitch_session_limit`	9093	Session limit
`rtp_channel_info`	9093	RTP channel info with labels for channel details

RTP Audio - Byte Counters

Metric Name	Port	Description
`rtp_audio_in_raw_bytes_total`	9093	Total bytes received (including headers)
`rtp_audio_out_raw_bytes_total`	9093	Total bytes sent (including headers)
`rtp_audio_in_media_bytes_total`	9093	Total media bytes received (payload only)
`rtp_audio_out_media_bytes_total`	9093	Total media bytes sent (payload only)

RTP Audio - Packet Counters

Metric Name	Port	Description
`rtp_audio_in_packets_total`	9093	Total packets received
`rtp_audio_out_packets_total`	9093	Total packets sent
`rtp_audio_in_media_packets_total`	9093	Total media packets received
`rtp_audio_out_media_packets_total`	9093	Total media packets sent
`rtp_audio_in_skip_packets_total`	9093	Inbound packets discarded
`rtp_audio_out_skip_packets_total`	9093	Outbound packets discarded

RTP Audio - Special Packet Types

Metric Name	Port	Description
`rtp_audio_in_jitter_packets_total`	9093	Jitter buffer packets received
`rtp_audio_in_dtmf_packets_total`	9093	DTMF packets received
`rtp_audio_out_dtmf_packets_total`	9093	DTMF packets sent
`rtp_audio_in_cng_packets_total`	9093	Comfort Noise Generation packets received
`rtp_audio_out_cng_packets_total`	9093	Comfort Noise Generation packets sent
`rtp_audio_in_flush_packets_total`	9093	Flushed packets (buffer resets)

RTP Audio - Jitter & Quality Metrics

Metric Name	Port	Description
`rtp_audio_in_jitter_buffer_bytes_max`	9093	Largest jitter buffer size in bytes
`rtp_audio_in_jitter_seconds_min`	9093	Minimum jitter in seconds
`rtp_audio_in_jitter_seconds_max`	9093	Maximum jitter in seconds
`rtp_audio_in_jitter_loss_rate`	9093	Packet loss rate due to jitter (ratio)
`rtp_audio_in_jitter_burst_rate`	9093	Packet burst rate due to jitter (ratio)
`rtp_audio_in_mean_interval_seconds`	9093	Mean interval between inbound packets
`rtp_audio_in_flaw_total`	9093	Total audio flaws detected (glitches, artifacts)
`rtp_audio_in_quality_percent`	9093	Audio quality as percentage (0-100)
`rtp_audio_in_quality_mos`	9093	Mean Opinion Score (1-5, where 5 is best)

RTCP Metrics

Metric Name	Port	Description
`rtcp_audio_bytes_total`	9093	Total RTCP bytes
`rtcp_audio_packets_total`	9093	Total RTCP packets

Go Runtime Metrics

Metric Name	Port	Description
`go_goroutines`	9090	Number of goroutines currently running
`go_threads`	9090	Number of OS threads created
`go_info`	9090	Information about the Go environment (with version label)
`go_gc_duration_seconds`	9090	Pause duration of garbage collection cycles (summary)
`go_memstats_alloc_bytes`	9090	Number of bytes allocated and still in use
`go_memstats_alloc_bytes_total`	9090	Total number of bytes allocated (counter)
`go_memstats_heap_alloc_bytes`	9090	Heap bytes allocated and still in use
`go_memstats_heap_idle_bytes`	9090	Heap bytes waiting to be used
`go_memstats_heap_inuse_bytes`	9090	Heap bytes currently in use
`go_memstats_heap_objects`	9090	Number of allocated heap objects
`go_memstats_heap_released_bytes`	9090	Heap bytes released to OS
`go_memstats_heap_sys_bytes`	9090	Heap bytes obtained from system
`go_memstats_sys_bytes`	9090	Total bytes obtained from system

Process Metrics

Metric Name	Port	Description
`process_cpu_seconds_total`	9090	Total user and system CPU time spent (counter)
`process_max_fds`	9090	Maximum number of open file descriptors
`process_open_fds`	9090	Current number of open file descriptors
`process_resident_memory_bytes`	9090	Resident memory size in bytes
`process_virtual_memory_bytes`	9090	Virtual memory size in bytes
`process_virtual_memory_max_bytes`	9090	Maximum amount of virtual memory available
`process_start_time_seconds`	9090	Process start time since Unix epoch

Prometheus HTTP Metrics

Metric Name	Port	Description
`promhttp_metric_handler_requests_in_flight`	9090	Current number of scrapes being served
`promhttp_metric_handler_requests_total`	9090	Total number of scrapes by HTTP status code (counter)

↑ Back to top

Metric Types

gauge: A metric that can go up or down (e.g., current_calls, cpu_idle)
counter: A metric that only increases (e.g., sessions_total, failed_scrapes)
summary: A metric that tracks quantiles over a sliding time window (e.g., gc_duration_seconds)

↑ Back to top

Usage

To scrape these metrics, configure your Prometheus server to scrape all three endpoints:

scrape_configs:
  - job_name: 'ims_as_system'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'ims_as_engine'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'

  - job_name: 'ims_as_media'
    static_configs:
      - targets: ['localhost:9093']
    metrics_path: '/esl'
    params:
      module: ['default']

↑ Back to top

Example Queries

Quick Links:

General Metrics (Port 9090)
Media Quality Metrics (Port 9093)
TAS Engine Metrics (Port 8080)

General Metrics

Current call volume:

freeswitch_current_calls

Gateway health:

freeswitch_sofia_gateway_status{status="UP"}

Average ping time to gateways:

avg(freeswitch_sofia_gateway_pingtime)

Sessions per second rate:

freeswitch_current_sps

Memory usage:

freeswitch_memory_uordblks

Media Quality Metrics

Call quality (MOS score):

rtp_audio_in_quality_mos

Audio quality percentage:

rtp_audio_in_quality_percent

Jitter rate:

rate(rtp_audio_in_jitter_packets_total[5m])

Packet loss rate:

rtp_audio_in_jitter_loss_rate

Average jitter:

avg(rtp_audio_in_jitter_seconds_max - rtp_audio_in_jitter_seconds_min)

RTP bandwidth (inbound):

rate(rtp_audio_in_media_bytes_total[1m]) * 8

Audio flaws detected:

increase(rtp_audio_in_flaw_total[5m])

TAS Engine Metrics

Active calls by type:

active_calls

Diameter peer health:

diameter_peer_state{application="sh"}

Call attempt rate:

rate(call_attempts_total[5m])

HLR lookup latency (95th percentile):

histogram_quantile(0.95, hlr_data_duration_milliseconds)

OCS authorization latency:

histogram_quantile(0.99, ocs_authorization_duration_milliseconds)

Subscriber data lookup rate:

rate(subscriber_data_lookups_total[5m])

Diameter request success rate:

rate(diameter_responses_total[5m]) / rate(diameter_requests_total[5m])

Event Socket connection status:

event_socket_connected

Mnesia transaction performance:

rate(erlang_mnesia_committed_transactions[5m])

Mnesia failed transaction rate:

rate(erlang_mnesia_failed_transactions[5m])

Erlang VM process count:

erlang_vm_process_count

Erlang VM memory usage:

erlang_vm_memory_bytes_total

Garbage collection rate:

rate(erlang_vm_statistics_garbage_collection_number_of_gcs[5m])

Scheduler run queue length:

erlang_vm_statistics_run_queues_length

ETS table count:

erlang_vm_memory_ets_tables

HTTP dialplan request duration (median):

histogram_quantile(0.5, http_dialplan_request_duration_milliseconds)

↑ Back to top

Grafana Dashboard Integration

The metrics can be visualized in Grafana using the Prometheus data source.

Recommended Dashboard Layout

Row 1: Call Volume & Health

Active calls gauge (active_calls)
Call attempts rate by type (rate(call_attempts_total[5m]))
Call rejection rate (rate(call_rejections_total[5m]))
Gateway health (freeswitch_sofia_gateway_status)

Row 2: Performance (Latency Percentiles)

P95 HTTP dialplan request time by call type
P95 Sh subscriber data lookup time
P95 HLR lookup time
P95 OCS authorization time
P95 Diameter response time by application

Row 3: Success Rates

Subscriber data lookup success rate
HLR lookup success rate
OCS authorization success rate
Diameter peer state

Row 4: Media Quality

Call quality MOS score (rtp_audio_in_quality_mos)
Audio quality percentage (rtp_audio_in_quality_percent)
Jitter statistics
Packet loss rate

Row 5: System Resources

Erlang VM process count
Erlang VM memory usage
ETS table count
Scheduler run queue length
Garbage collection rate

Row 6: Error Tracking

Call parameter errors
Authorization failures
Event Socket connection status
Mnesia transaction failures

Example Panel Queries

Active Calls by Type:

sum by (call_type) (active_calls)

P95 Dialplan Generation Latency:

histogram_quantile(0.95,
  rate(http_dialplan_request_duration_milliseconds_bucket[5m])
)

Diameter Success Rate:

rate(diameter_responses_total{result="success"}[5m]) /
rate(diameter_requests_total[5m]) * 100

Media Quality - Average MOS:

avg(rtp_audio_in_quality_mos)

↑ Back to top

Alerting Examples

Critical Alerts (Page Immediately)

System Down - No Call Attempts:

alert: SystemDown
expr: rate(call_attempts_total[5m]) == 0
for: 2m
labels:
  severity: critical
annotations:
  summary: "TAS system appears down - no call attempts"
  description: "No call attempts detected for 2 minutes"

Diameter Peer Down:

alert: DiameterPeerDown
expr: diameter_peer_state == 0
for: 1m
labels:
  severity: critical
annotations:
  summary: "Diameter peer {{ $labels.peer_host }} is down"
  description: "Peer for {{ $labels.application }} application is unavailable"

Event Socket Disconnected:

alert: EventSocketDisconnected
expr: event_socket_connected == 0
for: 30s
labels:
  severity: critical
annotations:
  summary: "Event Socket {{ $labels.connection_type }} disconnected"
  description: "Critical communication channel down"

High Severity Alerts

High Diameter Latency:

alert: HighDiameterLatency
expr: |
  histogram_quantile(0.95,
    rate(diameter_response_duration_milliseconds_bucket[5m])
  ) > 1000
for: 5m
labels:
  severity: high
annotations:
  summary: "High Diameter latency detected"
  description: "P95 latency is {{ $value }}ms"

OCS Authorization Failures:

alert: OCSAuthFailures
expr: |
  rate(ocs_authorization_attempts_total{result="no_credit"}[5m]) /
  rate(ocs_authorization_attempts_total[5m]) > 0.1
for: 5m
labels:
  severity: high
annotations:
  summary: "High rate of OCS no-credit responses"
  description: "{{ $value | humanizePercentage }} of requests denied credit"

High Call Rejection Rate:

alert: HighCallRejectionRate
expr: |
  rate(call_rejections_total[5m]) /
  rate(call_attempts_total[5m]) > 0.05
for: 5m
labels:
  severity: high
annotations:
  summary: "Call rejection rate above 5%"
  description: "{{ $value | humanizePercentage }} of calls rejected"

Poor Media Quality:

alert: PoorMediaQuality
expr: avg(rtp_audio_in_quality_mos) < 3.5
for: 3m
labels:
  severity: high
annotations:
  summary: "Poor call quality detected"
  description: "Average MOS score is {{ $value }}"

Warning Alerts

High Memory Usage:

alert: HighMemoryUsage
expr: |
  erlang_vm_memory_bytes_total{kind="processes"} /
  (erlang_vm_process_limit * 1000000) > 0.8
for: 10m
labels:
  severity: warning
annotations:
  summary: "Erlang VM memory usage high"
  description: "Process memory at {{ $value | humanizePercentage }}"

High Scheduler Run Queue:

alert: HighSchedulerRunQueue
expr: erlang_vm_statistics_run_queues_length > 10
for: 5m
labels:
  severity: warning
annotations:
  summary: "High scheduler run queue length"
  description: "Run queue length is {{ $value }}"

Mnesia Transaction Failures:

alert: MnesiaTransactionFailures
expr: rate(erlang_mnesia_failed_transactions[5m]) > 1
for: 5m
labels:
  severity: warning
annotations:
  summary: "Mnesia transaction failures detected"
  description: "{{ $value }} failures per second"

↑ Back to top

Troubleshooting with Metrics

Problem: Calls are slow

Investigation Steps:

Check overall dialplan generation time:

histogram_quantile(0.95, rate(http_dialplan_request_duration_milliseconds_bucket[5m]))

Break down by component:

# Subscriber data lookup
histogram_quantile(0.95, rate(subscriber_data_duration_milliseconds_bucket[5m]))

# HLR lookup
histogram_quantile(0.95, rate(hlr_data_duration_milliseconds_bucket[5m]))

# OCS authorization
histogram_quantile(0.95, rate(ocs_authorization_duration_milliseconds_bucket[5m]))

Check module-specific delays:

histogram_quantile(0.95,
  rate(dialplan_module_duration_milliseconds_bucket[5m])
) by (module)

Common Causes:

External system latency (HSS, HLR, OCS)
Network issues
Database contention
High system load

Problem: Calls are failing

Investigation Steps:

Check call rejection reasons:

sum by (reason) (rate(call_rejections_total[5m]))

Check authorization decisions:

sum by (decision) (rate(authorization_decisions_total[5m]))

Check Diameter peer health:

diameter_peer_state

Check Event Socket connection:

event_socket_connected

Problem: High load

Investigation Steps:

Check call volume:

rate(call_attempts_total[5m])
active_calls

Check Erlang VM resources:

erlang_vm_process_count
erlang_vm_statistics_run_queues_length
erlang_vm_memory_bytes_total

Check garbage collection:

rate(erlang_vm_statistics_garbage_collection_number_of_gcs[5m])

Problem: Poor Media Quality

Investigation Steps:

Check MOS scores:

rtp_audio_in_quality_mos
rtp_audio_in_quality_percent

Check jitter:

rtp_audio_in_jitter_seconds_max
rtp_audio_in_jitter_loss_rate

Check packet loss:

rtp_audio_in_skip_packets_total
rtp_audio_in_flaw_total

Check bandwidth usage:

rate(rtp_audio_in_media_bytes_total[1m]) * 8

↑ Back to top

Performance Baselines

Typical Values (Well-Tuned System)

Latency (P95):

HTTP dialplan request: 200-500ms
Subscriber data (Sh) lookup: 50-150ms
HLR data lookup: 100-300ms
OCS authorization: 100-250ms
Diameter requests: 50-200ms
Dialplan module processing: 10-50ms per module

Success Rates:

Call completion: >95%
Subscriber data lookups: >99%
HLR lookups: >98%
OCS authorizations: >99% (excluding legitimate no-credit)
Diameter peer uptime: >99.9%

Media Quality:

MOS score: >4.0
Audio quality percentage: >80%
Jitter: <30ms
Packet loss rate: <1%

System Resources:

Erlang process count: <50% of limit
Erlang memory usage: <70% of available
Scheduler run queue: <5
ETS tables: <1000

Capacity Planning

Per-Server Capacity (recommended maximums):

Concurrent calls: 500-1000 (depends on hardware)
Calls per second: 20-50 CPS
Registered subscribers: 10,000-50,000

Scaling Indicators (add capacity when):

Active calls consistently >70% of capacity
Erlang process count >70% of limit
P95 latency degrading
Scheduler run queues consistently >10

↑ Back to top

Best Practices

Monitoring Strategy

Set up dashboards for different audiences:
- Operations dashboard: Call volume, success rates, system health
- Engineering dashboard: Latency percentiles, error rates, resource usage
- Executive dashboard: High-level KPIs, uptime, cost metrics
Configure alerts at multiple levels:
- Critical: Page on-call (system down, major outage)
- High: Alert during business hours (degraded performance)
- Warning: Track in ticket system (potential issues)
Use appropriate time ranges:
- Real-time monitoring: 5-minute windows
- Troubleshooting: 15-minute to 1-hour windows
- Capacity planning: Daily/weekly aggregates
Focus on user impact:
- Prioritize end-to-end latency metrics
- Track success rates over individual error counters
- Monitor media quality for user experience

Query Performance

Use recording rules for frequently-used queries:

groups:
  - name: ims_as_aggregations
    interval: 30s
    rules:
      - record: job:call_attempts:rate5m
        expr: rate(call_attempts_total[5m])

      - record: job:dialplan_latency:p95
        expr: histogram_quantile(0.95, rate(http_dialplan_request_duration_milliseconds_bucket[5m]))

Avoid high-cardinality labels in queries (e.g., don't group by phone number)
Use appropriate rate intervals:
- Short-term trends: [5m]
- Medium-term trends: [1h]
- Long-term trends: [1d]

Metric Cardinality

Monitor cardinality to prevent Prometheus performance issues:

# Check metric cardinality
count by (__name__) ({__name__=~".+"})

High-cardinality risks:

Labels with unique values per call (phone numbers, call IDs)
Unbounded label values
Labels with >1000 unique values

Solution:

Use labels for categories, not unique identifiers
Aggregate high-cardinality data in external systems
Use recording rules to pre-aggregate

↑ Back to top

Table of Contents​

Metrics Endpoints​

Port 9090 - System Metrics​

Call and Session Metrics​

System Resource Metrics​

Memory Metrics​

Codec Status Metrics​

Endpoint Status Metrics​

Module Status Metrics​

Registration Metrics​

Sofia Gateway Metrics​

Exporter Health Metrics​

Port 8080 - TAS Engine Metrics​

Application Call Metrics​

Diameter Protocol Metrics​

Telephony Operations Metrics​

Online Charging System (OCS) Metrics​

Dialplan & Processing Metrics​

Event Socket Metrics​

Feature Usage Metrics​

SMS Trigger Metrics​

Erlang Mnesia Database Metrics​

Erlang VM Memory Metrics​

Erlang VM Statistics​

Erlang VM System Information​

Erlang VM Microstate Accounting (MSACC)​

Erlang VM Allocators​

Port 9093 - Media & Call Quality Metrics​

RTP Audio - Byte Counters​

RTP Audio - Packet Counters​

RTP Audio - Special Packet Types​

RTP Audio - Jitter & Quality Metrics​

RTCP Metrics​

Go Runtime Metrics​

Process Metrics​

Prometheus HTTP Metrics​

Metric Types​

Usage​

Example Queries​

General Metrics​

Media Quality Metrics​

TAS Engine Metrics​

Grafana Dashboard Integration​

Recommended Dashboard Layout​

Example Panel Queries​

Alerting Examples​

Critical Alerts (Page Immediately)​

High Severity Alerts​

Warning Alerts​

Troubleshooting with Metrics​

Problem: Calls are slow​

Problem: Calls are failing​

Problem: High load​

Problem: Poor Media Quality​

Performance Baselines​

Typical Values (Well-Tuned System)​

Capacity Planning​

Best Practices​

Monitoring Strategy​

Query Performance​

Metric Cardinality​

Table of Contents

Metrics Endpoints

Port 9090 - System Metrics

Call and Session Metrics

System Resource Metrics

Memory Metrics

Codec Status Metrics

Endpoint Status Metrics

Module Status Metrics

Registration Metrics

Sofia Gateway Metrics

Exporter Health Metrics

Port 8080 - TAS Engine Metrics

Application Call Metrics

Diameter Protocol Metrics

Telephony Operations Metrics

Online Charging System (OCS) Metrics

Dialplan & Processing Metrics

Event Socket Metrics

Feature Usage Metrics

SMS Trigger Metrics

Erlang Mnesia Database Metrics

Erlang VM Memory Metrics

Erlang VM Statistics

Erlang VM System Information

Erlang VM Microstate Accounting (MSACC)

Erlang VM Allocators

Port 9093 - Media & Call Quality Metrics

RTP Audio - Byte Counters

RTP Audio - Packet Counters

RTP Audio - Special Packet Types

RTP Audio - Jitter & Quality Metrics

RTCP Metrics

Go Runtime Metrics

Process Metrics

Prometheus HTTP Metrics

Metric Types

Usage

Example Queries

General Metrics

Media Quality Metrics

TAS Engine Metrics

Grafana Dashboard Integration

Recommended Dashboard Layout

Example Panel Queries

Alerting Examples

Critical Alerts (Page Immediately)

High Severity Alerts

Warning Alerts

Troubleshooting with Metrics

Problem: Calls are slow

Problem: Calls are failing

Problem: High load

Problem: Poor Media Quality

Performance Baselines

Typical Values (Well-Tuned System)

Capacity Planning

Best Practices

Monitoring Strategy

Query Performance

Metric Cardinality