Monitoring
This page is not applicable to ClickHouse Cloud. The procedure documented here is automated in ClickHouse Cloud services.
You can monitor:
- Utilization of hardware resources.
- ClickHouse server metrics.
Built-in observability dashboard
ClickHouse comes with a built-in observability dashboard feature which can be accessed by $HOST:$PORT/dashboard
(requires user and password) that shows the following metrics:
- Queries/second
- CPU usage (cores)
- Queries running
- Merges running
- Selected bytes/second
- IO wait
- CPU wait
- OS CPU Usage (userspace)
- OS CPU Usage (kernel)
- Read from disk
- Read from filesystem
- Memory (tracked)
- Inserted rows/second
- Total MergeTree parts
- Max parts for partition
Resource Utilization
ClickHouse also monitors the state of hardware resources by itself such as:
- Load and temperature on processors.
- Utilization of storage system, RAM and network.
This data is collected in the system.asynchronous_metric_log
table.
ClickHouse Server Metrics
ClickHouse server has embedded instruments for self-state monitoring.
To track server events use server logs. See the logger section of the configuration file.
ClickHouse collects:
- Different metrics of how the server uses computational resources.
- Common statistics on query processing.
You can find metrics in the system.metrics, system.events, and system.asynchronous_metrics tables.
You can configure ClickHouse to export metrics to Graphite. See the Graphite section in the ClickHouse server configuration file. Before configuring export of metrics, you should set up Graphite by following their official guide.
You can configure ClickHouse to export metrics to Prometheus. See the Prometheus section in the ClickHouse server configuration file. Before configuring export of metrics, you should set up Prometheus by following their official guide.
Additionally, you can monitor server availability through the HTTP API. Send the HTTP GET
request to /ping
. If the server is available, it responds with 200 OK
.
To monitor servers in a cluster configuration, you should set the max_replica_delay_for_distributed_queries parameter and use the HTTP resource /replicas_status
. A request to /replicas_status
returns 200 OK
if the replica is available and is not delayed behind the other replicas. If a replica is delayed, it returns 503 HTTP_SERVICE_UNAVAILABLE
with information about the gap.