Skip to content

Monitoring

PuppyGraph makes it easy to keep an eye on your system's health and performance. With built-in health check endpoints and detailed metrics collection, you can quickly monitor how your PuppyGraph instances are running and catch any issues early.

Health Check

PuppyGraph provides a health check endpoint to monitor the status and readiness of PuppyGraph instances. This is particularly useful for orchestration platforms like Kubernetes and load balancers to determine if a PuppyGraph instance is ready to accept traffic.

Endpoint

The health check endpoint is accessible at :8081/healthz.

When healthy, the endpoint returns 200 OK. If the server is not ready or experiencing issues, the endpoint may return a non-200 status code or be unavailable.

Usage

When the server is starting up, the health check endpoint may not be immediately available. It's recommended to configure your health check with appropriate retries and timeouts to allow for the server to initialize properly.

Docker Container Health Check

For Docker deployments, you can configure a health check in your docker-compose.yml:

services:
  puppygraph:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8081/healthz"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 60s

Kubernetes readiness probe (cluster deployments)

The health check endpoint is commonly used in Kubernetes deployments as a readiness probe. This ensures that traffic is only routed to PuppyGraph instances that are fully initialized and ready to process requests.

In cluster deployments with multiple nodes, each node exposes its own health check endpoint. The cluster management component and load balancers monitor the health of each node and stop routing traffic to nodes that are marked unhealthy, helping to maintain overall system stability.

Example Kubernetes readiness probe configuration:

readinessProbe:
  httpGet:
    path: /healthz
    port: 8081
  initialDelaySeconds: 60
  periodSeconds: 10
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 3

For more information about cluster deployment, see Cluster Deployment.

Metrics

PuppyGraph supports Prometheus / OpenMetrics format for metrics collection.

Specification

Endpoint

Once enabled, the metrics endpoint becomes accessible at :8081/metrics.

Available metrics

Metric Name Description
puppy_gremlin_errors_count gremlin server error count
puppy_gremlin_errors_fifteenminuterate gremlin server mean error rate in 15 minutes
puppy_gremlin_errors_fiveminuterate gremlin server mean error rate in 5 minutes
puppy_gremlin_errors_meanrate gremlin server mean error rate
puppy_gremlin_errors_oneminuterate gremlin server mean error rate in 1 minute
puppy_gremlin_op_eval_count gremlin server op eval count
puppy_gremlin_op_eval_fifteenminuterate gremlin server op eval mean rate in 15 minutes
puppy_gremlin_op_eval_fiveminuterate gremlin server op eval mean rate in 5 minutes
puppy_gremlin_op_eval_max gremlin server op eval max time cost
puppy_gremlin_op_eval_mean gremlin server op eval mean time cost
puppy_gremlin_op_eval_meanrate gremlin server op eval mean rate
puppy_gremlin_op_eval_min gremlin server op eval min time cost
puppy_gremlin_op_eval_oneminuterate gremlin server op eval mean rate in 1 minute
puppygraph_client_status aliveness of client, including gotty, bolt and notebook
puppygraph_gremlin_server_status aliveness of gremlin server
puppygraph_node_alive aliveness of nodes in PuppyGraph cluster

metrics of name starts with puppy_gremlin are available when metrics for gremlin server is enabled.

Configuration

Environment Variable Default Value Description
METRICS_ENABLED false Enables the metrics endpoint.
METRICS_AUTH_ENABLED true Enables basic auth (using PuppyGraph credentials) for the metrics endpoint.
GREMLINSERVER_METRICS_ENABLED false Enables metrics collection for the Gremlin server.

Prometheus Integration

Use the following Prometheus scrape configuration as a template for collecting PuppyGraph metrics. Replace the placeholder values to match your environment.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: puppygraph
    metrics_path: /metrics
    static_configs:
      - targets: ["<PUPPYGRAPH_HOST>:8081"]
    basic_auth:
      username: <YOUR_PUPPYGRAPH_USERNAME>
      password: <YOUR_PUPPYGRAPH_PASSWORD>
  • Set <PUPPYGRAPH_HOST> to the address where PuppyGraph exposes /metrics.
  • Replace <YOUR_PUPPYGRAPH_USERNAME> and <YOUR_PUPPYGRAPH_PASSWORD> with valid PuppyGraph credentials.
  • Adjust the scrape intervals to match your monitoring requirements.

Datadog Integration

Here's an example of how to integrate PuppyGraph's Prometheus endpoint with Datadog for monitoring.

Run Datadog agent

Follow Start the Datadog Agent with Docker to start Datadog agent.

If metrics authentication is enabled, you need to set PUPPYGRAPH_USERNAME and PUPPYGRAPH_PASSWORD as environment variables in the Datadog agent container:

docker run -d --name dd-agent \
  -e PUPPYGRAPH_USERNAME="<YOUR_PUPPYGRAPH_USERNAME>" \
  -e PUPPYGRAPH_PASSWORD="<YOUR_PUPPYGRAPH_PASSWORD>" \
  -e DD_API_KEY="<DATADOG_API_KEY>" \
  -e DD_SITE="<YOUR_DATADOG_SITE>" \
  -e DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -v /var/lib/docker/containers:/var/lib/docker/containers:ro \
  gcr.io/datadoghq/agent:latest

Run PuppyGraph with Datadog labels

Add labels to a PuppyGraph container to make Datadog to discover and scrape its metrics.

For example, if you are running PuppyGraph with Docker, you can add the following arguments to the docker run command:

With metric authentication enabled:

-l com.datadoghq.ad.check_names='["openmetrics"]' \
-l com.datadoghq.ad.init_configs='[{}]' \
-l com.datadoghq.ad.instances="[{\"openmetrics_endpoint\":\"http://%%host%%:8081/metrics\",\"username\":\"%%env_PUPPYGRAPH_USERNAME%%\",\"password\":\"%%env_PUPPYGRAPH_PASSWORD%%\",\"namespace\":\"puppy\",\"metrics\":[\".*\"]}]" \

With metric authentication disabled:

-l com.datadoghq.ad.check_names='["openmetrics"]' \
-l com.datadoghq.ad.init_configs='[{}]' \
-l com.datadoghq.ad.instances="[{\"openmetrics_endpoint\":\"http://%%host%%:8081/metrics\",\"namespace\":\"puppy\",\"metrics\":[\".*\"]}]" \