Skip to content

Querying Polaris Data as a Graph

Summary

In this tutorial, you will:

  1. Start Apache Polaris, MinIO, Spark, and PuppyGraph locally with Docker Compose.
  2. Create an Iceberg catalog named modern_catalog and load sample data into it.
  3. Upload a graph schema JSON to PuppyGraph and query the data as a graph.

Prerequisites

  • Docker
  • Docker Compose

▶ Verify that both commands are available:

docker version
docker compose version

Accessing the PuppyGraph Web UI requires a browser. If you prefer a CLI-only workflow, you can upload the graph schema with curl and connect to Gremlin on port 18182.

Note

This tutorial uses demo credentials and local container hostnames for a self-contained setup:

  • Polaris bootstrap credential: root:s3cr3t
  • MinIO access key: minio_root
  • MinIO secret key: m1n1opwd
  • PuppyGraph login: puppygraph / puppygraph123

Deployment and Data Preparation

This tutorial uses four containers:

  • polaris-minio for S3-compatible object storage
  • polaris for the Iceberg REST catalog
  • polaris-spark-iceberg to create the sample Iceberg tables
  • polaris-puppygraph for the PuppyGraph server

Create docker-compose.yaml

▶ Create a file named docker-compose.yaml with the following content:

docker-compose.yaml
services:
  polaris-minio:
    container_name: polaris-minio
    image: quay.io/minio/minio:RELEASE.2025-09-07T16-13-09Z
    restart: always
    command: ["server", "/data", "--console-address", ":9001"]
    environment:
      MINIO_ROOT_USER: minio_root
      MINIO_ROOT_PASSWORD: m1n1opwd
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://127.0.0.1:9000/minio/health/live"]
      interval: 2s
      timeout: 10s
      retries: 30
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - polaris-minio-data:/data

  polaris:
    container_name: polaris
    image: apache/polaris:latest
    restart: always
    depends_on:
      polaris-minio:
        condition: service_healthy
    environment:
      AWS_REGION: us-west-2
      AWS_ACCESS_KEY_ID: minio_root
      AWS_SECRET_ACCESS_KEY: m1n1opwd
      POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,s3cr3t
      polaris.realm-context.realms: POLARIS
      quarkus.otel.sdk.disabled: "true"
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://127.0.0.1:8182/q/health"]
      interval: 2s
      timeout: 10s
      retries: 30
      start_period: 10s
    ports:
      - "8181:8181"
      - "8182:8182"

  polaris-spark-iceberg:
    container_name: polaris-spark-iceberg
    image: tabulario/spark-iceberg
    restart: always
    depends_on:
      polaris:
        condition: service_healthy
    entrypoint: ["/bin/bash", "-lc", "tail -f /dev/null"]

  polaris-puppygraph:
    image: puppygraph/puppygraph:stable
    pull_policy: always
    container_name: puppy
    restart: always
    networks:
      default:
    environment:
      - PUPPYGRAPH_USERNAME=puppygraph
      - PUPPYGRAPH_PASSWORD=puppygraph123
    ports:
      - "8081:8081"
      - "18182:8182"
      - "7687:7687"
    depends_on:
      - polaris-spark-iceberg

volumes:
  polaris-minio-data:

networks:
  default:
    name: polaris_net

Create setup-polaris.sh

▶ Create a file named setup-polaris.sh with the following content. This script starts the containers, creates the MinIO bucket, provisions the Polaris catalog, grants permissions, and loads the sample Iceberg tables.

setup-polaris.sh
#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)
cd "$SCRIPT_DIR"

docker compose up -d

ROOT_TOKEN=$(
  docker run --rm --network polaris_net alpine/curl:8.17.0 sh -lc '
    apk add --no-cache jq >/dev/null
    curl -s --user root:s3cr3t \
      -H "Polaris-Realm: POLARIS" \
      -d grant_type=client_credentials \
      -d scope=PRINCIPAL_ROLE:ALL \
      http://polaris:8181/api/catalog/v1/oauth/tokens | jq -r .access_token
  '
)

docker run --rm --network polaris_net --entrypoint /bin/sh quay.io/minio/mc:RELEASE.2025-08-13T08-35-41Z -c '
  mc alias set pol http://polaris-minio:9000 minio_root m1n1opwd >/dev/null &&
  mc mb --ignore-existing pol/bucket457
'

docker run --rm --network polaris_net -e ROOT_TOKEN="$ROOT_TOKEN" alpine/curl:8.17.0 sh -lc '
  apk add --no-cache jq >/dev/null

  if ! curl -s http://polaris:8181/api/management/v1/catalogs \
    -H "Authorization: Bearer ${ROOT_TOKEN}" \
    -H "Polaris-Realm: POLARIS" | jq -e ".catalogs[]? | select(.name == \"modern_catalog\")" >/dev/null; then
    curl -s -X POST http://polaris:8181/api/management/v1/catalogs \
      -H "Authorization: Bearer ${ROOT_TOKEN}" \
      -H "Polaris-Realm: POLARIS" \
      -H "Content-Type: application/json" \
      -d "{\"catalog\":{\"name\":\"modern_catalog\",\"type\":\"INTERNAL\",\"readOnly\":false,\"properties\":{\"default-base-location\":\"s3://bucket457/modern_catalog\"},\"storageConfigInfo\":{\"storageType\":\"S3\",\"allowedLocations\":[\"s3://bucket457/modern_catalog\",\"s3://bucket457\"],\"endpoint\":\"http://polaris-minio:9000\",\"endpointInternal\":\"http://polaris-minio:9000\",\"pathStyleAccess\":true}}}"
  fi

  curl -s -X PUT http://polaris:8181/api/management/v1/catalogs/modern_catalog/catalog-roles/catalog_admin/grants \
    -H "Authorization: Bearer ${ROOT_TOKEN}" \
    -H "Polaris-Realm: POLARIS" \
    -H "Content-Type: application/json" \
    -d "{\"type\":\"catalog\",\"privilege\":\"TABLE_WRITE_DATA\"}"

  curl -s -X PUT http://polaris:8181/api/management/v1/principal-roles/service_admin/catalog-roles/modern_catalog \
    -H "Authorization: Bearer ${ROOT_TOKEN}" \
    -H "Polaris-Realm: POLARIS" \
    -H "Content-Type: application/json" \
    -d "{\"name\":\"catalog_admin\"}"
'

docker compose exec -T -e ROOT_TOKEN="$ROOT_TOKEN" polaris-spark-iceberg bash -lc '
  cat >/tmp/prepare.sql
  /opt/spark/bin/spark-sql \
    --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.1,org.apache.iceberg:iceberg-aws-bundle:1.10.1 \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.polaris.type=rest \
    --conf spark.sql.catalog.polaris.uri=http://polaris:8181/api/catalog \
    --conf spark.sql.catalog.polaris.oauth2-server-uri=http://polaris:8181/api/catalog/v1/oauth/tokens \
    --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
    --conf spark.sql.catalog.polaris.client.region=us-west-2 \
    --conf spark.sql.catalog.polaris.token="${ROOT_TOKEN}" \
    --conf spark.sql.catalog.polaris.warehouse=modern_catalog \
    --conf spark.sql.defaultCatalog=polaris \
    -f /tmp/prepare.sql
' <<'SQL'
CREATE DATABASE IF NOT EXISTS modern;

CREATE TABLE modern.person (id string, name string, age int) USING iceberg;
INSERT INTO modern.person VALUES
    ('v1', 'marko', 29),
    ('v2', 'vadas', 27),
    ('v4', 'josh', 32),
    ('v6', 'peter', 35);

CREATE TABLE modern.software (id string, name string, lang string) USING iceberg;
INSERT INTO modern.software VALUES
    ('v3', 'lop', 'java'),
    ('v5', 'ripple', 'java');

CREATE TABLE modern.created (id string, from_id string, to_id string, weight double) USING iceberg;
INSERT INTO modern.created VALUES
    ('e9', 'v1', 'v3', 0.4),
    ('e10', 'v4', 'v5', 1.0),
    ('e11', 'v4', 'v3', 0.4),
    ('e12', 'v6', 'v3', 0.2);

CREATE TABLE modern.knows (id string, from_id string, to_id string, weight double) USING iceberg;
INSERT INTO modern.knows VALUES
    ('e7', 'v1', 'v2', 0.5),
    ('e8', 'v1', 'v4', 1.0);

SELECT COUNT(*) AS person_cnt FROM modern.person;
SELECT COUNT(*) AS software_cnt FROM modern.software;
SELECT COUNT(*) AS created_cnt FROM modern.created;
SELECT COUNT(*) AS knows_cnt FROM modern.knows;
SQL

Run the setup

▶ Run the following commands from the directory that contains both files:

chmod +x setup-polaris.sh
./setup-polaris.sh

▶ When the script completes:

  • Polaris is available at http://localhost:8181/api/catalog
  • PuppyGraph Web UI is available at http://localhost:8081
  • PuppyGraph Gremlin is available at localhost:18182
  • The sample Iceberg catalog is modern_catalog
  • The sample namespace is modern

The script creates the following tables:

id name age
v1 marko 29
v2 vadas 27
v4 josh 32
v6 peter 35
id name lang
v3 lop java
v5 ripple java
id from_id to_id weight
e9 v1 v3 0.4
e10 v4 v5 1.0
e11 v4 v3 0.4
e12 v6 v3 0.2
id from_id to_id weight
e7 v1 v2 0.5
e8 v1 v4 1.0

Modeling the Graph

PuppyGraph models external data sources by using a graph schema JSON. In this tutorial, the schema connects PuppyGraph to Polaris through the Iceberg REST catalog and maps the four sample tables to the Modern graph.

Create schema.json

▶ Create a file named schema.json with the following content:

schema.json
{
  "catalogs": [
    {
      "name": "polaris_data",
      "type": "iceberg",
      "metastore": {
        "type": "rest",
        "uri": "http://polaris:8181/api/catalog",
        "warehouse": "modern_catalog",
        "security": "oauth2",
        "credential": "root:s3cr3t",
        "scope": "PRINCIPAL_ROLE:ALL",
        "oauthServerUri": "http://polaris:8181/api/catalog/v1/oauth/tokens"
      },
      "storage": {
        "type": "S3",
        "useInstanceProfile": "false",
        "accessKey": "minio_root",
        "secretKey": "m1n1opwd",
        "enableSsl": "false",
        "endpoint": "http://polaris-minio:9000",
        "enablePathStyleAccess": "true"
      }
    }
  ],
  "graph": {
    "vertices": [
      {
        "label": "person",
        "oneToOne": {
          "tableSource": {
            "catalog": "polaris_data",
            "schema": "modern",
            "table": "person"
          },
          "id": {
            "fields": [
              {
                "type": "String",
                "field": "id",
                "alias": "id"
              }
            ]
          },
          "attributes": [
            {
              "type": "String",
              "field": "name",
              "alias": "name"
            },
            {
              "type": "Int",
              "field": "age",
              "alias": "age"
            }
          ]
        }
      },
      {
        "label": "software",
        "oneToOne": {
          "tableSource": {
            "catalog": "polaris_data",
            "schema": "modern",
            "table": "software"
          },
          "id": {
            "fields": [
              {
                "type": "String",
                "field": "id",
                "alias": "id"
              }
            ]
          },
          "attributes": [
            {
              "type": "String",
              "field": "name",
              "alias": "name"
            },
            {
              "type": "String",
              "field": "lang",
              "alias": "lang"
            }
          ]
        }
      }
    ],
    "edges": [
      {
        "label": "created",
        "fromVertex": "person",
        "toVertex": "software",
        "tableSource": {
          "catalog": "polaris_data",
          "schema": "modern",
          "table": "created"
        },
        "id": {
          "fields": [
            {
              "type": "String",
              "field": "id",
              "alias": "id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "String",
              "field": "from_id",
              "alias": "from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "String",
              "field": "to_id",
              "alias": "to_id"
            }
          ]
        },
        "attributes": [
          {
            "type": "Double",
            "field": "weight",
            "alias": "weight"
          }
        ]
      },
      {
        "label": "knows",
        "fromVertex": "person",
        "toVertex": "person",
        "tableSource": {
          "catalog": "polaris_data",
          "schema": "modern",
          "table": "knows"
        },
        "id": {
          "fields": [
            {
              "type": "String",
              "field": "id",
              "alias": "id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "String",
              "field": "from_id",
              "alias": "from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "String",
              "field": "to_id",
              "alias": "to_id"
            }
          ]
        },
        "attributes": [
          {
            "type": "Double",
            "field": "weight",
            "alias": "weight"
          }
        ]
      }
    ]
  }
}

Note

The schema uses the container hostnames polaris and polaris-minio. These names resolve correctly because PuppyGraph is running in the same Docker Compose network.

Upload the graph schema

You can upload the schema in either of the following ways.

Option 1: Web UI

▶ Log into PuppyGraph Web UI at http://localhost:8081 with username puppygraph and password puppygraph123.

  1. Open http://localhost:8081.
  2. Log in with username puppygraph and password puppygraph123.
  3. Go to the schema page.
  4. Use the Upload Graph Schema JSON section to upload schema.json.

Option 2: REST API

▶ Alternatively, run the following command to upload the schema file:

curl -X POST \
  -H "content-type: application/json" \
  --data-binary @./schema.json \
  --user "puppygraph:puppygraph123" \
  http://localhost:8081/schema

After the upload succeeds, PuppyGraph will visualize the schema and make the graph queryable.

Querying the Graph

▶ After the schema is loaded, open the PuppyGraph Dashboard to inspect the graph summary, or use the Query UI to explore the data interactively.

▶ Run the following example Gremlin queries:

g.V().hasLabel('person').valueMap('name', 'age')

g.V().hasLabel('person').has('name', 'marko').out('knows').values('name')

g.V().hasLabel('person').has('name', 'josh').
  outE('created').
  project('software', 'weight').
    by(inV().values('name')).
    by(values('weight'))

With the sample data in this tutorial, the graph contains:

  • 4 person vertices
  • 2 software vertices
  • 2 knows edges
  • 4 created edges

Cleaning up

▶ Run the following command to stop and remove the containers, network, and volume created by this tutorial:

docker compose down -v