Querying Polaris Data as a Graph
Summary
In this tutorial, you will:
- Start Apache Polaris, MinIO, Spark, and PuppyGraph locally with Docker Compose.
- Create an Iceberg catalog named
modern_catalogand load sample data into it. - Upload a graph schema JSON to PuppyGraph and query the data as a graph.
Prerequisites
- Docker
- Docker Compose
Verify that both commands are available:
Accessing the PuppyGraph Web UI requires a browser. If you prefer a CLI-only workflow, you can upload the graph schema with curl and connect to Gremlin on port 18182.
Note
This tutorial uses demo credentials and local container hostnames for a self-contained setup:
- Polaris bootstrap credential:
root:s3cr3t - MinIO access key:
minio_root - MinIO secret key:
m1n1opwd - PuppyGraph login:
puppygraph/puppygraph123
Deployment and Data Preparation
This tutorial uses four containers:
polaris-miniofor S3-compatible object storagepolarisfor the Iceberg REST catalogpolaris-spark-icebergto create the sample Iceberg tablespolaris-puppygraphfor the PuppyGraph server
Create docker-compose.yaml
Create a file named
docker-compose.yaml with the following content:
docker-compose.yaml
services:
polaris-minio:
container_name: polaris-minio
image: quay.io/minio/minio:RELEASE.2025-09-07T16-13-09Z
restart: always
command: ["server", "/data", "--console-address", ":9001"]
environment:
MINIO_ROOT_USER: minio_root
MINIO_ROOT_PASSWORD: m1n1opwd
healthcheck:
test: ["CMD", "curl", "--fail", "http://127.0.0.1:9000/minio/health/live"]
interval: 2s
timeout: 10s
retries: 30
ports:
- "9000:9000"
- "9001:9001"
volumes:
- polaris-minio-data:/data
polaris:
container_name: polaris
image: apache/polaris:latest
restart: always
depends_on:
polaris-minio:
condition: service_healthy
environment:
AWS_REGION: us-west-2
AWS_ACCESS_KEY_ID: minio_root
AWS_SECRET_ACCESS_KEY: m1n1opwd
POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,s3cr3t
polaris.realm-context.realms: POLARIS
quarkus.otel.sdk.disabled: "true"
healthcheck:
test: ["CMD", "curl", "--fail", "http://127.0.0.1:8182/q/health"]
interval: 2s
timeout: 10s
retries: 30
start_period: 10s
ports:
- "8181:8181"
- "8182:8182"
polaris-spark-iceberg:
container_name: polaris-spark-iceberg
image: tabulario/spark-iceberg
restart: always
depends_on:
polaris:
condition: service_healthy
entrypoint: ["/bin/bash", "-lc", "tail -f /dev/null"]
polaris-puppygraph:
image: puppygraph/puppygraph:stable
pull_policy: always
container_name: puppy
restart: always
networks:
default:
environment:
- PUPPYGRAPH_USERNAME=puppygraph
- PUPPYGRAPH_PASSWORD=puppygraph123
ports:
- "8081:8081"
- "18182:8182"
- "7687:7687"
depends_on:
- polaris-spark-iceberg
volumes:
polaris-minio-data:
networks:
default:
name: polaris_net
Create setup-polaris.sh
Create a file named
setup-polaris.sh with the following content. This script starts the containers, creates the MinIO bucket, provisions the Polaris catalog, grants permissions, and loads the sample Iceberg tables.
setup-polaris.sh
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)
cd "$SCRIPT_DIR"
docker compose up -d
ROOT_TOKEN=$(
docker run --rm --network polaris_net alpine/curl:8.17.0 sh -lc '
apk add --no-cache jq >/dev/null
curl -s --user root:s3cr3t \
-H "Polaris-Realm: POLARIS" \
-d grant_type=client_credentials \
-d scope=PRINCIPAL_ROLE:ALL \
http://polaris:8181/api/catalog/v1/oauth/tokens | jq -r .access_token
'
)
docker run --rm --network polaris_net --entrypoint /bin/sh quay.io/minio/mc:RELEASE.2025-08-13T08-35-41Z -c '
mc alias set pol http://polaris-minio:9000 minio_root m1n1opwd >/dev/null &&
mc mb --ignore-existing pol/bucket457
'
docker run --rm --network polaris_net -e ROOT_TOKEN="$ROOT_TOKEN" alpine/curl:8.17.0 sh -lc '
apk add --no-cache jq >/dev/null
if ! curl -s http://polaris:8181/api/management/v1/catalogs \
-H "Authorization: Bearer ${ROOT_TOKEN}" \
-H "Polaris-Realm: POLARIS" | jq -e ".catalogs[]? | select(.name == \"modern_catalog\")" >/dev/null; then
curl -s -X POST http://polaris:8181/api/management/v1/catalogs \
-H "Authorization: Bearer ${ROOT_TOKEN}" \
-H "Polaris-Realm: POLARIS" \
-H "Content-Type: application/json" \
-d "{\"catalog\":{\"name\":\"modern_catalog\",\"type\":\"INTERNAL\",\"readOnly\":false,\"properties\":{\"default-base-location\":\"s3://bucket457/modern_catalog\"},\"storageConfigInfo\":{\"storageType\":\"S3\",\"allowedLocations\":[\"s3://bucket457/modern_catalog\",\"s3://bucket457\"],\"endpoint\":\"http://polaris-minio:9000\",\"endpointInternal\":\"http://polaris-minio:9000\",\"pathStyleAccess\":true}}}"
fi
curl -s -X PUT http://polaris:8181/api/management/v1/catalogs/modern_catalog/catalog-roles/catalog_admin/grants \
-H "Authorization: Bearer ${ROOT_TOKEN}" \
-H "Polaris-Realm: POLARIS" \
-H "Content-Type: application/json" \
-d "{\"type\":\"catalog\",\"privilege\":\"TABLE_WRITE_DATA\"}"
curl -s -X PUT http://polaris:8181/api/management/v1/principal-roles/service_admin/catalog-roles/modern_catalog \
-H "Authorization: Bearer ${ROOT_TOKEN}" \
-H "Polaris-Realm: POLARIS" \
-H "Content-Type: application/json" \
-d "{\"name\":\"catalog_admin\"}"
'
docker compose exec -T -e ROOT_TOKEN="$ROOT_TOKEN" polaris-spark-iceberg bash -lc '
cat >/tmp/prepare.sql
/opt/spark/bin/spark-sql \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.1,org.apache.iceberg:iceberg-aws-bundle:1.10.1 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.polaris.type=rest \
--conf spark.sql.catalog.polaris.uri=http://polaris:8181/api/catalog \
--conf spark.sql.catalog.polaris.oauth2-server-uri=http://polaris:8181/api/catalog/v1/oauth/tokens \
--conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
--conf spark.sql.catalog.polaris.client.region=us-west-2 \
--conf spark.sql.catalog.polaris.token="${ROOT_TOKEN}" \
--conf spark.sql.catalog.polaris.warehouse=modern_catalog \
--conf spark.sql.defaultCatalog=polaris \
-f /tmp/prepare.sql
' <<'SQL'
CREATE DATABASE IF NOT EXISTS modern;
CREATE TABLE modern.person (id string, name string, age int) USING iceberg;
INSERT INTO modern.person VALUES
('v1', 'marko', 29),
('v2', 'vadas', 27),
('v4', 'josh', 32),
('v6', 'peter', 35);
CREATE TABLE modern.software (id string, name string, lang string) USING iceberg;
INSERT INTO modern.software VALUES
('v3', 'lop', 'java'),
('v5', 'ripple', 'java');
CREATE TABLE modern.created (id string, from_id string, to_id string, weight double) USING iceberg;
INSERT INTO modern.created VALUES
('e9', 'v1', 'v3', 0.4),
('e10', 'v4', 'v5', 1.0),
('e11', 'v4', 'v3', 0.4),
('e12', 'v6', 'v3', 0.2);
CREATE TABLE modern.knows (id string, from_id string, to_id string, weight double) USING iceberg;
INSERT INTO modern.knows VALUES
('e7', 'v1', 'v2', 0.5),
('e8', 'v1', 'v4', 1.0);
SELECT COUNT(*) AS person_cnt FROM modern.person;
SELECT COUNT(*) AS software_cnt FROM modern.software;
SELECT COUNT(*) AS created_cnt FROM modern.created;
SELECT COUNT(*) AS knows_cnt FROM modern.knows;
SQL
Run the setup
Run the following commands from the directory that contains both files:
When the script completes:
- Polaris is available at
http://localhost:8181/api/catalog - PuppyGraph Web UI is available at
http://localhost:8081 - PuppyGraph Gremlin is available at
localhost:18182 - The sample Iceberg catalog is
modern_catalog - The sample namespace is
modern
The script creates the following tables:
| id | name | age |
|---|---|---|
| v1 | marko | 29 |
| v2 | vadas | 27 |
| v4 | josh | 32 |
| v6 | peter | 35 |
| id | name | lang |
|---|---|---|
| v3 | lop | java |
| v5 | ripple | java |
| id | from_id | to_id | weight |
|---|---|---|---|
| e9 | v1 | v3 | 0.4 |
| e10 | v4 | v5 | 1.0 |
| e11 | v4 | v3 | 0.4 |
| e12 | v6 | v3 | 0.2 |
| id | from_id | to_id | weight |
|---|---|---|---|
| e7 | v1 | v2 | 0.5 |
| e8 | v1 | v4 | 1.0 |
Modeling the Graph
PuppyGraph models external data sources by using a graph schema JSON. In this tutorial, the schema connects PuppyGraph to Polaris through the Iceberg REST catalog and maps the four sample tables to the Modern graph.
Create schema.json
Create a file named
schema.json with the following content:
schema.json
{
"catalogs": [
{
"name": "polaris_data",
"type": "iceberg",
"metastore": {
"type": "rest",
"uri": "http://polaris:8181/api/catalog",
"warehouse": "modern_catalog",
"security": "oauth2",
"credential": "root:s3cr3t",
"scope": "PRINCIPAL_ROLE:ALL",
"oauthServerUri": "http://polaris:8181/api/catalog/v1/oauth/tokens"
},
"storage": {
"type": "S3",
"useInstanceProfile": "false",
"accessKey": "minio_root",
"secretKey": "m1n1opwd",
"enableSsl": "false",
"endpoint": "http://polaris-minio:9000",
"enablePathStyleAccess": "true"
}
}
],
"graph": {
"vertices": [
{
"label": "person",
"oneToOne": {
"tableSource": {
"catalog": "polaris_data",
"schema": "modern",
"table": "person"
},
"id": {
"fields": [
{
"type": "String",
"field": "id",
"alias": "id"
}
]
},
"attributes": [
{
"type": "String",
"field": "name",
"alias": "name"
},
{
"type": "Int",
"field": "age",
"alias": "age"
}
]
}
},
{
"label": "software",
"oneToOne": {
"tableSource": {
"catalog": "polaris_data",
"schema": "modern",
"table": "software"
},
"id": {
"fields": [
{
"type": "String",
"field": "id",
"alias": "id"
}
]
},
"attributes": [
{
"type": "String",
"field": "name",
"alias": "name"
},
{
"type": "String",
"field": "lang",
"alias": "lang"
}
]
}
}
],
"edges": [
{
"label": "created",
"fromVertex": "person",
"toVertex": "software",
"tableSource": {
"catalog": "polaris_data",
"schema": "modern",
"table": "created"
},
"id": {
"fields": [
{
"type": "String",
"field": "id",
"alias": "id"
}
]
},
"fromId": {
"fields": [
{
"type": "String",
"field": "from_id",
"alias": "from_id"
}
]
},
"toId": {
"fields": [
{
"type": "String",
"field": "to_id",
"alias": "to_id"
}
]
},
"attributes": [
{
"type": "Double",
"field": "weight",
"alias": "weight"
}
]
},
{
"label": "knows",
"fromVertex": "person",
"toVertex": "person",
"tableSource": {
"catalog": "polaris_data",
"schema": "modern",
"table": "knows"
},
"id": {
"fields": [
{
"type": "String",
"field": "id",
"alias": "id"
}
]
},
"fromId": {
"fields": [
{
"type": "String",
"field": "from_id",
"alias": "from_id"
}
]
},
"toId": {
"fields": [
{
"type": "String",
"field": "to_id",
"alias": "to_id"
}
]
},
"attributes": [
{
"type": "Double",
"field": "weight",
"alias": "weight"
}
]
}
]
}
}
Note
The schema uses the container hostnames polaris and polaris-minio. These names resolve correctly because PuppyGraph is running in the same Docker Compose network.
Upload the graph schema
You can upload the schema in either of the following ways.
Option 1: Web UI
Log into PuppyGraph Web UI at http://localhost:8081 with username
puppygraph and password puppygraph123.
- Open http://localhost:8081.
- Log in with username
puppygraphand passwordpuppygraph123. - Go to the schema page.
- Use the
Upload Graph Schema JSONsection to uploadschema.json.
Option 2: REST API
Alternatively, run the following command to upload the schema file:
curl -X POST \
-H "content-type: application/json" \
--data-binary @./schema.json \
--user "puppygraph:puppygraph123" \
http://localhost:8081/schema
After the upload succeeds, PuppyGraph will visualize the schema and make the graph queryable.
Querying the Graph
After the schema is loaded, open the PuppyGraph Dashboard to inspect the graph summary, or use the Query UI to explore the data interactively.
Run the following example Gremlin queries:
g.V().hasLabel('person').valueMap('name', 'age')
g.V().hasLabel('person').has('name', 'marko').out('knows').values('name')
g.V().hasLabel('person').has('name', 'josh').
outE('created').
project('software', 'weight').
by(inV().values('name')).
by(values('weight'))
With the sample data in this tutorial, the graph contains:
- 4
personvertices - 2
softwarevertices - 2
knowsedges - 4
creatededges
Cleaning up
Run the following command to stop and remove the containers, network, and volume created by this tutorial: