Skip to content

Querying Polaris Data as a Graph

Summary

In this tutorial, you will:

  • Start an Apache Polaris server;
  • Create Apache Iceberg tables locally under the catalog and load it with example data;
  • Start a PuppyGraph Docker container and query the data as a graph.

Prerequisites

Docker

Docker is required to run the PuppyGraph server. You can download Docker from here.

Please ensure that docker compose is available. The installation can be verified by running:

docker compose version

JDK 21

JDK 21 is required to build the Polaris.

For Ubuntu, you can install it using the following commands:

sudo apt update
sudo apt install openjdk-21-jdk

Polaris Preparation

Starting Server

▶ Checkout the code from the Polaris repository.

git clone https://github.com/apache/polaris.git 
▶ Start the Polaris server.
cd polaris
./gradlew run -Dpolaris.bootstrap.credentials=POLARIS,root,s3cr3t

The Polaris server will start. We specified the bootstrap credentials POLARIS,root,s3cr3t in the command.

  • POLARIS is the realm
  • root is the CLIENT_ID
  • s3cr3t is the CLIENT_SECRET

▶ The credentials you need for connecting to PuppyGraph are root:s3cr3t (CLIENT_ID:CLIENT_SECRET format).

Data Preparation

▶ Start a different shell and run the following command in the polaris directory:

./regtests/run_spark_sql.sh
The script will download the required dependencies and start the Spark SQL shell. ▶ Then use the provided spark shell to create data catalog and prepare data.
CREATE DATABASE IF NOT EXISTS modern;

CREATE TABLE modern.person (id string, name string, age int) USING iceberg;
INSERT INTO modern.person VALUES
    ('v1', 'marko', 29),
    ('v2', 'vadas', 27),
    ('v4', 'josh', 32),
    ('v6', 'peter', 35);

CREATE TABLE modern.software (id string, name string, lang string) USING iceberg;
INSERT INTO modern.software VALUES
    ('v3', 'lop', 'java'),
    ('v5', 'ripple', 'java');

CREATE TABLE modern.created (id string, from_id string, to_id string, weight double) USING iceberg;
INSERT INTO modern.created VALUES
    ('e9', 'v1', 'v3', 0.4),
    ('e10', 'v4', 'v5', 1.0),
    ('e11', 'v4', 'v3', 0.4),
    ('e12', 'v6', 'v3', 0.2);

CREATE TABLE modern.knows (id string, from_id string, to_id string, weight double) USING iceberg;
INSERT INTO modern.knows VALUES
    ('e7', 'v1', 'v2', 0.5),
    ('e8', 'v1', 'v4', 1.0);

Starting PuppyGraph

▶ Start the PuppyGraph server with the following command. Note we change the exposed Gremlin port from 8182 to 8183 as Polaris also uses 8182.

docker run -p 8081:8081 -p 8183:8182 -p 7687:7687 -e PUPPYGRAPH_PASSWORD=puppygraph123 -v /tmp/polaris:/tmp/polaris --name puppy --rm -itd puppygraph/puppygraph:stable

Modeling the Graph

Connecting to Polaris

▶ Login onto PuppyGraph with puppygraph as the username and puppygraph123 as the password.

▶ Click on Create graph schema to create a new graph schema. Fill in the fields as follows.


Parameter Value
Catalog type Apache Iceberg
Catalog name Some name for the catalog as you like.
Metastore Type Iceberg-Rest
RestUri http://host.docker.internal:8181/api/catalog. On Linux, the IP for the host might be 172.17.0.1 if you do not add --add-host=host.docker.internal:host-gateway to the Docker run command.
Warehouse manual_spark. This was created by the run_spark_sql.sh script.
Credential root:s3cr3t (CLIENT_ID:CLIENT_SECRET format from the Polaris server output)
Scope PRINCIPAL_ROLE:ALL
Storage type Get from metastore

▶ Click on Submit to connect to the Polaris server.

Building the Graph Schema

▶ In the Schema Builder, select the modern database and add the first node type to the graph from the table person.


Adding Person Node

▶ After that use the Auto Suggestion to create other nodes and edges. Select person as the start node (vertex) and add the auto suggested nodes and edges.


Auto Suggestion

The graph schema should look like this:


Graph Schema

▶ Submit the schema to create the graph.

Querying the Graph

PuppyGraph provides a Dashboard that gives the summary of the graph.


One can also use the Interactive Query UI to further explore the graph by sending queries.


Cleaning up

▶ Run the following command to shut down and remove the PuppyGraph

docker stop puppy

▶ Also send Ctrl+C to stop the Polaris server as well as the Spark SQL shell.