Skip to content

Querying Polaris Data as a Graph

Summary

In this tutorial, you will:

  • Start an Apache Polaris server;
  • Create Apache Iceberg tables locally under the catalog and load it with example data;
  • Start a PuppyGraph Docker container and query the data as a graph.

Prerequisites

Docker

Docker is required to run the PuppyGraph server. You can download Docker from here.

Please ensure that docker compose is available. The installation can be verified by running:

docker compose version

JDK 21

JDK 21 is required to build the Polaris.

For Ubuntu, you can install it using the following commands:

sudo apt update
sudo apt install openjdk-21-jdk

Polaris Preparation

Starting Server

▶ Checkout the code from the Polaris repository.

git clone https://github.com/apache/polaris.git 
▶ Start the Polaris server.
cd polaris
./gradlew runApp

The Polaris server will start. Note the credentials for the Polaris server's output. The credentials are required to connect to the Polaris server later. The line contains the credentials will look like this:

realm: default-realm root principal credentials: f6973789e5270e5d:dce8e8e53d8f770eb9804f22de923645
▶ Please take note of the credentials as it will be required later.

Data Preparation

▶ Start a different shell and run the following command in the polaris directory:

./regtests/run_spark_sql.sh
The script will download the required dependencies and start the Spark SQL shell. ▶ Then use the provided spark shell to create data catalog and prepare data.
CREATE DATABASE IF NOT EXISTS modern;

CREATE TABLE modern.person (id string, name string, age int) USING iceberg;
INSERT INTO modern.person VALUES
    ('v1', 'marko', 29),
    ('v2', 'vadas', 27),
    ('v4', 'josh', 32),
    ('v6', 'peter', 35);

CREATE TABLE modern.software (id string, name string, lang string) USING iceberg;
INSERT INTO modern.software VALUES
    ('v3', 'lop', 'java'),
    ('v5', 'ripple', 'java');

CREATE TABLE modern.created (id string, from_id string, to_id string, weight double) USING iceberg;
INSERT INTO modern.created VALUES
    ('e9', 'v1', 'v3', 0.4),
    ('e10', 'v4', 'v5', 1.0),
    ('e11', 'v4', 'v3', 0.4),
    ('e12', 'v6', 'v3', 0.2);

CREATE TABLE modern.knows (id string, from_id string, to_id string, weight double) USING iceberg;
INSERT INTO modern.knows VALUES
    ('e7', 'v1', 'v2', 0.5),
    ('e8', 'v1', 'v4', 1.0);

Starting PuppyGraph

▶ Start the PuppyGraph server with the following command. Note we change the exposed Gremlin port from 8182 to 8183 as Polaris also uses 8182.

docker run -p 8081:8081 -p 8183:8182 -p 7687:7687 -v /tmp/polaris:/tmp/polaris --name puppy --rm -itd puppygraph/puppygraph-dev:latest

Modeling the Graph

Connecting to Polaris

▶ Login onto PuppyGraph with puppygraph as the username and puppygraph123 as the password.

▶ Click on Create graph schema to create a new graph schema. Fill in the fields as follows.

Parameter Value
Catalog type Apache Iceberg
Catalog name Some name for the catalog as you like.
Metastore Type Iceberg-Rest
RestUri http://host.docker.internal:8181/api/catalog. On Linux, the IP for the host might be 172.17.0.1 if you do not add --add-host=host.docker.internal:host-gateway to the Docker run command.
Warehouse manual_spark. This was created by the run_spark_sql.sh script.
Credential Fill in the root principal credentials from the Polaris server's output. For example f6973789e5270e5d:dce8e8e53d8f770eb9804f22de923645.
Scope PRINCIPAL_ROLE:ALL
Storage type Get from metastore

▶ Click on Submit to connect to the Polaris server.

Building the Graph Schema

▶ In the Schema Builder, select the modern database and add the first node type to the graph from the table person.

Adding Person Node

▶ After that use the Auto Suggestion to create other nodes and edges. Select person as the start vertex (node) and add the auto suggested nodes and edges.

Auto Suggestion

The graph schema should look like this:

Graph Schema

▶ Submit the schema to create the graph.

Querying the Graph

PuppyGraph provides a Dashboard that gives the summary of the graph.

One can also use the Interactive Query UI to further explore the graph by sending queries.

Cleaning up

▶ Run the following command to shut down and remove the PuppyGraph

docker stop puppy

▶ Also send Ctrl+C to stop the Polaris server as well as the Spark SQL shell.