Querying Polaris Data as a Graph
Summary
In this tutorial, you will:
- Start an Apache Polaris server;
- Create Apache Iceberg tables locally under the catalog and load it with example data;
- Start a PuppyGraph Docker container and query the data as a graph.
Prerequisites
Docker
Docker is required to run the PuppyGraph server. You can download Docker from here.
Please ensure that docker compose
is available. The installation can be verified by running:
JDK 21
JDK 21 is required to build the Polaris.
For Ubuntu, you can install it using the following commands:
Polaris Preparation
Starting Server
Checkout the code from the Polaris repository.
Start the Polaris server.The Polaris server will start. Note the credentials for the Polaris server's output. The credentials are required to connect to the Polaris server later. The line contains the credentials will look like this:
Please take note of the credentials as it will be required later.Data Preparation
Start a different shell and run the following command in the polaris
directory:
CREATE DATABASE IF NOT EXISTS modern;
CREATE TABLE modern.person (id string, name string, age int) USING iceberg;
INSERT INTO modern.person VALUES
('v1', 'marko', 29),
('v2', 'vadas', 27),
('v4', 'josh', 32),
('v6', 'peter', 35);
CREATE TABLE modern.software (id string, name string, lang string) USING iceberg;
INSERT INTO modern.software VALUES
('v3', 'lop', 'java'),
('v5', 'ripple', 'java');
CREATE TABLE modern.created (id string, from_id string, to_id string, weight double) USING iceberg;
INSERT INTO modern.created VALUES
('e9', 'v1', 'v3', 0.4),
('e10', 'v4', 'v5', 1.0),
('e11', 'v4', 'v3', 0.4),
('e12', 'v6', 'v3', 0.2);
CREATE TABLE modern.knows (id string, from_id string, to_id string, weight double) USING iceberg;
INSERT INTO modern.knows VALUES
('e7', 'v1', 'v2', 0.5),
('e8', 'v1', 'v4', 1.0);
Starting PuppyGraph
Start the PuppyGraph server with the following command.
Note we change the exposed Gremlin port from 8182
to 8183
as Polaris also uses 8182
.
docker run -p 8081:8081 -p 8183:8182 -p 7687:7687 -v /tmp/polaris:/tmp/polaris --name puppy --rm -itd puppygraph/puppygraph-dev:latest
Modeling the Graph
Connecting to Polaris
Login onto PuppyGraph with puppygraph
as the username and puppygraph123
as the password.
Click on Create graph schema
to create a new graph schema.
Fill in the fields as follows.
Parameter | Value |
---|---|
Catalog type | Apache Iceberg |
Catalog name | Some name for the catalog as you like. |
Metastore Type | Iceberg-Rest |
RestUri | http://host.docker.internal:8181/api/catalog . On Linux, the IP for the host might be 172.17.0.1 if you do not add --add-host=host.docker.internal:host-gateway to the Docker run command. |
Warehouse | manual_spark . This was created by the run_spark_sql.sh script. |
Credential | Fill in the root principal credentials from the Polaris server's output. For example f6973789e5270e5d:dce8e8e53d8f770eb9804f22de923645 . |
Scope | PRINCIPAL_ROLE:ALL |
Storage type | Get from metastore |
Click on Submit
to connect to the Polaris server.
Building the Graph Schema
In the Schema Builder
, select the modern
database and add the first node type to the graph from the table person
.
After that use the Auto Suggestion to create other nodes and edges.
Select person
as the start vertex (node) and add the auto suggested nodes and edges.
The graph schema should look like this:
Submit the schema to create the graph.
Querying the Graph
PuppyGraph provides a Dashboard that gives the summary of the graph.
One can also use the Interactive Query UI to further explore the graph by sending queries.
Cleaning up
Run the following command to shut down and remove the PuppyGraph
Also send Ctrl+C to stop the Polaris server as well as the Spark SQL shell.