Querying DuckDB Data as a Graph
A step-by-step tutorial to using PuppyGraph to query data in DuckDB
Last updated
A step-by-step tutorial to using PuppyGraph to query data in DuckDB
Last updated
In this tutorial, you will:
Create a DuckDB database and load it with example data;
Start a PuppyGraph Docker container and query the DuckDB data as a graph.
Please ensure that docker compose
is available. The installation can be verified by running:
See https://docs.docker.com/compose/install/ for Docker Compose installation instructions and https://www.docker.com/get-started/ for more details on Docker.
Accessing the PuppyGraph Web UI requires a browser. However, the tutorial offers alternative instructions for those who wish to exclusively use the CLI.
Create a file docker-compose.yaml
with the following content:
This tutorial is designed to be comprehensive and standalone, so it includes steps to populate data in DuckDB. In practical scenarios, PuppyGraph can query data directly from your existing DuckDB databases.
This command also creates a database file /home/share/demo.db
.
The above SQL creates the following tables:
We then define a graph on top of the data tables we just created. Actually, this is the "Modern" graph defined by Apache Tinkerpop.
A schema instructs PuppyGraph on mapping data from the DuckDB into a graph. PuppyGraph offers various methods for schema creation. For this tutorial, we've already prepared a schema to help save time.
Once the schema is uploaded, the schema page shows the visualized graph schema as follows.
The response shows that graph schema has been uploaded successfully:
In this tutorial we will use the Gremlin query language to query the Graph. Gremlin is a graph query language developed by Apache TinkerPop. Prior knowledge of Gremlin is not necessary to follow the tutorial. To learn more about it, visit https://tinkerpop.apache.org/gremlin.html.
Queries are entered on the left side, and the right side displays the graph visualization.
The first query retrieves the property of the person named "marko".
The output is plain text like the following:
Now let's also leverage the visualization. The next query gets all the software created by people known to "marko".
The output is as follows. There are two paths in the result as "marko" knows "josh" who created "lop" and "ripple".
Alternatively, we can query the graph via CLI.
The welcome screen appears as follows:
Properties of the person named "marko":
Then run the following command to start DuckDB and PuppyGraph:
Run the following command to start a DuckDB shell:
Then execute the following SQL statements in the shell to create tables and insert data.
id | name | age |
---|---|---|
id | name | lang |
---|---|---|
id | from_id | to_id | weight |
---|---|---|---|
id | from_id | to_id | weight |
---|---|---|---|
Type .exit
to quit the DuckDB shell.
Create a PuppyGraph schema file schema.json
with the following content:
Log into PuppyGraph Web UI at http://localhost:8081 with username puppygraph
and password puppygraph123
.
Upload the schema by selecting the file schema.json
in the Upload Graph Schema JSON
block and clicking on Upload
.
Alternatively, run the following command to upload the schema file:
Click on the Query
panel the left side. The Gremlin Query tab offers an interactive environment for querying the graph using Gremlin.
Copy the following query, paste it in the query input, and click on the run button.
Copy the following query, paste it in the query input, and click on the run button.
Execute the following command to access the PuppyGraph Gremlin CLI
Run the following queries in the console to query the Graph.
To exit PuppyGraph Gremlin Console, enter the command:
Run the following command to shut down and remove the containers:
Remove the data volume if needed:
v1
marko
29
v2
vadas
27
v4
josh
32
v6
peter
35
v3
lop
java
v5
ripple
java
e7
v1
v2
0.5
e8
v1
v4
1.0
e9
v1
v3
0.4
e10
v4
v5
1.0
e11
v4
v3
0.4
e12
v6
v3
0.2