Querying DuckDB Data as a Graph

A step-by-step tutorial to using PuppyGraph to query data in DuckDB

Summary

In this tutorial, you will:

  • Create a DuckDB database and load it with example data;

  • Start a PuppyGraph Docker container and query the DuckDB data as a graph.

Prerequisites

Please ensure that docker compose is available. The installation can be verified by running:

docker compose version

See https://docs.docker.com/compose/install/ for Docker Compose installation instructions and https://www.docker.com/get-started/ for more details on Docker.

Accessing the PuppyGraph Web UI requires a browser. However, the tutorial offers alternative instructions for those who wish to exclusively use the CLI.

Deployment

docker-compose.yaml
version: "3"

services:
  puppygraph:
    image: puppygraph/puppygraph:stable
    pull_policy: always
    container_name: puppygraph
    environment:
      - PUPPYGRAPH_USERNAME=puppygraph
      - PUPPYGRAPH_PASSWORD=puppygraph123
    networks:
      duckdb_net:
    ports:
      - "8081:8081"
      - "8182:8182"
      - "7687:7687"
    volumes:
      - share_content:/home/share
  duckdb:
    image: puppygraph/duckdb-ubuntu:latest
    container_name: duckdb
    networks:
      duckdb_net:
    volumes:
      - share_content:/home/share
networks:
  duckdb_net:
    name: puppy-duckdb
volumes:
  share_content:
    name: puppy-duckdb
docker compose up -d
[+] Running 1/1
 ✔ puppygraph Pulled                                                                                                                                                                                                                                                              2.0s
[+] Running 4/4
 ✔ Network puppy-duckdb   Created                                                                                                                                                                                                                                                 0.1s
 ✔ Volume "puppy-duckdb"  Created                                                                                                                                                                                                                                                 0.0s
 ✔ Container puppygraph   Started                                                                                                                                                                                                                                                 0.1s
 ✔ Container duckdb       Started

Data Preparation

This tutorial is designed to be comprehensive and standalone, so it includes steps to populate data in DuckDB. In practical scenarios, PuppyGraph can query data directly from your existing DuckDB databases.

docker exec -it duckdb duckdb /home/share/demo.db

This command also creates a database file /home/share/demo.db.

v0.9.2 3c695d7ba9
Enter ".help" for usage hints.
D
create schema modern;
create table modern.person (id text, name text, age integer);
insert into modern.person values
                              ('v1', 'marko', 29),
                              ('v2', 'vadas', 27),
                              ('v4', 'josh', 32),
                              ('v6', 'peter', 35);

create table modern.software (id text, name text, lang text);
insert into modern.software values
                                ('v3', 'lop', 'java'),
                                ('v5', 'ripple', 'java');

create table modern.created (id text, from_id text, to_id text, weight double precision);
insert into modern.created values
                               ('e9', 'v1', 'v3', 0.4),
                               ('e10', 'v4', 'v5', 1.0),
                               ('e11', 'v4', 'v3', 0.4),
                               ('e12', 'v6', 'v3', 0.2);

create table modern.knows (id text, from_id text, to_id text, weight double precision);
insert into modern.knows values
                             ('e7', 'v1', 'v2', 0.5),
                             ('e8', 'v1', 'v4', 1.0);

The above SQL creates the following tables:

Modeling a Graph

We then define a graph on top of the data tables we just created. Actually, this is the "Modern" graph defined by Apache Tinkerpop.

A schema instructs PuppyGraph on mapping data from the DuckDB into a graph. PuppyGraph offers various methods for schema creation. For this tutorial, we've already prepared a schema to help save time.

schema.json
{
  "catalogs": [
    {
      "name": "duckdb_data",
      "type": "duckdb",
      "jdbc": {
        "jdbcUri": "jdbc:duckdb:/home/share/demo.db",
        "driverClass": "org.duckdb.DuckDBDriver"
      }
    }
  ],
  "vertices": [
    {
      "label": "person",
      "mappedTableSource": {
        "catalog": "duckdb_data",
        "schema": "modern",
        "table": "person",
        "metaFields": {
          "id": "id"
        }
      },
      "attributes": [
        {
          "type": "Int",
          "name": "age"
        },
        {
          "type": "String",
          "name": "name"
        }
      ]
    },
    {
      "label": "software",
      "mappedTableSource": {
        "catalog": "duckdb_data",
        "schema": "modern",
        "table": "software",
        "metaFields": {
          "id": "id"
        }
      },
      "attributes": [
        {
          "type": "String",
          "name": "lang"
        },
        {
          "type": "String",
          "name": "name"
        }
      ]
    }
  ],
  "edges": [
    {
      "label": "knows",
      "mappedTableSource": {
        "catalog": "duckdb_data",
        "schema": "modern",
        "table": "knows",
        "metaFields": {
          "id": "id",
          "from": "from_id",
          "to": "to_id"
        }
      },
      "from": "person",
      "to": "person",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ]
    },
    {
      "label": "created",
      "mappedTableSource": {
        "catalog": "duckdb_data",
        "schema": "modern",
        "table": "created",
        "metaFields": {
          "id": "id",
          "from": "from_id",
          "to": "to_id"
        }
      },
      "from": "person",
      "to": "software",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ]
    }
  ]
}

Once the schema is uploaded, the schema page shows the visualized graph schema as follows.

Alternative: Schema Uploading via CLI

curl -XPOST -H "content-type: application/json" --data-binary @./schema.json --user "puppygraph:puppygraph123" localhost:8081/schema

The response shows that graph schema has been uploaded successfully:

{"Status":"OK","Message":"Schema uploaded and gremlin server restarted"}

Querying the Graph

In this tutorial we will use the Gremlin query language to query the Graph. Gremlin is a graph query language developed by Apache TinkerPop. Prior knowledge of Gremlin is not necessary to follow the tutorial. To learn more about it, visit https://tinkerpop.apache.org/gremlin.html.

Queries are entered on the left side, and the right side displays the graph visualization.

The first query retrieves the property of the person named "marko".

g.V().has("name", "marko").valueMap()

The output is plain text like the following:

Rows: 1
age              29
name             marko

Now let's also leverage the visualization. The next query gets all the software created by people known to "marko".

g.V().has("name", "marko")
  .out("knows").out("created").path()

The output is as follows. There are two paths in the result as "marko" knows "josh" who created "lop" and "ripple".

Alternative: Querying the graph via CLI

Alternatively, we can query the graph via CLI.

docker exec -it puppygraph ./bin/console

The welcome screen appears as follows:

  ____                                     ____                          _
 |  _ \   _   _   _ __    _ __    _   _   / ___|  _ __    __ _   _ __   | |__
 | |_) | | | | | | '_ \  | '_ \  | | | | | |  _  | '__|  / _` | | '_ \  | '_ \
 |  __/  | |_| | | |_) | | |_) | | |_| | | |_| | | |    | (_| | | |_) | | | | |
 |_|      \__,_| | .__/  | .__/   \__, |  \____| |_|     \__,_| | .__/  |_| |_|
                 |_|     |_|      |___/                         |_|
Welcome to PuppyGraph!
version: 0.10

puppy-gremlin> 
g.V().has("name", "marko").valueMap()

Properties of the person named "marko":

puppy-gremlin> g.V().has("name", "marko").valueMap()
Done! Elapsed time: 0.059s, rows: 1
==>map[age:29 name:marko]
:exit

Cleaning up

docker compose down
docker volume rm puppy-duckdb

Last updated