Connecting to DuckDB

It is also possible to query existing data on DuckDB directly without loading it to PuppyGraph. Here is an demo of the feature.

In the demo, the DuckDB data source stores people and referral information. To query the data as a graph, we model people as vertices and the referral relationship between people as edges.

Prerequisites

The demo assumes that PuppyGraph has been deployed at localhost according to the instruction in Launching PuppyGraph from AWS Marketplace or Launching PuppyGraph in Docker. For Docker deployment, in order to read DuckDB file, we must use volumn to share data cross containers.

In this demo, we use the username puppygraph and password puppygraph123.

For docker deployment, please follow these steps

docker volume rm duckdb-data
docker volume create duckdb-data
docker pull puppygraph/puppygraph:stable
docker run -p 8080:8080 -p 8081:8081 -p 8182:8182 -v duckdb-data:/home/share -d --name puppy --rm puppygraph/puppygraph:stable

Data Preparation (Optional)

The demo uses people and referral information as shown above.

The following command starts a DuckDB container through Docker and writes data to DuckDB. If using AMI Version, we can download DuckDB client from DuckDB website.

docker run --name duckdb -v duckdb-data:/home/share -d --restart=always puppygraph/duckdb-ubuntu:latest

After waiting for the DuckDB container to start, run follow command to start DuckDB interactive shell.

docker exec -it duckdb duckdb /home/share/demo.db

Now we can create a database and data table, and then write the data to DuckDB.

CREATE SCHEMA graph;
CREATE TABLE graph.person
(
    ID   VARCHAR,
    age  INTEGER,
    name VARCHAR
);
INSERT INTO graph.person(ID, age, name) VALUES  ('v1', 29, 'marko'), ('v2', 27, 'vadas');
CREATE TABLE graph.referral
(
    refId    VARCHAR,
    source   VARCHAR,
    referred VARCHAR,
    weight   DOUBLE
);
INSERT INTO graph.referral(refId, source, referred, weight) VALUES ('e1', 'v1', 'v2', 0.5);

Finally, type .exit to close the DuckDB client.

After writing the data to duckdb, user must stop DuckDB interactive shell before next steps, or other program such as PuppyGraph will fail to read data from the demo database.

Upload the schema

Now the data are ready in DuckDB. We need a PuppyGraph schema before querying it. Let's create a schema file duckdb.json:

duckdb.json
{
  "catalogs": [
    {
      "name": "jdbc_duckdb",
      "type": "duckdb",
      "jdbc": {
        "jdbcUri": "jdbc:duckdb:/home/share/demo.db",
        "driverClass": "org.duckdb.DuckDBDriver",
        "driverUrl": "https://repo1.maven.org/maven2/org/duckdb/duckdb_jdbc/0.9.1/duckdb_jdbc-0.9.1.jar"
      }
    }
  ],
  "vertices": [
    {
      "label": "person",
      "mappedTableSource": {
        "catalog": "jdbc_duckdb",
        "schema": "graph",
        "table": "person",
        "metaFields": {
          "id": "ID"
        }
      },
      "attributes": [
        {
          "type": "Int",
          "name": "age"
        },
        {
          "type": "String",
          "name": "name"
        }
      ]
    }
  ],
  "edges": [
    {
      "label": "knows",
      "mappedTableSource": {
        "catalog": "jdbc_duckdb",
        "schema": "graph",
        "table": "referral",
        "metaFields": {
          "id": "refId",
          "from": "source",
          "to": "referred"
        }
      },
      "from": "person",
      "to": "person",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ]
    }
  ]
}

Here are some notes on this schema:

  1. A catalog jdbc_duckdb is added to specify the remote data source in DuckDB.

    • Set type to duckdb.

    • Set driverClass to org.duckdb.DuckDBDriver.

    • driverUrl: Please provide a URL where PuppyGraph can find the DuckDB driver.

  2. The label of the vertices and edges do not have to be the same as the names of corresponding tables in DuckDB. There is a mappedTableSource field in each of the vertex and edge types specifying the actual schema (graph) and table (referral).

  3. Additionally, the mappedTableSource marks meta columns in the tables. For example, the fieldsfrom and to describe which columns in the table form the endpoints of edges.

Now we can upload the schema file duckdb.json to PuppyGraph with the following shell command, assuming that the PuppyGraph is running on localhost:

curl -XPOST -H "content-type: application/json" --data-binary @./duckdb.json --user "puppygraph:puppygraph123" localhost:8081/schema

Query the data

Connecting to PuppyGraph at http://localhost:8081 and start gremlin console from the "Query" section:

[PuppyGraph]> console
         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph

Now we have connected to the Gremlin Console. We can query the graph:

gremlin> g.V().hasLabel("person").out("knows").values("name")
==>vadas

Last updated