Querying Unity Catalog Data as a Graph

A step-by-step tutorial to using PuppyGraph to query data in Unity Catalog

Summary

In this tutorial, you will:

  • Start a Unity Catalog server;

  • Create (Delta) tables locally under the catalog and load it with example data;

  • Start a PuppyGraph Docker container and query the data as a graph.

Prerequisites

Please ensure that docker is available. The installation can be verified by running:

docker version

See https://www.docker.com/get-started/ for more details on Docker.

Accessing the PuppyGraph Web UI requires a browser.

Unity Catalog Preparation

Starting Server

git clone https://github.com/unitycatalog/unitycatalog
cd unitycatalog
build/sbt package
./bin/start-uc-server -p 9000

Data Preparation

This tutorial is designed to be comprehensive and standalone, so it includes steps to populate local tables with Unity Catalog CLI.

#!/bin/bash

unity_dir=`pwd`/unitycatalog
cli="${unity_dir}/bin/uc --server http://localhost:9000 "

${cli} catalog create --name puppygraph
${cli} schema create --name modern --catalog puppygraph
${cli} table create --full_name puppygraph.modern.person --columns "id STRING, name STRING, age INT" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/person/ --format DELTA
${cli} table create --full_name puppygraph.modern.knowns --columns "id STRING, from_id STRING, to_id STRING, weight DOUBLE" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/knowns/ --format DELTA
${cli} table create --full_name puppygraph.modern.software --columns "id STRING, name STRING, lang STRING" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/software/ --format DELTA
${cli} table create --full_name puppygraph.modern.created --columns "id STRING, from_id STRING, to_id STRING, weight DOUBLE" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/created/ --format DELTA
${cli} table write --full_name puppygraph.modern.person
${cli} table write --full_name puppygraph.modern.knowns
${cli} table write --full_name puppygraph.modern.software
${cli} table write --full_name puppygraph.modern.created

In this demo, the random value generator in the unitycatalog repository has been modified to produce a smaller range of values so that the values are interconnected to form a graph conceptually. The demo will be updated soon to use user-provided data.

The tables contain the following fields:

Starting PuppyGraph

docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 \
-v $(pwd)/unitycatalog:/home/ubuntu/unitycatalog \
--name puppy --rm -itd puppygraph/puppygraph:stable

Modeling a Graph

We then define a graph on top of the data tables we just created. Actually, this graph has the same schema as "Modern" graph defined by Apache Tinkerpop. Meanwhile, it contains random data populated from the Unity Catalog Table CLI.

A schema instructs PuppyGraph on mapping Delta tables into a graph. PuppyGraph offers various methods for schema creation. For this tutorial, we've already prepared a schema to help save time.

schema.json
{
  "catalogs": [
    {
      "name": "puppygraph",
      "type": "deltalake",
      "metastore": {
        "type": "unity",
        "host": "http://<unity-catalog-hostname>:9000",
        "token": "test",
        "databricksCatalogName": "puppygraph"
      }
    }
  ],
  "vertices": [
    {
      "label": "person",
      "attributes": [
        {
          "type": "String",
          "name": "name"
        },
        {
          "type": "Int",
          "name": "age"
        }
      ],
      "mappedTableSource": {
        "catalog": "puppygraph",
        "schema": "modern",
        "table": "person",
        "metaFields": {
          "id": "id"
        }
      }
    },
    {
      "label": "software",
      "attributes": [
        {
          "type": "String",
          "name": "name"
        },
        {
          "type": "String",
          "name": "lang"
        }
      ],
      "mappedTableSource": {
        "catalog": "puppygraph",
        "schema": "modern",
        "table": "software",
        "metaFields": {
          "id": "id"
        }
      }
    }
  ],
  "edges": [
    {
      "label": "knowns",
      "from": "person",
      "to": "person",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ],
      "mappedTableSource": {
        "catalog": "puppygraph",
        "schema": "modern",
        "table": "knowns",
        "metaFields": {
          "from": "from_id",
          "id": "id",
          "to": "to_id"
        }
      }
    },
    {
      "label": "created",
      "from": "person",
      "to": "software",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ],
      "mappedTableSource": {
        "catalog": "puppygraph",
        "schema": "modern",
        "table": "created",
        "metaFields": {
          "from": "from_id",
          "id": "id",
          "to": "to_id"
        }
      }
    }
  ]
}

Once the schema is uploaded, the schema page shows the visualized graph schema as follows.

Alternative: Schema Uploading via CLI

curl -XPOST -H "content-type: application/json" --data-binary @./schema.json --user "puppygraph:puppygraph123" localhost:8081/schema

The response shows that graph schema has been uploaded successfully:

{"Status":"OK","Message":"Schema uploaded and gremlin server restarted"}

Querying the Graph

PuppyGraph provides a Dashboard that gives the summary of the graph.

One can also use the Interactive Query UI to further explore the graph by sending queries.

Cleaning up

docker stop puppy

Last updated