Querying Unity Catalog Data as a Graph

A step-by-step tutorial to using PuppyGraph to query data in Unity Catalog

Summary

In this tutorial, you will:

  • Start a Unity Catalog server;

  • Create (Delta) tables locally under the catalog and load it with example data;

  • Start a PuppyGraph Docker container and query the data as a graph.

Prerequisites

Please ensure that docker is available. The installation can be verified by running:

docker version

See https://www.docker.com/get-started/ for more details on Docker.

Accessing the PuppyGraph Web UI requires a browser.

Unity Catalog Preparation

Starting Server

▶️ Build Unity Catalog from open source

git clone https://github.com/unitycatalog/unitycatalog
cd unitycatalog
build/sbt package

▶️ Run the following command under the same unitycatalog folder to start a Unity Catalog server at port 9000

./bin/start-uc-server -p 9000

Data Preparation

This tutorial is designed to be comprehensive and standalone, so it includes steps to populate local tables with Unity Catalog CLI.

▶️ Run the following shell script to populate tables and fill them with random data:

#!/bin/bash

unity_dir=`pwd`/unitycatalog
cli="${unity_dir}/bin/uc --server http://localhost:9000 "

${cli} catalog create --name puppygraph
${cli} schema create --name modern --catalog puppygraph
${cli} table create --full_name puppygraph.modern.person --columns "id STRING, name STRING, age INT" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/person/ --format DELTA
${cli} table create --full_name puppygraph.modern.knowns --columns "id STRING, from_id STRING, to_id STRING, weight DOUBLE" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/knowns/ --format DELTA
${cli} table create --full_name puppygraph.modern.software --columns "id STRING, name STRING, lang STRING" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/software/ --format DELTA
${cli} table create --full_name puppygraph.modern.created --columns "id STRING, from_id STRING, to_id STRING, weight DOUBLE" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/created/ --format DELTA
${cli} table write --full_name puppygraph.modern.person
${cli} table write --full_name puppygraph.modern.knowns
${cli} table write --full_name puppygraph.modern.software
${cli} table write --full_name puppygraph.modern.created

In this demo, the random value generator in the unitycatalog repository has been modified to produce a smaller range of values so that the values are interconnected to form a graph conceptually. The demo will be updated soon to use user-provided data.

The tables contain the following fields:

idnameage

String

String

Int

Starting PuppyGraph

▶️ Run the following command to start a PuppyGraph instance:

docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 \
-v $(pwd)/unitycatalog:$(pwd)/unitycatalog \
--name puppy --rm -itd puppygraph/puppygraph:stable

Modeling a Graph

We then define a graph on top of the data tables we just created. Actually, this graph has the same schema as "Modern" graph defined by Apache Tinkerpop. Meanwhile, it contains random data populated from the Unity Catalog Table CLI.

A schema instructs PuppyGraph on mapping Delta tables into a graph. PuppyGraph offers various methods for schema creation. For this tutorial, we've already prepared a schema to help save time.

▶️ Create a PuppyGraph schema file schema.json with the following content:

schema.json
{
  "catalogs": [
    {
      "name": "puppygraph",
      "type": "deltalake",
      "metastore": {
        "type": "unity",
        "host": "http://<unity-catalog-hostname>:9000",
        "token": "test",
        "databricksCatalogName": "puppygraph"
      }
    }
  ],
  "graph": {
    "vertices": [
      {
        "label": "person",
        "oneToOne": {
          "tableSource": {
            "catalog": "puppygraph",
            "schema": "modern",
            "table": "person"
          },
          "id": {
            "fields": [
              {
                "type": "String",
                "field": "id",
                "alias": "id"
              }
            ]
          },
          "attributes": [
            {
              "type": "Int",
              "field": "age",
              "alias": "age"
            },
            {
              "type": "String",
              "field": "name",
              "alias": "name"
            }
          ]
        }
      },
      {
        "label": "software",
        "oneToOne": {
          "tableSource": {
            "catalog": "puppygraph",
            "schema": "modern",
            "table": "software"
          },
          "id": {
            "fields": [
              {
                "type": "String",
                "field": "id",
                "alias": "id"
              }
            ]
          },
          "attributes": [
            {
              "type": "String",
              "field": "lang",
              "alias": "lang"
            },
            {
              "type": "String",
              "field": "name",
              "alias": "name"
            }
          ]
        }
      }
    ],
    "edges": [
      {
        "label": "knowns",
        "fromVertex": "person",
        "toVertex": "person",
        "tableSource": {
          "catalog": "puppygraph",
          "schema": "modern",
          "table": "knowns"
        },
        "id": {
          "fields": [
            {
              "type": "String",
              "field": "id",
              "alias": "id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "String",
              "field": "from_id",
              "alias": "from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "String",
              "field": "to_id",
              "alias": "to_id"
            }
          ]
        },
        "attributes": [
          {
            "type": "Double",
            "field": "weight",
            "alias": "weight"
          }
        ]
      },
      {
        "label": "created",
        "fromVertex": "person",
        "toVertex": "software",
        "tableSource": {
          "catalog": "puppygraph",
          "schema": "modern",
          "table": "created"
        },
        "id": {
          "fields": [
            {
              "type": "String",
              "field": "id",
              "alias": "id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "String",
              "field": "from_id",
              "alias": "from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "String",
              "field": "to_id",
              "alias": "to_id"
            }
          ]
        },
        "attributes": [
          {
            "type": "Double",
            "field": "weight",
            "alias": "weight"
          }
        ]
      }
    ]
  }
}

▶️ Replace the <unity-catalog-hostname> by your machine host name or IP address.

▶️ Log into PuppyGraph Web UI at http://localhost:8081 with username and password of PuppyGraph.(default username: puppygraph, default password: puppygraph123)

▶️ Upload the schema by choosing the file schema.json in the Upload Graph Schema JSON block and clicking on Upload.

Once the schema is uploaded, the schema page shows the visualized graph schema as follows.

Alternative: Schema Uploading via CLI

▶️ Alternatively, run the following command to upload the schema file after filing <unity-catalog-hostname>:

curl -XPOST -H "content-type: application/json" --data-binary @./schema.json --user "puppygraph:puppygraph123" localhost:8081/schema

The response shows that graph schema has been uploaded successfully:

{"Status":"OK","Message":"Schema uploaded and gremlin server restarted"}

Querying the Graph

PuppyGraph provides a Dashboard that gives the summary of the graph.

One can also use the Interactive Query UI to further explore the graph by sending queries.

Cleaning up

▶️ Run the following command to shut down and remove the PuppyGraph

docker stop puppy

▶️ Also send Ctrl-C to stop the Unity Catalog Server.

Last updated