Querying Unity Catalog Data as a Graph
Summary
In this tutorial, you will:
- Start a Unity Catalog server;
- Create (Delta) tables locally under the catalog and load it with example data;
- Start a PuppyGraph Docker container and query the data as a graph.
Prerequisites
Please ensure that docker
is available. The installation can be verified by running:
See https://www.docker.com/get-started/ for more details on Docker.
Accessing the PuppyGraph Web UI requires a browser.
Unity Catalog Preparation
Starting Server
Build Unity Catalog from open source
Run the following command under the same unitycatalog
folder to start a Unity Catalog server at port 9000
Data Preparation
This tutorial is designed to be comprehensive and standalone, so it includes steps to populate local tables with Unity Catalog CLI.
Run the following shell script to populate tables and fill them with random data:
#!/bin/bash
unity_dir=`pwd`/unitycatalog
cli="${unity_dir}/bin/uc --server http://localhost:9000 "
${cli} catalog create --name puppygraph
${cli} schema create --name modern --catalog puppygraph
${cli} table create --full_name puppygraph.modern.person --columns "id STRING, name STRING, age INT" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/person/ --format DELTA
${cli} table create --full_name puppygraph.modern.knowns --columns "id STRING, from_id STRING, to_id STRING, weight DOUBLE" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/knowns/ --format DELTA
${cli} table create --full_name puppygraph.modern.software --columns "id STRING, name STRING, lang STRING" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/software/ --format DELTA
${cli} table create --full_name puppygraph.modern.created --columns "id STRING, from_id STRING, to_id STRING, weight DOUBLE" --storage_location file://${unity_dir}/etc/data/external/puppygraph/modern/created/ --format DELTA
${cli} table write --full_name puppygraph.modern.person
${cli} table write --full_name puppygraph.modern.knowns
${cli} table write --full_name puppygraph.modern.software
${cli} table write --full_name puppygraph.modern.created
In this demo, the random value generator in the unitycatalog
repository has been modified to produce a smaller range of values so that the values are interconnected to form a graph conceptually. The demo will be updated soon to use user-provided data.
The tables contain the following fields:
id | name | age |
---|---|---|
String | String | Int |
id | name | lang |
---|---|---|
String | String | String |
id | from_id | to_id | weight |
---|---|---|---|
String | String | String | Double |
id | from_id | to_id | weight |
---|---|---|---|
String | String | String | Double |
Starting PuppyGraph
Run the following command to start a PuppyGraph instance:
docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 \
-v $(pwd)/unitycatalog:$(pwd)/unitycatalog \
--name puppy --rm -itd puppygraph/puppygraph:stable
Modeling a Graph
We then define a graph on top of the data tables we just created. Actually, this graph has the same schema as "Modern" graph defined by Apache Tinkerpop. Meanwhile, it contains random data populated from the Unity Catalog Table CLI.
A schema instructs PuppyGraph on mapping Delta tables into a graph. PuppyGraph offers various methods for schema creation. For this tutorial, we've already prepared a schema to help save time.
Create a PuppyGraph schema file schema.json
with the following content:
schema.json
{
"catalogs": [
{
"name": "puppygraph",
"type": "deltalake",
"metastore": {
"type": "unity",
"host": "http://<unity-catalog-hostname>:9000",
"token": "test",
"databricksCatalogName": "puppygraph"
}
}
],
"graph": {
"vertices": [
{
"label": "person",
"oneToOne": {
"tableSource": {
"catalog": "puppygraph",
"schema": "modern",
"table": "person"
},
"id": {
"fields": [
{
"type": "String",
"field": "id",
"alias": "id"
}
]
},
"attributes": [
{
"type": "Int",
"field": "age",
"alias": "age"
},
{
"type": "String",
"field": "name",
"alias": "name"
}
]
}
},
{
"label": "software",
"oneToOne": {
"tableSource": {
"catalog": "puppygraph",
"schema": "modern",
"table": "software"
},
"id": {
"fields": [
{
"type": "String",
"field": "id",
"alias": "id"
}
]
},
"attributes": [
{
"type": "String",
"field": "lang",
"alias": "lang"
},
{
"type": "String",
"field": "name",
"alias": "name"
}
]
}
}
],
"edges": [
{
"label": "knowns",
"fromVertex": "person",
"toVertex": "person",
"tableSource": {
"catalog": "puppygraph",
"schema": "modern",
"table": "knowns"
},
"id": {
"fields": [
{
"type": "String",
"field": "id",
"alias": "id"
}
]
},
"fromId": {
"fields": [
{
"type": "String",
"field": "from_id",
"alias": "from_id"
}
]
},
"toId": {
"fields": [
{
"type": "String",
"field": "to_id",
"alias": "to_id"
}
]
},
"attributes": [
{
"type": "Double",
"field": "weight",
"alias": "weight"
}
]
},
{
"label": "created",
"fromVertex": "person",
"toVertex": "software",
"tableSource": {
"catalog": "puppygraph",
"schema": "modern",
"table": "created"
},
"id": {
"fields": [
{
"type": "String",
"field": "id",
"alias": "id"
}
]
},
"fromId": {
"fields": [
{
"type": "String",
"field": "from_id",
"alias": "from_id"
}
]
},
"toId": {
"fields": [
{
"type": "String",
"field": "to_id",
"alias": "to_id"
}
]
},
"attributes": [
{
"type": "Double",
"field": "weight",
"alias": "weight"
}
]
}
]
}
}
Replace the <unity-catalog-hostname>
by your machine host name or IP address.
Log into PuppyGraph Web UI at http://localhost:8081 with username and password of PuppyGraph.(default username: puppygraph
, default password: puppygraph123
)
Upload the schema by choosing the file schema.json
in the Upload Graph Schema JSON
block and clicking on Upload
.
Once the schema is uploaded, the schema page shows the visualized graph schema as follows.
Alternative: Schema Uploading via CLI
Alternatively, run the following command to upload the schema file after filing <unity-catalog-hostname>
:
curl -XPOST -H "content-type: application/json" --data-binary @./schema.json --user "puppygraph:puppygraph123" localhost:8081/schema
The response shows that graph schema has been uploaded successfully:
Querying the Graph
PuppyGraph provides a Dashboard that gives the summary of the graph.
One can also use the Interactive Query UI to further explore the graph by sending queries.
Cleaning up
Run the following command to shut down and remove the PuppyGraph
Also send Ctrl-C to stop the Unity Catalog Server.