Skip to content

Querying OneLake Data as a Graph

Summary

In this tutorial, you will:

Prerequisites

Data Preparation

▶ Enable automatic table virtualization of Delta Lake tables to the Iceberg format by turning on the delegated OneLake setting named Enable Delta Lake to Apache Iceberg table format virtualization in your workspace settings.

▶ Create a schema-enabled lakehouse in your Fabric workspace.

▶ Create a notebook in the lakehouse, choose Spark SQL as the language, and run the following SQL commands to create the schema and tables, then insert data.

Spark SQL
CREATE SCHEMA modern;

CREATE TABLE modern.person (
  id string,
  name string,
  age int
) USING DELTA;

INSERT INTO modern.person VALUES
  ('v1', 'marko', 29),
  ('v2', 'vadas', 27),
  ('v4', 'josh', 32),
  ('v6', 'peter', 35);

CREATE TABLE modern.software (
  id string,
  name string,
  lang string
) USING DELTA;

INSERT INTO modern.software VALUES
  ('v3', 'lop', 'java'),
  ('v5', 'ripple', 'java');

CREATE TABLE modern.created (
  id string,
  from_id string,
  to_id string,
  weight double
) USING DELTA;

INSERT INTO modern.created VALUES
  ('e9', 'v1', 'v3', 0.4),
  ('e10', 'v4', 'v5', 1.0),
  ('e11', 'v4', 'v3', 0.4),
  ('e12', 'v6', 'v3', 0.2);

CREATE TABLE modern.knows (
  id string,
  from_id string,
  to_id string,
  weight double
) USING DELTA;

INSERT INTO modern.knows VALUES
  ('e7', 'v1', 'v2', 0.5),
  ('e8', 'v1', 'v4', 1.0);

You should now see the tables created in the lakehouse under the modern schema. Confirm that your Delta Lake table has converted successfully to the virtual Iceberg format. You can do this by examining the directory behind the table. See more details in virtualizing Delta Lake tables as Iceberg.

The above SQL creates the following tables:

id name age
v1 marko 29
v2 vadas 27
v4 josh 32
v6 peter 35
id name lang
v3 lop java
v5 ripple java
id from_id to_id weight
e7 v1 v2 0.5
e8 v1 v4 1.0
id from_id to_id weight
e9 v1 v3 0.4
e10 v4 v5 1.0
e11 v4 v3 0.4
e12 v6 v3 0.2

Deployment

▶ Start a PuppyGraph docker container:

docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 -e PUPPYGRAPH_PASSWORD=puppygraph123  -e QUERY_TIMEOUT=5m -d --name puppy --rm --pull=always puppygraph/puppygraph:0.101

▶ Log into the PuppyGraph Web UI at http://localhost:8081 with puppygraph as the username and puppygraph123 as the password.

Modeling a Graph

We then define a graph on top of the data tables we just created. Actually, this is the "Modern" graph defined by Apache Tinkerpop.

Modern Graph

Modern Graph

PuppyGraph defines graph modeling by using a schema. There are two methods to model the graph:

  1. Use the graph schema builder to build the schema in the Web UI.
    • Click on Create graph schema to create a new graph schema.
    • Configure the data source.
    • Map data to graph step by step.
  2. Compose a schema JSON file and upload it.
    • Replace the placeholders in the file schema.json below.
    • Choose the file and upload it.

▶ In this tutorial we use the second method, while for the first method, you essentially fill in the same information. Create a file named schema.json with the following content. Replace the placeholders with your actual values to configure the connection using the Iceberg REST Catalog via OneLake Table APIs for Iceberg:

Placeholder Description
<fabric_workspace_id> The ID of your Microsoft Fabric workspace
<fabric_data_item_id> The ID of the lakehouse data item
<client_id> The client ID of the service principal
<client_secret> The client secret of the service principal
<tenant_id> Your Microsoft Entra ID tenant ID - see endpoint
schema.json
{
  "catalogs": [
    {
      "name": "test_onelake_table_api",
      "type": "iceberg",
      "metastore": {
        "type": "rest",
        "uri": "https://onelake.table.fabric.microsoft.com/iceberg",
        "warehouse": "<fabric_workspace_id>/<fabric_data_item_id>",
        "security": "OAUTH2",
        "credential": "<client_id>:<client_secret>",
        "enableIcebergMetaCache": "true",
        "icebergMetaCacheTTL": "180",
        "scope": "https://storage.azure.com/.default",
        "oauthServerUri": "https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token"
      },
      "storage": {
        "type": "AzureDLS2",
        "clientId": "<client_id>",
        "clientSecret": "<client_secret>",
        "clientEndpoint": "https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token"
      }
    }
  ],
  "graph": {
    "vertices": [
      {
        "label": "person",
        "oneToOne": {
          "tableSource": {
            "catalog": "test_onelake_table_api",
            "schema": "modern",
            "table": "person"
          },
          "id": {
            "fields": [
              {
                "type": "STRING",
                "field": "id",
                "alias": "puppy_id_id"
              }
            ]
          },
          "attributes": [
            {
              "type": "STRING",
              "field": "name",
              "alias": "name"
            }
          ]
        }
      },
      {
        "label": "software",
        "oneToOne": {
          "tableSource": {
            "catalog": "test_onelake_table_api",
            "schema": "modern",
            "table": "software"
          },
          "id": {
            "fields": [
              {
                "type": "STRING",
                "field": "id",
                "alias": "puppy_id_id"
              }
            ]
          },
          "attributes": [
            {
              "type": "STRING",
              "field": "name",
              "alias": "name"
            },
            {
              "type": "STRING",
              "field": "lang",
              "alias": "lang"
            }
          ]
        }
      }
    ],
    "edges": [
      {
        "label": "created",
        "fromVertex": "person",
        "toVertex": "software",
        "tableSource": {
          "catalog": "test_onelake_table_api",
          "schema": "modern",
          "table": "created"
        },
        "id": {
          "fields": [
            {
              "type": "STRING",
              "field": "id",
              "alias": "puppy_id_id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "STRING",
              "field": "from_id",
              "alias": "puppy_from_from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "STRING",
              "field": "to_id",
              "alias": "puppy_to_to_id"
            }
          ]
        }
      },
      {
        "label": "knows",
        "fromVertex": "person",
        "toVertex": "person",
        "tableSource": {
          "catalog": "test_onelake_table_api",
          "schema": "modern",
          "table": "knows"
        },
        "id": {
          "fields": [
            {
              "type": "STRING",
              "field": "id",
              "alias": "puppy_id_id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "STRING",
              "field": "from_id",
              "alias": "puppy_from_from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "STRING",
              "field": "id",
              "alias": "puppy_to_id"
            }
          ]
        }
      }
    ]
  }
}

▶ Log into PuppyGraph Web UI at http://localhost:8081 with username puppygraph and password puppygraph123.

▶ Upload the schema by selecting the file schema.json in the Upload Graph Schema JSON block and clicking on Upload.

Once the schema is uploaded, the schema page shows the visualized graph schema.

Alternative: Schema Uploading via CLI

▶ Alternatively, run the following command to upload the schema file:

curl -XPOST -H "content-type: application/json" --data-binary @./schema.json --user "puppygraph:puppygraph123" localhost:8081/schema

The response shows that graph schema has been uploaded successfully:

{"Status":"OK","Message":"Schema uploaded and gremlin server restarted"}

Query the Graph

▶ Navigate to Query in the Web UI. Use Graph Query for Gremlin/openCypher queries with visualization.

Example Queries

  • Retrieve a vertex named 'marko'.

    Gremlin:

    g.V().has("name", "marko").valueMap()
    

    openCypher:

    MATCH (v {name: 'marko'}) RETURN v
    
  • Retrieve the paths from "marko" to the software created by those whom "marko" knows.

    Gremlin:

    g.V().has("name", "marko")
    .out("knows").out("created").path()
    

    openCypher:

    MATCH p=(v {name: 'marko'})-[:knows]->()-[:created]->()
    RETURN p
    

Cleanup

▶ To stop and remove the PuppyGraph Docker container, run:

docker stop puppy