Querying OneLake Data as a Graph

Summary

In this tutorial, you will:

Create a lakehouse in Microsoft OneLake and load example data;
Deploy PuppyGraph via Docker and query OneLake data as a graph via OneLake table APIs for Iceberg.

Prerequisites

OneLake
- A Microsoft Fabric workspace and a user account with at least the Contributor role in that workspace.
- A service principal in Microsoft Entra ID with permissions to read tables in the lakehouse to create later. For more details on setting up authentication, please refer to preparing for authentication.
It is also recommended to read the following official documents to get familiar with OneLake table APIs for Iceberg and lakehouse:
Docker

Please ensure that docker is available. The installation can be verified by running:
```
docker version
```
See https://www.docker.com/get-started/ for more details on Docker.

Data Preparation

Enable automatic table virtualization of Delta Lake tables to the Iceberg format by turning on the delegated OneLake setting named Enable Delta Lake to Apache Iceberg table format virtualization in your workspace settings.

Create a schema-enabled lakehouse in your Fabric workspace.

Create a notebook in the lakehouse, choose Spark SQL as the language, and run the following SQL commands to create the schema and tables, then insert data.

Spark SQL

CREATE SCHEMA modern;

CREATE TABLE modern.person (
  id string,
  name string,
  age int
) USING DELTA;

INSERT INTO modern.person VALUES
  ('v1', 'marko', 29),
  ('v2', 'vadas', 27),
  ('v4', 'josh', 32),
  ('v6', 'peter', 35);

CREATE TABLE modern.software (
  id string,
  name string,
  lang string
) USING DELTA;

INSERT INTO modern.software VALUES
  ('v3', 'lop', 'java'),
  ('v5', 'ripple', 'java');

CREATE TABLE modern.created (
  id string,
  from_id string,
  to_id string,
  weight double
) USING DELTA;

INSERT INTO modern.created VALUES
  ('e9', 'v1', 'v3', 0.4),
  ('e10', 'v4', 'v5', 1.0),
  ('e11', 'v4', 'v3', 0.4),
  ('e12', 'v6', 'v3', 0.2);

CREATE TABLE modern.knows (
  id string,
  from_id string,
  to_id string,
  weight double
) USING DELTA;

INSERT INTO modern.knows VALUES
  ('e7', 'v1', 'v2', 0.5),
  ('e8', 'v1', 'v4', 1.0);

You should now see the tables created in the lakehouse under the modern schema. Confirm that your Delta Lake table has converted successfully to the virtual Iceberg format. You can do this by examining the directory behind the table. See more details in virtualizing Delta Lake tables as Iceberg.

The above SQL creates the following tables:

personsoftwareknowscreated

id	name	age
v1	marko	29
v2	vadas	27
v4	josh	32
v6	peter	35

id	name	lang
v3	lop	java
v5	ripple	java

id	from_id	to_id	weight
e7	v1	v2	0.5
e8	v1	v4	1.0

id	from_id	to_id	weight
e9	v1	v3	0.4
e10	v4	v5	1.0
e11	v4	v3	0.4
e12	v6	v3	0.2

Deployment

Start a PuppyGraph docker container:

docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 -e PUPPYGRAPH_PASSWORD=puppygraph123  -e QUERY_TIMEOUT=5m -d --name puppy --rm --pull=always puppygraph/puppygraph:0.101

Log into the PuppyGraph Web UI at http://localhost:8081 with puppygraph as the username and puppygraph123 as the password.

Modeling a Graph

We then define a graph on top of the data tables we just created. Actually, this is the "Modern" graph defined by Apache Tinkerpop.

PuppyGraph defines graph modeling by using a schema. There are two methods to model the graph:

Use the graph schema builder to build the schema in the Web UI.
- Click on Create graph schema to create a new graph schema.
- Configure the data source.
- Map data to graph step by step.
Compose a schema JSON file and upload it.
- Replace the placeholders in the file schema.json below.
- Choose the file and upload it.

In this tutorial we use the second method, while for the first method, you essentially fill in the same information. Create a file named schema.json with the following content. Replace the placeholders with your actual values to configure the connection using the Iceberg REST Catalog via OneLake Table APIs for Iceberg:

Placeholder	Description
`<fabric_workspace_id>`	The ID of your Microsoft Fabric workspace
`<fabric_data_item_id>`	The ID of the lakehouse data item
`<client_id>`	The client ID of the service principal
`<client_secret>`	The client secret of the service principal
`<tenant_id>`	Your Microsoft Entra ID tenant ID - see endpoint

schema.json

{
  "catalogs": [
    {
      "name": "test_onelake_table_api",
      "type": "iceberg",
      "metastore": {
        "type": "rest",
        "uri": "https://onelake.table.fabric.microsoft.com/iceberg",
        "warehouse": "<fabric_workspace_id>/<fabric_data_item_id>",
        "security": "OAUTH2",
        "credential": "<client_id>:<client_secret>",
        "enableIcebergMetaCache": "true",
        "icebergMetaCacheTTL": "180",
        "scope": "https://storage.azure.com/.default",
        "oauthServerUri": "https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token"
      },
      "storage": {
        "type": "AzureDLS2",
        "clientId": "<client_id>",
        "clientSecret": "<client_secret>",
        "clientEndpoint": "https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token"
      }
    }
  ],
  "graph": {
    "vertices": [
      {
        "label": "person",
        "oneToOne": {
          "tableSource": {
            "catalog": "test_onelake_table_api",
            "schema": "modern",
            "table": "person"
          },
          "id": {
            "fields": [
              {
                "type": "STRING",
                "field": "id",
                "alias": "puppy_id_id"
              }
            ]
          },
          "attributes": [
            {
              "type": "STRING",
              "field": "name",
              "alias": "name"
            }
          ]
        }
      },
      {
        "label": "software",
        "oneToOne": {
          "tableSource": {
            "catalog": "test_onelake_table_api",
            "schema": "modern",
            "table": "software"
          },
          "id": {
            "fields": [
              {
                "type": "STRING",
                "field": "id",
                "alias": "puppy_id_id"
              }
            ]
          },
          "attributes": [
            {
              "type": "STRING",
              "field": "name",
              "alias": "name"
            },
            {
              "type": "STRING",
              "field": "lang",
              "alias": "lang"
            }
          ]
        }
      }
    ],
    "edges": [
      {
        "label": "created",
        "fromVertex": "person",
        "toVertex": "software",
        "tableSource": {
          "catalog": "test_onelake_table_api",
          "schema": "modern",
          "table": "created"
        },
        "id": {
          "fields": [
            {
              "type": "STRING",
              "field": "id",
              "alias": "puppy_id_id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "STRING",
              "field": "from_id",
              "alias": "puppy_from_from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "STRING",
              "field": "to_id",
              "alias": "puppy_to_to_id"
            }
          ]
        }
      },
      {
        "label": "knows",
        "fromVertex": "person",
        "toVertex": "person",
        "tableSource": {
          "catalog": "test_onelake_table_api",
          "schema": "modern",
          "table": "knows"
        },
        "id": {
          "fields": [
            {
              "type": "STRING",
              "field": "id",
              "alias": "puppy_id_id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "STRING",
              "field": "from_id",
              "alias": "puppy_from_from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "STRING",
              "field": "id",
              "alias": "puppy_to_id"
            }
          ]
        }
      }
    ]
  }
}

Log into PuppyGraph Web UI at http://localhost:8081 with username puppygraph and password puppygraph123.

Upload the schema by selecting the file schema.json in the Upload Graph Schema JSON block and clicking on Upload.

Once the schema is uploaded, the schema page shows the visualized graph schema.

Alternative: Schema Uploading via CLI

Alternatively, run the following command to upload the schema file:

curl -XPOST -H "content-type: application/json" --data-binary @./schema.json --user "puppygraph:puppygraph123" localhost:8081/schema

The response shows that graph schema has been uploaded successfully:

{"Status":"OK","Message":"Schema uploaded and gremlin server restarted"}

Query the Graph

Navigate to Query in the Web UI. Use Graph Query for Gremlin/openCypher queries with visualization.

Example Queries

Retrieve a vertex named 'marko'.

Gremlin:

g.V().has("name", "marko").valueMap()

openCypher:

MATCH (v {name: 'marko'}) RETURN v

Retrieve the paths from "marko" to the software created by those whom "marko" knows.

Gremlin:

g.V().has("name", "marko")
.out("knows").out("created").path()

openCypher:

MATCH p=(v {name: 'marko'})-[:knows]->()-[:created]->()
RETURN p

Cleanup

To stop and remove the PuppyGraph Docker container, run:

docker stop puppy