Skip to content

Querying Google Spanner Data as a Graph

Summary

In this tutorial, you will:

  • Create a Google Spanner database, load example data, and define a property graph (FinGraph) on top.
  • Start a PuppyGraph container and connect it to Spanner over JDBC.
  • Run Cypher and Gremlin queries against the Spanner data as a graph.

Requires a real Spanner instance

This tutorial points at a Spanner database in your own GCP project. PuppyGraph connects via the official Spanner JDBC driver; a service-account key file is mounted into the container so the driver can authenticate.

Prerequisites

  • docker is available on the host where you'll run PuppyGraph.
  • A GCP project with the Cloud Spanner API enabled, plus permission to create instances and databases.
  • A service account with roles/spanner.databaseReader (or higher) on the database, and a downloaded JSON key file.

Setup

Create a service-account key file

▶ In the GCP console, create or pick a service account, grant it roles/spanner.databaseReader on the Spanner instance, then generate a new JSON key and save it locally as key.json. This is the file the PuppyGraph container will read.

Data Preparation

▶ In the GCP console, create or pick a Cloud Spanner instance, then create a database with the Google Standard SQL dialect.

▶ Open Spanner Studio, paste the following SQL into the editor, and run it to create tables, define a property graph, and insert sample data.

fingraph.sql
CREATE TABLE Person (
  id        INT64 NOT NULL,
  name      STRING(MAX),
  birthday  TIMESTAMP,
  country   STRING(MAX),
  city      STRING(MAX),
) PRIMARY KEY (id);

CREATE TABLE Account (
  id           INT64 NOT NULL,
  create_time  TIMESTAMP,
  is_blocked   BOOL,
  nick_name    STRING(MAX),
) PRIMARY KEY (id);

CREATE TABLE PersonOwnAccount (
  id          INT64 NOT NULL,
  account_id  INT64 NOT NULL,
  create_time TIMESTAMP,
  FOREIGN KEY (account_id) REFERENCES Account (id)
) PRIMARY KEY (id, account_id),
  INTERLEAVE IN PARENT Person ON DELETE CASCADE;

CREATE TABLE AccountTransferAccount (
  id           INT64 NOT NULL,
  to_id        INT64 NOT NULL,
  amount       FLOAT64,
  create_time  TIMESTAMP NOT NULL,
  order_number STRING(MAX),
  FOREIGN KEY (to_id) REFERENCES Account (id)
) PRIMARY KEY (id, to_id, create_time),
  INTERLEAVE IN PARENT Account ON DELETE CASCADE;

CREATE OR REPLACE PROPERTY GRAPH FinGraph
  NODE TABLES (Account, Person)
  EDGE TABLES (
    PersonOwnAccount
      SOURCE KEY (id) REFERENCES Person (id)
      DESTINATION KEY (account_id) REFERENCES Account (id)
      LABEL Owns,
    AccountTransferAccount
      SOURCE KEY (id) REFERENCES Account (id)
      DESTINATION KEY (to_id) REFERENCES Account (id)
      LABEL Transfers
  );

INSERT INTO Account (id, create_time, is_blocked, nick_name) VALUES
  (7,  '2020-01-10 06:22:20.222', false, 'Vacation Fund'),
  (16, '2020-01-27 17:55:09.206', true,  'Vacation Fund'),
  (20, '2020-02-18 05:44:20.655', false, 'Rainy Day Fund');

INSERT INTO Person (id, name, birthday, country, city) VALUES
  (1, 'Alex', '1991-12-21 00:00:00', 'Australia',      'Adelaide'),
  (2, 'Dana', '1980-10-31 00:00:00', 'Czech_Republic', 'Moravia'),
  (3, 'Lee',  '1986-12-07 00:00:00', 'India',          'Kollam');

INSERT INTO PersonOwnAccount (id, account_id, create_time) VALUES
  (1, 7,  '2020-01-10 06:22:20.222'),
  (2, 20, '2020-01-27 17:55:09.206'),
  (3, 16, '2020-02-18 05:44:20.655');

INSERT INTO AccountTransferAccount (id, to_id, amount, create_time, order_number) VALUES
  (7,  16, 300, '2020-08-29 15:28:58.647', '304330008004315'),
  (7,  16, 100, '2020-10-04 16:55:05.342', '304120005529714'),
  (16, 20, 300, '2020-09-25 02:36:14.926', '103650009791820'),
  (20, 7,  500, '2020-10-04 16:55:05.342', '304120005529714'),
  (20, 16, 200, '2020-10-17 03:59:40.247', '302290001255747');

Start PuppyGraph

▶ Start the PuppyGraph container. Mount key.json into the container so the Spanner JDBC driver can read it:

docker run -d --name puppygraph \
  -p 8081:8081 -p 8182:8182 -p 7687:7687 \
  -e PUPPYGRAPH_USERNAME=puppygraph \
  -e PUPPYGRAPH_PASSWORD=puppygraph123 \
  -e GOOGLE_APPLICATION_CREDENTIALS=/keys/key.json \
  -v "$(pwd)/key.json:/keys/key.json:ro" \
  --pull=always puppygraph/puppygraph:latest

Default password

Change PUPPYGRAPH_PASSWORD before running on a publicly accessible machine.

Modeling a Graph

We model the data as two node labels (Person, Account) and two edge labels (Owns, Transfers).

▶ Log into the PuppyGraph Web UI at http://localhost:8081 with puppygraph / puppygraph123.

Build the graph in the Schema Builder

▶ Click Create Catalog, then expand Cloud Data Warehouses and pick Cloud Spanner.

▶ Fill in the connection form (replace placeholders with your Spanner project/instance/database):

Field Value
Catalog name spanner_data
JDBC Connection String jdbc:cloudspanner:/projects/<project-id>/instances/<instance>/databases/<database>
Spanner catalog form
Spanner catalog form

The built-in Spanner driver picks up the credential path from the GOOGLE_APPLICATION_CREDENTIALS env variable you set on docker run.

▶ Click Create Catalog.

▶ Add the Person and Account nodes, then the Owns and Transfers edges. The Spanner schema sits under the database's default schema ("").

Upload a schema file

▶ Create a file schema.json with the following content. Replace the JDBC URI with your Spanner database:

schema.json
{
  "catalog": [
    {
      "name": "spanner_data",
      "type": "spanner",
      "jdbc": {
        "username": "",
        "password": "",
        "jdbcUri": "jdbc:cloudspanner:/projects/<project-id>/instances/<instance>/databases/<database>",
        "driverClass": "com.google.cloud.spanner.jdbc.JdbcDriver"
      }
    }
  ],
  "node": [
    {
      "label": "Person",
      "dataSourceGroup": {
        "externalDataSource": {
          "enabled": true,
          "catalog": "spanner_data",
          "schema": "",
          "table": "Person",
          "mappedField": [
            { "sourceFieldName": "id",      "targetFieldName": "id"      },
            { "sourceFieldName": "name",    "targetFieldName": "name"    },
            { "sourceFieldName": "country", "targetFieldName": "country" },
            { "sourceFieldName": "city",    "targetFieldName": "city"    }
          ]
        }
      },
      "id":        [{ "name": "id", "type": "LONG" }],
      "attribute": [
        { "name": "name",    "type": "STRING" },
        { "name": "country", "type": "STRING" },
        { "name": "city",    "type": "STRING" }
      ]
    },
    {
      "label": "Account",
      "dataSourceGroup": {
        "externalDataSource": {
          "enabled": true,
          "catalog": "spanner_data",
          "schema": "",
          "table": "Account",
          "mappedField": [
            { "sourceFieldName": "id",         "targetFieldName": "id"         },
            { "sourceFieldName": "nick_name",  "targetFieldName": "nick_name"  },
            { "sourceFieldName": "is_blocked", "targetFieldName": "is_blocked" }
          ]
        }
      },
      "id":        [{ "name": "id", "type": "LONG" }],
      "attribute": [
        { "name": "nick_name",  "type": "STRING"  },
        { "name": "is_blocked", "type": "BOOLEAN" }
      ]
    }
  ],
  "edge": [
    {
      "label":         "Owns",
      "fromNodeLabel": "Person",
      "toNodeLabel":   "Account",
      "dataSourceGroup": {
        "externalDataSource": {
          "enabled": true,
          "catalog": "spanner_data",
          "schema": "",
          "table": "PersonOwnAccount",
          "mappedField": [
            { "sourceFieldName": "id",         "targetFieldName": "id"         },
            { "sourceFieldName": "account_id", "targetFieldName": "account_id" }
          ]
        }
      },
      "id":        [{ "name": "id",         "type": "LONG" }],
      "fromKey":   [{ "name": "id",         "type": "LONG" }],
      "toKey":     [{ "name": "account_id", "type": "LONG" }],
      "attribute": []
    },
    {
      "label":         "Transfers",
      "fromNodeLabel": "Account",
      "toNodeLabel":   "Account",
      "dataSourceGroup": {
        "externalDataSource": {
          "enabled": true,
          "catalog": "spanner_data",
          "schema": "",
          "table": "AccountTransferAccount",
          "mappedField": [
            { "sourceFieldName": "id",          "targetFieldName": "id"          },
            { "sourceFieldName": "to_id",       "targetFieldName": "to_id"       },
            { "sourceFieldName": "create_time", "targetFieldName": "create_time" },
            { "sourceFieldName": "amount",      "targetFieldName": "amount"      }
          ]
        }
      },
      "id": [
        { "name": "id",          "type": "LONG"      },
        { "name": "to_id",       "type": "LONG"      },
        { "name": "create_time", "type": "TIMESTAMP" }
      ],
      "fromKey":   [{ "name": "id",     "type": "LONG" }],
      "toKey":     [{ "name": "to_id",  "type": "LONG" }],
      "attribute": [
        { "name": "create_time", "type": "TIMESTAMP" },
        { "name": "amount",      "type": "DOUBLE"    }
      ]
    }
  ]
}

The Transfers edge uses Spanner's full composite primary key (id, to_id, create_time) as its edge ID. The sample data has two transfers with (id=7, to_id=16) and different create_time values, so keying on id alone would collapse them into one edge.

▶ In the Web UI, click Graph in the sidebar, then Upload Schema, and select schema.json.

Upload via CLI

curl -X POST -H "content-type: application/json" \
  --data-binary @./schema.json \
  --user "puppygraph:puppygraph123" \
  http://localhost:8081/schema

Querying the Graph

In the PuppyGraph Web UI, click Query in the sidebar. You can run graph queries in either Cypher or Gremlin.

The following query traces Person-to-Person transfer paths through owned Accounts:

MATCH path = (p:Person)-[:Owns]->(:Account)-[:Transfers]->(:Account)<-[:Owns]-(q:Person)
RETURN path;
g.V().hasLabel('Person').as('p')
  .out('Owns').out('Transfers').in('Owns').path()

Cleanup

▶ Stop the PuppyGraph container:

docker stop puppygraph && docker rm puppygraph

▶ Drop the Spanner database (or the whole instance) from the GCP console when you're done.