Connecting to BigQuery

It is also possible to query existing data on BigQuery directly without loading it to PuppyGraph. Here is an demo of the feature.

In the demo, the BigQuery data source stores people and referral information. To query the data as a graph, we model people as vertices and the referral relationship between people as edges.

Prerequisites

The demo assumes that PuppyGraph has been deployed at localhost according to the instruction in Launching PuppyGraph from AWS Marketplace or Launching PuppyGraph in Docker.

In this demo, we use the username puppygraph and password puppygraph123.

Data Preparation (Optional)

IDagename

v1

29

marko

v2

27

vadas

The demo uses people and referral information as shown above.

The following steps will create tables and insert data to BigQuery in the GCP web console.

Firstly, create dataset with multiple-region support in .

Then, create tables using the web console.

Finally, open query tab and execute follow SQL.

insert into `demo.person` values ('v1', 29, 'marko'), ('v2', 27, 'vadas');
insert into `demo.referral` values ('e1', 'v1', 'v2', 0.5);

Prepare BigQuery Authentication

Currently, we can use a Google Service Account for PuppyGraph to read data from BigQuery. Firstly, you must create a service account and generate its key. Then, download this key and put it to PuppyGraph's environment. Assume key file named key.json and PuppyGraph container named puppy

docker cp key.json puppy:/home/key.json

Upload the schema

Now the data are ready in BigQuery. We need a PuppyGraph schema before querying it. Let's create a schema file bigquery.json:

bigquery.json
{
  "catalogs": [
    {
      "name": "jdbc_bigquery",
      "type": "bigquery",
      "jdbc": {
        "jdbcUri": "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=PJID;OAuthType=0;OAuthServiceAcctEmail=bigquery-sa@PJID.iam.gserviceaccount.com;OAuthPvtKeyPath=/home/key.json;EnableSession=1;",
        "driverClass": "com.simba.googlebigquery.jdbc.Driver"
      }
    }
  ],
  "vertices": [
    {
      "label": "person",
      "mappedTableSource": {
        "catalog": "jdbc_bigquery",
        "schema": "demo",
        "table": "person",
        "metaFields": {
          "id": "ID"
        }
      },
      "attributes": [
        {
          "type": "Long",
          "name": "age"
        },
        {
          "type": "String",
          "name": "name"
        }
      ]
    }
  ],
  "edges": [
    {
      "label": "knows",
      "mappedTableSource": {
        "catalog": "jdbc_bigquery",
        "schema": "demo",
        "table": "referral",
        "metaFields": {
          "id": "refId",
          "from": "source",
          "to": "referred"
        }
      },
      "from": "person",
      "to": "person",
      "attributes": [
        {
          "type": "Double",
          "name": "weight"
        }
      ]
    }
  ]
}

Here are some notes on this schema:

  1. A catalog jdbc_bigquery is added to specify the remote data source in BigQuery.

    • Set type to bigquery.

    • Set driverClass to com.simba.googlebigquery.jdbc.Driver.

  2. jdbcUri need to set according to your service account config

    • ProjectId=PJID. PJID need to set your service account project id.

    • OAuthServiceAcctEmail= set your service account id.

    • OAuthPvtKeyPath= set the key file path in docker container. Here is /home/key.json

  3. The label of the vertices and edges do not have to be the same as the names of corresponding tables in BigQuery. There is a mappedTableSource field in each of the vertex and edge types specifying the actual schema (demo) and table (referral).

  4. Additionally, the mappedTableSource marks meta columns in the tables. For example, the fieldsfrom and to describe which columns in the table form the endpoints of edges.

Now we can upload the schema file bigquery.json to PuppyGraph with the following shell command, assuming that the PuppyGraph is running on localhost:

curl -XPOST -H "content-type: application/json" --data-binary @./bigquery.json --user "puppygraph:puppygraph123" localhost:8081/schema

Query the data

Connecting to PuppyGraph at http://localhost:8081 and start gremlin console from the "Query" section:

[PuppyGraph]> console
         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph

Now we have connected to the Gremlin Console. We can query the graph:

gremlin> g.V().hasLabel("person").out("knows").values("name")
==>vadas

\

Last updated