Querying MongoDB Data with BI Connector as a Graph

Summary

In this tutorial, you will:

Create a Local MongoDB database with BI connector and load it with example data;
Start a PuppyGraph Docker container and query the MongoDB data as a graph.

Prerequisites

Please ensure that docker compose is available. The installation can be verified by running:

docker compose version

See https://docs.docker.com/compose/install/ for Docker Compose installation instructions and https://www.docker.com/get-started/ for more details on Docker.

Accessing the PuppyGraph Web UI requires a browser. However, the tutorial offers alternative instructions for those who wish to exclusively use the CLI.

Deployment

Create a file docker-compose.yaml with the following content:

docker-compose.yaml

networks:
  puppy_net:
    name: puppy-mongo
services:
  hive-hdfs:
    image: mongodb/mongodb-community-server:latest 
    container_name: mongodb 
    networks:
      puppy_net:
    ports:
      - "3307:3307"
  puppygraph:
    image: puppygraph/puppygraph:stable
    container_name: puppygraph
    environment:
      - PUPPYGRAPH_USERNAME=puppygraph
      - PUPPYGRAPH_PASSWORD=puppygraph123
    networks:
      puppy_net:
    ports:
      - "7687:7687"
      - "8081:8081"
      - "8182:8182"

Warning: Ensure to modify your password environment variables, particularly when your machine is publicly accessible.

Then run the following command to start MongoDB and PuppyGraph:

docker compose up -d

[+] Running 3/3
 ✔ Network puppy-mongo     Created                             0.1s 
 ✔ Container mongodb       Started                             3.7s 
 ✔ Container puppygraph    Started                             3.7s

Data Preparation

This tutorial is designed to be comprehensive and standalone, so it includes steps to populate data in MongoDB. In practical scenarios, PuppyGraph can query data directly from existing tables in your MongoDB.

Run the following command to start a MongDB shell to access database:

docker exec -uroot -it mongodb mongosh modern

This command will default access the modern database like follow:

Current Mongosh Log ID: 679363652b2703e9db544ca6
Connecting to:      mongodb://127.0.0.1:27017/modern?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.3.8
Using MongoDB:      7.0.16
Using Mongosh:      2.3.8

For mongosh info see: https://www.mongodb.com/docs/mongodb-shell/


To help improve our products, anonymous usage data is collected and sent to MongoDB periodically (https://www.mongodb.com/legal/privacy-policy).
You can opt-out by running the disableTelemetry() command.

------
   The server generated these startup warnings when booting
   2025-01-24T09:54:37.638+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
   2025-01-24T09:54:38.355+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
   2025-01-24T09:54:38.355+00:00: vm.max_map_count is too low
------

modern>

Create the demo graph. We strongly recommend user to specify JSON schema validation to avoid errors.

db.createCollection("person", {
   validator: {
      $jsonSchema: {
         bsonType: "object",
         required: [ "id", "name", "age" ],
         properties: {
            id: { bsonType: "string" },
            name: { bsonType: "string" },
            age: { bsonType: "int"}
         }
      }
   }
})
db.person.insertMany([
  {id: 'v1', name: 'marko', age: 29},
  {id: 'v2', name: 'vadas', age: 27},
  {id: 'v4', name: 'josh', age: 32},
  {id: 'v6', name: 'peter', age: 35}
])
db.createCollection("software", {
   validator: {
      $jsonSchema: {
         bsonType: "object",
         required: [ "id", "name", "lang" ],
         properties: {
            id: { bsonType: "string" },
            name: { bsonType: "string" },
            lang: { bsonType: "string" }
         }
      }
   }
})
db.software.insertMany([
  {id: 'v3', name: 'lop', lang: 'java'},
  {id: 'v5', name: 'ripple', lang: 'java'}
])
db.createCollection("created", {
   validator: {
      $jsonSchema: {
         bsonType: "object",
         required: [ "id", "from_id", "to_id", "weight" ],
         properties: {
            id: { bsonType: "string" },
            from_id: { bsonType: "string" },
            to_id: { bsonType: "string" },
            weight: { bsonType: "double" }
         }
      }
   }
})
db.created.insertMany([
  {id: 'e9', from_id: 'v1', to_id: 'v3', weight: 0.4},
  {id: 'e10', from_id: 'v4', to_id: 'v5', weight: Double(1.1)},
  {id: 'e11', from_id: 'v4', to_id: 'v3', weight: 0.4},
  {id: 'e12', from_id: 'v6', to_id: 'v3', weight: 0.2}
])
db.createCollection("knows", {
   validator: {
      $jsonSchema: {
         bsonType: "object",
         required: [ "id", "from_id", "to_id", "weight" ],
         properties: {
            id: { bsonType: "string" },
            from_id: { bsonType: "string" },
            to_id: { bsonType: "string" },
            weight: { bsonType: "double" }
         }
      }
   }
})
db.knows.insertMany([
  {id: 'e7', from_id: 'v1', to_id: 'v2', weight: 0.5},
  {id: 'e8', from_id: 'v1', to_id: 'v4', weight: Double(1.1)}
])

The above SQL creates the following tables under the modern schema:

personsoftwareknowscreated

id	name	age
v1	marko	29
v2	vadas	27
v4	josh	32
v6	peter	35

id	name	lang
v3	lop	java
v5	ripple	java

id	from_id	to_id	weight
e7	v1	v2	0.5
e8	v1	v4	1.1

id	from_id	to_id	weight
e9	v1	v3	0.4
e10	v4	v5	1.1
e11	v4	v3	0.4
e12	v6	v3	0.2

Install and Start BI Connector inside MongoDB container

BI Connector provides a relational schema and translates SQL queries between your BI tool and MongoDB. BI Connector is not default installed in the docker container. We need to dowload from website and install it inside the MongoDB container.

Enter into the mongodb container:

docker exec -uroot -it mongodb bash
cd /home/

Confirm the OS system of the container:

cat /etc/lsb-release

It may show like follow:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.5 LTS"

Download BI Connector according the OS system and your machine platform: In x86_64 platform

wget https://info-mongodb-com.s3.amazonaws.com/mongodb-bi/v2/mongodb-bi-linux-x86_64-ubuntu2204-v2.14.21.tgz
wget https://info-mongodb-com.s3.amazonaws.com/mongodb-bi/v2/mongodb-bi-linux-x86_64-ubuntu2204-v2.14.21.tgz.md5
md5sum -c mongodb-bi-linux-x86_64-ubuntu2204-v2.14.21.tgz.md5
tar -zxvf mongodb-bi-linux-x86_64-ubuntu2204-v2.14.21.tgz
cd mongodb-bi-linux-x86_64-ubuntu2204-v2.14.21

In arm64 platform

wget https://info-mongodb-com.s3.amazonaws.com/mongodb-bi/v2/mongodb-bi-linux-arm64-ubuntu2204-v2.14.21.tgz
wget https://info-mongodb-com.s3.amazonaws.com/mongodb-bi/v2/mongodb-bi-linux-arm64-ubuntu2204-v2.14.21.tgz.md5
md5sum -c mongodb-bi-linux-arm64-ubuntu2204-v2.14.21.tgz.md5
tar -zxvf mongodb-bi-linux-arm64-ubuntu2204-v2.14.21.tgz
cd mongodb-bi-linux-arm64-ubuntu2204-v2.14.21

Start BI Connector.

nohup ./bin/mongosqld --addr=0.0.0.0 &

Modeling a Graph

We then define a graph on top of the data tables we just created. Actually, this is the "Modern" graph defined by Apache Tinkerpop.

A schema instructs PuppyGraph on mapping data from the Hive into a graph. PuppyGraph offers various methods for schema creation. For this tutorial, we've already prepared a schema to help save time.

Create a PuppyGraph schema file schema.json with the following content:

schema.json

{
  "catalogs": [
    {
      "name": "mongodb_data",
      "type": "mongodbBiConnector",
      "jdbc": {
        "username": "",
        "password": "",
        "jdbcUri": "jdbc:mysql://mongodb:3307",
        "driverClass": "org.mariadb.jdbc.Driver",
          "driverUrl": "https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.12/mariadb-java-client-2.7.12.jar"
      }
    }
  ],
  "graph": {
    "vertices": [
      {
        "label": "person",
        "oneToOne": {
          "tableSource": {
            "catalog": "mongodb_data",
            "schema": "modern",
            "table": "person"
          },
          "id": {
            "fields": [
              {
                "type": "String",
                "field": "id",
                "alias": "id"
              }
            ]
          },
          "attributes": [
            {
              "type": "Long",
              "field": "age",
              "alias": "age"
            },
            {
              "type": "String",
              "field": "name",
              "alias": "name"
            }
          ]
        }
      },
      {
        "label": "software",
        "oneToOne": {
          "tableSource": {
            "catalog": "mongodb_data",
            "schema": "modern",
            "table": "software"
          },
          "id": {
            "fields": [
              {
                "type": "String",
                "field": "id",
                "alias": "id"
              }
            ]
          },
          "attributes": [
            {
              "type": "String",
              "field": "lang",
              "alias": "lang"
            },
            {
              "type": "String",
              "field": "name",
              "alias": "name"
            }
          ]
        }
      }
    ],
    "edges": [
      {
        "label": "knows",
        "fromVertex": "person",
        "toVertex": "person",
        "tableSource": {
          "catalog": "mongodb_data",
          "schema": "modern",
          "table": "knows"
        },
        "id": {
          "fields": [
            {
              "type": "String",
              "field": "id",
              "alias": "id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "String",
              "field": "from_id",
              "alias": "from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "String",
              "field": "to_id",
              "alias": "to_id"
            }
          ]
        },
        "attributes": [
          {
            "type": "Double",
            "field": "weight",
            "alias": "weight"
          }
        ]
      },
      {
        "label": "created",
        "fromVertex": "person",
        "toVertex": "software",
        "tableSource": {
          "catalog": "mongodb_data",
          "schema": "modern",
          "table": "created"
        },
        "id": {
          "fields": [
            {
              "type": "String",
              "field": "id",
              "alias": "id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "String",
              "field": "from_id",
              "alias": "from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "String",
              "field": "to_id",
              "alias": "to_id"
            }
          ]
        },
        "attributes": [
          {
            "type": "Double",
            "field": "weight",
            "alias": "weight"
          }
        ]
      }
    ]
  }
}

Warning: Please use MySQL 5.7.X Community Edition (Or MariaDB 2.7.x )and the latest version of MongoDB Connector for BI. The plugin is not compatible with MySQL Server or Connector/ODBC driver version 8 and later. Here we use MariaDB 2.7.12 Client to connect.

Log into PuppyGraph Web UI at http://localhost:8081 with username puppygraph and password puppygraph123.

Upload the schema by selecting the file schema.json in the Upload Graph Schema JSON block and clicking on Upload.

Once the schema is uploaded, the schema page shows the visualized graph schema as follows.

Alternative: Schema Uploading via CLI

Alternatively, run the following command to upload the schema file:

curl -XPOST -H "content-type: application/json" --data-binary @./schema.json --user "puppygraph:puppygraph123" localhost:8081/schema

The response shows that graph schema has been uploaded successfully:

{"Status":"OK","Message":"Schema uploaded and gremlin server restarted"}

Querying the Graph

In this tutorial we will use the Gremlin query language to query the Graph. Gremlin is a graph query language developed by Apache TinkerPop. Prior knowledge of Gremlin is not necessary to follow the tutorial. To learn more about it, visit https://tinkerpop.apache.org/gremlin.html.

Click on the Query panel the left side. The Gremlin Query tab offers an interactive environment for querying the graph using Gremlin.

Queries are entered on the left side, and the right side displays the graph visualization.

The first query retrieves the property of the person named "marko".

Copy the following query, paste it in the query input, and click on the run button.

g.V().has("name", "marko").valueMap()

The output is plain text like the following:

Rows: 1
age              29
name             marko

Now let's also leverage the visualization. The next query gets all the software created by people known to "marko".

Copy the following query, paste it in the query input, and click on the run button.

g.V().has("name", "marko")
  .out("knows").out("created").path()

The output is as follows. There are two paths in the result as "marko" knows "josh" who created "lop" and "ripple".

Alternative: Querying the graph via CLI

Alternatively, we can query the graph via CLI.

Execute the following command to access the PuppyGraph Gremlin CLI

docker exec -it puppy ./bin/console

The welcome screen appears as follows:

  ____                                     ____                          _
 |  _ \   _   _   _ __    _ __    _   _   / ___|  _ __    __ _   _ __   | |__
 | |_) | | | | | | '_ \  | '_ \  | | | | | |  _  | '__|  / _` | | '_ \  | '_ \
 |  __/  | |_| | | |_) | | |_) | | |_| | | |_| | | |    | (_| | | |_) | | | | |
 |_|      \__,_| | .__/  | .__/   \__, |  \____| |_|     \__,_| | .__/  |_| |_|
                 |_|     |_|      |___/                         |_|
Welcome to PuppyGraph!
version: 0.10

puppy-gremlin>

Run the following queries in the console to query the Graph.

Query 1Query 2

g.V().has("name", "marko").valueMap()

Properties of the person named "marko":

puppy-gremlin> g.V().has("name", "marko").valueMap()
Done! Elapsed time: 0.059s, rows: 1
==>map[age:29 name:marko]

g.V().has("name", "marko").out("knows").out("created").valueMap()

All the software created by the people known to "marko":

puppy-gremlin> g.V().has("name", "marko").out("knows").out("created").valueMap()
Done! Elapsed time: 0.042s, rows: 2
==>map[lang:java name:lop]
==>map[lang:java name:ripple]

To exit PuppyGraph Gremlin Console, enter the command:

:exit

Cleaning up

Run the following command to shut down and remove the services:

docker compose down

[+] Running 3/3
 ✔ Container mongdb        Removed                                  3.6s 
 ✔ Container puppygraph    Removed                                  10.5s 
 ✔ Network puppy-mongo     Removed                                  0.2s