Querying MongoDB Data with BI Connector as a Graph
Summary
In this tutorial, you will:
- Start a PuppyGraph container alongside a MongoDB container, populate collections with example data, and run the BI Connector (
mongosqld) inside the MongoDB container. - Connect PuppyGraph to MongoDB through the BI Connector's MySQL-compatible interface.
- Run Cypher and Gremlin queries against the MongoDB data as a graph.
Self-contained MongoDB Data
PuppyGraph reads MongoDB through the BI Connector, which exposes MongoDB collections as SQL tables over the MySQL wire protocol.
In real deployments, point the BI Connector at your existing MongoDB cluster. See Connecting to MongoDB for the connection reference.
Prerequisites
Please ensure that docker compose is available. The installation can be verified by running:
See https://docs.docker.com/compose/install/ for Docker Compose installation instructions and https://www.docker.com/get-started/ for more details on Docker.
Accessing the PuppyGraph Web UI requires a browser. The schema upload and query steps also have CLI alternatives via curl and the bundled Gremlin console.
Setup
Deployment
Create a file
docker-compose.yaml with the following content:
docker-compose.yaml
version: "3"
services:
puppygraph:
image: puppygraph/puppygraph:latest
pull_policy: always
container_name: puppygraph
environment:
- PUPPYGRAPH_USERNAME=puppygraph
- PUPPYGRAPH_PASSWORD=puppygraph123
networks:
- mongo_net
ports:
- "8081:8081"
- "8182:8182"
- "7687:7687"
mongodb:
image: mongodb/mongodb-community-server:latest
container_name: mongodb
networks:
- mongo_net
ports:
- "3307:3307"
networks:
mongo_net:
name: puppy-mongo
Start the stack:
[+] Running 3/3
✔ Network puppy-mongo Created 0.1s
✔ Container mongodb Started 3.7s
✔ Container puppygraph Started 3.7s
Data Preparation
Open a
mongosh shell in the modern database:
Paste the following commands into the prompt. They create four collections with JSON Schema validators (so the rows match the modern graph types) and seed each one:
modern.js
db.createCollection("software", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["id", "name", "lang"],
properties: {
id: { bsonType: "string" },
name: { bsonType: "string" },
lang: { bsonType: "string" }
}
}
}
});
db.software.insertMany([
{ id: 'v3', name: 'lop', lang: 'java' },
{ id: 'v5', name: 'ripple', lang: 'java' }
]);
db.createCollection("person", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["id", "name", "age"],
properties: {
id: { bsonType: "string" },
name: { bsonType: "string" },
age: { bsonType: "int" }
}
}
}
});
db.person.insertMany([
{ id: 'v1', name: 'marko', age: 29 },
{ id: 'v2', name: 'vadas', age: 27 },
{ id: 'v4', name: 'josh', age: 32 },
{ id: 'v6', name: 'peter', age: 35 }
]);
db.createCollection("created", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["id", "from_id", "to_id", "weight"],
properties: {
id: { bsonType: "string" },
from_id: { bsonType: "string" },
to_id: { bsonType: "string" },
weight: { bsonType: "double" }
}
}
}
});
db.created.insertMany([
{ id: 'e9', from_id: 'v1', to_id: 'v3', weight: Double(0.4) },
{ id: 'e10', from_id: 'v4', to_id: 'v5', weight: Double(1.0) },
{ id: 'e11', from_id: 'v4', to_id: 'v3', weight: Double(0.4) },
{ id: 'e12', from_id: 'v6', to_id: 'v3', weight: Double(0.2) }
]);
db.createCollection("knows", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["id", "from_id", "to_id", "weight"],
properties: {
id: { bsonType: "string" },
from_id: { bsonType: "string" },
to_id: { bsonType: "string" },
weight: { bsonType: "double" }
}
}
}
});
db.knows.insertMany([
{ id: 'e7', from_id: 'v1', to_id: 'v2', weight: Double(0.5) },
{ id: 'e8', from_id: 'v1', to_id: 'v4', weight: Double(1.0) }
]);
The above creates four collections in the modern database. PuppyGraph reads them through the BI Connector as SQL tables.
| id | name | age |
|---|---|---|
| v1 | marko | 29 |
| v2 | vadas | 27 |
| v4 | josh | 32 |
| v6 | peter | 35 |
| id | name | lang |
|---|---|---|
| v3 | lop | java |
| v5 | ripple | java |
| id | from_id | to_id | weight |
|---|---|---|---|
| e9 | v1 | v3 | 0.4 |
| e10 | v4 | v5 | 1.0 |
| e11 | v4 | v3 | 0.4 |
| e12 | v6 | v3 | 0.2 |
| id | from_id | to_id | weight |
|---|---|---|---|
| e7 | v1 | v2 | 0.5 |
| e8 | v1 | v4 | 1.0 |
Type
exit to leave mongosh.
Install and start the BI Connector
The MongoDB BI Connector translates SQL queries into MongoDB queries. It is not bundled in the MongoDB container, so you need to download and start it inside the container.
Open a shell in the MongoDB container:
Confirm the OS so you pick the right BI Connector binary:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.5 LTS"
Download the BI Connector for your CPU architecture and unpack it:
Start
mongosqld listening on port 3307 of the container (which is published to the host):
Type
exit to leave the MongoDB container shell.
Modeling a Graph
We model the data as the TinkerPop modern graph: two node types (person, software) and two edge types (knows, created).

First, log into the PuppyGraph Web UI at http://localhost:8081 with the credentials configured above:
| Field | Value |
|---|---|
| Username | puppygraph |
| Password | puppygraph123 |
There are two ways to define the schema in PuppyGraph: build it interactively in the Schema Builder, or upload a JSON file directly. Pick whichever you prefer; both produce the same graph.
Build the graph in the Schema Builder
The Schema Builder is the visual editor in the PuppyGraph Web UI for adding catalogs, nodes, and edges step by step. It's the recommended path when you're modeling a graph for the first time or want to inspect what each click produces. For a deeper visual walkthrough of every dialog and field, see Modeling a Graph through the Schema Builder. The summary below covers what's needed to build the modern graph against this tutorial's MongoDB data.
Connecting to MongoDB BI Connector
Click Create Catalog, then expand NoSQL & Search and pick MongoDB BI.
Fill in the connection form (BI Connector accepts unauthenticated connections by default in this setup, so leave Username and Password empty):
| Field | Value |
|---|---|
| Catalog name | mongodb_data |
| Username | (empty) |
| Password | (empty) |
| JDBC Connection String | jdbc:mysql://mongodb:3307 |
Check User Specified Driver and fill in (the BI Connector requires the older MariaDB Connector/J 2.x):
| Field | Value |
|---|---|
| JDBC Driver Class | org.mariadb.jdbc.Driver |
| JDBC Driver URL | https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.12/mariadb-java-client-2.7.12.jar |
Click Create Catalog.
Adding nodes
Click Add Node in the toolbar. The Select Table for Node dialog opens. Expand
mongodb_data then modern, pick software, then click Next.
In the Add Node wizard, click Add to ID and select
id from the dropdown. The wizard moves id into ID Columns, leaving name and lang as attributes. Click Next, leave Enable Local Replication off, then click Add Node.
Repeat for
person. The flow is the same: click Add Node, pick the table, click Next, assign id to ID Columns, leave replication off, click Add Node.
Adding edges
Click Add Edge in the toolbar, pick
created from the catalog tree, then click Next.
In the Add Edge wizard, set:
| Field | Value |
|---|---|
| From Node | person |
| To Node | software |
FROM Select Column |
from_id |
TO Select Column |
to_id |
Click Add to ID and select
id to set the edge identifier. Click Next, leave Enable Local Replication off, then click Add Edge.
Repeat for
knows with both From Node and To Node set to person. The other settings are identical to created.
Upload a schema file
If you've already built the graph in the Schema Builder above, you can skip this section. The resulting schema is the same.
This method writes the full schema to a JSON file and uploads it directly. It's useful when you already have a schema for an environment and want to recreate it elsewhere (e.g. for CI, scripted setup, or copy-pasting between PuppyGraph instances).
Create a file
schema.json with the following content:
schema.json
{
"catalog": [
{
"name": "mongodb_data",
"type": "mongodbBiConnector",
"jdbc": {
"username": "",
"password": "",
"jdbcUri": "jdbc:mysql://mongodb:3307",
"driverClass": "org.mariadb.jdbc.Driver",
"driverUrl": "https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.12/mariadb-java-client-2.7.12.jar"
}
}
],
"node": [
{
"label": "software",
"dataSourceGroup": {
"externalDataSource": {
"enabled": true,
"catalog": "mongodb_data",
"schema": "modern",
"table": "software",
"mappedField": [
{ "sourceFieldName": "id", "targetFieldName": "id" },
{ "sourceFieldName": "name", "targetFieldName": "name" },
{ "sourceFieldName": "lang", "targetFieldName": "lang" }
]
}
},
"id": [{ "name": "id", "type": "STRING" }],
"attribute": [
{ "name": "name", "type": "STRING" },
{ "name": "lang", "type": "STRING" }
]
},
{
"label": "person",
"dataSourceGroup": {
"externalDataSource": {
"enabled": true,
"catalog": "mongodb_data",
"schema": "modern",
"table": "person",
"mappedField": [
{ "sourceFieldName": "id", "targetFieldName": "id" },
{ "sourceFieldName": "name", "targetFieldName": "name" },
{ "sourceFieldName": "age", "targetFieldName": "age" }
]
}
},
"id": [{ "name": "id", "type": "STRING" }],
"attribute": [
{ "name": "name", "type": "STRING" },
{ "name": "age", "type": "LONG" }
]
}
],
"edge": [
{
"label": "created",
"fromNodeLabel": "person",
"toNodeLabel": "software",
"dataSourceGroup": {
"externalDataSource": {
"enabled": true,
"catalog": "mongodb_data",
"schema": "modern",
"table": "created",
"mappedField": [
{ "sourceFieldName": "id", "targetFieldName": "id" },
{ "sourceFieldName": "from_id", "targetFieldName": "from_id" },
{ "sourceFieldName": "to_id", "targetFieldName": "to_id" },
{ "sourceFieldName": "weight", "targetFieldName": "weight" }
]
}
},
"id": [{ "name": "id", "type": "STRING" }],
"fromKey": [{ "name": "from_id", "type": "STRING" }],
"toKey": [{ "name": "to_id", "type": "STRING" }],
"attribute": [
{ "name": "from_id", "type": "STRING" },
{ "name": "to_id", "type": "STRING" },
{ "name": "weight", "type": "DOUBLE" }
]
},
{
"label": "knows",
"fromNodeLabel": "person",
"toNodeLabel": "person",
"dataSourceGroup": {
"externalDataSource": {
"enabled": true,
"catalog": "mongodb_data",
"schema": "modern",
"table": "knows",
"mappedField": [
{ "sourceFieldName": "id", "targetFieldName": "id" },
{ "sourceFieldName": "from_id", "targetFieldName": "from_id" },
{ "sourceFieldName": "to_id", "targetFieldName": "to_id" },
{ "sourceFieldName": "weight", "targetFieldName": "weight" }
]
}
},
"id": [{ "name": "id", "type": "STRING" }],
"fromKey": [{ "name": "from_id", "type": "STRING" }],
"toKey": [{ "name": "to_id", "type": "STRING" }],
"attribute": [
{ "name": "from_id", "type": "STRING" },
{ "name": "to_id", "type": "STRING" },
{ "name": "weight", "type": "DOUBLE" }
]
}
]
}
In the Web UI, click Graph in the sidebar, then Upload Schema, and select
schema.json.
Upload via CLI
You can also POST the schema directly:
Querying the Graph
In the PuppyGraph Web UI, click Query in the sidebar. You can run graph queries in either Cypher or Gremlin.
The following query answers "What software was created by people that marko knows?"
There are two paths in the result: marko knows josh, who created lop and ripple.
Cleanup
Shut down and remove the containers: