Skip to content

Querying ClickHouse Data as a Graph

Available since

ClickHouse connection is available since version 0.73

Summary

In this tutorial, you will:

  • Create a ClickHouse database and load it with example data;
  • Start a PuppyGraph Docker container and query the ClickHouse data as a graph.

Prerequisites

Please ensure that docker compose is available. The installation can be verified by running:

docker compose version

See https://docs.docker.com/compose/install/ for Docker Compose installation instructions and https://www.docker.com/get-started/ for more details on Docker.

Accessing the PuppyGraph Web UI requires a browser. However, the tutorial offers alternative instructions for those who wish to exclusively use the CLI.

Deployment

▶ Create a file docker-compose.yaml with the following content:

docker-compose.yaml
version: "3"

services:
  puppygraph:
    image: puppygraph/puppygraph:0.73
    pull_policy: always
    container_name: puppygraph
    environment:
      - PUPPYGRAPH_USERNAME=puppygraph
      - PUPPYGRAPH_PASSWORD=puppygraph123
    networks:
      ch_net:
    ports:
      - "8081:8081"
      - "8182:8182"
      - "7687:7687"
  clickhouse-server:
    image: clickhouse/clickhouse-server:latest
    container_name: clickhouse-server
    environment:
      - CLICKHOUSE_USER=ch_user
      - CLICKHOUSE_PASSWORD=ch_pwd
    networks:
      - ch_net
    ports:
      - "8123:8123"
      - "9000:9000"
      - "9004:9004"
networks:
  ch_net:
    name: puppy-ch

Update password

Ensure to modify your password environment variables, particularly when your machine is publicly accessible.

▶ Then run the following command to start ClickHouse and PuppyGraph:

docker compose up -d
[+] Running 3/3
 ✔ Network puppy-ch             Created
 ✔ Container clickhouse-server  Started
 ✔ Container puppygraph         Started

Data Preparation

This tutorial is designed to be comprehensive and standalone, so it includes steps to populate data in ClickHouse. In practical scenarios, PuppyGraph can query data directly from your existing ClickHouse databases.

▶ Run the following command to start a ClickHouse shell to access the database:

docker exec -it clickhouse-server clickhouse-client \
  --user ch_user \
  --password ch_pwd

The shell will be as follows:

ClickHouse client version 25.4.2.31 (official build).
Connecting to localhost:9000 as user ch_user.
Connected to ClickHouse server version 25.4.2.

Warnings:
 * Delay accounting is not enabled, OSIOWaitMicroseconds will not be gathered. You can enable it using `echo 1 > /proc/sys/kernel/task_delayacct` or by using sysctl.

45beff86ac12 :)

▶ Then execute the following SQL statements in the shell to create tables and insert data.

CREATE DATABASE IF NOT EXISTS modern;
USE modern;

CREATE TABLE modern.person
(
    id   String,
    name String,
    age  Int32
) ENGINE = MergeTree()
ORDER BY id;

INSERT INTO modern.person (id, name, age) VALUES
    ('v1','marko',29),
    ('v2','vadas',27),
    ('v4','josh',32),
    ('v6','peter',35);

CREATE TABLE modern.software
(
    id   String,
    name String,
    lang String
) ENGINE = MergeTree()
ORDER BY id;

INSERT INTO modern.software (id, name, lang) VALUES
    ('v3','lop','java'),
    ('v5','ripple','java');

CREATE TABLE modern.created
(
    id      String,
    from_id String,
    to_id   String,
    weight  Float64
) ENGINE = MergeTree()
ORDER BY id;

INSERT INTO modern.created (id, from_id, to_id, weight) VALUES
    ('e9','v1','v3',0.4),
    ('e10','v4','v5',1.0),
    ('e11','v4','v3',0.4),
    ('e12','v6','v3',0.2);

CREATE TABLE modern.knows
(
    id      String,
    from_id String,
    to_id   String,
    weight  Float64
) ENGINE = MergeTree()
ORDER BY id;

INSERT INTO modern.knows (id, from_id, to_id, weight) VALUES
    ('e7','v1','v2',0.5),
    ('e8','v1','v4',1.0);

The above SQL creates the following tables:

id name age
v1 marko 29
v2 vadas 27
v4 josh 32
v6 peter 35
id name lang
v3 lop java
v5 ripple java
id from_id to_id weight
e7 v1 v2 0.5
e8 v1 v4 1.0
id from_id to_id weight
e9 v1 v3 0.4
e10 v4 v5 1.0
e11 v4 v3 0.4
e12 v6 v3 0.2

We then define a graph on top of the data tables we just created. Actually, this is the "Modern" graph defined by Apache Tinkerpop.

Modern Graph

Modern Graph

Modeling a Graph

Step 1: Connecting to ClickHouse Server

▶ Log in to PuppyGraph with puppygraph as the username and puppygraph123 as the password.

PuppyGraph Login

PuppyGraph Login

▶ Click on Create graph schema to create a new graph schema.

Fill in the fields as follows.

Create ClickHouse Catalog

Parameter Value
Catalog type Clickhouse
Catalog name Some name for the catalog as you like.
Username Same as CLICKHOUSE_USER in docker-compose.yaml
Password Same as CLICKHOUSE_PASSWORD in docker-compose.yaml
JDBC Connection String jdbc:ch://<host>:<port>/<database>
Driver Class The class name of the JDBC Driver. The default value is com.clickhouse.jdbc.ClickHouseDriver, and typically you don’t need to change it.

▶ Click on Save, then Click on Submit to connect to ClickHouse Server.

Step 2: Building the Graph Schema

▶ In the Schema Builder, add the first node (vertex) to the graph from the table person.

▶ After that use the Auto Suggestion to create other nodes and edges. Select person as the start node (vertex) and add the auto suggested nodes and edges.

The graph schema should look like this: ▶ Submit the schema to create the graph.

Step 3: Querying the Graph

PuppyGraph provides a Dashboard that gives the summary of the graph.

Use the Interactive Query UI to further explore the graph by sending queries.

Querying the Graph

In this tutorial, we will use the Gremlin and Cypher query languages to interact with the Graph. Gremlin, developed by Apache TinkerPop, and Cypher, designed by Neo4j, are both powerful graph query languages. Prior knowledge of either Gremlin or Cypher is not necessary to follow this tutorial. To learn more about Gremlin, visit https://tinkerpop.apache.org/gremlin.html. For information on Cypher, you can check out https://neo4j.com/product/cypher-graph-query-language/.

▶ Click on the Query panel the left side. The Gremlin Query tab offers an interactive environment for querying the graph using Gremlin.

Interactive Gremlin Query Page

Interactive Gremlin Query Page

Queries are entered on the left side, and the right side displays the graph visualization.

The first query retrieves the property of the person named "marko".

▶ Copy the following query, paste it in the query input, and click on the run button.

g.V().has("name", "marko").valueMap()

The output is plain text like the following:

Rows: 1
age              29
name             marko

Now let's also leverage the visualization. The next query gets all the software created by people known to "marko".

▶ Copy the following query, paste it in the query input, and click on the run button.

g.V().has("name", "marko")
  .out("knows").out("created").path()

The output is as follows. There are two paths in the result as "marko" knows "josh" who created "lop" and "ripple".

Interactive Query with Results

Interactive Query with Results

Alternative: Querying the graph via CLI

Alternatively, we can query the graph via CLI.

▶ Execute the following command to access the PuppyGraph CLI.

docker exec -it puppygraph ./bin/puppygraph

The welcome screen appears as follows:

  ____                                     ____                          _
 |  _ \   _   _   _ __    _ __    _   _   / ___|  _ __    __ _   _ __   | |__
 | |_) | | | | | | '_ \  | '_ \  | | | | | |  _  | '__|  / _` | | '_ \  | '_ \
 |  __/  | |_| | | |_) | | |_) | | |_| | | |_| | | |    | (_| | | |_) | | | | |
 |_|      \__,_| | .__/  | .__/   \__, |  \____| |_|     \__,_| | .__/  |_| |_|
                 |_|     |_|      |___/                         |_|
Welcome to PuppyGraph, type help to see the command list
[PuppyGraph]> 
               console         access PuppyGraph Gremlin Console             
               cypher-console  access PuppyGraph Cypher Console              
               groovy          access console to run complex groovy scripts  
               exit            exit PuppyGraph                               
               help            show the command list                         

▶ Access the PuppyGraph Gremlin Console through the console command.

[PuppyGraph]> console
  ____                                     ____                          _
 |  _ \   _   _   _ __    _ __    _   _   / ___|  _ __    __ _   _ __   | |__
 | |_) | | | | | | '_ \  | '_ \  | | | | | |  _  | '__|  / _` | | '_ \  | '_ \
 |  __/  | |_| | | |_) | | |_) | | |_| | | |_| | | |    | (_| | | |_) | | | | |
 |_|      \__,_| | .__/  | .__/   \__, |  \____| |_|     \__,_| | .__/  |_| |_|
                 |_|     |_|      |___/                         |_|
Welcome to PuppyGraph!
version: 0.11

To Learn more about the graph schema:
- Use graph.show() to list all the node (vertex) and edge labels.
- Use graph.show('$FOO') to list all the node (vertex) and edge labels related to $FOO.
- Use graph.describe('$BAR') to list all the attributes of the label $BAR.

See https://tinkerpop.apache.org/gremlin.html to learn more about the Gremlin query language.
Here are some example queries for exploring the graph:
- Use g.V() to list all the nodes (vertices).
- Use g.E() to list all the edges.
- Use g.V().count() to get the total number of nodes (vertices).
- Use g.E().count() to get the total number of edges.
- Use g.V('$ID').out() to find out nodes (vertices) that are reachable in 1-hop from the node (vertex) $ID. For example, g.V('person[v1]').out() will find out 1-hop reachable nodes (vertices) from 'person[v1]'.
- Use g.V('$ID').out().out() similarly to find out 2-hop reachable nodes (vertices) from the node (vertex) $ID.

puppy-gremlin> 

▶ Run the following queries in the console to query the Graph.

g.V().has("name", "marko").valueMap()

Properties of the person named "marko":

puppy-gremlin> g.V().has("name", "marko").valueMap()
Done! Elapsed time: 0.048s, rows: 1
==>map[age:29 name:marko]
g.V().has("name", "marko").out("knows").out("created").valueMap()

All the software created by the people known to "marko":

puppy-gremlin> g.V().has("name", "marko").out("knows").out("created").valueMap()
Done! Elapsed time: 0.051s, rows: 2
==>map[lang:java name:lop]
==>map[lang:java name:ripple]

▶ Exit Gremlin console by :x and then access Cypher console by cypher-console.

puppy-gremlin> :x
Bye!               
[PuppyGraph]> cypher-console

▶ Make sure to initiate Cypher queries using the prefix :>.

puppy-cypher> :> MATCH (v) RETURN count(*)
==>[count(*):6]
puppy-cypher> :> MATCH (v) RETURN v
==>[v:[_type:node,name:peter,_id:person[v6],_label:person,age:35]]
==>[v:[_type:node,name:vadas,_id:person[v2],_label:person,age:27]]
==>[v:[_type:node,name:josh,_id:person[v4],_label:person,age:32]]
==>[v:[_type:node,name:marko,_id:person[v1],_label:person,age:29]]
==>[v:[_type:node,name:ripple,_id:software[v5],lang:java,_label:software]]
==>[v:[_type:node,name:lop,_id:software[v3],lang:java,_label:software]]

▶ Exit Cypher console and the CLI by :x and then exit command.

puppy-cypher> :x
[PuppyGraph]> exit

Cleaning up

▶ Run the following command to shut down and remove the services:

docker compose down
[+] Running 3/3
 ✔ Container clickhouse-server  Removed
 ✔ Container puppygraph         Removed
 ✔ Network puppy-ch             Removed