Skip to content

Querying Nessie and Minio Data as a Graph with TLS

Summary

In this tutorial, you will:

  • Create a Nessie-backed Apache Iceberg data lake and load it with example data;
  • Deploy TLS-enabled MinIO, Nessie, Spark-Iceberg, and PuppyGraph services using Docker Compose.
  • Configure each service to communicate securely via HTTPS.
  • Use PuppyGraph to query and visualize Nessie data as a graph.

Note: For the non-TLS version, see Querying Nessie Data as a Graph.

Prerequisites

Docker

Docker is required to run the PuppyGraph server. You can download Docker from here.

Please ensure that docker compose is available. The installation can be verified by running:

docker compose version

OpenSSL

OpenSSL is needed to generate self-signed TLS certificates for secure communications.

For Ubuntu, you can install it using the following commands:

sudo apt update
sudo apt install openssl

Accessing the PuppyGraph Web UI requires a browser. However, the tutorial offers alternative instructions for those who wish to exclusively use the CLI.

TLS Certificate Setup

To enable secure communications, we generate self-signed TLS certificates for MinIO and Nessie.

1. Create an OpenSSL Configuration File for Minio

▶ Create a file named openssl.cnf with the following content.

openssl.cnf
[ req ]
default_bits       = 2048
distinguished_name = req_distinguished_name
req_extensions     = req_ext
x509_extensions    = v3_ca
prompt             = no

[ req_distinguished_name ]
C  = US
ST = State
L  = City
O  = Organization
OU = Organizational Unit
CN = minio

[ req_ext ]
subjectAltName = @alt_names

[ v3_ca ]
subjectAltName = @alt_names

[ alt_names ]
DNS.1   = minio
DNS.2   = localhost
IP.1    = 127.0.0.1

2. Generate the Certificate Files for Minio

▶ Run the following commands to generate a self-signed certificate and key, and store them under ./certs/minio

mkdir -p ./certs/minio

openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout ./certs/minio/private.key \
  -out ./certs/minio/public.crt \
  -config ./openssl.cnf

3. Generate the Certificate Files for Nessie

mkdir -p ./certs/nessie

keytool -genkeypair -alias nessie -keyalg RSA -keysize 2048 -storetype pkcs12 -keystore ./certs/nessie/nessie.jks -validity 365 \
-storepass your_nessie_password -keypass your_nessie_password \
-dname "CN=nessie, OU=YourOrgUnit, O=YourOrg, L=YourCity, ST=YourState, C=YourCountry"

keytool -exportcert -alias nessie -keystore ./certs/nessie/nessie.jks -rfc -file ./certs/nessie/nessie.crt -storepass your_nessie_password

Deployment

1. Create the Docker Compose File

▶ Create a file named docker-compose.yaml with the content below. Notice that TLS parameters are integrated into the MinIO and Nessie service configurations.

docker-compose.yaml
services:
  spark-iceberg:
    image: tabulario/spark-iceberg
    container_name: spark-iceberg
    networks:
      iceberg_net:
    depends_on:
      - minio
      - nessie
    volumes:
      - ./warehouse:/home/iceberg/warehouse
      - ./notebooks:/home/iceberg/notebooks/notebooks
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
    ports:
      - 8888:8888
      - 8080:8080
      - 10000:10000
      - 10001:10001

  minio:
    image: quay.io/minio/minio
    container_name: minio
    networks:
      iceberg_net:
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - ./certs/minio:/certs/minio
    environment:
      - MINIO_ROOT_USER=admin
      - MINIO_ROOT_PASSWORD=password
      - MINIO_REGION=us-east-1
    entrypoint: >
      /bin/sh -c "
      minio server /data --certs-dir /certs/minio --console-address ':9001' &
      sleep 5;
      mc alias set myminio https://localhost:9000 admin password --insecure;
      mc mb myminio/my-bucket --ignore-existing --insecure;
      tail -f /dev/null"

  nessie:
    image: ghcr.io/projectnessie/nessie
    container_name: nessie
    networks:
      iceberg_net:
    ports:
      - 19121:19121
    volumes:
      - ./certs/nessie/nessie.jks:/config/nessie.jks
    environment:
      - nessie.version.store.type=IN_MEMORY
      - nessie.catalog.default-warehouse=warehouse
      - nessie.catalog.warehouses.warehouse.location=s3a://my-bucket/
      - nessie.catalog.service.s3.default-options.region=us-east-1
      - nessie.catalog.service.s3.default-options.endpoint=https://minio:9000
      - nessie.catalog.service.s3.default-options.path-style-access=true
      - nessie.catalog.service.s3.default-options.access-key=urn:nessie-secret:quarkus:nessie.catalog.secrets.access-key
      - nessie.catalog.secrets.access-key.name=admin
      - nessie.catalog.secrets.access-key.secret=password
      - nessie.server.authentication.enabled=false
      - quarkus.http.ssl-port=19121
      - quarkus.http.ssl.certificate.key-store-file=/config/nessie.jks
      - quarkus.http.ssl.certificate.key-store-password=your_nessie_password
      - quarkus.http.ssl.certificate.key-store-type=PKCS12

  puppygraph:
    image: puppygraph/puppygraph:stable
    container_name: puppygraph
    networks:
      iceberg_net:
    environment:
      - PUPPYGRAPH_USERNAME=puppygraph
      - PUPPYGRAPH_PASSWORD=puppygraph123
    ports:
      - "8081:8081"
      - "8182:8182"
      - "7687:7687"
    depends_on:
      - spark-iceberg

networks:
  iceberg_net:

2. Start the Services

▶ Then run the following command to start Nessie-backed Iceberg services and PuppyGraph:

docker compose up -d
[+] Running 5/5
 ✔ Network   iceberg_net       Created
 ✔ Container nessie            Started
 ✔ Container minio             Started
 ✔ Container spark-iceberg     Started
 ✔ Container puppygraph        Started

3. Import TLS Certificates into Containers

To ensure that each component correctly validates TLS connections, copy the MinIO and Nessie public certificate into the other containers.

Nessie

▶ Copy the certificate.

docker cp ./certs/minio/public.crt nessie:/tmp/minio.crt

▶ Enter the Nessie container as root and import the certificate.

docker exec -it -u root nessie bash
keytool -importcert -file /tmp/minio.crt -alias minio -cacerts -storepass changeit -noprompt
exit

▶ Restart Nessie.

docker compose restart nessie

Spark-Iceberg

▶ Copy the certificates.

docker cp ./certs/minio/public.crt spark-iceberg:/tmp/minio.crt
docker cp ./certs/nessie/nessie.crt spark-iceberg:/tmp/nessie.crt

▶ Enter the container and update CA certificates.

docker exec -it -u root spark-iceberg bash
cp /tmp/minio.crt /usr/local/share/ca-certificates/
cp /tmp/nessie.crt /usr/local/share/ca-certificates/
update-ca-certificates
exit

PuppyGraph

▶ Copy the certificates.

docker cp ./certs/minio/public.crt puppygraph:/tmp/minio.crt
docker cp ./certs/nessie/nessie.crt puppygraph:/tmp/nessie.crt

▶ Enter the container and update CA certificates.

docker exec -it -u root puppygraph bash
cp /tmp/minio.crt /usr/local/share/ca-certificates/
cp /tmp/nessie.crt /usr/local/share/ca-certificates/
update-ca-certificates
exit

▶ Restart Puppygraph.

docker compose restart puppygraph

Data Preparation

This tutorial is designed to be comprehensive and standalone, so it includes steps to populate data in Nessie. In practical scenarios, PuppyGraph can query data directly from your existing Nessie tables.

▶ Run the following command to start a Spark-SQL shell connected to Nessie.

docker exec -it spark-iceberg spark-sql \
  --conf spark.sql.catalog.demo=org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.demo.uri=https://nessie:19121/iceberg/ \
  --conf spark.sql.catalog.demo.warehouse=s3a://my-bucket/ \
  --conf spark.sql.catalog.demo.type=rest

The shell will be like this:

spark-sql ()>

▶ Then execute the following SQL statements in the shell to create tables and insert data:

CREATE DATABASE demo.modern;

CREATE EXTERNAL TABLE demo.modern.person (
  id string,
  name string,
  age int
) USING iceberg;

INSERT INTO demo.modern.person VALUES
  ('v1', 'marko', 29),
  ('v2', 'vadas', 27),
  ('v4', 'josh', 32),
  ('v6', 'peter', 35);

CREATE EXTERNAL TABLE demo.modern.software (
  id string,
  name string,
  lang string
) USING iceberg;

INSERT INTO demo.modern.software VALUES
  ('v3', 'lop', 'java'),
  ('v5', 'ripple', 'java');

CREATE EXTERNAL TABLE demo.modern.created (
  id string,
  from_id string,
  to_id string,
  weight double
) USING iceberg;

INSERT INTO demo.modern.created VALUES
  ('e9', 'v1', 'v3', 0.4),
  ('e10', 'v4', 'v5', 1.0),
  ('e11', 'v4', 'v3', 0.4),
  ('e12', 'v6', 'v3', 0.2);

CREATE EXTERNAL TABLE demo.modern.knows (
  id string,
  from_id string,
  to_id string,
  weight double
) USING iceberg;

INSERT INTO demo.modern.knows VALUES
  ('e7', 'v1', 'v2', 0.5),
  ('e8', 'v1', 'v4', 1.0);

The above SQL creates the following tables:

id name age
v1 marko 29
v2 vadas 27
v4 josh 32
v6 peter 35
id name lang
v3 lop java
v5 ripple java
id from_id to_id weight
e7 v1 v2 0.5
e8 v1 v4 1.0
id from_id to_id weight
e9 v1 v3 0.4
e10 v4 v5 1.0
e11 v4 v3 0.4
e12 v6 v3 0.2

▶ When finished, exit the Spark-SQL shell by entering:

quit;

Modeling the Graph

Step 1: Connecting to Nessie

▶ Log in to PuppyGraph with puppygraph as the username and puppygraph123 as the password.

▶ Click on Create graph schema to create a new graph schema.

Fill in the fields as follows.

Create Nessie Catalog

Parameter Value
Catalog type Apache Iceberg
Catalog name Some name for the catalog as you like.
Metastore Type Iceberg-Rest
RestUri https://nessie:19121/iceberg (note the HTTPS).
Warehouse Same as nessie.catalog.warehouses.warehouse.location in docker-compose.yaml.
Storage type S3 Compatible
Endpoint https://minio:9000 (note the HTTPS). Same as nessie.catalog.service.s3.default-options.endpoint in docker-compose.yaml.
Access key Same as AWS_ACCESS_KEY_ID in docker-compose.yaml
Secret key Same as AWS_SECRET_ACCESS_KEY in docker-compose.yaml
Enable SSL true
Enable path style access true

▶ Click on Save, then Click on Submit to connect to Nessie.

Step 2: Building the Graph Schema

▶ In the Schema Builder, add the first vertex to the graph from the table person.

▶ After that use the Auto Suggestion to create other nodes and edges. Select person as the start vertex (node) and add the auto suggested nodes and edges.

The graph schema should look like this: ▶ Submit the schema to create the graph.

Step 3: Querying the Graph

PuppyGraph provides a Dashboard that gives the summary of the graph.

Use the Interactive Query UI to further explore the graph by sending queries.

Cleaning up

▶ Run the following command to shut down and remove the services:

docker compose down --volumes --remove-orphans