Skip to content

Connecting to Iceberg

Prerequisites

  • Both the Iceberg Catalog and Storage are accessible over the network from the PuppyGraph instance.
  • PuppyGraph supports REST, Hive Metastore, and AWS Glue Data Catalog as Iceberg Catalog implementation.
  • PuppyGraph supports Amazon S3, S3 Compatible Storage (e.g. MinIO), Google Cloud Storage (), Azure, ad HDFS as Iceberg

Configuration

The configuration consists of two parts: Metastore (Catalog) and Data Storage. Please configure them according to you Iceberg setup.

Metastore Configuration

Iceberg REST Catalog

Configuration Explanation
RestUri The server endpoint URI of the REST Catalog.
Warehouse The name of the Tabular warehouse. Required by Tabular metastore.
Security Security Schema of the REST catalog. Set it to oauth2 when using Tabular metastore.
Session Set it to user when using Tabular metastore.
Credential The Tabular authentication credential. Required by Tabular metastore.

AWS Glue

Configuration Explanation
Region The region of the AWS Glue Data Catalog. Example: us-east-1. See AWS Glue endpoints and quotas for more details.
Use instance profile Whether to use role-based authentication (Explicit IAM roles or instance-profile attached)
IAM Role ARN The ARN of the IAM role for accessing the AWS Glue Data Catalog. Required by authentication with IAM roles.
Access key The access key of the IAM user for accessing the AWS Glue Data Catalog. Required by authentication with IAM User Access keys.
Secret key The secret key of the IAM user for accessing the AWS Glue Data Catalog. Required by authentication with IAM User Access keys.

Hive Metastore

Configuration Explanation
Hive metastore URI The URI of your Hive metastore. Format: thrift://<metastore_IP_address>:<metastore_port>.

Data Storage Configuration

Amazon S3 (Simple Storage Service)

PuppyGraph supports Amazon S3 (Simple Storage Service) for Iceberg.

Configuration Explanation
Region The region of the Amazon S3. Example: us-east-1. See Amazon Simple Storage Service endpoints and quotas for more details.
Use instance profile Whether to use role-based authentication (Explicit IAM roles or instance-profile attached).
IAM Role ARN The ARN of the IAM role for accessing the Amazon S3. Required by authentication with IAM roles.
Access key The access key of the IAM user for accessing the Amazon S3. Required by authentication with IAM User Access keys.
Secret key The ARN of the IAM role for accessing the Amazon S3. Required by authentication with IAM User Access keys.

Amazon S3 Compatible Storage

PuppyGraph supports S3 Compatible Storage (e.g. MinIO) for Iceberg.

Configuration Explanation
Endpoint The S3 compatible storage endpoint.
Access key The access key of an IAM user for accessing the S3 compatible storage.
Secret key The secret key of an IAM user for accessing the S3 compatible storage.
Enable SSL Whether to enable SSL connection for accessing the S3 compatible storage.
Enable path style access Whether to use path-style access method when accessing the S3 compatible storage.

Get from metastore

There is no need to specify Storage configuration with the following implementation of Iceberg:

Select Get from metastore in the Web UI for these implementations.

Demo

See Querying Iceberg Data as a Graph for a complete demo.

Example Configurations

Please refer to Data Lake Catalog for detailed parameters for each type of catalog and storage.

Catalog TypeStorage TypeExample Configuration
REST CatalogAmazon S3#rest-catalog--amazon-s3
REST CatalogMinIO#rest-catalog--minio
AWS GlueAmazon S3#aws-glue--amazon-s3
Hive MetastoreHDFS#hive-metastore--hdfs
Hive MetastoreAmazon S3#hive-metastore--amazon-s3
Hive MetastoreMinIO#hive-metastore--minio
Hive MetastoreGoogle GCS#hive-metastore--google-gcs
Hive MetastoreAzure Blob#hive-metastore--azure-blob
Hive MetastoreAzure Data Lake Gen2#hive-metastore--azure-data-lake-gen2

REST Catalog + Amazon S3

"catalogs": [
  {
    "name": "iceberg_rest_s3",
    "type": "iceberg",
    "metastore": {
      "type": "rest",
      "uri": "http://127.0.0.1:8181"
    },
    "storage": {
      "useInstanceProfile": "false",
      "region": "us-west-2",
      "accessKey": "AKIAIOSFODNN7EXAMPLE",
      "secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "enableSsl": "false"
    }
  }
]

REST Catalog + MinIO

"catalogs": [
  {
    "name": "iceberg_rest_minio",
    "type": "iceberg",
    "metastore": {
      "type": "rest",
      "uri": "http://127.0.0.1:8181"
    },
    "storage": {
      "useInstanceProfile": "false",
      "accessKey": "admin",
      "secretKey": "password",
      "enableSsl": "false",
      "endpoint": "http://127.0.0.1:9000",
      "enablePathStyleAccess": "true"
    }
  }
]

AWS Glue + Amazon S3

"catalogs": [
  {
    "name": "iceberg_glue_s3",
    "type": "iceberg",
    "metastore": {
      "type": "glue",
      "useInstanceProfile": "false",
      "region": "us-west-2",
      "accessKey": "AKIAIOSFODNN7EXAMPLE",
      "secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
    },
    "storage": {
      "useInstanceProfile": "false",
      "region": "us-west-2",
      "accessKey": "AKIAIOSFODNN7EXAMPLE",
      "secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "enableSsl": "false"
    }
  }
]

Hive Metastore + HDFS

"catalogs": [
  {
    "name": "iceberg_hms_hdfs",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    }
  }
]

Hive Metastore + MinIO

"catalogs": [
  {
    "name": "iceberg_hms_minio",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    },
    "storage": {
      "useInstanceProfile": "false",
      "accessKey": "admin",
      "secretKey": "password",
      "enableSsl": "false",
      "endpoint": "http://127.0.0.1:9000",
      "enablePathStyleAccess": "true"
    }
  }
]

Hive Metastore + Amazon S3

"catalogs": [
  {
    "name": "iceberg_hms_hdfs",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    },
    "storage": {
      "useInstanceProfile": "false",
      "region": "us-west-2",
      "accessKey": "AKIAIOSFODNN7EXAMPLE",
      "secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "enableSsl": "false"
    }
  }
]

Hive Metastore + Google GCS

"catalogs": [
  {
    "name": "iceberg_hms_gcs",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    },
    "storage": {
      "type": "GCS",
      "serviceAccountEmail": "acc_name@project.iam.gserviceaccount.com",
      "serviceAccountPrivateKeyId": "AKIAIOSFODNN7EXAMPLE",
      "serviceAccountPrivateKey": "-----BEGIN PRIVATE KEY-----\nabcded\n-----END PRIVATE KEY-----\n"
    }
  }
]

Hive Metastore + Azure Blob

"catalogs": [
  {
    "name": "iceberg_hms_azblob",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    },
    "storage": {
      "type": "AzureBlob",
      "storageAccount": "account_name",
      "storageContainer": "container_name",
      "sasToken": "sp=rl&st=2020-12-15T03:19:48Z&se=2024-12-12T11:19:48Z&sv=2022-11-02&sr=c&sig=1"
    }
  }
]

Hive Metastore + Azure Data Lake Gen2

"catalogs": [
  {
    "name": "iceberg_hms_azgen2",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    },
    "storage": {
      "type": "AzureDLS2",
      "clientId": "000000-avaf-aaaa-bbbb-aba988azfa",
      "clientSecret": "EXAMPLEvonefPJabcde",
      "clientEndpoint": "https://login.microsoftonline.com/000000-avaf-aaaa-bbbb-aba988azfa/oauth2/token"
    }
  }
]