Connecting to Iceberg

Prerequisites

  • Both the Iceberg Catalog and Storage are accessible over the network from the PuppyGraph instance.

  • PuppyGraph supports REST, Hive Metastore, and AWS Glue Data Catalog as Iceberg Catalog implementation.

  • PuppyGraph supports Amazon S3, S3 Compatible Storage (e.g. MinIO), Google Cloud Storage (), Azure, ad HDFS as Iceberg

Configuration

The configuration consists of two parts: Metastore (Catalog) and Data Storage. Please configure them according to you Iceberg setup.

Metastore Configuration

Iceberg REST Catalog

ConfigurationExplanation

RestUri

The server endpoint URI of the REST Catalog.

Warehouse

The name of the Tabular warehouse. Required by Tabular metastore.

Security

Security Schema of the REST catalog. Set it to oauth2 when using Tabular metastore.

Session

Set it to user when using Tabular metastore.

Credential

The Tabular authentication credential. Required by Tabular metastore.

AWS Glue

ConfigurationExplanation

Region

The region of the AWS Glue Data Catalog. Example: us-east-1. See AWS Glue endpoints and quotas for more details.

Use instance profile

Whether to use role-based authentication (Explicit IAM roles or instance-profile attached)

IAM Role ARN

The ARN of the IAM role for accessing the AWS Glue Data Catalog. Required by authentication with IAM roles.

Access key

The access key of the IAM user for accessing the AWS Glue Data Catalog. Required by authentication with IAM User Access keys.

Secret key

The secret key of the IAM user for accessing the AWS Glue Data Catalog. Required by authentication with IAM User Access keys.

Hive Metastore

ConfigurationExplanation

Hive metastore URI

The URI of your Hive metastore. Format: thrift://<metastore_IP_address>:<metastore_port>.

Data Storage Configuration

Amazon S3 (Simple Storage Service)

PuppyGraph supports Amazon S3 (Simple Storage Service) for Iceberg.

ConfigurationExplanation

Region

The region of the Amazon S3. Example: us-east-1. See Amazon Simple Storage Service endpoints and quotas for more details.

Use instance profile

Whether to use role-based authentication (Explicit IAM roles or instance-profile attached).

IAM Role ARN

The ARN of the IAM role for accessing the Amazon S3. Required by authentication with IAM roles.

Access key

The access key of the IAM user for accessing the Amazon S3. Required by authentication with IAM User Access keys.

Secret key

The ARN of the IAM role for accessing the Amazon S3. Required by authentication with IAM User Access keys.

Amazon S3 Compatible Storage

PuppyGraph supports S3 Compatible Storage (e.g. MinIO) for Iceberg.

ConfigurationExplanation

Endpoint

The S3 compatible storage endpoint.

Access key

The access key of an IAM user for accessing the S3 compatible storage.

Secret key

The secret key of an IAM user for accessing the S3 compatible storage.

Enable SSL

Whether to enable SSL connection for accessing the S3 compatible storage.

Enable path style access

Whether to use path-style access method when accessing the S3 compatible storage.

Get from metastore

There is no need to specify Storage configuration with the following implementation of Iceberg:

Select Get from metastore in the Web UI for these implementations.

Demo

See Querying Iceberg Data as a Graph for a complete demo.

Example Configurations

Please refer to Data Lake Catalog for detailed parameters for each type of catalog and storage.

Catalog TypeStorage TypeExample Configuration

REST Catalog

Amazon S3

REST Catalog

MinIO

AWS Glue

Amazon S3

Hive Metastore

HDFS

Hive Metastore

Amazon S3

Hive Metastore

MinIO

Hive Metastore

Google GCS

Hive Metastore

Azure Blob

Hive Metastore

Azure Data Lake Gen2

REST Catalog + Amazon S3

"catalogs": [
  {
    "name": "iceberg_rest_s3",
    "type": "iceberg",
    "metastore": {
      "type": "rest",
      "uri": "http://127.0.0.1:8181"
    },
    "storage": {
      "useInstanceProfile": "false",
      "region": "us-west-2",
      "accessKey": "AKIAIOSFODNN7EXAMPLE",
      "secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "enableSsl": "false"
    }
  }
]

REST Catalog + MinIO

"catalogs": [
  {
    "name": "iceberg_rest_minio",
    "type": "iceberg",
    "metastore": {
      "type": "rest",
      "uri": "http://127.0.0.1:8181"
    },
    "storage": {
      "useInstanceProfile": "false",
      "accessKey": "admin",
      "secretKey": "password",
      "enableSsl": "false",
      "endpoint": "http://127.0.0.1:9000",
      "enablePathStyleAccess": "true"
    }
  }
]

AWS Glue + Amazon S3

"catalogs": [
  {
    "name": "iceberg_glue_s3",
    "type": "iceberg",
    "metastore": {
      "type": "glue",
      "useInstanceProfile": "false",
      "region": "us-west-2",
      "accessKey": "AKIAIOSFODNN7EXAMPLE",
      "secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
    },
    "storage": {
      "useInstanceProfile": "false",
      "region": "us-west-2",
      "accessKey": "AKIAIOSFODNN7EXAMPLE",
      "secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "enableSsl": "false"
    }
  }
]

Hive Metastore + HDFS

"catalogs": [
  {
    "name": "iceberg_hms_hdfs",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    }
  }
]

Hive Metastore + MinIO

"catalogs": [
  {
    "name": "iceberg_hms_minio",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    },
    "storage": {
      "useInstanceProfile": "false",
      "accessKey": "admin",
      "secretKey": "password",
      "enableSsl": "false",
      "endpoint": "http://127.0.0.1:9000",
      "enablePathStyleAccess": "true"
    }
  }
]

Hive Metastore + Amazon S3

"catalogs": [
  {
    "name": "iceberg_hms_hdfs",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    },
    "storage": {
      "useInstanceProfile": "false",
      "region": "us-west-2",
      "accessKey": "AKIAIOSFODNN7EXAMPLE",
      "secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "enableSsl": "false"
    }
  }
]

Hive Metastore + Google GCS

"catalogs": [
  {
    "name": "iceberg_hms_gcs",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    },
    "storage": {
      "type": "GCS",
      "serviceAccountEmail": "acc_name@project.iam.gserviceaccount.com",
      "serviceAccountPrivateKeyId": "AKIAIOSFODNN7EXAMPLE",
      "serviceAccountPrivateKey": "-----BEGIN PRIVATE KEY-----\nabcded\n-----END PRIVATE KEY-----\n"
    }
  }
]

Hive Metastore + Azure Blob

"catalogs": [
  {
    "name": "iceberg_hms_azblob",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    },
    "storage": {
      "type": "AzureBlob",
      "storageAccount": "account_name",
      "storageContainer": "container_name",
      "sasToken": "sp=rl&st=2020-12-15T03:19:48Z&se=2024-12-12T11:19:48Z&sv=2022-11-02&sr=c&sig=1"
    }
  }
]

Hive Metastore + Azure Data Lake Gen2

"catalogs": [
  {
    "name": "iceberg_hms_azgen2",
    "type": "iceberg",
    "metastore": {
      "type": "HMS",
      "hiveMetastoreUrl": "thrift://127.0.0.1:9083"
    },
    "storage": {
      "type": "AzureDLS2",
      "clientId": "000000-avaf-aaaa-bbbb-aba988azfa",
      "clientSecret": "EXAMPLEvonefPJabcde",
      "clientEndpoint": "https://login.microsoftonline.com/000000-avaf-aaaa-bbbb-aba988azfa/oauth2/token"
    }
  }
]

Last updated