Connecting to Iceberg
Prerequisites
- Both the Iceberg Catalog and Storage are accessible over the network from the PuppyGraph instance.
- PuppyGraph supports REST, Hive Metastore, and AWS Glue Data Catalog as Iceberg Catalog implementation.
- PuppyGraph supports Amazon S3, S3 Compatible Storage (e.g. MinIO), Google Cloud Storage (), Azure, ad HDFS as Iceberg
Configuration
The configuration consists of two parts: Metastore (Catalog) and Data Storage. Please configure them according to you Iceberg setup.
Metastore Configuration
Iceberg REST Catalog
Configuration | Explanation |
---|---|
RestUri | The server endpoint URI of the REST Catalog. |
Warehouse | The name of the Tabular warehouse. Required by Tabular metastore. |
Security | Security Schema of the REST catalog. Set it to oauth2 when using Tabular metastore. |
Session | Set it to user when using Tabular metastore. |
Credential | The Tabular authentication credential. Required by Tabular metastore. |
AWS Glue
Configuration | Explanation |
---|---|
Region | The region of the AWS Glue Data Catalog. Example: us-east-1 . See AWS Glue endpoints and quotas for more details. |
Use instance profile | Whether to use role-based authentication (Explicit IAM roles or instance-profile attached) |
IAM Role ARN | The ARN of the IAM role for accessing the AWS Glue Data Catalog. Required by authentication with IAM roles. |
Access key | The access key of the IAM user for accessing the AWS Glue Data Catalog. Required by authentication with IAM User Access keys. |
Secret key | The secret key of the IAM user for accessing the AWS Glue Data Catalog. Required by authentication with IAM User Access keys. |
Hive Metastore
Configuration | Explanation |
---|---|
Hive metastore URI | The URI of your Hive metastore. Format: thrift://<metastore_IP_address>:<metastore_port> . |
Data Storage Configuration
Amazon S3 (Simple Storage Service)
PuppyGraph supports Amazon S3 (Simple Storage Service) for Iceberg.
Configuration | Explanation |
---|---|
Region | The region of the Amazon S3. Example: us-east-1 . See Amazon Simple Storage Service endpoints and quotas for more details. |
Use instance profile | Whether to use role-based authentication (Explicit IAM roles or instance-profile attached). |
IAM Role ARN | The ARN of the IAM role for accessing the Amazon S3. Required by authentication with IAM roles. |
Access key | The access key of the IAM user for accessing the Amazon S3. Required by authentication with IAM User Access keys. |
Secret key | The ARN of the IAM role for accessing the Amazon S3. Required by authentication with IAM User Access keys. |
Amazon S3 Compatible Storage
PuppyGraph supports S3 Compatible Storage (e.g. MinIO) for Iceberg.
Configuration | Explanation |
---|---|
Endpoint | The S3 compatible storage endpoint. |
Access key | The access key of an IAM user for accessing the S3 compatible storage. |
Secret key | The secret key of an IAM user for accessing the S3 compatible storage. |
Enable SSL | Whether to enable SSL connection for accessing the S3 compatible storage. |
Enable path style access | Whether to use path-style access method when accessing the S3 compatible storage. |
Get from metastore
There is no need to specify Storage configuration with the following implementation of Iceberg:
- HDFS with #hive-metastore.
- Tabular (credential vending) with #iceberg-rest-catalog.
Select Get from metastore
in the Web UI for these implementations.
Demo
See Querying Iceberg Data as a Graph for a complete demo.
Example Configurations
Please refer to Data Lake Catalog for detailed parameters for each type of catalog and storage.
Catalog Type | Storage Type | Example Configuration |
---|---|---|
REST Catalog | Amazon S3 | #rest-catalog--amazon-s3 |
REST Catalog | MinIO | #rest-catalog--minio |
AWS Glue | Amazon S3 | #aws-glue--amazon-s3 |
Hive Metastore | HDFS | #hive-metastore--hdfs |
Hive Metastore | Amazon S3 | #hive-metastore--amazon-s3 |
Hive Metastore | MinIO | #hive-metastore--minio |
Hive Metastore | Google GCS | #hive-metastore--google-gcs |
Hive Metastore | Azure Blob | #hive-metastore--azure-blob |
Hive Metastore | Azure Data Lake Gen2 | #hive-metastore--azure-data-lake-gen2 |
REST Catalog + Amazon S3
"catalogs": [
{
"name": "iceberg_rest_s3",
"type": "iceberg",
"metastore": {
"type": "rest",
"uri": "http://127.0.0.1:8181"
},
"storage": {
"useInstanceProfile": "false",
"region": "us-west-2",
"accessKey": "AKIAIOSFODNN7EXAMPLE",
"secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"enableSsl": "false"
}
}
]
REST Catalog + MinIO
"catalogs": [
{
"name": "iceberg_rest_minio",
"type": "iceberg",
"metastore": {
"type": "rest",
"uri": "http://127.0.0.1:8181"
},
"storage": {
"useInstanceProfile": "false",
"accessKey": "admin",
"secretKey": "password",
"enableSsl": "false",
"endpoint": "http://127.0.0.1:9000",
"enablePathStyleAccess": "true"
}
}
]
AWS Glue + Amazon S3
"catalogs": [
{
"name": "iceberg_glue_s3",
"type": "iceberg",
"metastore": {
"type": "glue",
"useInstanceProfile": "false",
"region": "us-west-2",
"accessKey": "AKIAIOSFODNN7EXAMPLE",
"secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
},
"storage": {
"useInstanceProfile": "false",
"region": "us-west-2",
"accessKey": "AKIAIOSFODNN7EXAMPLE",
"secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"enableSsl": "false"
}
}
]
Hive Metastore + HDFS
"catalogs": [
{
"name": "iceberg_hms_hdfs",
"type": "iceberg",
"metastore": {
"type": "HMS",
"hiveMetastoreUrl": "thrift://127.0.0.1:9083"
}
}
]
Hive Metastore + MinIO
"catalogs": [
{
"name": "iceberg_hms_minio",
"type": "iceberg",
"metastore": {
"type": "HMS",
"hiveMetastoreUrl": "thrift://127.0.0.1:9083"
},
"storage": {
"useInstanceProfile": "false",
"accessKey": "admin",
"secretKey": "password",
"enableSsl": "false",
"endpoint": "http://127.0.0.1:9000",
"enablePathStyleAccess": "true"
}
}
]
Hive Metastore + Amazon S3
"catalogs": [
{
"name": "iceberg_hms_hdfs",
"type": "iceberg",
"metastore": {
"type": "HMS",
"hiveMetastoreUrl": "thrift://127.0.0.1:9083"
},
"storage": {
"useInstanceProfile": "false",
"region": "us-west-2",
"accessKey": "AKIAIOSFODNN7EXAMPLE",
"secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"enableSsl": "false"
}
}
]
Hive Metastore + Google GCS
"catalogs": [
{
"name": "iceberg_hms_gcs",
"type": "iceberg",
"metastore": {
"type": "HMS",
"hiveMetastoreUrl": "thrift://127.0.0.1:9083"
},
"storage": {
"type": "GCS",
"serviceAccountEmail": "acc_name@project.iam.gserviceaccount.com",
"serviceAccountPrivateKeyId": "AKIAIOSFODNN7EXAMPLE",
"serviceAccountPrivateKey": "-----BEGIN PRIVATE KEY-----\nabcded\n-----END PRIVATE KEY-----\n"
}
}
]
Hive Metastore + Azure Blob
"catalogs": [
{
"name": "iceberg_hms_azblob",
"type": "iceberg",
"metastore": {
"type": "HMS",
"hiveMetastoreUrl": "thrift://127.0.0.1:9083"
},
"storage": {
"type": "AzureBlob",
"storageAccount": "account_name",
"storageContainer": "container_name",
"sasToken": "sp=rl&st=2020-12-15T03:19:48Z&se=2024-12-12T11:19:48Z&sv=2022-11-02&sr=c&sig=1"
}
}
]
Hive Metastore + Azure Data Lake Gen2
"catalogs": [
{
"name": "iceberg_hms_azgen2",
"type": "iceberg",
"metastore": {
"type": "HMS",
"hiveMetastoreUrl": "thrift://127.0.0.1:9083"
},
"storage": {
"type": "AzureDLS2",
"clientId": "000000-avaf-aaaa-bbbb-aba988azfa",
"clientSecret": "EXAMPLEvonefPJabcde",
"clientEndpoint": "https://login.microsoftonline.com/000000-avaf-aaaa-bbbb-aba988azfa/oauth2/token"
}
}
]