Data Lake Catalog
Detailed explanation of parameters in PuppyGraph schemas for accessing Data Lakes.
Catalog Parameters Overview
Parameter | Required | Description |
---|---|---|
name | Yes | The name of the catalog. |
type | Yes | The type of your data source. Set the value to |
metastore | Yes | A set of catalog / metastore params about your data source. |
storage | Yes | A set of storage params about your data source. |
Metastore Parameters
A set of parameters about how PuppyGraph integrates with the metastore of your data source.
Hive Metastore (HMS)
If you choose Hive Metastore as the metastore of your data source, configure metastore
as follows:
Parameter | Required | Description |
---|---|---|
type | Yes | The type of metastore that you use for your data source. Set the value to |
hiveMetastoreUrl | Yes | The URI of your Hive metastore. Format: |
AWS Glue
If you choose AWS Glue as the metastore of your data source, which is supported only when you choose AWS S3 as storage, take one of the following actions:
To choose the instance profile-based authentication method, configure
metastore
as follows:
To choose the assumed role-based authentication method, configure
metastore
as follows:
To choose the IAM user-based authentication method, configure
metastore
as follows:
The following table describes the parameters you need to configure in metastore
.
Parameter | Required | Description |
---|---|---|
type | Yes | The type of metastore that you use for your data source. Set the value to |
useInstanceProfile | Yes | Specifies whether to enable the instance profile-based authentication method and the assumed role-based authentication method. Valid values: |
region | Yes | The region in which your AWS Glue Data Catalog resides. Example: |
accessKey | No | The access key of your AWS IAM user. If you use the IAM user-based authentication method to access AWS Glue, you must specify this parameter. |
secretKey | No | The secret key of your AWS IAM user. If you use the IAM user-based authentication method to access AWS Glue, you must specify this parameter. |
iamRoleArn | No | The ARN of the IAM role that has privileges on your AWS Glue Data Catalog. If you use the assumed role-based authentication method to access AWS Glue, you must specify this parameter. |
Iceberg Rest/Tabular
If you choose Iceberg Rest or Tabular as the metastore of your data source,, take one of the following actions:
To use Iceberg Rest, configure
metastore
as follows:
If using tabular as metastore, you do not need to set storage parameters. To use Tabular, configure
metastore
as follows:
The following table describes the parameters you need to configure in metastore
.
Parameter | Required | Description |
---|---|---|
type | Yes | The type of metastore that you use for your data source. Set the value to |
uri | Yes | Specifies rest catalog server uri. |
warehouse | No | Use for tabular, the warehouse name. |
credential | No | Use for tabular, authentication secret for tabular service. |
security | No | Use for tabular, fix value |
session | No | Use for tabular, fix value |
Storage Parameters
A set of parameters about how PuppyGraph integrates with your storage system.
HDFS
If you use HDFS as storage, you do not need to configure storage
.
AWS S3
If you choose AWS S3 as storage for your data source, take one of the following actions:
To choose the instance profile-based authentication method, configure
storage
as follows:
To choose the IAM user-based authentication method, configure
storage
as follows:
The following table describes the parameters you need to configure in storage
.
Parameter | Required | Description |
---|---|---|
useInstanceProfile | Yes | Specifies whether to enable the instance profile-based authentication method and the assumed role-based authentication method. Valid values: |
region | Yes | The region in which your AWS S3 bucket resides. Example: |
accessKey | No | The access key of your IAM user when IAM user-based authentication method is used. |
secretKey | No | The secret key of your IAM user when IAM user-based authentication method is used. |
MinIO
If you choose MinIO as storage for your data source, take one of the following actions:
The following table describes the parameters you need to configure in storage
.
Parameter | Required | Description |
---|---|---|
useInstanceProfile | Yes | Set the value to false. |
accessKey | Yes | The access key of your IAM user. |
secretKey | Yes | The secret key of your IAM user. |
enableSsl | Yes | Specifies whether to enable SSL connection.
Valid values: |
endpoint | Yes | The endpoint that is used to connect to your MinIO storage system instead of AWS S3. |
enablePathStyleAccess | Yes | Specifies whether to enable path-style access.
Valid values: |
Google GCS
If you choose Google GCS as storage for your data source, take one of the following actions:
To choose the instance VM-based authentication method, configure
storage
as follows:
To choose the service account based authentication method, configure
storage
as follows:
The following table describes the parameters you need to configure in storage
.
Parameter | Required | Description |
---|---|---|
type | Yes | Storage Type. Fix value: |
useComputeEngineServiceAccount | No | Specifies whether to enable the instance VM-based authentication method. Valid values: |
serviceAccountEmail | No | Service account email address. |
serviceAccountPrivateKeyId | No | Service account private key id. |
serviceAccountPrivateKey | No | Service account private key. |
Azure Blob Storage
If you choose Azure Blob Storage as storage for your data source, take one of the following actions:
To choose the shared key authentication method, configure
storage
as follows:
To choose the SAS token authentication method, configure
storage
as follows:
The following table describes the parameters you need to configure in storage
.
Parameter | Required | Description |
---|---|---|
type | Yes | Storage Type. Fix value: |
storageAccount | Yes | The username of your Blob Storage account . |
sharedKey | No | Shared Key of your Blob Storage account. |
storageContainer | No | Container name that stores your data. |
sasToken | No | Account or container SAS token to access your data. |
Azure Data Lake Storage Gen2
If you choose Azure Data Lake Storage Gen2 as storage for your data source, take one of the following actions:
To choose the shared key authentication method, configure
storage
as follows:
To choose the service principal authentication method, configure
storage
as follows:
To choose the Managed Identity authentication method, configure
storage
as follows:
The following table describes the parameters you need to configure in storage
.
Parameter | Required | Description |
---|---|---|
type | Yes | Storage Type. Fix value: |
storageAccount | No | The username of your Blob Storage account . |
sharedKey | No | Shared Key of your Blob Storage account. |
clientId | No | Client id of service principal, or client id of the managed identity |
clientSecret | No | Client secret of service principal. |
clientEndpoint | No | Client endpoint of service principal |
useManagedIdentity | No | Specifies whether to enable the Managed Identity authentication method. Valid values: |
tenantId | No | The id of the tenant whose data you want to access |
Examples
Hudi + Hive metastore + S3
Hudi + Hive metastore + MinIO
DeltaLake + Hive metastore + HDFS
DeltaLake + AWS Glue + S3
Last updated