Data Lake Catalog
Detailed explanation of parameters in PuppyGraph schemas for accessing Data Lakes.
Catalog Parameters Overview
Parameter | Required | Description |
---|---|---|
name | Yes | The name of the catalog |
type | Yes | The type of the catalog |
metastore | Yes | Metastore parameters |
storage | Yes | Data storage parameters |
Metastore Parameters
Hive Metastore
PuppyGraph supports Hive Metastore (HMS) as a catalog metastore:
The table below outlines the Hive Metastore parameters in the metastore
section.
Parameter | Required | Description |
---|---|---|
type | Yes | The type of the metastore. Set the value to |
hiveMetastoreUrl | Yes | The URI of the Hive metastore. Format: |
AWS Glue
PuppyGraph supports AWS Glue as a catalog metastore with the following authentication methods:
Authentication with Instance profile
Authentication with IAM Roles
Authentication with IAM User Access keys
The table below outlines the AWS Glue parameters in the metastore
section.
Parameter | Required | Description |
---|---|---|
type | Yes | The type of the metastore. Set the value to |
useInstanceProfile | Yes | Whether to use role-based authentication (Explicit IAM roles or instance-profile attached). Set the value to |
region | Yes | The region of the AWS Glue Data Catalog. Example: |
accessKey | No | The access key of the IAM user for accessing the AWS Glue Data Catalog. Required by authentication with IAM User Access keys. |
secretKey | No | The secret key of the IAM user for accessing the AWS Glue Data Catalog. Required by authentication with IAM User Access keys. |
iamRoleArn | No | The ARN of the IAM role for accessing the AWS Glue Data Catalog. Required by authentication with IAM roles. |
Iceberg REST Catalog
PuppyGraph supports Iceberg REST Catalog (including Tabular) as a catalog metastore. See the REST Catalog API to learn more about the details.
Iceberg REST
The minimal configuration of an Iceberg REST metastore is as follows:
Tabular
Tabular (tabular.io) is a managed Iceberg platform. An example of the Tabular metastore configuration is as follows:
The table below outlines the Iceberg REST parameters in the metastore
section.
Parameter | Required | Description |
---|---|---|
type | Yes | The type of metastore. Set the value to |
uri | Yes | The server endpoint URI of the REST Catalog |
warehouse | No | The name of the Tabular warehouse. Required by Tabular metastore. |
credential | No | The Tabular authentication credential. Required by Tabular metastore. |
security | No | Security Schema of the REST catalog. Set it to |
session | No | Set it to |
Data Storage Parameters
HDFS
PuppyGraph supports HDFS as data storage with Hive Metastore. There is no storage
section needed with this combination.
Amazon S3
PuppyGraph supports Amazon S3 (Simple Storage Service) as data storage with the following authentication methods:
Authentication with Instance profile
Authentication with IAM Roles
Authentication with IAM User Access keys
The table below outlines the AWS S3 parameters in the storage
section.
Parameter | Required | Description |
---|---|---|
useInstanceProfile | Yes | Whether to use role-based authentication (Explicit IAM roles or instance-profile attached). Set the value to |
region | Yes | The region of the Amazon S3. Example: |
accessKey | No | The access key of the IAM user for accessing the Amazon S3. Required by authentication with IAM User Access keys. |
secretKey | No | The ARN of the IAM role for accessing the Amazon S3. Required by authentication with IAM User Access keys. |
iamRoleArn | No | The ARN of the IAM role for accessing the Amazon S3. Required by authentication with IAM roles. |
S3 Compatible Storage
PuppyGraph supports S3 Compatible Storage (e.g. MinIO) as data storage.
The table below outlines the S3 Compatible parameters in the storage
section.
Parameter | Required | Description |
---|---|---|
useInstanceProfile | Yes | Set the value to |
accessKey | Yes | The access key of an IAM user for accessing the S3 compatible storage. |
secretKey | Yes | The secret key of an IAM user for accessing the S3 compatible storage. |
enableSsl | Yes | Whether to enable SSL connection for accessing the S3 compatible storage.
Set the value to |
endpoint | Yes | The S3 compatible storage endpoint. |
enablePathStyleAccess | Yes | Whether to use path-style access method when accessing the S3 compatible storage.
Set the value to |
Google Cloud Storage
PuppyGraph supports Google Cloud Storage (GCS) as data storage with the following authentication methods:
Authentication with Instance-associated Service Account
Authentication with Service Account Key
The table below outlines the GCS parameters in the storage
section.
Parameter | Required | Description |
---|---|---|
type | Yes | The type of the data storage. Set the value to |
useComputeEngineServiceAccount | No | Whether to use the service account associated to the compute engine instance for accessing GCS. Set the value to |
serviceAccountEmail | No | The email address of the service account for accessing GCS. Required by authentication with Service Account Key. |
serviceAccountPrivateKeyId | No | The private key id of the service account for accessing GCS. Required by authentication with Service Account Key. |
serviceAccountPrivateKey | No | The private key of the service account for accessing GCS. Required by authentication with Service Account Key. |
Azure Blob Storage
PuppyGraph supports Azure Blob Storage as data storage with the following authentication methods:
Authentication with Shared Key
Authentication with SAS (Shared Access Signatures) Token
The following table describes the parameters you need to configure in storage
.
Parameter | Required | Description |
---|---|---|
type | Yes | The type of the data storage. Set the value to |
storageAccount | Yes | The name of the Azure Storage Account |
sharedKey | No | The Shared Key of the Azure Storage account |
storageContainer | No | The name of the Storage Container. |
sasToken | No | The account or container SAS Token. Required by Authentication with SAS (Shared Access Signatures) Token. |
Azure Data Lake Storage Gen2
PuppyGraph supports Azure Data Lake Storage Gen2 as data storage with the following authentication methods:
Authentication with Shared Key
Authentication with Client Secret of Service Principal
Authentication with Managed Identities
The following table describes the parameters you need to configure in storage
.
Parameter | Required | Description |
---|---|---|
type | Yes | The type of the data storage. Set the value to |
storageAccount | No | The name of the Azure Storage Account |
sharedKey | No | The Shared Key of the Azure Storage account. |
clientId | No | The Client ID of the service principal. |
tenantId | No | The Tenant ID of the managed identity. |
clientSecret | No | The Client Secret of the service principal |
clientEndpoint | No | The Client Endpoint of service principal. |
useManagedIdentity | No | Whether to authenticate with Managed Identities. Set the value to |
Last updated