Exporting Query Results
PuppyGraph supports exporting query results to cloud storage services.
Supported storage types and file formats
Storage Types
- Amazon S3
- Azure Data Lake Storage (Gen2)
- Google Cloud Storage
- MinIO
File Formats
- Parquet Iceberg tables
- Parquet files
- CSV files
Syntax
Parameters
Key | Description | Default Value |
---|---|---|
fileType |
Target file format (csv , parquet , or table ) |
csv |
storageType |
Cloud storage type (inferred from export path if not provided) | - |
catalog |
Catalog name defined in schema for storage configuration | - |
identifier |
Cloud storage access identifier | - |
credential |
Cloud storage access credential | - |
endpoint |
S3-compatible storage endpoint | - |
region |
S3/S3-compatible storage region | - |
useComputeEngineService |
Use GCS Compute Engine service account | false |
serviceAccountEmail |
GCS service account email | - |
useManagedIdentity |
Use Azure VM managed identity | false |
tenantId |
Azure tenant ID | - |
clientId |
Azure client ID | - |
clientSecret |
Azure client secret | - |
Export using schema catalog configuration
When the export destination storage matches the schema's catalog configuration, you can reuse existing storage settings. Before saving data to this location, ensure you possess appropriate write access privileges.
Export to cloud storage
<target_path>
: Storage path URI matching catalog scheme (e.g.,s3://my_bucket/subfolder
)<catalog name>
: Name of the catalog defined in the graph schema
Export as Iceberg table
It is also possible to store query results as an Iceberg table. This feature is currently experimental.
To store results as an Iceberg table:
- Configure the catalog type in the graph schema as Iceberg.
- Ensure the Iceberg catalog service aligns with the schema configuration.
- Confirm you have CREATE TABLE
privileges on the target Iceberg database (schema).
- Check that you have write permissions for the designated storage path.
Query results will be stored as Parquet files in a new Iceberg table.
<target_database>.<target_table>
: The database and table to save to, database name must be provided. e.g.mydatabase.result_table
<catalog name>
: Name of the catalog defined in the graph schema. It must be of Iceberg type.fileType
parameter: Must be set totable
Export using a separate configuration
When the export destination storage does not match the schema's catalog configuration, you can provide separate storage settings.
Amazon S3
Before exporting to Amazon S3, ensure that:
- The provided credentials have been granted write permissions to the target S3 bucket
- The IAM policy includes proper authorization for s3:PutObject actions
<target_path>
: Specifies the destination directory path URI (format:s3://bucket/path/
ors3a://bucket/path/
). The path must use eithers3
ors3a
scheme and will always be treated as a directory.<region>
: AWS region where the target S3 bucket is located<aws_access_key_id>
: AWS access key ID for S3 authentication<aws_secret_access_key>
: Corresponding secret access key for AWS authentication
MinIO
Before saving data to this location, ensure you possess appropriate write access privileges.
<target_path>
: Destination URI (scheme must bes3
) specifying the directory path for saving exported data<minio_endpoint>
: Endpoint URL of the MinIO service<user>
: Authentication username for MinIO<password>
: Authentication password for MinIO- The
storageType
parameter must be explicitly set tominio
Google Cloud Storage
Before saving results to GCS, ensure the provided credentials have write permissions granted.
Using a Key JSON File
To authenticate with a JSON key file:
- Mount the JSON key file during container creation
- Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to point to the mounted file path within the container
See Authentication for PuppyGraph to access Google Cloud resources for more information on how to configure this.
After completing these steps, you can export results to GCS using the following queries:
Using an attached Service Account on Compute Engine
Before utilizing using a Service Account on Compute Engine, ensure the following prerequisites are met:
- The VM instance running PuppyGraph is associated with a service account
- The instance's access scope includes appropriate permissions for storage operations
Reference: service accounts
Using service account identifier and credential pair
<target_path>
: Destination path URI for exported files (must usegs://
scheme for Google Cloud Storage). The path will be treated as a directory<email>
: Service account email address associated with your Google Cloud project<key_id>
: Unique identifier for the service account's private key<secret_key>
: Service account's private key for authentication
Azure Data Lake Storage Gen2
Before saving data to Azure, ensure the provided credentials have write permissions to the target storage location.
Using Managed Identity
g.with("exportTo", "<target_path>")
.with("useManagedIdentity", true)
.with("tenantId", "<tenant-id-of-identity>")
.with("clientId": "<client-id-of-identity>")
.V().as("s")
.in("created").as("p")
.group().by(select("s")).by(values('age').sum())
.unfold()
.project('name', 'totalAge')
.by(select(keys).values('name'))
.by(select(values))
Using Storage Account Access Key
g.with("exportTo", "<target_path>")
.with("identifier", "<storage-account-name>")
.with("credential": "<storage-account-access-key>")
.V().as("s")
.in("created").as("p")
.group().by(select("s")).by(values('age').sum())
.unfold()
.project('name', 'totalAge')
.by(select(keys).values('name'))
.by(select(values))
<target_path>
: Target directory path URI (scheme must beabfs://
orabfss://
). The path is always treated as a directory<storage-account-name>
: Name of your Azure Storage Account<storage-account-access-key>
: Access key for the storage account<tenant-id-of-identity>
: Tenant ID of the managed identity<client-id-of-identity>
: Client ID (application ID) of the managed identity