Skip to content

Exporting Query Results

PuppyGraph supports exporting query results to cloud storage services.

Supported storage types and file formats

Storage Types

  • Amazon S3
  • Azure Data Lake Storage (Gen2)
  • Google Cloud Storage
  • MinIO

File Formats

  • Parquet Iceberg tables
  • Parquet files
  • CSV files

Syntax

EXPORT TO <path>
PROPERTIES {
  key: value,
  ...
}
[cypher query]

From Graph traversal queries:

g.with("exportTo", <path>)
.with("key", value)
...
[gremlin steps]
From Graph algorithm programs:
graph.compute()
.program([program_def])
.submit().get()
.save([
  "exportTo": "<target_path>",
  "key": value,
  ...
])

Parameters

Key Description Default Value
fileType Target file format (csv, parquet, or table) csv
storageType Cloud storage type (inferred from export path if not provided) -
catalog Catalog name defined in schema for storage configuration -
identifier Cloud storage access identifier -
credential Cloud storage access credential -
endpoint S3-compatible storage endpoint -
region S3/S3-compatible storage region -
useComputeEngineService Use GCS Compute Engine service account false
serviceAccountEmail GCS service account email -
useManagedIdentity Use Azure VM managed identity false
tenantId Azure tenant ID -
clientId Azure client ID -
clientSecret Azure client secret -

Export using schema catalog configuration

When the export destination storage matches the schema's catalog configuration, you can reuse existing storage settings. Before saving data to this location, ensure you possess appropriate write access privileges.

Export to cloud storage

EXPORT TO '<target_path>'
PROPERTIES {
  catalog: '<catalog name>'
}
MATCH (p)-[:created]->(s:software) 
RETURN s.name as name
g.with("exportTo", "<target_path>")
  .with("catalog": "<catalog name>")
  .V()
  .out("created")
  .project('name')
    .by(values('name'))
  • <target_path>: Storage path URI matching catalog scheme (e.g., s3://my_bucket/subfolder)
  • <catalog name>: Name of the catalog defined in the graph schema

Export as Iceberg table

It is also possible to store query results as an Iceberg table. This feature is currently experimental.

To store results as an Iceberg table: - Configure the catalog type in the graph schema as Iceberg. - Ensure the Iceberg catalog service aligns with the schema configuration. - Confirm you have CREATE TABLE privileges on the target Iceberg database (schema). - Check that you have write permissions for the designated storage path.

Query results will be stored as Parquet files in a new Iceberg table.

EXPORT TO '<target_database>.<target_table>'
PROPERTIES {
  catalog: '<catalog name>',
  fileType: 'table'
}
MATCH (p)-[:created]->(s:software) 
RETURN s.name as name
g.with("exportTo", "<target_database>.<target_table>")
  .with("catalog", "<catalog name>")
  .with("fileType", "table")
  .V()
  .out("created")
  .project('name')
    .by(values('name'))
  • <target_database>.<target_table>: The database and table to save to, database name must be provided. e.g. mydatabase.result_table
  • <catalog name>: Name of the catalog defined in the graph schema. It must be of Iceberg type.
  • fileType parameter: Must be set to table

Export using a separate configuration

When the export destination storage does not match the schema's catalog configuration, you can provide separate storage settings.

Amazon S3

Before exporting to Amazon S3, ensure that:

  • The provided credentials have been granted write permissions to the target S3 bucket
  • The IAM policy includes proper authorization for s3:PutObject actions
EXPORT TO '<target_path>'
PROPERTIES {
  identifier: '<aws_access_key_id>',
  credential: '<aws_secret_access_key>'
  region: '<region>'
}
MATCH (p)-[:created]->(s:software) 
RETURN s.name as name
g.with("exportTo", "<target_path>")
  .with("region": "<region>")
  .with("identifier", "<aws_access_key_id>")
  .with("credential": "<aws_secret_access_key>")
  .V()
  .out("created")
  .project('name')
    .by(values('name'))
  • <target_path>: Specifies the destination directory path URI (format: s3://bucket/path/ or s3a://bucket/path/). The path must use either s3 or s3a scheme and will always be treated as a directory.
  • <region>: AWS region where the target S3 bucket is located
  • <aws_access_key_id>: AWS access key ID for S3 authentication
  • <aws_secret_access_key>: Corresponding secret access key for AWS authentication

MinIO

Before saving data to this location, ensure you possess appropriate write access privileges.

EXPORT TO '<target_path>'
PROPERTIES {
  storageType: 'minio',
  endpoint: '<minio_endpoint>',
  identifier: '<user>',
  credential: '<password>'
}
MATCH (p)-[:created]->(s:software) 
RETURN s.name as name
g.with("exportTo", "<target_path>")
  .with("storageType", "minio")
  .with("endpoint", "<minio_endpoint>")
  .with("identifier", "<user>")
  .with("credential", "<password>")
  .V()
  .out("created")
  .project('name')
    .by(values('name'))
  • <target_path>: Destination URI (scheme must be s3) specifying the directory path for saving exported data
  • <minio_endpoint>: Endpoint URL of the MinIO service
  • <user>: Authentication username for MinIO
  • <password>: Authentication password for MinIO
  • The storageType parameter must be explicitly set to minio

Google Cloud Storage

Before saving results to GCS, ensure the provided credentials have write permissions granted.

Using a Key JSON File

To authenticate with a JSON key file:

  1. Mount the JSON key file during container creation
  2. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to the mounted file path within the container

See Authentication for PuppyGraph to access Google Cloud resources for more information on how to configure this.

After completing these steps, you can export results to GCS using the following queries:

EXPORT TO '<target_path>'
MATCH (p)-[:created]->(s:software) 
RETURN s.name as name
g.with("exportTo", "<target_path>")
  .V()
  .out("created")
  .project('name')
    .by(values('name'))

Using an attached Service Account on Compute Engine

Before utilizing using a Service Account on Compute Engine, ensure the following prerequisites are met:

  1. The VM instance running PuppyGraph is associated with a service account
  2. The instance's access scope includes appropriate permissions for storage operations

Reference: service accounts

EXPORT TO '<target_path>'
PROPERTIES {
  useComputeEngineService: true
}
MATCH (p)-[:created]->(s:software) 
RETURN s.name as name
g.with("exportTo", "<target_path>")
  .with("useComputeEngineService": true)
  .V()
  .out("created")
  .project('name')
    .by(values('name'))

Using service account identifier and credential pair

EXPORT TO '<target_path>'
PROPERTIES {
  serviceAccountEmail: '<email>',
  identifier: '<key_id>',
  credential: '<secret_key>'
}
MATCH (p)-[:created]->(s:software) 
RETURN s.name as name
g.with("exportTo", "<target_path>")
  .with("storageType", "GCS")
  .with("serviceAccountEmail": "<email>")
  .with("identifier", "<key_id>")
  .with("credential": "<secret_key>")
  .V()
  .out("created")
  .project('name')
    .by(values('name'))
  • <target_path>: Destination path URI for exported files (must use gs:// scheme for Google Cloud Storage). The path will be treated as a directory
  • <email>: Service account email address associated with your Google Cloud project
  • <key_id>: Unique identifier for the service account's private key
  • <secret_key>: Service account's private key for authentication

Azure Data Lake Storage Gen2

Before saving data to Azure, ensure the provided credentials have write permissions to the target storage location.

Using Managed Identity

EXPORT TO '<target_path>'
PROPERTIES {
  useManagedIdentity: true,
  tenantId: '<tenant-id-of-identity>',
  clientId: '<client-id-of-identity>'
}
MATCH (p)-[:created]->(s:software) 
RETURN s.name as name, sum(p.age) as totalAge
g.with("exportTo", "<target_path>")
  .with("useManagedIdentity", true)
  .with("tenantId", "<tenant-id-of-identity>")
  .with("clientId": "<client-id-of-identity>")
  .V().as("s")
  .in("created").as("p")
  .group().by(select("s")).by(values('age').sum())
  .unfold()
  .project('name', 'totalAge')
    .by(select(keys).values('name'))
    .by(select(values))

Using Storage Account Access Key

EXPORT TO '<target_path>'
PROPERTIES {
  identifier: '<storage-account-name>',
  credential: '<storage-account-access-key>'
}
MATCH (p)-[:created]->(s:software) 
RETURN s.name as name, sum(p.age) as totalAge
g.with("exportTo", "<target_path>")
  .with("identifier", "<storage-account-name>")
  .with("credential": "<storage-account-access-key>")
  .V().as("s")
  .in("created").as("p")
  .group().by(select("s")).by(values('age').sum())
  .unfold()
  .project('name', 'totalAge')
    .by(select(keys).values('name'))
    .by(select(values))
  • <target_path>: Target directory path URI (scheme must be abfs:// or abfss://). The path is always treated as a directory
  • <storage-account-name>: Name of your Azure Storage Account
  • <storage-account-access-key>: Access key for the storage account
  • <tenant-id-of-identity>: Tenant ID of the managed identity
  • <client-id-of-identity>: Client ID (application ID) of the managed identity