Skip to content

Managing Locally Cached Data

PuppyGraph enables local caching of tables to boost query performance and reduce load on original data sources.

Configuring Cache Mode for Graphs

Configuring Local Cache for the Entire Server

Local cache is enabled by default. You can control it via the DATAACCESS_DATA_CACHE_STRATEGY environment variable (valid values: FULL, ADAPTIVE, NEVER).

Option Description
FULL Cache all data locally, regardless of access patterns. Default for rapid development.
ADAPTIVE Cache data based on access patterns and heuristics. Suitable for data lakes (Iceberg, Delta Lake, etc).
NEVER Disable local caching; always access the original data source.

Configuring Local Cache for Specific Nodes and Edges

With ADAPTIVE mode, you can fine-tune caching for individual node and edge types in your graph schema. Add cacheConfig and partitionConfig to the relevant node or edge definitions in your schema.json.

  • cacheConfig: Sets the caching strategy for each node or edge type, overriding global settings.
  • partitionConfig: Defines how cached data is partitioned and loaded, improving performance for large datasets via partition pruning and parallel loading.

Cache config example

Specify a caching strategy for a node or edge type using the cacheConfig property:

"cacheConfig": {
  "cacheStrategy": "FULL"
}
Cache Strategy Description
FULL Always cache this node or edge type locally.
DEFAULT Use the global cache strategy defined at the system level.

Partition config example

"partitionConfig": {
  "partitionColumns": [
    {
      "partitionKey": "ts",
      "partitionTimeUnit": "DAY"
    }
  ]
}
  • partitionKey: Column used for partitioning cached data.
  • partitionTimeUnit: Time unit for partitioning.
Partition Time Unit Description
YEAR Partition by year
MONTH Partition by month
DAY Partition by day
HOUR Partition by hour

Managing Locally Cached Graph Data

Monitoring and Managing Local Cache via Web UI and HTTP APIs

The Web UI provides a dashboard for monitoring locally cached data, including cache build progress, partition status, and error details. It does not currently support fine-grained cache management operations, but this capability is planned for future releases. If you are interested in this feature, please contact us.

For cache management tasks such as loading, refreshing, or removing cache, use the HTTP APIs listed below.

Loading Data into Local Cache

Two configurations affect data loading:

Configuration Default Description
DATAACCESS_DATA_CACHE_LOADONSCHEMAUPDATE true Automatically load data into local cache when the schema is uploaded. If partitions are defined, all partitions are loaded.
DATAACCESS_DATA_CACHE_FALLBACKTODIRECTLOAD true Fallback to direct load if data is not found in local cache. If false, only cached data is used for queries.

To load data manually, use the following HTTP APIs.

The refreshLocalCache request payload supports the following fields:

Field Required Description
viewIds Yes List of node or edge label IDs to load into the local cache.
partitionStartValue Yes Start of the partition range. Must be provided; use an empty string "" for non-partitioned data.
partitionEndValue Yes End of the partition range. Must be provided; use an empty string "" for non-partitioned data.
skipCollectAnalytics No When true, skips collecting analytics (e.g. table statistics) during the cache load. This can speed up cache loading when analytics are not needed. Defaults to false.

Manually Loading Non-Partitioned Data

For nodes or edges without partitioning:

curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "", "partitionEndValue": ""}'

Manually Loading Partitioned Data

For partitioned nodes or edges, specify the partition range:

curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'

To load multiple date ranges, submit separate requests for each range:

curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-02-01 00:00:00", "partitionEndValue": "2025-02-02 00:00:00"}'

Handling Partial Failures When Loading Multiple Elements

Monitor cache data loading progress with:

curl --user username:password http://localhost:8081/ui-api/getLocalCacheDetail

When submitting multiple viewIds in a single request, PuppyGraph processes each independently. If some fail while others succeed, successful views may temporarily show OTHERS_UNAVAILABLE status. This means at least one sibling view failed, not that the successful caches are invalid.

Key points:

  1. Retry only the failed views. Do not re-issue requests for views that already succeeded.
  2. Once failed views succeed in a retry, the OTHERS_UNAVAILABLE status disappears on the next UI refresh.

Example Scenario

Initial request (attempts to load both node_label and edge_label):

curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'

Result: node_label succeeds, edge_label fails. Use the API below to check per-view cache build state:

curl --user username:password \
  http://localhost:8081/ui-api/getLocalCacheDetail

Example response:

{
  "items": [
    {
      "viewId": "node_label",
      "name": "node_label",
      "state": "OTHERS_UNAVAILABLE",
      "progress": "",
      "errorCode": "",
      "viewType": "VERTEX"
    },
    {
      "viewId": "edge_label",
      "name": "edge_label",
      "state": "FAILED",
      "progress": "",
      "errorCode": "",
      "errorMessage": "",
      "viewType": "EDGE"
    }
  ],
  "status": "",
  "type": ""
}
  • node_label shows OTHERS_UNAVAILABLE because edge_label failed.
  • Only views with state = FAILED need to be retried.

Retry only the failed edge_label:

curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'

After a successful retry, node_label will display SUCCESS status.

Partition Management

To view available partitions in the local cache:

curl --user username:password http://localhost:8081/ui-api/getLocalCachePartitionDisplayInfo?viewId=node_label
  • recentState: The current state of the most recent cache data loading operation for this partition.
Value Description
PENDING Cache data load task submitted but not started
RUNNING Cache data load task in progress
SUCCESS Cache data load task completed successfully
FAILED Cache data load task failed (check errorCode and errorMessage for details)
  • recentProgress: Indicates the progress of the most recent data loading operation (e.g., 0%, 10%, 100%).

To remove a specific partition from local cache storage, get the partition name from the previous result:

curl --user username:password -X POST http://localhost:8081/ui-api/dropLocalCachePartition -d '{"viewId": "node_label", "partitionName": "p20250101"}'