Skip to content

Managing Locally Cached Data

PuppyGraph enables local caching of tables to boost query performance and reduce load on original data sources.

Configuring Cache Mode for Graphs

Configuring Local Cache for the Entire Server

Local cache is enabled by default. You can control it via the DATAACCESS_DATA_CACHE_STRATEGY environment variable (valid values: FULL, ADAPTIVE, NEVER).

Option Description
FULL Cache all data locally, regardless of access patterns. Default for rapid development.
ADAPTIVE Cache data based on access patterns and heuristics. Suitable for data lakes (Iceberg, Delta Lake, etc).
NEVER Disable local caching; always access the original data source.

Configuring Local Cache for Specific Nodes and Edges

With ADAPTIVE mode, you can fine-tune caching for individual node and edge types in your graph schema. Add cacheConfig and partitionConfig to the relevant node or edge definitions in your schema.json.

  • cacheConfig: Sets the caching strategy for each node or edge type, overriding global settings.
  • partitionConfig: Defines how cached data is partitioned and loaded, improving performance for large datasets via partition pruning and parallel loading.

Cache config example

Specify a caching strategy for a node or edge type using the cacheConfig property:

"cacheConfig": {
  "cacheStrategy": "FULL"
}
Cache Strategy Description
FULL Always cache this node or edge type locally.
DEFAULT Use the global cache strategy defined at the system level.

Partition config example

"partitionConfig": {
  "partitionColumns": [
    {
      "partitionKey": "ts",
      "partitionTimeUnit": "DAY"
    }
  ]
}
  • partitionKey: Column used for partitioning cached data.
  • partitionTimeUnit: Time unit for partitioning.
Partition Time Unit Description
YEAR Partition by year
MONTH Partition by month
DAY Partition by day
HOUR Partition by hour

Managing Locally Cached Graph Data

Monitoring and Managing Local Cache via Web UI and HTTP APIs

The Settings page in the PuppyGraph Web UI provides a dashboard for monitoring locally cached data and lets you refresh the cache for individual nodes or edges, or refresh the entire cache. For advanced partition management operations such as loading a specific partition range or dropping individual partitions, use the HTTP APIs listed below.

Loading Data into Local Cache

Two configurations affect data loading:

Configuration Default Description
DATAACCESS_DATA_CACHE_LOADONSCHEMAUPDATE true Automatically load data into local cache when the schema is uploaded. If partitions are defined, all partitions are loaded.
DATAACCESS_DATA_CACHE_FALLBACKTODIRECTLOAD true Fallback to direct load if data is not found in local cache. If false, only cached data is used for queries.

To load or refresh specific partition ranges, use the following HTTP APIs.

The refreshLocalCache request payload supports the following fields:

Field Required Description
viewIds Yes List of node or edge label IDs to load into the local cache.
partitionStartValue Yes Start of the partition range. Must be provided; use an empty string "" for non-partitioned data.
partitionEndValue Yes End of the partition range. Must be provided; use an empty string "" for non-partitioned data.
skipCollectAnalytics No When true, skips collecting analytics (e.g. table statistics) during the cache load. This can speed up cache loading when analytics are not needed. Defaults to false.

Manually Loading Non-Partitioned Data

For nodes or edges without partitioning:

curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "", "partitionEndValue": ""}'

Manually Loading Partitioned Data

For partitioned nodes or edges, specify the partition range:

curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'

To load multiple date ranges, submit separate requests for each range:

curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-02-01 00:00:00", "partitionEndValue": "2025-02-02 00:00:00"}'

Handling Partial Failures When Loading Multiple Elements

Monitor cache data loading progress with:

curl --user username:password http://localhost:8081/ui-api/getLocalCacheDetail

When submitting multiple viewIds in a single request, PuppyGraph processes each independently. If some fail while others succeed, successful views may temporarily show OTHERS_UNAVAILABLE status. This means at least one sibling view failed, not that the successful caches are invalid.

Key points:

  1. Retry only the failed views. Do not re-issue requests for views that already succeeded.
  2. Once failed views succeed in a retry, the OTHERS_UNAVAILABLE status disappears on the next UI refresh.

Example Scenario

Initial request (attempts to load both node_label and edge_label):

curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'

Result: node_label succeeds, edge_label fails. Use the API below to check per-view cache build state:

curl --user username:password \
  http://localhost:8081/ui-api/getLocalCacheDetail

Example response:

{
  "items": [
    {
      "viewId": "node_label",
      "name": "node_label",
      "state": "OTHERS_UNAVAILABLE",
      "progress": "",
      "errorCode": "",
      "viewType": "VERTEX"
    },
    {
      "viewId": "edge_label",
      "name": "edge_label",
      "state": "FAILED",
      "progress": "",
      "errorCode": "",
      "errorMessage": "",
      "viewType": "EDGE"
    }
  ],
  "status": "",
  "type": ""
}
  • node_label shows OTHERS_UNAVAILABLE because edge_label failed.
  • Only views with state = FAILED need to be retried.

Retry only the failed edge_label:

curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
  -d '{"viewIds": ["edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'

After a successful retry, node_label will display SUCCESS status.

Partition Management

To view available partitions in the local cache:

curl --user username:password http://localhost:8081/ui-api/getLocalCachePartitionDisplayInfo?viewId=node_label
  • recentState: The current state of the most recent cache data loading operation for this partition.
Value Description
PENDING Cache data load task submitted but not started
RUNNING Cache data load task in progress
SUCCESS Cache data load task completed successfully
FAILED Cache data load task failed (check errorCode and errorMessage for details)
  • recentProgress: Indicates the progress of the most recent data loading operation (e.g., 0%, 10%, 100%).

To remove a specific partition from local cache storage, get the partition name from the previous result:

curl --user username:password -X POST http://localhost:8081/ui-api/dropLocalCachePartition -d '{"viewId": "node_label", "partitionName": "p20250101"}'

Local Cache Status Reference

The Current local cache status value is displayed on the Settings page in the PuppyGraph Web UI. It reflects the overall state of the local cache system.

Normal States

Status Description
NOOP Local cache is not enabled. PuppyGraph uses direct data source access for all queries.
INIT The local cache system is initializing at startup, before the persisted state has been loaded.
IN_PROGRESS An intermediate transitional state encountered during status updates.
PENDING_LOAD The local cache has been set up but data loading has not yet started. This occurs when a schema is uploaded with DATAACCESS_DATA_CACHE_LOADONSCHEMAUPDATE=false. Trigger loading manually from the Settings page or via the refreshLocalCache API.
LOADING_DATA Data is actively being loaded into the local cache for the first time. This is triggered automatically after schema upload when DATAACCESS_DATA_CACHE_LOADONSCHEMAUPDATE=true. Subsequent manual refreshes produce REFRESHING_DATA instead.
REFRESHING_DATA Data is actively being refreshed (reloaded). Analytics collection will run automatically after the refresh completes.
REFRESHING_DATA_NO_ANALYTICS Data is actively being refreshed without a subsequent analytics collection step.
START_COLLECT_ANALYTICS Data loading or refresh has completed and analytics collection (statistics) is about to begin.
COLLECTING_ANALYTICS Statistics are being collected to improve query planning performance. This is a normal part of the load/refresh workflow.
READY The local cache is fully loaded and ready to serve queries. This is the normal operational state.

Error States

The following statuses indicate an error condition. UNAVAILABLE and DATA_LOADING_ERROR require manual intervention to recover. STATUS_RETRIEVAL_ERROR is potentially transient and may resolve on its own.

Status Description How to Diagnose / Recover
UNAVAILABLE Local cache creation failed, typically due to a connectivity or configuration problem. Check PuppyGraph logs for the root cause. Fix the underlying issue and re-upload the schema to recreate the cache.
STATUS_RETRIEVAL_ERROR PuppyGraph could not read its internal cache state. This is often transient. Check PuppyGraph logs for the root cause. The Settings page polls automatically and will update on the next cycle.
DATA_LOADING_ERROR Data loading failed after exhausting all retries. View the cache detail table to inspect error codes and messages. Fix the underlying data source or connectivity issue, then retry from the Settings page or via the refreshLocalCache API.

Note: When the PuppyGraph server cannot be reached, the Settings page will display Unknown. This is a connectivity issue, not a cache state.