Managing Locally Cached Data
PuppyGraph enables local caching of tables to boost query performance and reduce load on original data sources.
Configuring Cache Mode for Graphs
Configuring Local Cache for the Entire Server
Local cache is enabled by default. You can control it via the DATAACCESS_DATA_CACHE_STRATEGY environment variable (valid values: FULL, ADAPTIVE, NEVER).
| Option | Description |
|---|---|
FULL |
Cache all data locally, regardless of access patterns. Default for rapid development. |
ADAPTIVE |
Cache data based on access patterns and heuristics. Suitable for data lakes (Iceberg, Delta Lake, etc). |
NEVER |
Disable local caching; always access the original data source. |
Configuring Local Cache for Specific Nodes and Edges
With ADAPTIVE mode, you can fine-tune caching for individual node and edge types in your graph schema. Add cacheConfig and partitionConfig to the relevant node or edge definitions in your schema.json.
cacheConfig: Sets the caching strategy for each node or edge type, overriding global settings.partitionConfig: Defines how cached data is partitioned and loaded, improving performance for large datasets via partition pruning and parallel loading.
Cache config example
Specify a caching strategy for a node or edge type using the cacheConfig property:
| Cache Strategy | Description |
|---|---|
FULL |
Always cache this node or edge type locally. |
DEFAULT |
Use the global cache strategy defined at the system level. |
Partition config example
partitionKey: Column used for partitioning cached data.partitionTimeUnit: Time unit for partitioning.
| Partition Time Unit | Description |
|---|---|
YEAR |
Partition by year |
MONTH |
Partition by month |
DAY |
Partition by day |
HOUR |
Partition by hour |
Managing Locally Cached Graph Data
Monitoring and Managing Local Cache via Web UI and HTTP APIs
The Web UI provides a dashboard for monitoring locally cached data, including cache build progress, partition status, and error details. It does not currently support fine-grained cache management operations, but this capability is planned for future releases. If you are interested in this feature, please contact us.
For cache management tasks such as loading, refreshing, or removing cache, use the HTTP APIs listed below.
Loading Data into Local Cache
Two configurations affect data loading:
| Configuration | Default | Description |
|---|---|---|
DATAACCESS_DATA_CACHE_LOADONSCHEMAUPDATE |
true |
Automatically load data into local cache when the schema is uploaded. If partitions are defined, all partitions are loaded. |
DATAACCESS_DATA_CACHE_FALLBACKTODIRECTLOAD |
true |
Fallback to direct load if data is not found in local cache. If false, only cached data is used for queries. |
To load data manually, use the following HTTP APIs.
The refreshLocalCache request payload supports the following fields:
| Field | Required | Description |
|---|---|---|
viewIds |
Yes | List of node or edge label IDs to load into the local cache. |
partitionStartValue |
Yes | Start of the partition range. Must be provided; use an empty string "" for non-partitioned data. |
partitionEndValue |
Yes | End of the partition range. Must be provided; use an empty string "" for non-partitioned data. |
skipCollectAnalytics |
No | When true, skips collecting analytics (e.g. table statistics) during the cache load. This can speed up cache loading when analytics are not needed. Defaults to false. |
Manually Loading Non-Partitioned Data
For nodes or edges without partitioning:
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "", "partitionEndValue": ""}'
Manually Loading Partitioned Data
For partitioned nodes or edges, specify the partition range:
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'
To load multiple date ranges, submit separate requests for each range:
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-02-01 00:00:00", "partitionEndValue": "2025-02-02 00:00:00"}'
Handling Partial Failures When Loading Multiple Elements
Monitor cache data loading progress with:
When submitting multiple viewIds in a single request, PuppyGraph processes each independently. If some fail while others succeed, successful views may temporarily show OTHERS_UNAVAILABLE status. This means at least one sibling view failed, not that the successful caches are invalid.
Key points:
- Retry only the failed views. Do not re-issue requests for views that already succeeded.
- Once failed views succeed in a retry, the
OTHERS_UNAVAILABLEstatus disappears on the next UI refresh.
Example Scenario
Initial request (attempts to load both node_label and edge_label):
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'
Result: node_label succeeds, edge_label fails. Use the API below to check per-view cache build state:
Example response:
{
"items": [
{
"viewId": "node_label",
"name": "node_label",
"state": "OTHERS_UNAVAILABLE",
"progress": "",
"errorCode": "",
"viewType": "VERTEX"
},
{
"viewId": "edge_label",
"name": "edge_label",
"state": "FAILED",
"progress": "",
"errorCode": "",
"errorMessage": "",
"viewType": "EDGE"
}
],
"status": "",
"type": ""
}
node_labelshowsOTHERS_UNAVAILABLEbecauseedge_labelfailed.- Only views with
state=FAILEDneed to be retried.
Retry only the failed edge_label:
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'
After a successful retry, node_label will display SUCCESS status.
Partition Management
To view available partitions in the local cache:
curl --user username:password http://localhost:8081/ui-api/getLocalCachePartitionDisplayInfo?viewId=node_label
recentState: The current state of the most recent cache data loading operation for this partition.
| Value | Description |
|---|---|
PENDING |
Cache data load task submitted but not started |
RUNNING |
Cache data load task in progress |
SUCCESS |
Cache data load task completed successfully |
FAILED |
Cache data load task failed (check errorCode and errorMessage for details) |
recentProgress: Indicates the progress of the most recent data loading operation (e.g.,0%,10%,100%).
To remove a specific partition from local cache storage, get the partition name from the previous result: