Managing Locally Cached Data
PuppyGraph enables local caching of tables to boost query performance and reduce load on original data sources.
Configuring Cache Mode for Graphs
Configuring Local Cache for the Entire Server
Local cache is enabled by default. You can control it via the DATAACCESS_DATA_CACHE_STRATEGY environment variable (valid values: FULL, ADAPTIVE, NEVER).
| Option | Description |
|---|---|
FULL |
Cache all data locally, regardless of access patterns. Default for rapid development. |
ADAPTIVE |
Cache data based on access patterns and heuristics. Suitable for data lakes (Iceberg, Delta Lake, etc). |
NEVER |
Disable local caching; always access the original data source. |
Configuring Local Cache for Specific Nodes and Edges
With ADAPTIVE mode, you can fine-tune caching for individual node and edge types in your graph schema. Add cacheConfig and partitionConfig to the relevant node or edge definitions in your schema.json.
cacheConfig: Sets the caching strategy for each node or edge type, overriding global settings.partitionConfig: Defines how cached data is partitioned and loaded, improving performance for large datasets via partition pruning and parallel loading.
Cache config example
Specify a caching strategy for a node or edge type using the cacheConfig property:
| Cache Strategy | Description |
|---|---|
FULL |
Always cache this node or edge type locally. |
DEFAULT |
Use the global cache strategy defined at the system level. |
Partition config example
partitionKey: Column used for partitioning cached data.partitionTimeUnit: Time unit for partitioning.
| Partition Time Unit | Description |
|---|---|
YEAR |
Partition by year |
MONTH |
Partition by month |
DAY |
Partition by day |
HOUR |
Partition by hour |
Managing Locally Cached Graph Data
Monitoring and Managing Local Cache via Web UI and HTTP APIs
The Settings page in the PuppyGraph Web UI provides a dashboard for monitoring locally cached data and lets you refresh the cache for individual nodes or edges, or refresh the entire cache. For advanced partition management operations such as loading a specific partition range or dropping individual partitions, use the HTTP APIs listed below.
Loading Data into Local Cache
Two configurations affect data loading:
| Configuration | Default | Description |
|---|---|---|
DATAACCESS_DATA_CACHE_LOADONSCHEMAUPDATE |
true |
Automatically load data into local cache when the schema is uploaded. If partitions are defined, all partitions are loaded. |
DATAACCESS_DATA_CACHE_FALLBACKTODIRECTLOAD |
true |
Fallback to direct load if data is not found in local cache. If false, only cached data is used for queries. |
To load or refresh specific partition ranges, use the following HTTP APIs.
The refreshLocalCache request payload supports the following fields:
| Field | Required | Description |
|---|---|---|
viewIds |
Yes | List of node or edge label IDs to load into the local cache. |
partitionStartValue |
Yes | Start of the partition range. Must be provided; use an empty string "" for non-partitioned data. |
partitionEndValue |
Yes | End of the partition range. Must be provided; use an empty string "" for non-partitioned data. |
skipCollectAnalytics |
No | When true, skips collecting analytics (e.g. table statistics) during the cache load. This can speed up cache loading when analytics are not needed. Defaults to false. |
Manually Loading Non-Partitioned Data
For nodes or edges without partitioning:
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "", "partitionEndValue": ""}'
Manually Loading Partitioned Data
For partitioned nodes or edges, specify the partition range:
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'
To load multiple date ranges, submit separate requests for each range:
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-02-01 00:00:00", "partitionEndValue": "2025-02-02 00:00:00"}'
Handling Partial Failures When Loading Multiple Elements
Monitor cache data loading progress with:
When submitting multiple viewIds in a single request, PuppyGraph processes each independently. If some fail while others succeed, successful views may temporarily show OTHERS_UNAVAILABLE status. This means at least one sibling view failed, not that the successful caches are invalid.
Key points:
- Retry only the failed views. Do not re-issue requests for views that already succeeded.
- Once failed views succeed in a retry, the
OTHERS_UNAVAILABLEstatus disappears on the next UI refresh.
Example Scenario
Initial request (attempts to load both node_label and edge_label):
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["node_label", "edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'
Result: node_label succeeds, edge_label fails. Use the API below to check per-view cache build state:
Example response:
{
"items": [
{
"viewId": "node_label",
"name": "node_label",
"state": "OTHERS_UNAVAILABLE",
"progress": "",
"errorCode": "",
"viewType": "VERTEX"
},
{
"viewId": "edge_label",
"name": "edge_label",
"state": "FAILED",
"progress": "",
"errorCode": "",
"errorMessage": "",
"viewType": "EDGE"
}
],
"status": "",
"type": ""
}
node_labelshowsOTHERS_UNAVAILABLEbecauseedge_labelfailed.- Only views with
state=FAILEDneed to be retried.
Retry only the failed edge_label:
curl --user username:password -X POST http://localhost:8081/ui-api/refreshLocalCache \
-d '{"viewIds": ["edge_label"], "partitionStartValue": "2025-01-01 00:00:00", "partitionEndValue": "2025-01-02 00:00:00"}'
After a successful retry, node_label will display SUCCESS status.
Partition Management
To view available partitions in the local cache:
curl --user username:password http://localhost:8081/ui-api/getLocalCachePartitionDisplayInfo?viewId=node_label
recentState: The current state of the most recent cache data loading operation for this partition.
| Value | Description |
|---|---|
PENDING |
Cache data load task submitted but not started |
RUNNING |
Cache data load task in progress |
SUCCESS |
Cache data load task completed successfully |
FAILED |
Cache data load task failed (check errorCode and errorMessage for details) |
recentProgress: Indicates the progress of the most recent data loading operation (e.g.,0%,10%,100%).
To remove a specific partition from local cache storage, get the partition name from the previous result:
curl --user username:password -X POST http://localhost:8081/ui-api/dropLocalCachePartition -d '{"viewId": "node_label", "partitionName": "p20250101"}'
Local Cache Status Reference
The Current local cache status value is displayed on the Settings page in the PuppyGraph Web UI. It reflects the overall state of the local cache system.
Normal States
| Status | Description |
|---|---|
NOOP |
Local cache is not enabled. PuppyGraph uses direct data source access for all queries. |
INIT |
The local cache system is initializing at startup, before the persisted state has been loaded. |
IN_PROGRESS |
An intermediate transitional state encountered during status updates. |
PENDING_LOAD |
The local cache has been set up but data loading has not yet started. This occurs when a schema is uploaded with DATAACCESS_DATA_CACHE_LOADONSCHEMAUPDATE=false. Trigger loading manually from the Settings page or via the refreshLocalCache API. |
LOADING_DATA |
Data is actively being loaded into the local cache for the first time. This is triggered automatically after schema upload when DATAACCESS_DATA_CACHE_LOADONSCHEMAUPDATE=true. Subsequent manual refreshes produce REFRESHING_DATA instead. |
REFRESHING_DATA |
Data is actively being refreshed (reloaded). Analytics collection will run automatically after the refresh completes. |
REFRESHING_DATA_NO_ANALYTICS |
Data is actively being refreshed without a subsequent analytics collection step. |
START_COLLECT_ANALYTICS |
Data loading or refresh has completed and analytics collection (statistics) is about to begin. |
COLLECTING_ANALYTICS |
Statistics are being collected to improve query planning performance. This is a normal part of the load/refresh workflow. |
READY |
The local cache is fully loaded and ready to serve queries. This is the normal operational state. |
Error States
The following statuses indicate an error condition. UNAVAILABLE and DATA_LOADING_ERROR require manual intervention to recover. STATUS_RETRIEVAL_ERROR is potentially transient and may resolve on its own.
| Status | Description | How to Diagnose / Recover |
|---|---|---|
UNAVAILABLE |
Local cache creation failed, typically due to a connectivity or configuration problem. | Check PuppyGraph logs for the root cause. Fix the underlying issue and re-upload the schema to recreate the cache. |
STATUS_RETRIEVAL_ERROR |
PuppyGraph could not read its internal cache state. This is often transient. | Check PuppyGraph logs for the root cause. The Settings page polls automatically and will update on the next cycle. |
DATA_LOADING_ERROR |
Data loading failed after exhausting all retries. | View the cache detail table to inspect error codes and messages. Fix the underlying data source or connectivity issue, then retry from the Settings page or via the refreshLocalCache API. |
Note: When the PuppyGraph server cannot be reached, the
Settingspage will displayUnknown. This is a connectivity issue, not a cache state.