Working with SCD Type 2 Tables
If your data source is an SCD (Slowly Changing Dimensions) Type 2 table, you can add special settings in PuppyGraph's schema to make it easier to query data at specific points in time using Cypher.
Setting Up SCD2 Information in Schema
SCD2 Configuration
Before adding SCD2 settings, make sure the relevant columns are already mapped as attributes in the schema.
In your PuppyGraph vertex or edge definition, include an scd2Config section like this:
"scd2Config": {
"effectiveTime": "start_time",
"expirationTime": "end_time",
"currentExpirationTimeValue": "9999-12-31",
"currentFlag": "is_current",
"currentFlagValue": "Y"
}
Here's what each setting means:
- effectiveTime: The attribute alias for the column that stores when the record becomes active
- expirationTime: The attribute alias for the column that stores when the record expires
- currentExpirationTimeValue: The value that indicates a record is currently active in the expiration time column (omit this field if the value is
NULL) - currentFlag: The attribute alias for the column that marks whether a record is current
- currentFlagValue: The value that indicates a record is currently active
Note: If your expiration time column uses the string
'null'to represent current records, set"currentExpirationTimeValue": "null"
You can use these configuration combinations:
- Time range only: Set
effectiveTimeandexpirationTime(pluscurrentExpirationTimeValueif needed). Both time fields must be the same data type. - Current flag only: Set
currentFlag(pluscurrentFlagValueif needed). - Both time range and current flag: Use all relevant settings.
Complete Schema Example
Here's a full example showing how SCD2 configuration fits into vertex and edge definitions:
Suppose the data source table definitions are:
CREATE TABLE modern.person (
id String,
name String,
age Int,
_start TimeStamp,
_end TimeStamp,
is_current Boolean
);
CREATE TABLE modern.knows (
id String,
from_id String,
to_id String,
weight Double,
_start TimeStamp,
_end TimeStamp,
is_current_str String
)
We can create PuppyGraph schema with SCD2 config:
{
"catalogs": [
{
"name": "my_catalog",
...
}
],
"graph": {
"vertices": [
{
"label": "person",
"oneToOne": {
"tableSource": {
"catalog": "my_catalog",
"schema": "modern",
"table": "person"
},
"id": {
"fields": [
{
"type": "String",
"field": "id",
"alias": "person_id"
}
]
},
"attributes": [
{
"type": "String",
"field": "name",
"alias": "person_name"
},
{
"type": "Int",
"field": "age",
"alias": "person_age"
},
{
"type": "DateTime",
"field": "_start",
"alias": "start_time"
},
{
"type": "DateTime",
"field": "_end",
"alias": "end_time"
},
{
"type": "Boolean",
"field": "is_current",
"alias": "current_flag"
}
]
},
"scd2Config": {
"effectiveTime": "start_time",
"expirationTime": "end_time",
"currentExpirationTimeValue": "null",
"currentFlag": "current_flag"
}
}
],
"edges": [
{
"label": "knows",
"fromVertex": "person",
"toVertex": "person",
"tableSource": {
"catalog": "my_catalog",
"schema": "modern",
"table": "knows"
},
"id": {
"fields": [
{
"type": "String",
"field": "id",
"alias": "knows_id"
}
]
},
"fromId": {
"fields": [
{
"type": "String",
"field": "from_id",
"alias": "from_id"
}
]
},
"toId": {
"fields": [
{
"type": "String",
"field": "to_id",
"alias": "to_id"
}
]
},
"attributes": [
{
"type": "Double",
"field": "weight",
"alias": "knows_weight"
},
{
"type": "DateTime",
"field": "_start",
"alias": "start_time"
},
{
"type": "DateTime",
"field": "_end",
"alias": "end_time"
},
{
"type": "String",
"field": "is_current_str",
"alias": "current_flag_str"
}
],
"scd2Config": {
"effectiveTime": "start_time",
"expirationTime": "end_time",
"currentExpirationTimeValue": "null",
"currentFlag": "current_flag_str",
"currentFlagValue": "Y"
}
}
]
}
}
Querying SCD2 Tables with Cypher
To query data at a specific point in time, use the SnapShotTime option in your query (not available if configured as current flag only):
For current data, use the keyword 'current':
Format guidelines:
- If your time columns use
DATEorDATETIMEtypes, provide dates asyyyy-MM-ddoryyyy-MM-dd HH:mm:ss - If your time columns use numeric types (like timestamps), provide the numeric value directly
- Type conversion is not automatic, and type casting is not allowed here either, so match the format to your column's data type