Skip to content

Working with SCD Type 2 Tables

If your data source is an SCD (Slowly Changing Dimensions) Type 2 table, you can add special settings in PuppyGraph's schema to make it easier to query data at specific points in time using Cypher.

Setting Up SCD2 Information in Schema

SCD2 Configuration

Before adding SCD2 settings, make sure the relevant columns are already mapped as attributes in the schema.

In your PuppyGraph vertex or edge definition, include an scd2Config section like this:

"scd2Config": {
    "effectiveTime": "start_time",
    "expirationTime": "end_time",
    "currentExpirationTimeValue": "9999-12-31",
    "currentFlag": "is_current",
    "currentFlagValue": "Y"
}

Here's what each setting means:

  • effectiveTime: The attribute alias for the column that stores when the record becomes active
  • expirationTime: The attribute alias for the column that stores when the record expires
  • currentExpirationTimeValue: The value that indicates a record is currently active in the expiration time column (omit this field if the value is NULL)
  • currentFlag: The attribute alias for the column that marks whether a record is current
  • currentFlagValue: The value that indicates a record is currently active

Note: If your expiration time column uses the string 'null' to represent current records, set "currentExpirationTimeValue": "null"

You can use these configuration combinations:

  1. Time range only: Set effectiveTime and expirationTime (plus currentExpirationTimeValue if needed). Both time fields must be the same data type.
  2. Current flag only: Set currentFlag (plus currentFlagValue if needed).
  3. Both time range and current flag: Use all relevant settings.

Complete Schema Example

Here's a full example showing how SCD2 configuration fits into vertex and edge definitions:

Suppose the data source table definitions are:

CREATE TABLE modern.person (
  id          String,
  name        String,
  age         Int,
  _start      TimeStamp,
  _end        TimeStamp,
  is_current  Boolean
);

CREATE TABLE modern.knows (
  id              String,
  from_id         String,
  to_id           String,
  weight          Double,
  _start          TimeStamp,
  _end            TimeStamp,
  is_current_str  String
)

We can create PuppyGraph schema with SCD2 config:

{
  "catalogs": [
    {
      "name": "my_catalog",
      ...
    }
  ],
  "graph": {
    "vertices": [
      {
        "label": "person",
        "oneToOne": {
          "tableSource": {
            "catalog": "my_catalog",
            "schema": "modern",
            "table": "person"
          },
          "id": {
            "fields": [
              {
                "type": "String",
                "field": "id",
                "alias": "person_id"
              }
            ]
          },
          "attributes": [
            {
              "type": "String",
              "field": "name",
              "alias": "person_name"
            },
            {
              "type": "Int",
              "field": "age",
              "alias": "person_age"
            },
            {
              "type": "DateTime",
              "field": "_start",
              "alias": "start_time"
            },
            {
              "type": "DateTime",
              "field": "_end",
              "alias": "end_time"
            },
            {
              "type": "Boolean",
              "field": "is_current",
              "alias": "current_flag"
            }
          ]
        },
        "scd2Config": {
          "effectiveTime": "start_time",
          "expirationTime": "end_time",
          "currentExpirationTimeValue": "null",
          "currentFlag": "current_flag"
        }
      }
    ],
    "edges": [
      {
        "label": "knows",
        "fromVertex": "person",
        "toVertex": "person",
        "tableSource": {
          "catalog": "my_catalog",
          "schema": "modern",
          "table": "knows"
        },
        "id": {
          "fields": [
            {
              "type": "String",
              "field": "id",
              "alias": "knows_id"
            }
          ]
        },
        "fromId": {
          "fields": [
            {
              "type": "String",
              "field": "from_id",
              "alias": "from_id"
            }
          ]
        },
        "toId": {
          "fields": [
            {
              "type": "String",
              "field": "to_id",
              "alias": "to_id"
            }
          ]
        },
        "attributes": [
          {
            "type": "Double",
            "field": "weight",
            "alias": "knows_weight"
          },
          {
            "type": "DateTime",
            "field": "_start",
            "alias": "start_time"
          },
          {
            "type": "DateTime",
            "field": "_end",
            "alias": "end_time"
          },
          {
            "type": "String",
            "field": "is_current_str",
            "alias": "current_flag_str"
          }
        ],
        "scd2Config": {
          "effectiveTime": "start_time",
          "expirationTime": "end_time",
          "currentExpirationTimeValue": "null",
          "currentFlag": "current_flag_str",
          "currentFlagValue": "Y"
        }
      }
    ]
  }
}

Querying SCD2 Tables with Cypher

To query data at a specific point in time, use the SnapShotTime option in your query (not available if configured as current flag only):

USING {SnapShotTime: '2025-01-01'} MATCH (n) RETURN n 

For current data, use the keyword 'current':

USING {SnapShotTime: 'current'} MATCH (n) RETURN n

Format guidelines:

  • If your time columns use DATE or DATETIME types, provide dates as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss
  • If your time columns use numeric types (like timestamps), provide the numeric value directly
  • Type conversion is not automatic, and type casting is not allowed here either, so match the format to your column's data type