Building a Graph

Building a graph in PuppyGraph means deciding how your tables map onto a graph: which tables become nodes, which tables become edges, how identifiers link them together, and which columns travel along as properties.

This page walks through that mapping using the TinkerPop modern graph as a running example. Each step shows both the Schema Builder UI flow and the JSON it produces.

For where the rows behind a node or edge come from in the first place (external catalogs, locally cached tables, or unions of multiple tables), see Data Sources and Local Tables.

Launching the Schema Builder

When you first open PuppyGraph, the Schema page shows a welcome screen. Create a catalog connection to point at your data source, then use the toolbar's Add Node and Add Edge buttons (or Highlight Catalog Panel on the empty canvas) to start building. The screenshots below come from the Schema Builder tutorial; the dialogs are the same regardless of catalog or table names.

From a table to a node

A node is a typed entity in your graph. Building one means picking a source table, naming the type, and deciding which role each column plays.

Start from a table

Suppose your postgres_data PostgreSQL catalog has a person table from the modern graph:

id	name	age
v1	marko	29
v2	vadas	27
v4	josh	32
v6	peter	35

Schema BuilderJSON

Click Add Node in the toolbar. The Select Table for Node dialog opens. Expand the catalog tree, pick person, and click Next.

The source is referenced through the node's dataSourceGroup:

"dataSourceGroup": {
  "externalDataSource": {
    "enabled": true,
    "catalog": "postgres_data",
    "schema": "modern",
    "table": "person"
  }
}

Name the type

Pick a label and optionally a few Cypher type aliases:

Setting	Value	Notes
Label	`Person`	Used in queries (`MATCH (p:Person)`) and as the target of edge `fromNodeLabel` and `toNodeLabel`.
Type aliases	`person`, `human`	Optional. Cypher queries can also match the node by these names.

Schema BuilderJSON

The wizard's first step has a Node Label field at the top, defaulting to the table name, with an optional Type Aliases field below.

Add Node wizard, freshly opened. Node Label and Type Aliases sit at the top, columns under Attribute Columns.

"label": "Person",
"type":  ["person", "human"]

Map columns to node roles

For a node, each column plays one of these roles:

Role	Description
Identifier	Uniquely identifies a node within the type. The combination of identifier columns must be unique. PuppyGraph addresses each node as `Type[id1, id2, ...]`. Add the same column under attributes if users need to query it as a property.
Attribute	Exposed only as a node property in queries (`p.name`, `p.age`).
Unused	Not represented in the graph.

For the person table, a typical mapping is:

Column	Role	Graph type
`id`	Identifier	`STRING`
`name`	Attribute	`STRING`
`age`	Attribute	`INT`

Schema BuilderJSON

Click Add to ID to move a column into ID Columns and pick from the dropdown. You can also drag a row by its handle between the ID Columns, Attribute Columns, and Unused Columns sections, or use the per-row trash icon to push a column out of its current section. Rename or retype any column inline through the row's three-dot menu.

Column configuration with `id` assigned to ID Columns

"id":        [{ "name": "id",   "type": "STRING" }],
"attribute": [
  { "name": "name", "type": "STRING" },
  { "name": "age",  "type": "INT" }
]

Optionally cache the data locally

For external catalog tables, the Schema Builder offers a second wizard step asking whether to mirror the data into a local table for query-time performance.

Schema BuilderJSON

Toggle Enable Local Replication to mirror the source table into a PuppyGraph-managed local table. Leave it off to read directly from the external source at query time.

Caching is configured through the localDataSource block on the same dataSourceGroup. See Data Sources and Local Tables.

From a table to an edge

An edge is a typed, directional relationship between two nodes. The mapping extends the node walkthrough with two more decisions: which node types it connects, and which columns hold the foreign keys to those nodes.

Start from a table

Suppose your postgres_data catalog also has a knows table from the modern graph:

id	from_id	to_id	weight
e7	v1	v2	0.5
e8	v1	v4	1.0

Schema BuilderJSON

Click Add Edge in the toolbar, pick knows from the catalog tree, and click Next.

Select Table for Edge dialog with `knows` highlighted

"dataSourceGroup": {
  "externalDataSource": {
    "enabled": true,
    "catalog": "postgres_data",
    "schema": "modern",
    "table": "knows"
  }
}

Connect the node types

Specify which node types this edge connects, and what to call the edge type:

Setting	Value	Notes
Label	`KNOWS`	Edge type name.
From node label	`Person`	Label of the source node type.
To node label	`Person`	Label of the target node type.
Type aliases	(optional)	Same as for nodes.

Schema BuilderJSON

The edge wizard adds From Node and To Node dropdowns alongside the Edge Label field. Picking values reveals matching FROM: and TO: sections lower down, each showing the chosen node's identifier (e.g. id (STRING)) with a Select Column dropdown.

Edge wizard after From Node and To Node are picked. FROM: and TO: sections appear with empty Select Column dropdowns.

"label":         "KNOWS",
"fromNodeLabel": "Person",
"toNodeLabel":   "Person"

Map columns to edge roles

Edges add two roles to the ones nodes use:

Role	Description
Identifier	Uniquely identifies an edge. Optional. If omitted, PuppyGraph generates one automatically. Add the same column under attributes if users need to query it as a property.
From key	Foreign key whose values match the source node's identifier. The column is also added under `attribute[]` below so it stays queryable as an edge property.
To key	Foreign key whose values match the target node's identifier. Like From key, also added under `attribute[]` so it stays queryable.
Attribute	Exposed as an edge property in queries.
Unused	Not represented in the graph.

For the knows table:

Column	Role	Graph type
`id`	Identifier	`STRING`
`from_id`	From key (also added as Attribute)	`STRING`
`to_id`	To key (also added as Attribute)	`STRING`
`weight`	Attribute	`DOUBLE`

The number, order, and types of from-key columns must match the source node's identifier columns. Same for to-key and the target.

Schema BuilderJSON

Use the FROM: and TO: sections to pick the source columns that match each side's identifier, one Select Column dropdown per identifier column. Move the edge identifier into Edge ID Columns with Add to ID (or toggle Enable ID off to let PuppyGraph generate one). Remaining columns stay under Attribute Columns as edge properties.

Edge configuration with From / To Node selected, FROM / TO Select Columns mapped, and an explicit edge ID

"id":        [{ "name": "id",      "type": "STRING" }],
"fromKey":   [{ "name": "from_id", "type": "STRING" }],
"toKey":     [{ "name": "to_id",   "type": "STRING" }],
"attribute": [
  { "name": "from_id", "type": "STRING" },
  { "name": "to_id",   "type": "STRING" },
  { "name": "weight",  "type": "DOUBLE" }
]

Foreign-key columns appear in both fromKey / toKey and attribute so they remain queryable as edge properties.

Combining multiple tables for one type

You can map multiple tables onto a single node or edge type. This is useful when:

A single conceptual node type appears as foreign keys across several tables but has no dedicated table of its own.
Data for one node type is partitioned across multiple sources you want to query as one.

In the Schema Builder, click Add Source Table (Create Union Node) from the column-configuration step to add another source table to the same type. In JSON, this is configured through unionDataSource on the dataSourceGroup, with an optional dedupKey[] to deduplicate rows that appear in more than one input. See Data Sources and Local Tables for the details.

For example, suppose a Stack Overflow-style dataset has user references in both posts_flat and badges_flat:

posts_flatbadges_flat

post_id	post_type	owner_user_id	owner_display_name
q1	question	u1	Ada
a1	answer	u2	Byron

user_id	display_name	badge_name
u2	Byron	Teacher
u3	Cora	Editor

Model both sources as one User node type:

{
  "label": "User",
  "id": [{ "name": "user_id", "type": "STRING" }],
  "attribute": [{ "name": "display_name", "type": "STRING" }],
  "dataSourceGroup": {
    "unionDataSource": {
      "enabled": true,
      "input": [
        {
          "catalog": "stackoverflow_data",
          "schema": "stackoverflow",
          "table": "posts_flat",
          "mappedField": [
            { "sourceFieldName": "owner_user_id", "targetFieldName": "user_id" },
            {
              "sourceFieldName": "owner_display_name",
              "targetFieldName": "display_name",
              "aggregation": "AGG_ANY_VALUE"
            }
          ]
        },
        {
          "catalog": "stackoverflow_data",
          "schema": "stackoverflow",
          "table": "badges_flat",
          "mappedField": [
            { "sourceFieldName": "user_id", "targetFieldName": "user_id" },
            {
              "sourceFieldName": "display_name",
              "targetFieldName": "display_name",
              "aggregation": "AGG_ANY_VALUE"
            }
          ]
        }
      ],
      "dedupKey": ["user_id"]
    }
  }
}

This schema produces three User nodes:

User["u1"] {display_name: "Ada"}.
User["u2"] {display_name: "Byron"}.
User["u3"] {display_name: "Cora"}.

The repeated u2 value is deduplicated because user_id is the dedupKey.

Modeling denormalized tables

Denormalized source tables often store several graph concepts in one physical table. A flattened parent row can contain parent properties, copied dimension fields, repeated values, or nested child records. This moves some relationship work upstream into the source table so graph queries can avoid repeatedly reconstructing the same joins at query time.

Schema JSON required

The Web UI for modeling denormalized tables is not ready yet. Author these patterns by editing the graph schema JSON and uploading or importing the updated schema.

Schema fields for denormalized tables

For denormalized tables, the main tools are:

Field	Use it when
`mappedField`	The source column name differs from the graph field name, or a graph field needs a simple derived value.
`whereClause`	Only some rows in the source table should become a node or edge type.
`unnest`	One source row contains an array or repeated field and should produce one graph row per element.
`ordinalityAlias`	A generated 1-based position from an `unnest` block is needed, usually to build a stable child ID.
`unionDataSource` with `dedupKey[]`	A node type is derived from distinct values in one or more source tables instead of a dedicated dimension table.
`enableNullFilter`	A node ID or edge endpoint key can be null and should be filtered automatically.

mappedField[] maps source fields to graph fields:

"mappedField": [
  { "sourceFieldName": "user_id", "targetFieldName": "id" },
  { "sourceFieldName": "user_country", "targetFieldName": "country" }
]

Identifier fields define graph identity. If users also need to return or filter by an identifier as a graph property, add the same field to attribute[] and map it in mappedField[].

A mappedField[] entry can use sourceExpression instead of sourceFieldName for a simple scalar derived value, such as a generated ID. Prefer sourceFieldName when a source column can be mapped directly.

For unionDataSource, dedupKey[] uses target field names. Non-key fields can specify an aggregation such as AGG_MAX, AGG_MIN, AGG_ANY_VALUE, AGG_SUM, AGG_COUNT, AGG_COUNT_DISTINCT, AGG_ARG_MAX, or AGG_ARG_MIN. AGG_ARG_MAX and AGG_ARG_MIN require argOrderBy.

Use whereClause as a filter expression. Start with comparisons against literal values and combine them with boolean operators only when the graph model needs it.

Recommended starter syntax:

Column identifiers such as entity_type, created, or ranked.
Literal values: strings in single quotes, numbers, TRUE, and FALSE.
Comparisons between a column and a literal: =, !=, <>, <, <=, >, and >=.
Boolean operators: AND, OR, and NOT.
Parentheses for grouping boolean expressions.

Examples:

post_type = 'question'
score >= 10
post_type = 'question' OR post_type = 'answer'
(post_type = 'question' AND score >= 10) OR owner_user_id = 'u1'

Do not use statement separators, comments, subqueries, joins, data-definition or data-modification statements, set operations, function calls, or arithmetic in whereClause. If a filter needs more complex logic, create a source-side view that exposes the needed filtered rows as a table.

unnest turns an array or repeated field into one graph row per element:

"unnest": [
  {
    "arrayColumn": [
      { "sourceColumn": "tags", "alias": "tag" }
    ]
  }
]

For aligned arrays, put all columns in one unnest block:

"unnest": [
  {
    "arrayColumn": [
      { "sourceColumn": "answer_ids", "alias": "answer_id" },
      { "sourceColumn": "answer_scores", "alias": "answer_score" }
    ]
  }
]

Only one unnest block is supported per data source. Put multiple aligned arrays inside that one block.

When a child record does not have a natural ID in the source data, add ordinalityAlias to expose the element position from the unnest block. The generated position starts at 1 and can be combined with a parent ID to build a stable synthetic child ID.

The patterns below use one Stack Overflow-style graph with User, Post, Question, Answer, and Tag nodes. The first pattern models posts as a general row-level Post node; later patterns split post rows into Question and Answer nodes. The source tables are simplified samples based on common public Stack Overflow tables such as posts, users, badges, and post links.

Wide fact table

Use this pattern when one flattened table contains an entity, copied dimension fields, and the keys needed to create relationships. For example, posts_flat stores post fields and copied owner fields:

post_id	post_type	title	owner_user_id	owner_display_name
q1	question	How to model graphs?	u1	Ada
a1	answer		u2	Byron

Model row-level posts with externalDataSource. Each source row becomes one graph element:

{
  "label": "Post",
  "id": [{ "name": "post_id", "type": "STRING" }],
  "attribute": [
    { "name": "post_type", "type": "STRING" },
    { "name": "title", "type": "STRING" }
  ],
  "dataSourceGroup": {
    "externalDataSource": {
      "enabled": true,
      "catalog": "stackoverflow_data",
      "schema": "stackoverflow",
      "table": "posts_flat",
      "mappedField": [
        { "sourceFieldName": "post_id", "targetFieldName": "post_id" },
        { "sourceFieldName": "post_type", "targetFieldName": "post_type" },
        { "sourceFieldName": "title", "targetFieldName": "title" }
      ]
    }
  }
}

The copied owner fields can also derive User nodes from distinct source values:

{
  "label": "User",
  "id": [{ "name": "user_id", "type": "STRING" }],
  "attribute": [{ "name": "display_name", "type": "STRING" }],
  "dataSourceGroup": {
    "unionDataSource": {
      "enabled": true,
      "input": [
        {
          "catalog": "stackoverflow_data",
          "schema": "stackoverflow",
          "table": "posts_flat",
          "mappedField": [
            { "sourceFieldName": "owner_user_id", "targetFieldName": "user_id" },
            {
              "sourceFieldName": "owner_display_name",
              "targetFieldName": "display_name",
              "aggregation": "AGG_ANY_VALUE"
            }
          ]
        }
      ],
      "dedupKey": ["user_id"]
    }
  }
}

Edges can use the same physical source table and map endpoint keys:

{
  "label": "POSTED",
  "fromNodeLabel": "User",
  "toNodeLabel": "Post",
  "fromKey": [{ "name": "user_id", "type": "STRING" }],
  "toKey": [{ "name": "post_id", "type": "STRING" }],
  "attribute": [
    { "name": "user_id", "type": "STRING" },
    { "name": "post_id", "type": "STRING" },
    { "name": "post_type", "type": "STRING" }
  ],
  "dataSourceGroup": {
    "externalDataSource": {
      "enabled": true,
      "catalog": "stackoverflow_data",
      "schema": "stackoverflow",
      "table": "posts_flat",
      "mappedField": [
        { "sourceFieldName": "owner_user_id", "targetFieldName": "user_id" },
        { "sourceFieldName": "post_id", "targetFieldName": "post_id" },
        { "sourceFieldName": "post_type", "targetFieldName": "post_type" }
      ]
    }
  }
}

With the sample rows and snippets above, the graph contains:

Nodes created:

Node	Properties
`User["u1"]`	`display_name: "Ada"`
`User["u2"]`	`display_name: "Byron"`
`Post["q1"]`	`post_type: "question"`, `title: "How to model graphs?"`
`Post["a1"]`	`post_type: "answer"`, `title: ""`

Edges created:

Edge	Properties
`User["u1"] -[:POSTED]-> Post["q1"]`	`user_id: "u1"`, `post_id: "q1"`, `post_type: "question"`
`User["u2"] -[:POSTED]-> Post["a1"]`	`user_id: "u2"`, `post_id: "a1"`, `post_type: "answer"`

Array many-to-many

Use this pattern when a repeated field represents multiple target nodes or edges. In Stack Overflow data, a question can have multiple tags:

post_id	post_type	title	tags
q1	question	How to model graphs?	`[graph, schema]`
q2	question	How to tune queries?	`[query, graph]`

This pattern works best for bounded repeated fields. If one parent row can contain thousands of related entries, consider whether the relationship should stay in a separate source table instead of being modeled from an array on the parent row.

The question row is modeled as a Question node:

{
  "label": "Question",
  "id": [{ "name": "post_id", "type": "STRING" }],
  "attribute": [{ "name": "title", "type": "STRING" }],
  "dataSourceGroup": {
    "externalDataSource": {
      "enabled": true,
      "catalog": "stackoverflow_data",
      "schema": "stackoverflow",
      "table": "posts_flat",
      "whereClause": "post_type = 'question'"
    }
  }
}

If tags are only represented by array values, derive them with unionDataSource, unnest, and dedupKey[]:

{
  "label": "Tag",
  "id": [{ "name": "tag", "type": "STRING" }],
  "dataSourceGroup": {
    "unionDataSource": {
      "enabled": true,
      "input": [
        {
          "catalog": "stackoverflow_data",
          "schema": "stackoverflow",
          "table": "posts_flat",
          "whereClause": "post_type = 'question'",
          "unnest": [
            {
              "arrayColumn": [
                { "sourceColumn": "tags", "alias": "tag" }
              ]
            }
          ],
          "mappedField": [
            { "sourceFieldName": "tag", "targetFieldName": "tag" }
          ]
        }
      ],
      "dedupKey": ["tag"]
    }
  }
}

Use the same unnest block on the edge:

{
  "label": "TAGGED_WITH",
  "fromNodeLabel": "Question",
  "toNodeLabel": "Tag",
  "fromKey": [{ "name": "post_id", "type": "STRING" }],
  "toKey": [{ "name": "tag", "type": "STRING" }],
  "attribute": [
    { "name": "post_id", "type": "STRING" },
    { "name": "tag", "type": "STRING" }
  ],
  "dataSourceGroup": {
    "externalDataSource": {
      "enabled": true,
      "catalog": "stackoverflow_data",
      "schema": "stackoverflow",
      "table": "posts_flat",
      "whereClause": "post_type = 'question'",
      "unnest": [
        {
          "arrayColumn": [
            { "sourceColumn": "tags", "alias": "tag" }
          ]
        }
      ],
      "mappedField": [
        { "sourceFieldName": "post_id", "targetFieldName": "post_id" },
        { "sourceFieldName": "tag", "targetFieldName": "tag" }
      ]
    }
  }
}

With the sample rows and snippets above, the graph contains:

Nodes created:

Node	Properties
`Question["q1"]`	`title: "How to model graphs?"`
`Question["q2"]`	`title: "How to tune queries?"`
`Tag["graph"]`	none
`Tag["schema"]`	none
`Tag["query"]`	none

Tag graph appears in both source rows but becomes one graph node because tag is the dedupKey.

Edges created:

Edge	Properties
`Question["q1"] -[:TAGGED_WITH]-> Tag["graph"]`	`post_id: "q1"`, `tag: "graph"`
`Question["q1"] -[:TAGGED_WITH]-> Tag["schema"]`	`post_id: "q1"`, `tag: "schema"`
`Question["q2"] -[:TAGGED_WITH]-> Tag["query"]`	`post_id: "q2"`, `tag: "query"`
`Question["q2"] -[:TAGGED_WITH]-> Tag["graph"]`	`post_id: "q2"`, `tag: "graph"`

Nested child records

Use this pattern when a parent row contains aligned arrays or nested repeated fields that represent child entities. For example, a question rollup table can store answer IDs, answer owners, and answer scores on the question parent row:

question_id	title	answer_ids	answer_owner_user_ids	answer_scores
q1	How to model graphs?	`[a1, a2]`	`[u2, u3]`	`[5, 2]`
q2	How to tune queries?	`[a3]`	`[u1]`	`[7]`

Model the parent row as a normal node. For child records stored as aligned arrays, unnest the arrays together so element i from each array produces one child row:

{
  "label": "Answer",
  "id": [{ "name": "answer_id", "type": "STRING" }],
  "attribute": [
    { "name": "owner_user_id", "type": "STRING" },
    { "name": "score", "type": "INT" }
  ],
  "dataSourceGroup": {
    "externalDataSource": {
      "enabled": true,
      "catalog": "stackoverflow_data",
      "schema": "stackoverflow",
      "table": "question_answer_rollup",
      "unnest": [
        {
          "arrayColumn": [
            { "sourceColumn": "answer_ids", "alias": "answer_id" },
            { "sourceColumn": "answer_owner_user_ids", "alias": "answer_owner_user_id" },
            { "sourceColumn": "answer_scores", "alias": "answer_score" }
          ]
        }
      ],
      "mappedField": [
        { "sourceFieldName": "answer_id", "targetFieldName": "answer_id" },
        { "sourceFieldName": "answer_owner_user_id", "targetFieldName": "owner_user_id" },
        { "sourceFieldName": "answer_score", "targetFieldName": "score" }
      ]
    }
  }
}

Edges that point to Answer must use the same unnest aliases:

{
  "label": "HAS_ANSWER",
  "fromNodeLabel": "Question",
  "toNodeLabel": "Answer",
  "fromKey": [{ "name": "question_id", "type": "STRING" }],
  "toKey": [{ "name": "answer_id", "type": "STRING" }],
  "attribute": [
    { "name": "question_id", "type": "STRING" },
    { "name": "answer_id", "type": "STRING" },
    { "name": "score", "type": "INT" }
  ],
  "dataSourceGroup": {
    "externalDataSource": {
      "enabled": true,
      "catalog": "stackoverflow_data",
      "schema": "stackoverflow",
      "table": "question_answer_rollup",
      "unnest": [
        {
          "arrayColumn": [
            { "sourceColumn": "answer_ids", "alias": "answer_id" },
            { "sourceColumn": "answer_scores", "alias": "answer_score" }
          ]
        }
      ],
      "mappedField": [
        { "sourceFieldName": "question_id", "targetFieldName": "question_id" },
        { "sourceFieldName": "answer_id", "targetFieldName": "answer_id" },
        { "sourceFieldName": "answer_score", "targetFieldName": "score" }
      ]
    }
  }
}

With the sample rows and schemas above, the graph contains:

Nodes created:

Node	Properties
`Answer["a1"]`	`owner_user_id: "u2"`, `score: 5`
`Answer["a2"]`	`owner_user_id: "u3"`, `score: 2`
`Answer["a3"]`	`owner_user_id: "u1"`, `score: 7`

Edges created:

Edge	Properties
`Question["q1"] -[:HAS_ANSWER]-> Answer["a1"]`	`question_id: "q1"`, `answer_id: "a1"`, `score: 5`
`Question["q1"] -[:HAS_ANSWER]-> Answer["a2"]`	`question_id: "q1"`, `answer_id: "a2"`, `score: 2`
`Question["q2"] -[:HAS_ANSWER]-> Answer["a3"]`	`question_id: "q2"`, `answer_id: "a3"`, `score: 7`

If the nested child records do not contain answer IDs, use ordinalityAlias to generate a stable child ID from the parent ID and array position:

question_id	title	answer_owner_user_ids	answer_scores
q1	How to model graphs?	`[u2, u3]`	`[5, 2]`
q2	How to tune queries?	`[u1]`	`[7]`

{
  "label": "Answer",
  "id": [{ "name": "answer_id", "type": "STRING" }],
  "attribute": [
    { "name": "owner_user_id", "type": "STRING" },
    { "name": "score", "type": "INT" }
  ],
  "dataSourceGroup": {
    "externalDataSource": {
      "enabled": true,
      "catalog": "stackoverflow_data",
      "schema": "stackoverflow",
      "table": "question_answer_rollup",
      "unnest": [
        {
          "arrayColumn": [
            { "sourceColumn": "answer_owner_user_ids", "alias": "answer_owner_user_id" },
            { "sourceColumn": "answer_scores", "alias": "answer_score" }
          ],
          "ordinalityAlias": "answer_idx"
        }
      ],
      "mappedField": [
        {
          "sourceExpression": "concat(question_id, '-', cast(answer_idx as varchar))",
          "targetFieldName": "answer_id"
        },
        { "sourceFieldName": "answer_owner_user_id", "targetFieldName": "owner_user_id" },
        { "sourceFieldName": "answer_score", "targetFieldName": "score" }
      ]
    }
  }
}

Edges that point to these generated Answer IDs must use the same unnest block and the same ID expression:

{
  "label": "HAS_ANSWER",
  "fromNodeLabel": "Question",
  "toNodeLabel": "Answer",
  "fromKey": [{ "name": "question_id", "type": "STRING" }],
  "toKey": [{ "name": "answer_id", "type": "STRING" }],
  "attribute": [
    { "name": "question_id", "type": "STRING" },
    { "name": "answer_id", "type": "STRING" },
    { "name": "score", "type": "INT" }
  ],
  "dataSourceGroup": {
    "externalDataSource": {
      "enabled": true,
      "catalog": "stackoverflow_data",
      "schema": "stackoverflow",
      "table": "question_answer_rollup",
      "unnest": [
        {
          "arrayColumn": [
            { "sourceColumn": "answer_owner_user_ids", "alias": "answer_owner_user_id" },
            { "sourceColumn": "answer_scores", "alias": "answer_score" }
          ],
          "ordinalityAlias": "answer_idx"
        }
      ],
      "mappedField": [
        { "sourceFieldName": "question_id", "targetFieldName": "question_id" },
        {
          "sourceExpression": "concat(question_id, '-', cast(answer_idx as varchar))",
          "targetFieldName": "answer_id"
        },
        { "sourceFieldName": "answer_score", "targetFieldName": "score" }
      ]
    }
  }
}

With the sample rows above, the graph contains:

Nodes created:

Node	Properties
`Answer["q1-1"]`	`owner_user_id: "u2"`, `score: 5`
`Answer["q1-2"]`	`owner_user_id: "u3"`, `score: 2`
`Answer["q2-1"]`	`owner_user_id: "u1"`, `score: 7`

Edges created:

Edge	Properties
`Question["q1"] -[:HAS_ANSWER]-> Answer["q1-1"]`	`question_id: "q1"`, `answer_id: "q1-1"`, `score: 5`
`Question["q1"] -[:HAS_ANSWER]-> Answer["q1-2"]`	`question_id: "q1"`, `answer_id: "q1-2"`, `score: 2`
`Question["q2"] -[:HAS_ANSWER]-> Answer["q2-1"]`	`question_id: "q2"`, `answer_id: "q2-1"`, `score: 7`

One table with several node labels

Use this pattern when one table stores several entity types and a discriminator column tells you which rows belong to each graph label.

In the same Stack Overflow-style graph, posts_flat can store both questions and answers:

post_id	post_type	title	parent_question_id	owner_user_id
q1	question	How to model graphs?		u1
a1	answer		q1	u2
a2	answer		q1	u3

Create one node type for each graph label you want to expose:

{
  "label": "Question",
  "dataSourceGroup": {
    "externalDataSource": {
      "enabled": true,
      "catalog": "stackoverflow_data",
      "schema": "stackoverflow",
      "table": "posts_flat",
      "whereClause": "post_type = 'question'"
    }
  },
  "id": [{ "name": "post_id", "type": "STRING" }],
  "attribute": [
    { "name": "post_id", "type": "STRING" },
    { "name": "title", "type": "STRING" },
    { "name": "owner_user_id", "type": "STRING" }
  ]
}

Create a sibling Answer node with whereClause: "post_type = 'answer'", the same post_id identifier, and answer-specific properties such as parent_question_id and owner_user_id.

Graph identity is scoped by node label and identifier. For example, Question["q1"] and Answer["q1"] are different graph nodes even if they read from the same table and use the same source identifier value. Use disjoint whereClause filters when each source row should appear as only one graph label.

Users can then query each category by graph label:

MATCH (q:Question)
RETURN q.post_id, q.title
LIMIT 25

With the sample rows and sibling node entries described above, the graph contains:

Nodes created:

Node	Properties
`Question["q1"]`	`post_id: "q1"`, `title: "How to model graphs?"`, `owner_user_id: "u1"`
`Answer["a1"]`	`parent_question_id: "q1"`, `owner_user_id: "u2"`
`Answer["a2"]`	`parent_question_id: "q1"`, `owner_user_id: "u3"`

Each node reads from the same physical table, but each label only sees rows selected by its own whereClause.

Use one node type with the discriminator as a property when all rows have the same graph behavior and users only need to filter by category in queries.

Column types

Identifier and property columns share a small JSON shape: a name, a type, and an optional auto-increment flag.

Type	Description
`STRING`	UTF-8 string.
`INT`	32-bit signed integer.
`BIGINT`	64-bit signed integer.
`DOUBLE`	64-bit IEEE 754 floating point.
`BOOLEAN`	True or false.
`DATE`	Calendar date with no time component.
`DATETIME`	Date and time.

The auto_increment flag is used on local-table columns that should be auto-generated, typically as default edge IDs.