Skip to content

Schema

PuppyGraph utilizes schemas to model graphs on top of data sources.

A graph schema in PuppyGraph defines how data sources are connected and structured. It specifies the types of nodes (vertices) and edges in the graph, along with their associated properties.

Data Sources

A schema contains one or more data source connections. Each data source connection is defined as a catalog.

A catalog specifies the connection parameters and authentication details required to access the data source. It also includes an internal name used for referencing in graph modeling.

Graph Modeling

PuppyGraph follows a labeled property graph model, built around two core concepts: nodes (vertices) and edges.

During the graph modeling process, PuppyGraph maps tables from data sources into these two components, defining their structure and relationships.

Modeling Nodes

Nodes (also known as vertices) are the fundamental units of which graphs are formed. Think of them as points in a space where each point can represent an object, a person, a place, or any abstract concept.

PuppyGraph supports

Standard Node (One-to-one) Mapping

  • A standard node (vertex) type is derived from a single table.
  • Each node (vertex) type must have a unique identifier, similar to a primary key in relational tables.
  • A standard node (vertex) can include properties. For instance, a node (vertex) representing a person may have properties such as name and age.
Id Age Name
v1 29 marko
v2 27 vadas

Two standard nodes (vertices), v1 and v2, are derived from the table above. Each node (vertex) has a unique identifier Id and includes the properties name and age.

Flexible Node (Many-to-one) Mapping

In some cases, a node (vertex) type may not have a dedicated table. Instead, the node (vertex) is represented as a foreign key in other tables, and in such cases, the node (vertex) can still be modeled as a flexible one.

  • A flexible node (vertex) type is derived from one or more tables.
  • The identifier of a single flexible node (vertex) may appear multiple times across the table(s), and it will be treated as a single node (vertex) in the graph.

Consider the table below, which represents the friendships between individuals. Here’s a rephrased version:

id from_id to_id weight
e7 v1 v2 0.5
e8 v2 v3 1.0

When a dedicated table for a node (vertex) exists, as shown in the previous section, the node (vertex) is modeled as a standard node (vertex). However, when no dedicated table is available, the node (vertex) can be derived from the from_id and to_id columns in the table.

In this case, any distinct values from these columns are considered the identifiers for the nodes (vertices). Consequently, the flexible node (vertex) type will include v1, v2, and v3, as modeled from the table above.

Modeling Edges

Edges, also known as relationships, represent the connections between nodes (vertices), symbolizing interactions or relationships between them. In a simple graph, an edge is often just a line connecting two nodes (vertices).

Like nodes (vertices), edges can also have properties, which describe the nature or attributes of the relationship. For example, an edge representing a friendship might include a property like since: 2010, indicating when the friendship started.

PuppyGraph follows a directional edge model, where the relationship is defined from one node (the source) to another (the target).

  • An edge type connects two node (vertex) types.
  • An edge type is mapped from a single table.
  • Each edge type must have a unique identifier, similar to a primary key in relational tables.
  • Each edge type must have referencing identifiers, akin to foreign keys in relational tables, which link to the identifiers of the nodes (vertices) it connects.
  • An edge type can have properties. For instance, an edge representing a friendship may include properties like since and friendshipLevel.
id from_id to_id weight
e7 v1 v2 0.5
e8 v2 v3 1.0

This table can be modeled into two edges, e7 and e8, which connect nodes (vertices) v1 to v2, and v2 to v3, respectively. Each edge has an identifier id, two referencing identifiers from_id and to_id, and a property weight.

When modeling the edges, we also specify that the edge type KNOWS connects the node (vertex) types Person to Person. As a result, PuppyGraph recognizes that the values in from_id and to_id are referencing the identifiers of the Person node (vertex) type.

Identifiers

Similar to primary keys in relational databases, the identifier is a unique value that distinguishes one node (vertex) or edge from another. In PuppyGraph, the identifier is mapped from one or more columns from the source table. When multiple columns are used, the combination of their values must be unique.

PuppyGraph uses $Type[$id1, $id2, ...] to represent the identifier of a node (vertex) or edge. For instance, Person[v1] or KNOWS[e7] represents the node (vertex) v1 of type Person and the edge e7 of type KNOWS, respectively.

Properties

Properties provide extra context and details about nodes (vertices) and edges in a graph. These properties are expressed as key-value pairs, offering a way to store additional information relevant to the graph’s structure and relationships.

Flexible Nodes (Many-to-one) do not directly have properties. However, properties can still be defined on the edges that connect to flexible nodes (vertices). This means that while the nodes (vertices) themselves may not carry additional data, the relationships between them—represented by edges—can still contain key-value pairs that describe attributes related to the connection. This approach allows for greater flexibility in modeling relationships without overloading the nodes (vertices) with extra data.

Building a Schema

Graph Schema Builder

The Graph Schema Builder provided in PuppyGraph web UI is the recommended way of building a graph schema.

JSON Representation

A graph schema can also be serialized as a JSON and uploaded to PuppyGraph. Here is a breakdown of the different components in the JSON.

Field Type Description
label string A user-defined name used for referencing the node (vertex) in edges and queries.
oneToOne OneToOne Defines a standard node (vertex). Only one of oneToOne or manyToOne can be present.
manyToOne ManyToOne Defines a flexible node (vertex). Only one of oneToOne or manyToOne can be present.
Field Type Description
tableSource TableSource Specifies the table information, including the catalog, database, and table name.
id MappedId Defines the unique identifier for the node (vertex) within the mapped table.
attributes []MappedField Represents the properties of the node (vertex), mapped from the data source.
Field Type Description
sources []MappedIdSource Defines multiple sources for composing the flexible node (vertex)
Field Type Description
source TableSource Specifies the table information, including the catalog, database, and table name.
id MappedId Defines the unique identifier for the node (vertex) within the mapped table.
Field Type Description
label string A user-defined name used for referencing the edge in queries.
fromVertex string The label of the source (starting) node (vertex).
toVertex string The label of the destination (ending) node (vertex).
tableSource TableSource Specifies the table information, including the catalog, database, and table name.
id MappedId Defines the unique identifier for the edge within the mapped table.
fromId MappedId References the unique identifier of the source node (vertex).
toId MappedId References the unique identifier of the destination node (vertex).
attributes []MappedField Represents the properties associated with the edge, mapped from the data source.
Field Type Description
fields []MappedField Specifies one or more field that the Id consists
Field Type Description
name string The original name of the field in the data source.
type string The data type of the field.
alias string The name used for referencing the field in queries.
Field Type Description
catalog string The name of the PuppyGraph catalog that contains the specified schema / database.
schema string The name of the schema / database within the catalog.
table string The name of the table within the specified schema / database.