Schema
PuppyGraph utilizes schemas to model graphs on top of data sources.
A graph schema in PuppyGraph defines how data sources are connected and structured. It specifies the types of nodes (vertices) and edges in the graph, along with their associated properties.
Data Sources
A schema contains one or more data source connections. Each data source connection is defined as a catalog.
A catalog specifies the connection parameters and authentication details required to access the data source. It also includes an internal name used for referencing in graph modeling.
Graph Modeling
PuppyGraph follows a labeled property graph model, built around two core concepts: nodes (vertices) and edges.
During the graph modeling process, PuppyGraph maps tables from data sources into these two components, defining their structure and relationships.
Modeling Nodes
Nodes (also known as vertices) are the fundamental units of which graphs are formed. Think of them as points in a space where each point can represent an object, a person, a place, or any abstract concept.
PuppyGraph supports
Standard Node (One-to-one) Mapping
- A standard node (vertex) type is derived from a single table.
- Each node (vertex) type must have a unique identifier, similar to a primary key in relational tables.
- A standard node (vertex) can include properties. For instance, a node (vertex) representing a person may have properties such as
name
andage
.
Id | Age | Name |
---|---|---|
v1 | 29 | marko |
v2 | 27 | vadas |
Two standard nodes (vertices), v1
and v2
, are derived from the table above. Each node (vertex) has a unique identifier Id
and includes the properties name
and age
.
Flexible Node (Many-to-one) Mapping
In some cases, a node (vertex) type may not have a dedicated table. Instead, the node (vertex) is represented as a foreign key in other tables, and in such cases, the node (vertex) can still be modeled as a flexible one.
- A flexible node (vertex) type is derived from one or more tables.
- The identifier of a single flexible node (vertex) may appear multiple times across the table(s), and it will be treated as a single node (vertex) in the graph.
Consider the table below, which represents the friendships between individuals. Here’s a rephrased version:
id | from_id | to_id | weight |
---|---|---|---|
e7 | v1 | v2 | 0.5 |
e8 | v2 | v3 | 1.0 |
When a dedicated table for a node (vertex) exists, as shown in the previous section, the node (vertex) is modeled as a standard node (vertex). However, when no dedicated table is available, the node (vertex) can be derived from the from_id
and to_id
columns in the table.
In this case, any distinct values from these columns are considered the identifiers for the nodes (vertices). Consequently, the flexible node (vertex) type will include v1
, v2
, and v3
, as modeled from the table above.
Modeling Edges
Edges, also known as relationships, represent the connections between nodes (vertices), symbolizing interactions or relationships between them. In a simple graph, an edge is often just a line connecting two nodes (vertices).
Like nodes (vertices), edges can also have properties, which describe the nature or attributes of the relationship. For example, an edge representing a friendship might include a property like since: 2010
, indicating when the friendship started.
PuppyGraph follows a directional edge model, where the relationship is defined from one node (the source) to another (the target).
- An edge type connects two node (vertex) types.
- An edge type is mapped from a single table.
- Each edge type must have a unique identifier, similar to a primary key in relational tables.
- Each edge type must have referencing identifiers, akin to foreign keys in relational tables, which link to the identifiers of the nodes (vertices) it connects.
- An edge type can have properties. For instance, an edge representing a friendship may include properties like
since
andfriendshipLevel
.
id | from_id | to_id | weight |
---|---|---|---|
e7 | v1 | v2 | 0.5 |
e8 | v2 | v3 | 1.0 |
This table can be modeled into two edges, e7
and e8
, which connect nodes (vertices) v1
to v2
, and v2
to v3
, respectively. Each edge has an identifier id
, two referencing identifiers from_id
and to_id
, and a property weight
.
When modeling the edges, we also specify that the edge type KNOWS
connects the node (vertex) types Person
to Person
. As a result, PuppyGraph recognizes that the values in from_id
and to_id
are referencing the identifiers of the Person
node (vertex) type.
Identifiers
Similar to primary keys in relational databases, the identifier is a unique value that distinguishes one node (vertex) or edge from another. In PuppyGraph, the identifier is mapped from one or more columns from the source table. When multiple columns are used, the combination of their values must be unique.
PuppyGraph uses $Type[$id1, $id2, ...]
to represent the identifier of a node (vertex) or edge. For instance, Person[v1]
or KNOWS[e7]
represents the node (vertex) v1
of type Person
and the edge e7
of type KNOWS
, respectively.
Properties
Properties provide extra context and details about nodes (vertices) and edges in a graph. These properties are expressed as key-value pairs, offering a way to store additional information relevant to the graph’s structure and relationships.
Flexible Nodes (Many-to-one) do not directly have properties. However, properties can still be defined on the edges that connect to flexible nodes (vertices). This means that while the nodes (vertices) themselves may not carry additional data, the relationships between them—represented by edges—can still contain key-value pairs that describe attributes related to the connection. This approach allows for greater flexibility in modeling relationships without overloading the nodes (vertices) with extra data.
Building a Schema
Graph Schema Builder
The Graph Schema Builder provided in PuppyGraph web UI is the recommended way of building a graph schema.
JSON Representation
A graph schema can also be serialized as a JSON and uploaded to PuppyGraph. Here is a breakdown of the different components in the JSON.
Field | Type | Description |
---|---|---|
label |
string |
A user-defined name used for referencing the node (vertex) in edges and queries. |
oneToOne |
OneToOne |
Defines a standard node (vertex). Only one of oneToOne or manyToOne can be present. |
manyToOne |
ManyToOne |
Defines a flexible node (vertex). Only one of oneToOne or manyToOne can be present. |
Field | Type | Description |
---|---|---|
tableSource |
TableSource |
Specifies the table information, including the catalog, database, and table name. |
id |
MappedId |
Defines the unique identifier for the node (vertex) within the mapped table. |
attributes |
[]MappedField |
Represents the properties of the node (vertex), mapped from the data source. |
Field | Type | Description |
---|---|---|
sources |
[]MappedIdSource |
Defines multiple sources for composing the flexible node (vertex) |
Field | Type | Description |
---|---|---|
source |
TableSource |
Specifies the table information, including the catalog, database, and table name. |
id |
MappedId |
Defines the unique identifier for the node (vertex) within the mapped table. |
Field | Type | Description |
---|---|---|
label |
string |
A user-defined name used for referencing the edge in queries. |
fromVertex |
string |
The label of the source (starting) node (vertex). |
toVertex |
string |
The label of the destination (ending) node (vertex). |
tableSource |
TableSource |
Specifies the table information, including the catalog, database, and table name. |
id |
MappedId |
Defines the unique identifier for the edge within the mapped table. |
fromId |
MappedId |
References the unique identifier of the source node (vertex). |
toId |
MappedId |
References the unique identifier of the destination node (vertex). |
attributes |
[]MappedField |
Represents the properties associated with the edge, mapped from the data source. |
Field | Type | Description |
---|---|---|
fields |
[]MappedField |
Specifies one or more field that the Id consists |
Field | Type | Description |
---|---|---|
name |
string |
The original name of the field in the data source. |
type |
string |
The data type of the field. |
alias |
string |
The name used for referencing the field in queries. |
Field | Type | Description |
---|---|---|
catalog |
string |
The name of the PuppyGraph catalog that contains the specified schema / database. |
schema |
string |
The name of the schema / database within the catalog. |
table |
string |
The name of the table within the specified schema / database. |