Schema
PuppyGraph utilizes schemas to model graphs on top of data sources.
A graph schema in PuppyGraph defines how data sources are connected and structured. It specifies the types of vertices and edges in the graph, along with their associated properties.
Data Sources
A schema contains one or more data source connections. Each data source connection is defined as a catalog.
A catalog specifies the connection parameters and authentication details required to access the data source. It also includes an internal name used for referencing in graph modeling.
Graph Modeling
PuppyGraph follows a labeled property graph model, built around two core concepts: vertices and edges.
During the graph modeling process, PuppyGraph maps tables from data sources into these two components, defining their structure and relationships.
Modeling Vertices
Vertices (also known as nodes) are the fundamental units of which graphs are formed. Think of them as points in a space where each point can represent an object, a person, a place, or any abstract concept.
PuppyGraph supports
Standard Vertex (One-to-one) Mapping
- A standard vertex type is derived from a single table.
- Each vertex type must have a unique identifier, similar to a primary key in relational tables.
- A standard vertex can include properties. For instance, a vertex representing a person may have properties such as
name
andage
.
Id | Age | Name |
---|---|---|
v1 | 29 | marko |
v2 | 27 | vadas |
Two standard vertices, v1
and v2
, are derived from the table above. Each vertex has a unique identifier Id
and includes the properties name
and age
.
Flexible Vertex (Many-to-one) Mapping
In some cases, a vertex type may not have a dedicated table. Instead, the vertex is represented as a foreign key in other tables, and in such cases, the vertex can still be modeled as a flexible one.
- A flexible vertex type is derived from one or more tables.
- The identifier of a single flexible vertex may appear multiple times across the table(s), and it will be treated as a single vertex in the graph.
Consider the table below, which represents the friendships between individuals. Here’s a rephrased version:
id | from_id | to_id | weight |
---|---|---|---|
e7 | v1 | v2 | 0.5 |
e8 | v2 | v3 | 1.0 |
When a dedicated table for a vertex exists, as shown in the previous section, the vertex is modeled as a standard vertex. However, when no dedicated table is available, the vertex can be derived from the from_id
and to_id
columns in the table.
In this case, any distinct values from these columns are considered the identifiers for the vertices. Consequently, the flexible vertex type will include v1
, v2
, and v3
, as modeled from the table above.
Modeling Edges
Edges, also known as relationships, represent the connections between vertices, symbolizing interactions or relationships between them. In a simple graph, an edge is often just a line connecting two vertices.
Like vertices, edges can also have properties, which describe the nature or attributes of the relationship. For example, an edge representing a friendship might include a property like since: 2010
, indicating when the friendship started.
PuppyGraph follows a directional edge model, where the relationship is defined from one vertex (the source) to another (the target).
- An edge type connects two vertex types.
- An edge type is mapped from a single table.
- Each edge type must have a unique identifier, similar to a primary key in relational tables.
- Each edge type must have referencing identifiers, akin to foreign keys in relational tables, which link to the identifiers of the vertices it connects.
- An edge type can have properties. For instance, an edge representing a friendship may include properties like
since
andfriendshipLevel
.
id | from_id | to_id | weight |
---|---|---|---|
e7 | v1 | v2 | 0.5 |
e8 | v2 | v3 | 1.0 |
This table can be modeled into two edges, e7
and e8
, which connect vertices v1
to v2
, and v2
to v3
, respectively. Each edge has an identifier id
, two referencing identifiers from_id
and to_id
, and a property weight
.
When modeling the edges, we also specify that the edge type KNOWS
connects the vertex types Person
to Person
. As a result, PuppyGraph recognizes that the values in from_id
and to_id
are referencing the identifiers of the Person
vertex type.
Identifiers
Similar to primary keys in relational databases, the identifier is a unique value that distinguishes one vertex or edge from another. In PuppyGraph, the identifier is mapped from one or more columns from the source table. When multiple columns are used, the combination of their values must be unique.
PuppyGraph uses $Type[$id1, $id2, ...]
to represent the identifier of a vertex or edge. For instance, Person[v1]
or KNOWS[e7]
represents the vertex v1
of type Person
and the edge e7
of type KNOWS
, respectively.
Properties
Properties provide extra context and details about vertices and edges in a graph. These properties are expressed as key-value pairs, offering a way to store additional information relevant to the graph’s structure and relationships.
Flexible Vertices (Many-to-one) do not directly have properties. However, properties can still be defined on the edges that connect to flexible vertices. This means that while the vertices themselves may not carry additional data, the relationships between them—represented by edges—can still contain key-value pairs that describe attributes related to the connection. This approach allows for greater flexibility in modeling relationships without overloading the vertices with extra data.
Building a Schema
Graph Schema Builder
The Graph Schema Builder provided in PuppyGraph web UI is the recommended way of building a graph schema.
JSON Representation
A graph schema can also be serialized as a JSON and uploaded to PuppyGraph. Here is a breakdown of the different components in the JSON.
Field | Type | Description |
---|---|---|
label |
string |
A user-defined name used for referencing the vertex in edges and queries. |
oneToOne |
OneToOne |
Defines a standard vertex. Only one of oneToOne or manyToOne can be present. |
manyToOne |
ManyToOne |
Defines a flexible vertex. Only one of oneToOne or manyToOne can be present. |
Field | Type | Description |
---|---|---|
tableSource |
TableSource |
Specifies the table information, including the catalog, database, and table name. |
id |
MappedId |
Defines the unique identifier for the vertex within the mapped table. |
attributes |
[]MappedField |
Represents the properties of the vertex, mapped from the data source. |
Field | Type | Description |
---|---|---|
sources |
[]MappedIdSource |
Defines multiple sources for composing the flexible vertex |
Field | Type | Description |
---|---|---|
source |
TableSource |
Specifies the table information, including the catalog, database, and table name. |
id |
MappedId |
Defines the unique identifier for the vertex within the mapped table. |
Field | Type | Description |
---|---|---|
label |
string |
A user-defined name used for referencing the edge in queries. |
fromVertex |
string |
The label of the source (starting) vertex. |
toVertex |
string |
The label of the destination (ending) vertex. |
tableSource |
TableSource |
Specifies the table information, including the catalog, database, and table name. |
id |
MappedId |
Defines the unique identifier for the edge within the mapped table. |
fromId |
MappedId |
References the unique identifier of the source vertex. |
toId |
MappedId |
References the unique identifier of the destination vertex. |
attributes |
[]MappedField |
Represents the properties associated with the edge, mapped from the data source. |
Field | Type | Description |
---|---|---|
fields |
[]MappedField |
Specifies one or more field that the Id consists |
Field | Type | Description |
---|---|---|
name |
string |
The original name of the field in the data source. |
type |
string |
The data type of the field. |
alias |
string |
The name used for referencing the field in queries. |
Field | Type | Description |
---|---|---|
catalog |
string |
The name of the PuppyGraph catalog that contains the specified schema / database. |
schema |
string |
The name of the schema / database within the catalog. |
table |
string |
The name of the table within the specified schema / database. |