Skip to content

Schema

PuppyGraph utilizes schemas to model graphs on top of data sources.

A graph schema in PuppyGraph defines how data sources are connected and structured. It specifies the types of vertices and edges in the graph, along with their associated properties.

Data Sources

A schema contains one or more data source connections. Each data source connection is defined as a catalog.

A catalog specifies the connection parameters and authentication details required to access the data source. It also includes an internal name used for referencing in graph modeling.

Graph Modeling

PuppyGraph follows a labeled property graph model, built around two core concepts: vertices and edges.

During the graph modeling process, PuppyGraph maps tables from data sources into these two components, defining their structure and relationships.

Modeling Vertices

Vertices (also known as nodes) are the fundamental units of which graphs are formed. Think of them as points in a space where each point can represent an object, a person, a place, or any abstract concept.

PuppyGraph supports

Standard Vertex (One-to-one) Mapping

  • A standard vertex type is derived from a single table.
  • Each vertex type must have a unique identifier, similar to a primary key in relational tables.
  • A standard vertex can include properties. For instance, a vertex representing a person may have properties such as name and age.
Id Age Name
v1 29 marko
v2 27 vadas

Two standard vertices, v1 and v2, are derived from the table above. Each vertex has a unique identifier Id and includes the properties name and age.

Flexible Vertex (Many-to-one) Mapping

In some cases, a vertex type may not have a dedicated table. Instead, the vertex is represented as a foreign key in other tables, and in such cases, the vertex can still be modeled as a flexible one.

  • A flexible vertex type is derived from one or more tables.
  • The identifier of a single flexible vertex may appear multiple times across the table(s), and it will be treated as a single vertex in the graph.

Consider the table below, which represents the friendships between individuals. Here’s a rephrased version:

id from_id to_id weight
e7 v1 v2 0.5
e8 v2 v3 1.0

When a dedicated table for a vertex exists, as shown in the previous section, the vertex is modeled as a standard vertex. However, when no dedicated table is available, the vertex can be derived from the from_id and to_id columns in the table.

In this case, any distinct values from these columns are considered the identifiers for the vertices. Consequently, the flexible vertex type will include v1, v2, and v3, as modeled from the table above.

Modeling Edges

Edges, also known as relationships, represent the connections between vertices, symbolizing interactions or relationships between them. In a simple graph, an edge is often just a line connecting two vertices.

Like vertices, edges can also have properties, which describe the nature or attributes of the relationship. For example, an edge representing a friendship might include a property like since: 2010, indicating when the friendship started.

PuppyGraph follows a directional edge model, where the relationship is defined from one vertex (the source) to another (the target).

  • An edge type connects two vertex types.
  • An edge type is mapped from a single table.
  • Each edge type must have a unique identifier, similar to a primary key in relational tables.
  • Each edge type must have referencing identifiers, akin to foreign keys in relational tables, which link to the identifiers of the vertices it connects.
  • An edge type can have properties. For instance, an edge representing a friendship may include properties like since and friendshipLevel.
id from_id to_id weight
e7 v1 v2 0.5
e8 v2 v3 1.0

This table can be modeled into two edges, e7 and e8, which connect vertices v1 to v2, and v2 to v3, respectively. Each edge has an identifier id, two referencing identifiers from_id and to_id, and a property weight.

When modeling the edges, we also specify that the edge type KNOWS connects the vertex types Person to Person. As a result, PuppyGraph recognizes that the values in from_id and to_id are referencing the identifiers of the Person vertex type.

Identifiers

Similar to primary keys in relational databases, the identifier is a unique value that distinguishes one vertex or edge from another. In PuppyGraph, the identifier is mapped from one or more columns from the source table. When multiple columns are used, the combination of their values must be unique.

PuppyGraph uses $Type[$id1, $id2, ...] to represent the identifier of a vertex or edge. For instance, Person[v1] or KNOWS[e7] represents the vertex v1 of type Person and the edge e7 of type KNOWS, respectively.

Properties

Properties provide extra context and details about vertices and edges in a graph. These properties are expressed as key-value pairs, offering a way to store additional information relevant to the graph’s structure and relationships.

Flexible Vertices (Many-to-one) do not directly have properties. However, properties can still be defined on the edges that connect to flexible vertices. This means that while the vertices themselves may not carry additional data, the relationships between them—represented by edges—can still contain key-value pairs that describe attributes related to the connection. This approach allows for greater flexibility in modeling relationships without overloading the vertices with extra data.

Building a Schema

Graph Schema Builder

The Graph Schema Builder provided in PuppyGraph web UI is the recommended way of building a graph schema.

JSON Representation

A graph schema can also be serialized as a JSON and uploaded to PuppyGraph. Here is a breakdown of the different components in the JSON.

Field Type Description
label string A user-defined name used for referencing the vertex in edges and queries.
oneToOne OneToOne Defines a standard vertex. Only one of oneToOne or manyToOne can be present.
manyToOne ManyToOne Defines a flexible vertex. Only one of oneToOne or manyToOne can be present.
Field Type Description
tableSource TableSource Specifies the table information, including the catalog, database, and table name.
id MappedId Defines the unique identifier for the vertex within the mapped table.
attributes []MappedField Represents the properties of the vertex, mapped from the data source.
Field Type Description
sources []MappedIdSource Defines multiple sources for composing the flexible vertex
Field Type Description
source TableSource Specifies the table information, including the catalog, database, and table name.
id MappedId Defines the unique identifier for the vertex within the mapped table.
Field Type Description
label string A user-defined name used for referencing the edge in queries.
fromVertex string The label of the source (starting) vertex.
toVertex string The label of the destination (ending) vertex.
tableSource TableSource Specifies the table information, including the catalog, database, and table name.
id MappedId Defines the unique identifier for the edge within the mapped table.
fromId MappedId References the unique identifier of the source vertex.
toId MappedId References the unique identifier of the destination vertex.
attributes []MappedField Represents the properties associated with the edge, mapped from the data source.
Field Type Description
fields []MappedField Specifies one or more field that the Id consists
Field Type Description
name string The original name of the field in the data source.
type string The data type of the field.
alias string The name used for referencing the field in queries.
Field Type Description
catalog string The name of the PuppyGraph catalog that contains the specified schema / database.
schema string The name of the schema / database within the catalog.
table string The name of the table within the specified schema / database.