Gremlin Query Language

Introduction

PuppyGraph supports Gremlin query language to effectively retrieve data from the data sources.

Gremlin is a query language developed as part of the Apache TinkerPop, specially designed for traversing graph databases. Gremlin stands out as a functional language where queries are constructed using chained traversal steps, allowing for expressive and intricate data exploration in graph structures. Gremlin has been widely adopted by numerous graph database solutions.

PuppyGraph is a data analytics engine features fast data analytics queries from external data sources. It does not support manipulating the data directly from Gremlin language.

Gremlin console

Apache Tinkerpop provides gremlin client libraries for variety of programming languages. This document uses the official gremlin console as an example to learn the basic Gremlin language.

PuppyGraph provides a quick way to access the gremlin console. If we already followed the guide and deployed a local PuppyGraph server, the gremlin console can be accessed from the PuppyGraph web UI. There are other ways to use our own gremlin console, the guides can be found in Gremlin official guide.

Example graph

Following the guide, we already setup a demo modern graph data with PuppyGraph that can be used for the rest of the document as an example. You may also use your own graph data.

Hello world

  • g.V() all vertexes.

  • g.E() all edges.

  • g.V().count() total number of vertexes.

  • g.E().count() total number of edges.

gremlin> g.V()
==>v[software:::v5]
==>v[software:::v3]
==>v[person:::v6]
==>v[person:::v2]
==>v[person:::v4]
==>v[person:::v1]
gremlin> g.E()
==>e[created:::e11][person:::v4-created->software:::v3]
==>e[created:::e10][person:::v4-created->software:::v5]
==>e[created:::e12][person:::v6-created->software:::v3]
==>e[created:::e9][person:::v1-created->software:::v3]
==>e[knows:::e7][person:::v1-knows->person:::v2]
==>e[knows:::e8][person:::v1-knows->person:::v4]
gremlin> g.V().count()
==>6
gremlin> g.E().count()
==>6

Basic traversal

Usually graph traversal starts with a single vertex or a set of vertexes. hasStep can be used to filter the vertex and start the traversal.

gremlin> marko = g.V().has('person','name','marko').next()
==>v[person:::v1]
gremlin> peopleMarkoKnows = g.V().has('person','name','marko').out('knows').toList()
==>v[person:::v2]
==>v[person:::v4]

Basic traversal steps are called "vertex steps" in gremlin.

The vertex steps (flatMap) are fundamental to the Gremlin language. Via these steps, its possible to "move" on the graph — i.e. traverse.

  • out(string…​): Move to the outgoing adjacent vertices given the edge labels.

  • in(string…​): Move to the incoming adjacent vertices given the edge labels.

  • both(string…​): Move to both the incoming and outgoing adjacent vertices given the edge labels.

  • outE(string…​): Move to the outgoing incident edges given the edge labels.

  • inE(string…​): Move to the incoming incident edges given the edge labels.

  • bothE(string…​): Move to both the incoming and outgoing incident edges given the edge labels.

  • outV(): Move to the outgoing vertex.

  • inV(): Move to the incoming vertex.

  • bothV(): Move to both vertices.

  • otherV() : Move to the vertex that was not the vertex that was moved from.

The vertex steps can be chained together to form more complex traversals.

gremlin> g.V().has('person','name','marko').out('knows').outE('created').otherV()
==>v[software:::v3]
==>v[software:::v5]

Common steps

Filters

Filters can be applied along the traversal, has() step can be used to filter on the attributes.

gremlin> g.V().has('person','name','marko').out('knows').outE('created').valueMap()
==>{weight=0.4}
==>{weight=1.0}
gremlin> g.V().has('person','name','marko').out('knows').outE('created').has('weight', gt(0.5))
==>e[created:::e10][person:::v4-created->software:::v5]
gremlin> g.V().has('person','name','marko').out('knows').outE('created').has('weight', gt(0.5)).otherV()
==>v[software:::v5]

hasLabel() and hasId() steps are special variants of the has() step.

gremlin> g.V().hasLabel('person')
==>v[person:::v4]
==>v[person:::v1]
==>v[person:::v6]
==>v[person:::v2]
gremlin> g.V().hasId('software:::v3')
==>v[software:::v3]

PuppyGraph id has a special format label:::id. The id format helps PuppyGraph to uniquely identify an element.

where() step can be used for another common use case: filter by traversal.

gremlin> personMarkoKnowsHasCreated = g.V().has('person','name','marko').out('knows').where(out('created'))
==>v[person:::v4]

and() or() step can be used to combine multiple filter by traversals.

gremlin> g.V().and(out('created'), out('knows'))
==>v[person:::v1]
gremlin> g.V().or(__.in('created'), out('knows'))
==>v[software:::v3]
==>v[software:::v5]
==>v[person:::v1]

Projections and references

Gremlin has many ways to branch a traversal. project() step is a very convenient way to branch the traversal while keeping the reference to all branches.

gremlin> g.V().hasLabel('person').project('person', 'knowsCount', 'createdCount').by(identity()).by(out('knows').count()).by(out('created').count()).order().by('knowsCount', desc)
==>{person=v[person:::v1], knowsCount=2, createdCount=1}
==>{person=v[person:::v4], knowsCount=0, createdCount=2}
==>{person=v[person:::v6], knowsCount=0, createdCount=1}
==>{person=v[person:::v2], knowsCount=0, createdCount=0}

as() and select() step can be used to reference previous steps of the traversal and start from there.

gremlin> personWhoCreatedSoftware = g.V().as('creator').out('created').select('creator').dedup()
==>v[person:::v4]
==>v[person:::v6]
==>v[person:::v1]

Utilities

path() step can be used to return the whole traversal path (instead of just the end result).

gremlin> g.V().has('person','name','marko').out('knows').outE('created').has('weight', gt(0.5)).otherV().path()
==>path[v[person:::v1], v[person:::v4], e[created:::e10][person:::v4-created->software:::v5], v[software:::v5]]

dedup() step can be used to only return distinct results.

gremlin> g.V().in()
==>v[person:::v1]
==>v[person:::v4]
==>v[person:::v4]
==>v[person:::v6]
==>v[person:::v1]
==>v[person:::v1]
gremlin> g.V().in().dedup()
==>v[person:::v6]
==>v[person:::v1]
==>v[person:::v4]

count() step can be used to return the total count of the result.

gremlin> g.V().has('person','name','marko').out('knows').outE('created').otherV().count()
==>2

order() step ben can be used to sort the results.

gremlin> g.V().has('person','name','marko').out('knows').order().by('age')
==>v[person:::v2]
==>v[person:::v4]

limit() step can be used to limit the number of the result set.

gremlin> g.V().has('person','name','marko').out('knows').outE('created').otherV().limit(1)
==>v[software:::v3]

profile() step can be used to printout the query profile.

gremlin> g.V().has('person','name','marko').out('knows').outE('created').otherV().limit(1).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
PuppyGraphStep(vertex,[])                                              1           1          46.862    99.65
EdgeOtherVertexStep                                                    1           1           0.162     0.35
                                            >TOTAL                     -           -          47.024        -

Official specification

For detailed Gremlin language specification, please refer to the Gremlin official website.

Last updated