Gremlin Query Language
Introduction
PuppyGraph supports Gremlin query language to effectively retrieve data from the data sources.
Gremlin is a query language developed as part of the Apache TinkerPop, specially designed for traversing graph databases. Gremlin stands out as a functional language where queries are constructed using chained traversal steps, allowing for expressive and intricate data exploration in graph structures. Gremlin has been widely adopted by numerous graph database solutions.
PuppyGraph is a data analytics engine features fast data analytics queries from external data sources. It does not support manipulating the data directly from Gremlin language.
Gremlin console
Apache Tinkerpop provides gremlin client libraries for variety of programming languages. This document uses the official gremlin console as an example to learn the basic Gremlin language.
PuppyGraph provides a quick way to access the gremlin console. If we already followed the Getting Started guide and deployed a local PuppyGraph server, the gremlin console can be accessed from the PuppyGraph web UI. There are other ways to use our own gremlin console, the guides can be found in Gremlin official guide.
Example graph
Following the Getting started guide, we already setup a demo modern graph data with PuppyGraph that can be used for the rest of the document as an example. You may also use your own graph data.
Hello world
g.V()
all vertexes.g.E()
all edges.g.V().count()
total number of vertexes.g.E().count()
total number of edges.
gremlin> g.V()
==>v[software[v5]]
==>v[software[v3]]
==>v[person[v6]]
==>v[person[v2]]
==>v[person[v4]]
==>v[person[v1]]
gremlin> g.E()
==>e[created[e11]][person[v4]-created->software[v3]]
==>e[created[e10]][person[v4]-created->software[v5]]
==>e[created[e12]][person[v6]-created->software[v3]]
==>e[created[e9]][person[v1]-created->software[v3]]
==>e[knows[e7]][person[v1]-knows->person[v2]]
==>e[knows[e8]][person[v1]-knows->person[v4]]
gremlin> g.V().count()
==>6
gremlin> g.E().count()
==>6
Basic traversal
Usually graph traversal starts with a single vertex or a set of vertexes. hasStep
can be used to filter the vertex and start the traversal.
gremlin> marko = g.V().has('person','name','marko').next()
==>v[person[v1]]
gremlin> peopleMarkoKnows = g.V().has('person','name','marko').out('knows').toList()
==>v[person[v2]]
==>v[person[v4]]
Basic traversal steps are called "vertex steps" in gremlin.
The vertex steps (flatMap) are fundamental to the Gremlin language. Via these steps, its possible to "move" on the graph — i.e. traverse.
out(string…)
: Move to the outgoing adjacent vertices given the edge labels.in(string…)
: Move to the incoming adjacent vertices given the edge labels.both(string…)
: Move to both the incoming and outgoing adjacent vertices given the edge labels.outE(string…)
: Move to the outgoing incident edges given the edge labels.inE(string…)
: Move to the incoming incident edges given the edge labels.bothE(string…)
: Move to both the incoming and outgoing incident edges given the edge labels.outV()
: Move to the outgoing vertex.inV()
: Move to the incoming vertex.bothV()
: Move to both vertices.otherV()
: Move to the vertex that was not the vertex that was moved from.
The vertex steps can be chained together to form more complex traversals.
gremlin> g.V().has('person','name','marko').out('knows').outE('created').otherV()
==>v[software[v3]]
==>v[software[v5]]
Common steps
Filters
Filters can be applied along the traversal, has()
step can be used to filter on the attributes.
gremlin> g.V().has('person','name','marko').out('knows').outE('created').valueMap()
==>{weight=0.4}
==>{weight=1.0}
gremlin> g.V().has('person','name','marko').out('knows').outE('created').has('weight', gt(0.5))
==>e[created[e10]][person[v4]-created->software[v5]]
gremlin> g.V().has('person','name','marko').out('knows').outE('created').has('weight', gt(0.5)).otherV()
==>v[software[v5]]
hasLabel()
and hasId()
steps are special variants of the has()
step.
gremlin> g.V().hasLabel('person')
==>v[person[v4]]
==>v[person[v1]]
==>v[person[v6]]
==>v[person[v2]]
gremlin> g.V().hasId('software[v3]')
==>v[software[v3]]
Info
PuppyGraph id has a special format label[id]
. The id format helps PuppyGraph to uniquely identify an element.
where()
step can be used for another common use case: filter by traversal.
gremlin> personMarkoKnowsHasCreated = g.V().has('person','name','marko').out('knows').where(out('created'))
==>v[person[v4]]
and()
or()
step can be used to combine multiple filter by traversals.
gremlin> g.V().and(out('created'), out('knows'))
==>v[person[v1]]
gremlin> g.V().or(__.in('created'), out('knows'))
==>v[software[v3]]
==>v[software[v5]]
==>v[person[v1]]
Projections and references
Gremlin has many ways to branch a traversal. project()
step is a very convenient way to branch the traversal while keeping the reference to all branches.
gremlin> g.V().hasLabel('person').project('person', 'knowsCount', 'createdCount').by(identity()).by(out('knows').count()).by(out('created').count()).order().by('knowsCount', desc)
==>{person=v[person[v1]], knowsCount=2, createdCount=1}
==>{person=v[person[v4]], knowsCount=0, createdCount=2}
==>{person=v[person[v6]], knowsCount=0, createdCount=1}
==>{person=v[person[v2]], knowsCount=0, createdCount=0}
as()
and select()
step can be used to reference previous steps of the traversal and start from there.
gremlin> personWhoCreatedSoftware = g.V().as('creator').out('created').select('creator').dedup()
==>v[person[v4]]
==>v[person[v6]]
==>v[person[v1]]
Utilities
path()
step can be used to return the whole traversal path (instead of just the end result).
gremlin> g.V().has('person','name','marko').out('knows').outE('created').has('weight', gt(0.5)).otherV().path()
==>path[v[person[v1]], v[person[v4]], e[created[e10]][person[v4]-created->software[v5]], v[software[v5]]]
dedup()
step can be used to only return distinct results.
gremlin> g.V().in()
==>v[person[v1]]
==>v[person[v4]]
==>v[person[v4]]
==>v[person[v6]]
==>v[person[v1]]
==>v[person[v1]]
gremlin> g.V().in().dedup()
==>v[person[v6]]
==>v[person[v1]]
==>v[person[v4]]
count()
step can be used to return the total count of the result.
order()
step ben can be used to sort the results.
gremlin> g.V().has('person','name','marko').out('knows').order().by('age')
==>v[person[v2]]
==>v[person[v4]]
limit()
step can be used to limit the number of the result set.
gremlin> g.V().has('person','name','marko').out('knows').outE('created').otherV().limit(1)
==>v[software[v3]]
profile()
step can be used to printout the query profile.
gremlin> g.V().has('person','name','marko').out('knows').outE('created').otherV().limit(1).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
PuppyGraphStep(vertex,[]) 1 1 46.862 99.65
EdgeOtherVertexStep 1 1 0.162 0.35
>TOTAL - - 47.024 -
Official specification
For detailed Gremlin language specification, please refer to the Gremlin official website.