Connecting to Redshift
It is also possible to query existing data on Redshift directly without loading it to PuppyGraph. Here is an demo of the feature.
Last updated
It is also possible to query existing data on Redshift directly without loading it to PuppyGraph. Here is an demo of the feature.
Last updated
In the demo, the Redshift data source stores people and referral information. To query the data as a graph, we model people as vertices and the referral relationship between people as edges.
The demo assumes that PuppyGraph has been deployed at localhost
according to the instruction in Launching PuppyGraph from AWS Marketplace or Launching PuppyGraph in Docker. And the firewall rules of the machine and Redshift is set up properly to allow PuppyGraph access Redshift.
In this demo, we use the username puppygraph
and password puppygraph123
.
ID | Age | Name |
---|---|---|
The demo uses people and referral information as shown above.
The following steps will create tables and insert data to Redshift in the Amazon Redshift Query editor v2. We assume that the redshift environment has been set up and use user name and password to connect.
Firstly, create database with query editor.
Then, create tables in the database.
Finally, execute follow SQL in query editor to insert data into tables.
Now the data are ready in Redshift. We need a PuppyGraph schema before querying it. Let's create a schema file redshift.json
:
Here are some notes on this schema:
A catalog jdbc_redshift
is added to specify the remote data source in Redshift.
Set type
to redshift
.
Set driverClass
to com.amazon.redshift.Driver
.
Replace username
and password
with your actual usernmae and password.
Replace jdbcUri with your actual JDBC URL.
The label of the vertices and edges do not have to be the same as the names of corresponding tables in Redshift. There is a mappedTableSource
field in each of the vertex and edge types specifying the actual schema (public
) and table (referral
).
Additionally, the mappedTableSource
marks meta columns in the tables. For example, the fieldsfrom
and to
describe which columns in the table form the endpoints of edges.
Now we can upload the schema file redshift.json
to PuppyGraph with the following shell command, assuming that the PuppyGraph is running on localhost
:
Connecting to PuppyGraph at http://localhost:8081 and start gremlin console from the "Query" section:
Now we have connected to the Gremlin Console. We can query the graph:
v1
29
marko
v2
27
vadas
e1
v1
v2
0.5