Connecting to Apache Hive
Prerequisites
In this guide, PuppyGraph connects to the tables in an existing Apache Hive server.
The guide assumes that the Hive Server is at localhost:10000
and its metastore service is at localhost:9083
. See https://hive.apache.org/ for more information if you need to start a Hive server with metastore.
Moreover, the Hive data should be stored on HDFS whose ports need to be accessible.
The guide also assumes that PuppyGraph has been deployed at http://localhost:8081 according to one of guides in Launching PuppyGraph from AWS Marketplace or Launching PuppyGraph in Docker. In this demo, it uses the username puppygraph
and password puppygraph123
.
Prepare Data (Optional)
The guide will create the following two tables in the Hive database hive_onhdfs
. Feel free to skip this step if you would like to query some existing tables.
ID | Age | Name |
---|---|---|
v1 | 29 | marko |
v2 | 27 | vadas |
Use the Hive beeline client to connect to the Hive Server. The command assumes that the Hive home path is /opt/hive
. If the Hive Server is not at localhost
, change the URL accordingly.
Create the tables by typing the following statements in the Hive beeline console.
Define the Graph
We then defines a graph on top of the Hive tables we just created. Create a PuppyGraph schema file named hive_hdfs.json
with the following content:
The schema defines a Hive Catalog:
The name
hive_hdfs
defines a reference within the JSON schema. It is used by the definition of vertices and edges.The catalog type must be
hive
, and its metastore type has to beHMS
.The
metastore.hiveMetastoreUrl
specifies the URL of the Hive Metastore Service. Change the hostname accordingly if it is not deployed at localhost.
Once the schema file hive_hdfs.json
is created, upload it in the PuppyGraph Web GUI at http://localhost:8081 or using the following shell command:
Query the Graph
Connect to PuppyGraph Web GUI at http://localhost:8081 and start a gremlin console by clicking at the "Start query" button:
Now we have connected to the Gremlin Console. In order to query the graph on top of the Hive tables, we run the following query which finds out the names of people known by someone:
The result is like this:
Last updated