Posted in Graph Database

Titan: more about examples and confs

I was quite busy with visas these days. So the update is very late.

As I realised my deadline is almost there, I think I need to work on Christmas. My team and I will try to test some scalable graph algorithms on top of Titan. Besides the algo I was working, I need to focus on configurations and deployment as well.

Titan interacts with Data storage:
* Cassandra (Will use this one)
* HBase
* BerkeleyDB
With Indices (enable complex queries):
* Elasticsearch
* Lucene

Applications can interact with Titan:
* Titan’s Java API (Blueprints API).
* Tinkerpop stack utilities built atop Blueprints (Gremlin Query language, Rexster graph server).

About Transactions
Methods on a TitanGraph instance perform a ThreadLocal lookup to retrieve or create a transaction associated with the calling thread.

Vertex is automatically transitioned, but edges are not.

TitanGraph g ="berkeleyje:/tmp/titan");
Vertex juno = g.addVertex(null); //Automatically opens a new transaction
g.commit(); //Ends transaction
juno.setProperty("name", "juno"); //Vertex is automatically transitioned
Edge e = juno.addEdge("knows",g.addVertex(null));
g.commit(); //Ends transaction
e = g.getEdge(e); //Need to refresh edge
e.setProperty("time", 99);

-Multi-threaded transactions
ThreadedTransactionalGraph interface.

TransactionalGraph tx = g.newTransaction();
Thread[] threads = new Thread[10];
for (int i=0;i<threads.length;i++) {
threads[i]=new Thread(new DoSomething(tx));
for (int i=0;i<threads.length;i++) threads[i].join();

Embedded Mode of Titan and Cassandra (I will use this mode in the future)

Titan and Cassandra run in the same JVM. Leads to performance improvements. Titan internally starts a cassandra daemon and it connects to its own cluster.
embeddedcassandra as storage backend.
When running Titan in embedded mode, the Cassandra yaml file is configured using the additional configuration option storage.conf-file, which specifies the yaml file as a full url, e.g. storage.conf-file = file:///home/cassandra.yaml.


             ——–Titan Document.

Then I searched the .yaml file, it has hosts and port defined.

How to start and stop properly?
If you are using titan-hadoop-1.0.0, then you can start everything (titan,elasticsearch and cassandra) by simply one command:

>> bin/ start

Forking Cassandra...
Running `nodetool statusthrift`. OK (returned exit status 0 and printed string "running").
Forking Elasticsearch...
Connecting to Elasticsearch ( OK (connected to
Forking Gremlin-Server...
Connecting to Gremlin-Server ( OK (connected to
Run to connect.

Then use to start gremlin console.

As I tested, if I run elasticsearch and cassandra separately, and then, it will show the same thing.

Use the following to check status:

>> bin/ status
Gremlin-Server (org.apache.tinkerpop.gremlin.server.GremlinServer) is running with pid 10669
Cassandra (org.apache.cassandra.service.CassandraDaemon) does not appear in the java process table
Elasticsearch (org.elasticsearch.bootstrap.Elasticsearch) is running with pid 10597

Use >> bin/ stop to terminate the service. Few days ago, I only stop by ctrl+C, then I started cassandra, the error was showing that a PID was in use, I think it was because I did not stop cassandra properly so the old one was still running.


Special thanks to Daniel and Jason, who helped me on the questions on Google Groups of Gremlin-users. We fixed the problem of g.V()[version problem]. (Do check the version before you start working.)



Keep calm and update blog.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s