I was quite busy with visas these days. So the update is very late.
As I realised my deadline is almost there, I think I need to work on Christmas. My team and I will try to test some scalable graph algorithms on top of Titan. Besides the algo I was working, I need to focus on configurations and deployment as well.
Titan interacts with Data storage:
* Cassandra (Will use this one)
* HBase
* BerkeleyDB
With Indices (enable complex queries):
* Elasticsearch
* Lucene
Applications can interact with Titan:
* Titan’s Java API (Blueprints API).
* Tinkerpop stack utilities built atop Blueprints (Gremlin Query language, Rexster graph server).
About Transactions
Methods on a TitanGraph instance perform a ThreadLocal lookup to retrieve or create a transaction associated with the calling thread.
Vertex is automatically transitioned, but edges are not.
TitanGraph g = TitanFactory.open("berkeleyje:/tmp/titan"); Vertex juno = g.addVertex(null); //Automatically opens a new transaction g.commit(); //Ends transaction juno.setProperty("name", "juno"); //Vertex is automatically transitioned Edge e = juno.addEdge("knows",g.addVertex(null)); g.commit(); //Ends transaction e = g.getEdge(e); //Need to refresh edge e.setProperty("time", 99);
-Multi-threaded transactions
ThreadedTransactionalGraph interface.
TransactionalGraph tx = g.newTransaction(); Thread[] threads = new Thread[10]; for (int i=0;i<threads.length;i++) { threads[i]=new Thread(new DoSomething(tx)); threads[i].start(); } for (int i=0;i<threads.length;i++) threads[i].join(); tx.commit();
Embedded Mode of Titan and Cassandra (I will use this mode in the future)
Titan and Cassandra run in the same JVM. Leads to performance improvements. Titan internally starts a cassandra daemon and it connects to its own cluster.
embeddedcassandra as storage backend.
When running Titan in embedded mode, the Cassandra yaml file is configured using the additional configuration option storage.conf-file, which specifies the yaml file as a full url, e.g. storage.conf-file = file:///home/cassandra.yaml.
——–Titan Document.
Then I searched the .yaml file, it has hosts and port defined.
How to start and stop properly?
If you are using titan-hadoop-1.0.0, then you can start everything (titan,elasticsearch and cassandra) by simply one command:
>> bin/titan.sh start Forking Cassandra... Running `nodetool statusthrift`. OK (returned exit status 0 and printed string "running"). Forking Elasticsearch... Connecting to Elasticsearch (127.0.0.1:9300)......... OK (connected to 127.0.0.1:9300). Forking Gremlin-Server... Connecting to Gremlin-Server (127.0.0.1:8182)...... OK (connected to 127.0.0.1:8182). Run gremlin.sh to connect.
Then use gremlin.sh to start gremlin console.
As I tested, if I run elasticsearch and cassandra separately, and then gremlin.sh, it will show the same thing.
Use the following to check status:
>> bin/titan.sh status Gremlin-Server (org.apache.tinkerpop.gremlin.server.GremlinServer) is running with pid 10669 Cassandra (org.apache.cassandra.service.CassandraDaemon) does not appear in the java process table Elasticsearch (org.elasticsearch.bootstrap.Elasticsearch) is running with pid 10597
Use >> bin/titan.sh stop to terminate the service. Few days ago, I only stop by ctrl+C, then I started cassandra, the error was showing that a PID was in use, I think it was because I did not stop cassandra properly so the old one was still running.
Special thanks to Daniel and Jason, who helped me on the questions on Google Groups of Gremlin-users. We fixed the problem of g.V()[version problem]. (Do check the version before you start working.)
References:
Click to access C2012-Titan-MatthiasBroecheler.pdf
https://groups.google.com/forum/#!topic/gremlin-users/keD42JsAam8
http://tinkerpop.incubator.apache.org/docs/3.0.2-incubating/#_the_graph_process
EC2:
http://s3.thinkaurelius.com/docs/titan/0.5.4/cassandra.html