twitter/cassovary

Name: cassovary

Owner: Twitter, Inc.

Description: Cassovary is a simple big graph processing library for the JVM

Created: 2012-02-21 23:22:30.0

Updated: 2018-01-09 21:14:17.0

Pushed: 2017-09-07 19:10:16.0

Homepage: http://twitter.com/cassovary

Size: 53535

Language: Scala

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Cassovary

Maven Central Build Status

Cassovary is a simple “big graph” processing library for the JVM. Most JVM-hosted graph libraries are flexible but not space efficient. Cassovary is designed from the ground up to first be able to efficiently handle graphs with billions of nodes and edges. A typical example usage is to do large scale graph mining and analysis of a big network. Cassovary is written in Scala and can be used with any JVM-hosted language. It comes with some common data structures and algorithms.

Please follow the cassovary project on twitter at @cassovary for updates.

Quick Start and Examples

After cloning the repository, type `./sbt` which will download the sbt launch jar and launch the sbt console. Then type the following in the console:

There is a subproject included called cassovary-examples containing simple java and scala examples of using the library. See this README to get started with these examples.

Some other subprojects to check are cassovary-benchmarks for helping benchmark some graph algorithms and cassovary-server that exposes Cassovary on a web server.

Building

Cassovary is built using sbt and was tested last using sbt version 0.13.9

Alternative for using for local projects
Using maven published version of library

Cassovary is published to maven central with scala version 2.11.8 starting Cassovary version 7.0.0. The latest published Cassovary version number that also works with scala 2.10 is version 6.4.0. Please see the latest version number (such as 7.1.0) released alongside the maven-central image at the top of this README.

To use with sbt, substitute the latest version number and use:


The last Cassovary version to support scala 2.9 is 3.4.0, and support for scala version 2.9.x has been discontinued since. The last Cassovary version to support scala 2.10 is 6.4.0, and support for scala version 2.10.x has been discontinued since. Also, Cassovary requires Java 7+ and the last Cassovary version to support Java 6 was 3.4.0.

The only dependency that Cassovary uses which is not bundled with it (because of its size) is `it.unimi.dsi.fastutil`. You can add that dependency in your sbt project as follows:

omparison to Other Graph Libraries
e are many excellent graph mining libraries already in existence. Most of
 have one or more of the following characteristics:

ritten in C/C++. Examples include [SNAP](http://snap.stanford.edu/) from Stanford and
phLab](http://graphlab.org/) from CMU. The typical way to use these from JVM is to use
bridges.
acrifice storage efficiency for flexibility. Examples include
G](http://jung.sourceforge.net/) which is written in Java but
es nodes and edges as big objects.
re meant to do much more, typically a full graph database. Examples include
4J](http://neo4j.org).

he other hand, Cassovary is intended to be easy to use in a JVM-hosted
ronment and yet be efficient enough to scale to billions of edges.
s deliberately not designed to provide any persistence or database functionality.
, it currently skips any concerns of partitioning the graph and hence is
directly comparable to distributed graph processing systems like
che Giraph](http://incubator.apache.org/giraph/). This allows complex algorithms
e run on the graph efficiently, an otherwise recurring issue with distributed
h processing systems because of the known difficulty of achieving good
h partitions. On the flip side, the size of the
h it works with is bounded by the memory available in a machine, though
use of space efficient data structures does not seem to make this a
tation for most practical graphs. For example, a ```SharedArrayBasedDirectedGraph```
ance of a unidirectional graph with 10M nodes and 1B edges consumes
 than 6GB of memory, and scales linearly beyond that. Some other data points for memory
e can be checked out using the script ```bash cassovary-examples/src/main/bash/load-graph-examples.sh```.
he script shows, a randomly generated unidirectional directed graph with 0.5M nodes and 10M edges can be built
 60MB of memory, and one with 5M nodes and 100M edges can be built with 500MB of memory. Loading both
ctions of those graphs takes respectively 120MB and 1.1GB of memory.


ailing list
://groups.google.com/group/twitter-cassovary

se follow the cassovary project on twitter at [@cassovary](https://twitter.com/cassovary)
updates.

ugs
se report any bugs to: <https://github.com/twitter/cassovary/issues>

cknowledgments
ks to all the [contributors](https://github.com/twitter/cassovary/graphs/contributors) of Cassovary.

se the [Yourkit](http://yourkit.com) Java Profiler for profiling and tuning Cassovary. [![Yourkit logo](http://projects.collide.info/attachments/download/1289/yklogo.png)](http://yourkit.com)

icense
right 2016 Twitter, Inc.

nsed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.