Tuesday, February 25, 2014

Cascading extensions for Accumulo

I recently had the opportunity to work on extending Cascading to read/write to Accumulo.
Versions - Cascading 2.5.2 and Accumulo 1.5.0.

The source code is at -
https://github.com/airawat/cascading.accumulo

Examples of using the AccumuloTap are at -
https://github.com/airawat/cascading.accumulo.examples

The examples cover the following functionality-
1.  Querying Accumulo from Cascading.
2.  Performing Accumulo table operations like - create table, create table with splits, check if table exists, delete table & flush, from Cascading 
3.  Dump data in Accumulo to HDFS from Cascading.
4.  Export data in Accumulo to HDFS, after transposing to a flat, delimited format with column headers.
5.  Import data in HDFS, in a flat delimited format into Accumulo.
6.  Read data in Accumulo and write (back) to Accumulo 
7.  Export data in Accumulo into Mysql