Name: mongo-hadoop
Owner: racker
Description: MongoDB Connector for Hadoop
Forked from: YaroslavLitvinov/mongo-hadoop
Created: 2015-10-19 20:44:51.0
Updated: 2015-10-19 20:44:55.0
Pushed: 2015-12-02 02:24:51.0
Size: 86916
Language: Java
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
The MongoDB Connector for Hadoop is a library which allows MongoDB (or backup files in its data format, BSON) to be used as an input source, or output destination, for Hadoop MapReduce tasks. It is designed to allow greater flexibility and performance and make it easy to integrate data in MongoDB with other parts of the Hadoop ecosystem including the following:
Check out the releases page for the latest stable release.
mongorestore
See the release page.
Run ./gradlew jar
to build the jars. The jars will be placed in to build/libs
for each module. e.g. for the core module,
it will be generated in the core/build/libs
directory.
After successfully building, you must copy the jars to the lib directory on each node in your hadoop cluster. This is usually one of the following locations, depending on which Hadoop release you are using:
$HADOOP_HOME/lib/
$HADOOP_HOME/share/hadoop/mapreduce/
$HADOOP_HOME/share/hadoop/lib/
mongo-hadoop should work on any distribution of hadoop. Should you run in to an issue, please file a Jira ticket.
For full documentation, please check out the Hadoop Connector Wiki. The documentation includes installation instructions, configuration options, as well as specific instructions and examples for each Hadoop application the connector supports.
Amazon Elastic MapReduce is a managed Hadoop framework that allows you to submit jobs to a cluster of customizable size and configuration, without needing to deal with provisioning nodes and installing software.
Using EMR with the MongoDB Connector for Hadoop allows you to run MapReduce jobs against MongoDB backup files stored in S3.
Submitting jobs using the MongoDB Connector for Hadoop to EMR simply requires that the bootstrap actions fetch the dependencies (mongoDB
java driver, mongo-hadoop-core libs, etc.) and place them into the hadoop distributions lib
folders.
For a full example (running the enron example on Elastic MapReduce) please see here.
If your code introduces new features, add tests that cover them if possible and make sure that ./gradlew check
still passes. For instructions on how to run the tests, see the Running the Tests section in the wiki.
If you're not sure how to write a test for a feature or have trouble with a test failure, please post on the google-groups with details
and we will try to help. Note: Until findbugs updates its dependencies, running ./gradlew check
on Java 8 will fail.
Luke Lovett (luke.lovett@mongodb.com)
Issue tracking: https://jira.mongodb.org/browse/HADOOP/
Discussion: http://groups.google.com/group/mongodb-user/