Name: mongo_to_hive_mapping
Owner: racker
Description: Getting mongodb collection's structure in json format. Creating hiveql create table statements based on json input.
Forked from: YaroslavLitvinov/mongo_to_hive_mapping
Created: 2015-10-19 20:44:13.0
Updated: 2016-03-09 18:25:24.0
Pushed: 2016-03-09 18:27:08.0
Size: 33
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
example: python get_mongo_schema_as_json.py –host localhost:27017 -cn db.collection -of schema.txt
Contents of resulted file schema.txt can be as following: {
"_id": "INT",
"some_field": "BOOLEAN",
"data": [
{
"_type": "STR",
"messages": [
{
"date": "TIMESTAMP",
"message": {
"type": "TINYINT",
"text": "STR"
}
}
]
}
]
}
2.To get hiveql scripts for creating hive nested and native flat tables see below. While generating external hive table it using content of template.txt as template. The lateral view is used for creating plain tables from external table. Some excessive table fields can be filtered by using 'ifeb' option, just provide file with lines corresponding to data to be excluded. Also to get all schema branches into file use option '-output-branches'.
exclude_list.txt: some_field data.messages.message.type
example: python get_hiveql_create_tables_by_schema.py -ifs schema.txt -tn records -od hiveql_autogenerated -fexclude exclude_list.txt -output-branches all_branches.txt –mongouri mongodb://localhost:27017/db.collection
example: python get_mongo_schema_as_json.py –host localhost -cn db.collection | python get_hiveql_create_tables_by_schema.py -tn records -od hiveql_autogenerated -fexclude exclude_list.txt -output-branches all_branches.txt –mongouri mongodb://localhost:27017/db.collection
3.Known issues: Generated tables may have duplicate fields due to naming conflicts, in this case it's can be resolved manually by altering name of field in produced file.