C3BI-pasteur-fr/taxodb_ncbi

Name: taxodb_ncbi

Owner: C3BI-pasteur-fr

Description: NCBI Taxonomy Database formater

Created: 2015-10-12 14:27:29.0

Updated: 2016-05-26 15:29:09.0

Pushed: 2016-11-23 09:10:58.0

Homepage: null

Size: 34

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

taxodb_ncbi.py is a simple python script used to format the NCBI taxonomy Database. It requires bsddb3 python library and Berkeley DB library to work.

INSTALL
  1. Install Berkeley DB

  2. Mac OSX

     install berkeley-db4
    
  3. Ubuntu/Debian

     apt-get install libdb-dev
    
  4. CentOS

     yum install libdb-devel
    
  5. Install bsddb3

install bsddb3
  1. Install taxodb_ncbi.py
on setup.py install
GETTING DATA

taxodb_ncbi.py only requires 'Taxonomy nodes' (nodes.dmp) and 'Taxonomy names' (names.dmp) files to work. These files are provided within NCBI Taxonomy database.

Download required files from NCBI taxonomy database:

wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar zxf taxdump.tar.gz
USAGE
thon taxodb_ncbi.py -h
e: taxodb_ncbi.py [-h] -n file -d file [-k File] [-t File] [-b File]
                  [-f string]

ram uses to format the NCBI Taxonomy Database

onal arguments:
, --help            show this help message and exit

ons:
 file, --names file
                    names.dmp from NCBI taxonomy databank (default: None)
 file, --nodes file
                    nodes.dmp from NCBI taxonomy databank (default: None)
 File, --flatdb File
                    Output file: flat databank format. (default: None)
 File, --tab File   Output file: tabulated format. Organism with
                    classification. (default: None)
 File, --bdb File   Output file: Berleley db format (default: None)
 string, --format string
                    By default, reports only full taxonomy ie taxonomies
                    that have 'species', 'subspecies' or 'no rank' at the
                    final position. Otherwise, reports all taxonomies even
                    if they are partial (default: full)
RUNNING

Create Berkeley DB database and associated databank and tabulates files:

on taxodb_ncbi.py -n names.dmp -d nodes.dmp -k taxodb_ncbi.out -t taxodb_ncbi.out -b taxodb_ncbi.bdb

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.