NaturalHistoryMuseum/DEAN-PHD-TEXT-COLLECTOR

Name: DEAN-PHD-TEXT-COLLECTOR

Owner: Natural History Museum

Description: null

Created: 2017-10-02 07:27:24.0

Updated: 2017-10-02 07:57:02.0

Pushed: 2017-10-02 08:03:43.0

Homepage: null

Size: 141

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

This is a python script for collecting data from Science Direct and EoL. The data it collects from Science Direct are titles and abstracts related to a search query identified in the config. The text it collects from EoL are the Taxonomy Identifications and any textual description information. All text are saved as either xml files (Science Direct) or text Files (EoL). This project is still at early stage development Note: To perform a scienceDirect search you must get an api key from Science Direct. It needs to be inserted in the elapsy/elapsy/config.json file. All other configuration settings related to any search or download to be performed should be inserted into the text_collector_config.py file. I indent to update it soon so that all settings are stored in this file. I also intend to upload instructions once the tool is deemed usable by anyone other than me.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.