codeforjapan/scrape_c4j_tumblr

Name: scrape_c4j_tumblr

Owner: Code for Japan

Description: Code for Japan ?????Tumblr??????????????WordPress??????????CSV??????????????????????

Created: 2017-12-11 03:58:13.0

Updated: 2017-12-11 04:17:25.0

Pushed: 2017-12-11 04:21:12.0

Homepage: null

Size: 11

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Code for Japan ??Website (http://archive.code4japan.org/) ??????????WordPress?????????????CSV????????????????? ????????????????????????????????????????????????????????????CSV??????????? ?????????????????????????????????????img???src???????????????????

?????????

out/export.csv — ????CSV???????WordPress ?CSV?????????????????? out/images/full — ????????????????????????????????????????URL?SHA1?????????????????

HOW TO USE

Python 2.7.12 ??????????????

???????? scrapy ???????????????????scrapy ??????????????

p install scrapy

??????????????????

t clone https://github.com/codeforjapan/scrape_c4j_tumblr.git

????????????????????????????

 scrape_c4j_tumblr
rapy crawl c4j

???????????????????OK??? out ???????????????????????? export.csv ????CSV??????images ???????????????????

ut
rt.csv  images
LICENSE

This software is released under the MIT License, see LICENSE.txt.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.