g0v/pdf-to-markdown

Name: pdf-to-markdown

Owner: g0v

Description: Convert PDF files into markdown files

Created: 2015-05-23 16:43:23.0

Updated: 2015-05-23 16:43:24.0

Pushed: 2015-05-24 14:20:38.0

Homepage: null

Size: 6558

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

PDF To Markdown

This is NOT a general-purpose converter. Currently only for urban planning document in Taiwan.

Demo

From this PDF file, we generate:

System Requirement

You should install pdfminer first.

If your PDF file doesn't contain Chinese Characters
sudo apt-get install python-pdfminer
Else
git clone git@github.com:euske/pdfminer.git
cd pdfminer
make cmap
sudo python setup.py install

The make cmap is necessary for documents containing Chinese characters.

Usage

Just type

python main.py <pdf>

For example, you can use our example PDF file:

python main.py examples/neihu.pdf

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.