Name: mlscrape
Owner: Sunlight Labs
Description: mlscrape is a library for site-specific automated website scraping based on human-annotated examples
Created: 2015-05-08 22:14:18.0
Updated: 2015-08-21 09:55:12.0
Pushed: 2015-05-19 17:36:37.0
Size: 124
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
mlscrape is a library for site-specific automated website scraping based on human-annotated examples. It contains two types of models: one for learning to distinguish between pages of interest and uninteresting pages, and one for identifying elements of interest within target pages. It works best for websites where there are good clues in the DOM that both distinguish between interesting and uninteresting pages, and between interesting and uninteresting DOM nodes.