Name: py_classifurlr
Owner: Berkman Klein Center for Internet & Society
Description: null
Created: 2017-01-11 23:40:02.0
Updated: 2017-10-05 15:27:03.0
Pushed: 2017-05-22 22:32:55.0
Homepage: null
Size: 79
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Classifurlr is a tool to automatically determine if a given web page or set of web pages is likely inaccessible.
This tool does not actually fetch any content from the Internet - previously fetched content is fed into it. The given content is moved through a pipeline of classifiers, each of which looks for different signatures of inaccessibility. Each classifier returns how confident it is that the given content is inaccessible. These are then pooled to create a final accessibility verdict.
Right now, the following classifiers are implemented:
After making sure you have the requirements, install the rest of the dependencies with:
install -r requirements.txt
The tool can be used in three ways: as a Python module, as a command line program, and as a web service.
To run Classifurlr as a command line tool, simply run:
on classifurlr.py <name of data file>
You can see more options by adding the -h
flag to the above command.
The data file should be a JSON file with the following structure:
l: 'http://example.com',
seline: false, // 'page_1',
geDetail: {
'page_0': {
asn: 0,
screenshot: 'data:image/png;base64,',
errors: [''],
},
...
r: {...}
More details and field definitions for this structure are in the wiki.
The tool will return a JSON document that looks like this:
tatus": "down",
tatusConfidence": 0.52,
lassifier": "classification_pipeline",
onstituents": [
{
"status": "down",
"statusConfidence": 0.4,
"classifier": "page_length"
},
...
Here are the field definitions:
up
or down
. Right now,
this will always return down
.Classifurlr also minimally complies to the WSGI spec with the provided app
function. To run the tool as a web service, run something like the following:
corn classifurlr:app
Code is hosted on GitHub at https://github.com/berkmancenter/classifurlr
This has been tested with Python 3.6.0 running on Ubuntu 16.04.
Classifurlr comes with a really minimal test suite. To run it, just run:
on classifurlr_test.py
TODO
jdcc
Copyright © 2017 President and Fellows of Harvard College
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.