newsdev/nyt-entity-uploader

Name: nyt-entity-uploader

Owner: NYT Newsroom Developers

Description: A Python wrapper for making requests to the NYT Entity Service API

Created: 2017-06-09 17:37:00.0

Updated: 2017-06-13 07:39:26.0

Pushed: 2017-06-09 20:02:31.0

Homepage: null

Size: 4

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

NYT Entity Uploader

A Python wrapper for making requests to the NYT Entity Service API.

Usage

First: You should be running an instance of the NYT Entity Service API.

Second: You should export ENTITYSVC_BASE_URL before running the uploader to point to your own running entity service API endpoint.

Example 1: As a python module

You can run the uploader as a python module and pass the name as a keyword argument.

port ENTITYSVC_BASE_URL='http://localhost.newsdev.net:8000'
thon
on 3.6.1 (default, Apr  4 2017, 09:40:21)
 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
 "help", "copyright", "credits" or "license" for more information.
from entity_uploader import UploadEntity
e = UploadEntity(name="Bank of America")
e.to_dict()
me': 'Bank of America', 'uuid': 'f514b0e1-eea5-4676-aed2-2f9ee501cd5e', 'score': 0, 'created': True}
Example 2: Running example.py

example.py is a sample implementation that reads a list of entity names from example_entities.txt.

port ENTITYSVC_BASE_URL='http://localhost.newsdev.net:8000'
thon example.py

me': 'Bank of America', 'uuid': 'f514b0e1-eea5-4676-aed2-2f9ee501cd5e', 'score': 0, 'created': True}
me': 'Bank of America', 'uuid': 'f514b0e1-eea5-4676-aed2-2f9ee501cd5e', 'score': 95, 'created': False}
me': 'Bank of America', 'uuid': 'f514b0e1-eea5-4676-aed2-2f9ee501cd5e', 'score': 100, 'created': False}
me': "banque d'amerique", 'uuid': 'cb626971-1989-4d78-870d-e6835017c936', 'score': 62, 'created': True}
me': 'Bank of America', 'uuid': 'f514b0e1-eea5-4676-aed2-2f9ee501cd5e', 'score': 95, 'created': False}
me': 'Bank of America', 'uuid': 'f514b0e1-eea5-4676-aed2-2f9ee501cd5e', 'score': 90, 'created': False}
me': 'Bank of America', 'uuid': 'f514b0e1-eea5-4676-aed2-2f9ee501cd5e', 'score': 86, 'created': False}
me': 'Bank of America', 'uuid': 'f514b0e1-eea5-4676-aed2-2f9ee501cd5e', 'score': 86, 'created': False}
me': 'Bank of America', 'uuid': 'f514b0e1-eea5-4676-aed2-2f9ee501cd5e', 'score': 86, 'created': False}

In this example, the default create_if_below score is 80. The first entity, Bank of America is created. The next entity, Bank of America, N.A. is not created because it has a similarity score of 95. The UUID of the matching entity, Bank of America, is returned. The same is true for the next entity, BANK OF AMERICA, which has an even higher score of 100. The next entity, banque d'amerique, is created as a new entity because it has a matching score of 62, which is lower than the default create_if_below score of 80. The last few entities in example_entities.txt match the first entity with varying degrees of closeness.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.