datamade/stringcmp

Name: stringcmp

Owner: datamade

Description: String comparison functions from FEBRL

Created: 2015-09-26 15:37:41.0

Updated: 2018-01-03 15:35:30.0

Pushed: 2017-07-06 14:32:41.0

Homepage: null

Size: 184

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

strcmp

String comparison functions from FEBRL

act          Exact comparison
ro           Jaro
nkler        Winkler (based on Jaro)  (for backwards compatibility)
ram          q-gram based
gram         2-gram based  (for backwards compatibility)
sqgram       Positional q-gram based
ram          Skip-gram based
itdist       Edit-distance (or Levenshtein distance)
d_editdist   Modified edit-distance (with transposition cost 1, not 2)
gdist        Bag distance (cheap distance based method)
dist         Smith-Waternam distance
llaligndist  Syllable alignment distance
qmatch       Uses Python's standard library 'difflib'
mpression    Based on Zlib compression algorithm
s            (Repeated) longest common substring, improves results for
             swapped words
tolcs        Ontology alignment string comparison based on longest common
             substring, Hamacher product and Winkler heuristics.
rmwinkler    Winkler combined with permutations of words, improves results
             for swapped words
rtwinkler    Winkler with sorted words (if more than one), improves results
             for swapped words
itex         Phonetic aware edit-distance (Zobel et al. 1996)
oleveljaro   Apply Jaro comparator at word level, with words being compared
             using a selectable approximate string comparison function
arhistogram  Get histogram of characters for both strings and calculate the
             cosine similarity between the two histogram vectors

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.