Name: stringcmp
Owner: datamade
Description: String comparison functions from FEBRL
Created: 2015-09-26 15:37:41.0
Updated: 2018-01-03 15:35:30.0
Pushed: 2017-07-06 14:32:41.0
Homepage: null
Size: 184
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
String comparison functions from FEBRL
act Exact comparison
ro Jaro
nkler Winkler (based on Jaro) (for backwards compatibility)
ram q-gram based
gram 2-gram based (for backwards compatibility)
sqgram Positional q-gram based
ram Skip-gram based
itdist Edit-distance (or Levenshtein distance)
d_editdist Modified edit-distance (with transposition cost 1, not 2)
gdist Bag distance (cheap distance based method)
dist Smith-Waternam distance
llaligndist Syllable alignment distance
qmatch Uses Python's standard library 'difflib'
mpression Based on Zlib compression algorithm
s (Repeated) longest common substring, improves results for
swapped words
tolcs Ontology alignment string comparison based on longest common
substring, Hamacher product and Winkler heuristics.
rmwinkler Winkler combined with permutations of words, improves results
for swapped words
rtwinkler Winkler with sorted words (if more than one), improves results
for swapped words
itex Phonetic aware edit-distance (Zobel et al. 1996)
oleveljaro Apply Jaro comparator at word level, with words being compared
using a selectable approximate string comparison function
arhistogram Get histogram of characters for both strings and calculate the
cosine similarity between the two histogram vectors