Yoctol/word2vec-api

Name: word2vec-api

Owner: YOCTOL INFO INC.

Description: Simple web service providing a word embedding model

Forked from: 3Top/word2vec-api

Created: 2017-01-15 08:08:10.0

Updated: 2017-01-15 08:08:13.0

Pushed: 2016-12-22 19:43:20.0

Homepage: http://www.3top.com

Size: 22

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

word2vec-api

Simple web service providing a word embedding API. The methods are based on Gensim Word2Vec implementation. Models are passed as parameters and must be in the Word2Vec text or binary format.

Note: The “model” method returns a base64 encoding of the vector. “model_word_set” returns a base64 encoded pickle of the model's vocabulary.

Where to get a pretrained model

In case you do not have domain specific data to train, it can be convenient to use a pretrained model. Please feel free to submit additions to this list through a pull request.

| Model file | Number of dimensions | Corpus (size)| Vocabulary size | Author | Architecture | Training Algorithm | Context window - size | Web page | | — | — | — | — | — | — | — | — | — | | Google News | 300 |Google News (100B) | 3M | Google | word2vec | negative sampling | BoW - ~5| link | | Freebase IDs | 1000 | Gooogle News (100B) | 1.4M | Google | word2vec, skip-gram | ? | BoW - ~10 | link | | Freebase names | 1000 | Gooogle News (100B) | 1.4M | Google | word2vec, skip-gram | ? | BoW - ~10 | link | | Wikipedia+Gigaword 5 | 50 | Wikipedia+Gigaword 5 (6B) | 400,000 | GloVe | GloVe | AdaGrad | 10+10 | link | | Wikipedia+Gigaword 5 | 100 | Wikipedia+Gigaword 5 (6B) | 400,000 | GloVe | GloVe | AdaGrad | 10+10 | link | | Wikipedia+Gigaword 5 | 200 | Wikipedia+Gigaword 5 (6B) | 400,000 | GloVe | GloVe | AdaGrad | 10+10 | link | | Wikipedia+Gigaword 5 | 300 | Wikipedia+Gigaword 5 (6B) | 400,000 | GloVe | GloVe | AdaGrad | 10+10 | link | | Common Crawl 42B | 300 | Common Crawl (42B) | 1.9M | GloVe | GloVe | GloVe | AdaGrad | link | | Common Crawl 840B | 300 | Common Crawl (840B) | 2.2M | GloVe | GloVe | GloVe | AdaGrad | link | | Twitter (2B Tweets) | 25 | Twitter (27B) | ? | GloVe | GloVe | GloVe | AdaGrad | link | | Twitter (2B Tweets) | 50 | Twitter (27B) | ? | GloVe | GloVe | GloVe | AdaGrad | link | | Twitter (2B Tweets) | 100 | Twitter (27B) | ? | GloVe | GloVe | GloVe | AdaGrad | link | | Twitter (2B Tweets) | 200 | Twitter (27B) | ? | GloVe | GloVe | GloVe | AdaGrad | link | | Wikipedia dependency | 300 | Wikipedia (?) | 174,015 | Levy \& Goldberg | word2vec modified | word2vec | syntactic dependencies | link | | DBPedia vectors (wiki2vec) | 1000 | Wikipedia (?) | ? | Idio | word2vec | word2vec, skip-gram | BoW, 10 | link |


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.