Name: word2vec-api
Owner: YOCTOL INFO INC.
Description: Simple web service providing a word embedding model
Forked from: 3Top/word2vec-api
Created: 2017-01-15 08:08:10.0
Updated: 2017-01-15 08:08:13.0
Pushed: 2016-12-22 19:43:20.0
Homepage: http://www.3top.com
Size: 22
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Simple web service providing a word embedding API. The methods are based on Gensim Word2Vec implementation. Models are passed as parameters and must be in the Word2Vec text or binary format.
Install Depenencies
install -r requirements.txt
Launching the service
on word2vec-api --model path/to/the/model [--host host --port 1234]
or
on word2vec-api.py --model /path/to/GoogleNews-vectors-negative300.bin --binary BINARY --path /word2vec --host 0.0.0.0 --port 5000
Example calls
http://127.0.0.1:5000/word2vec/n_similarity?ws1=Sushi&ws1=Shop&ws2=Japanese&ws2=Restaurant
http://127.0.0.1:5000/word2vec/similarity?w1=Sushi&w2=Japanese
http://127.0.0.1:5000/word2vec/most_similar?positive=indian&positive=food[&negative=][&topn=]
http://127.0.0.1:5000/word2vec/model?word=restaurant
http://127.0.0.1:5000/word2vec/model_word_set
Note: The “model” method returns a base64 encoding of the vector. “model_word_set” returns a base64 encoded pickle of the model's vocabulary.
In case you do not have domain specific data to train, it can be convenient to use a pretrained model. Please feel free to submit additions to this list through a pull request.
| Model file | Number of dimensions | Corpus (size)| Vocabulary size | Author | Architecture | Training Algorithm | Context window - size | Web page | | — | — | — | — | — | — | — | — | — | | Google News | 300 |Google News (100B) | 3M | Google | word2vec | negative sampling | BoW - ~5| link | | Freebase IDs | 1000 | Gooogle News (100B) | 1.4M | Google | word2vec, skip-gram | ? | BoW - ~10 | link | | Freebase names | 1000 | Gooogle News (100B) | 1.4M | Google | word2vec, skip-gram | ? | BoW - ~10 | link | | Wikipedia+Gigaword 5 | 50 | Wikipedia+Gigaword 5 (6B) | 400,000 | GloVe | GloVe | AdaGrad | 10+10 | link | | Wikipedia+Gigaword 5 | 100 | Wikipedia+Gigaword 5 (6B) | 400,000 | GloVe | GloVe | AdaGrad | 10+10 | link | | Wikipedia+Gigaword 5 | 200 | Wikipedia+Gigaword 5 (6B) | 400,000 | GloVe | GloVe | AdaGrad | 10+10 | link | | Wikipedia+Gigaword 5 | 300 | Wikipedia+Gigaword 5 (6B) | 400,000 | GloVe | GloVe | AdaGrad | 10+10 | link | | Common Crawl 42B | 300 | Common Crawl (42B) | 1.9M | GloVe | GloVe | GloVe | AdaGrad | link | | Common Crawl 840B | 300 | Common Crawl (840B) | 2.2M | GloVe | GloVe | GloVe | AdaGrad | link | | Twitter (2B Tweets) | 25 | Twitter (27B) | ? | GloVe | GloVe | GloVe | AdaGrad | link | | Twitter (2B Tweets) | 50 | Twitter (27B) | ? | GloVe | GloVe | GloVe | AdaGrad | link | | Twitter (2B Tweets) | 100 | Twitter (27B) | ? | GloVe | GloVe | GloVe | AdaGrad | link | | Twitter (2B Tweets) | 200 | Twitter (27B) | ? | GloVe | GloVe | GloVe | AdaGrad | link | | Wikipedia dependency | 300 | Wikipedia (?) | 174,015 | Levy \& Goldberg | word2vec modified | word2vec | syntactic dependencies | link | | DBPedia vectors (wiki2vec) | 1000 | Wikipedia (?) | ? | Idio | word2vec | word2vec, skip-gram | BoW, 10 | link |