adobe/NLP-Cube

Name: NLP-Cube

Owner: Adobe Systems Incorporated

Description: Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing

Created: 2018-03-13 18:50:44.0

Updated: 2018-05-21 18:25:01.0

Pushed: 2018-05-21 13:59:50.0

Homepage:

Size: 6606

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

NLP-Cube

Setup:

Before running the server, you need the model's weights, and you can follow two approaches to get them:

Installing dyNET:
  1. Make sure you have Mercurial, python, pip, cmake installed (you can also check steps documented here)

  2. Install Intel's MKL library

  3. Install dyNET by using the installation steps from the manual installation page. More specifically, you should use:

    install cython
    r dynet-base
    ynet-base
    
    clone https://github.com/clab/dynet.git
    lone https://bitbucket.org/eigen/eigen -r 2355b22  # -r NUM specified a known working revision
    
    ynet
    r build
    uild
    e .. -DEIGEN3_INCLUDE_DIR=/path/to/eigen -DMKL_ROOT=/opt/intel/mkl -DPYTHON=`which python2`
    
     -j 2 # replace 2 with the number of available cores
     install
    
    ython
    on2 ../../setup.py build --build-dir=.. --skip-build install
    
Training the lemmatizer:

Use the following command to train your lemmatizer:

 Running the server:

the following command to run the server locally:

Current status

ToDO

Parser architecture

-----------------                    -------------------------- 
|word emebddings|----          ------|morphological embeddings|
-----------------    |        |      --------------------------
                     |        |
                   --------------
                   |concatenate |
                   --------------
                          |
                  ----------------
                  |bdlstm_1_layer|
                  ----------------
                          |
                  ----------------                  
                  |bdlstm_2_layer| 
                  ----------------                    
                          |-----------------------------------------------------------------                          
                  ----------------                                                         |
                  |bdlstm_3_layer|                                                         |
                  ----------------                                                         |
                          |                                                                |
     ---------------------------------------------                    ---------------------------------------------              
     |           |                |              |                    |           |                |              |
     |           |                |              |                    |           |                |              |
 ---------  -----------       ----------    ------------          ---------  -----------       ----------    ------------
 |to_link|  |from_link|       |to_label|    |from_label|          |to_link|  |from_link|       |to_label|    |from_label|
 ---------  -----------       ----------    ------------          ---------  -----------       ----------    ------------
      |        |                      |       |                       |           |                  |            |
    --------------                 ---------------                  ------------------            -------------------
    |softmax link|                 |softmax label|                  |aux softmax link|            |aux softmax label|
    --------------                 ---------------                  ------------------            -------------------


Tagger architecture

-----------------                    ---------------------- 
|word emebddings|----          ------|character embeddings|
-----------------    |        |      ----------------------
                     |        |
                   --------------
                   |tanh_1_layer|
                   --------------
                          |
                  ----------------
                  |bdlstm_1_layer|
                  ----------------
                          |
                   --------------                  
                   |tanh_2_layer|-------------------
                   --------------                   |
                          |                         |
                  ----------------         -------------------
                  |bdlstm_2_layer|         |aux_softmax_layer|
                  ----------------         -------------------
                          |
                   ---------------
                   |softmax_layer|
                   ---------------


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.