CD2H gitForager

CornellDataScience/NLP_Research-FA17

Name: NLP_Research-FA17

Owner: Cornell Data Science

Description: Cornell Data Science: Machine learning research project

Created: 2017-09-17 20:31:34.0

Updated: 2018-03-01 23:19:49.0

Pushed: 2018-03-01 23:19:45.0

Homepage:

Size: 637427

Language: Python

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

CDS: NLP Research Team

Team Lead: Kenta Takatsu (CS '19)
Advisor: Prof. Thorsten Joachims

About Us

We are a student-led research team from Cornell Data Science (CDS), working on Natural Language Processing projects under Prof. Thorsten Joachims. This semester, we are participating in the Yelp Dataset Challenge to provide analytic insights from raw review texts. Our final products are research papers which makes use of machine learning algorithms and statistical validations. You can visit the subteam sections to see our individual work.

Achivements

This past semester, we had a wide range of research topics, from recommendation system to deep style transfer. In general, we took the approach called Natural Language Processing – an interaction between machine learning and text analysis.

All researches demonstrated remarkable results; an implementation of recommendation system that beats industry standard algorithm, an accurate analytic tool to assess business trends, a classifier to identify locally popular users, and a writing style transfer with deep learning.

Subteams

Extracting Rating Dimensions with Text Reviews

Members: Xuwen Shen (STAT '18), Xinzhe Yang (CS '20)
� In order to give insights to overall ratings and then create a new personalized recommendation system based on the rating that account for his or her preferences, we were hoping to extract hidden information in reviews including an individual user?s preference and a business?s properties (scores for each feature of the business). Finally, we created a model combining the topics and overall ratings to get a personalized ratings for a specific user.
Topic Modeling as a Trend-Aware Performance Metric

Members: Kenta Takatsu (CS '19), Caroline Chang (CS '20)
We are developing a stream-lined star-prediction system to better assess business performance using different types of classifiers, which accounts for the temporal trends in user review topics and the strength/weakness of business characteristics in latent space.
Local Experts in Yelp

Members: Brandon Kates (BTRY '19), Brian Cheang (CS '20)
The objective of the project is to build and combine two models (Local Expert Identifier / Topical Expert Identifier) for the purpose of identifying 'experts' among yelp users.
Neural Style Transfer For Text

Members: Luca Leeser (INFO '18), Yuji Akimoto (ORIE '19), Ryan Butler (CS '19), Cameron Ibrahim (ORIE '20)
We are seeking to modify the neural style transfer algorithm proposed by Gatys et. al. to make it applicable to text. Our goal is to devise an algorithm that is able to transfer the writing style of one review onto the content of another.

Final Submissions

You can visit our final papers from the following links:

How to get the code

The code uses git submodules, so to properly intialize those you need the --recurse-submodules option. Additionally, using --depth 1 will avoid cloning the history, making the clone faster.

clone --recurse-submodules --depth 1 https://github.com/CornellDataScience/Yelp-FA17.git

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.

CornellDataScience/NLP_Research-FA17

README

CDS: NLP Research Team

About Us

Achivements

Subteams

Extracting Rating Dimensions with Text Reviews

Topic Modeling as a Trend-Aware Performance Metric

Local Experts in Yelp

Neural Style Transfer For Text

Final Submissions

How to get the code