CornellDataScience/DuQI

Name: DuQI

Owner: Cornell Data Science

Description: ?-ee: Duplicate Question Identification

Created: 2018-03-03 18:51:21.0

Updated: 2018-03-22 16:40:05.0

Pushed: 2018-03-22 16:40:04.0

Homepage:

Size: 6647

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

DuQI: Duplicate Question Identification

Members: Brandon Kates, Zhao Shen, Arnav Ghosh

Objective: To create a system capable of detecting duplicate questions on Q&A platforms.

We expect our approach to help centralize the available knowledge on a single question/issue and direct users with questions that have already been answered to the appropriate resource.

We will test a variety of duplicate question identification methods on the Quora question pairs dataset, and hope to eventually apply our findings to the classroom Q&A platform Piazza to improve the Cornell student experience.

Data Requirements

Below is the data required to successfully train/run all of the models.

In the current directory (“DuQI”), create a folder named “data” and populate it with:

Final directory should look like:


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.