Jimmy Lin

Login: lintool

Company: University of Waterloo

Location: Nearby data lake

Bio: I profess to know very little at the University of Waterloo. I used to write code for Twitter and slides for Cloudera.

Blog: https://cs.uwaterloo.ca/~jimmylin/

Blog: https://cs.uwaterloo.ca/~jimmylin/

Member of

  1. Afterburner
  2. Archives Unleashed
  3. Castorini
  4. Data Systems Group - University of Waterloo
  5. LIARR Workshop at SIGIR 2017
  6. recommenders
  7. TREC Real-Time Summarization Track

Repositories

AnseriniMaven
Maven repo for some Anserini dependencies.
aut
The Archives Unleashed Toolkit is an open-source platform for analyzing web archives.
bespin
Reference implementations of data-intensive algorithms in MapReduce and Spark
bespin-data
Datasets for Bespin
bfscan
Document retrieval using brute force scans
bigdata-2016w
CS 489/698 Big Data Infrastructure (Winter 2016) at the University of Waterloo
bigdata-2017w
CS 489/698 Big Data Infrastructure (Winter 2017) at the University of Waterloo
bigdata-2018w
CS 451/651 431/631 Data-Intensive Distribute Computing (Winter 2018) at the University of Waterloo
BuboQA
Question answering over knowledge graphs
cassovary
Cassovary is a simple big graph processing library for the JVM
Cassovary-vs-GraphJet
Performance comparison between Cassovary and GraphJet
c-bfscan
Implementations of brute force scans for document retrieval in C
chrome-archive-this-page
Internet Archive "Save a Page" Plug-In for Chrome
chrome-archive-this-page-crx
Packaged CRX distribution for Internet Archive "Save a Page" Plug-In
chrome-scholar-search-extension
Google Scholar Search Extension for Chrome
chrome-scholar-search-extension-crx
Chrome CRX packages for the Google Scholar Search Extension
Cloud9
Cloud9 is a Hadoop toolkit for working with big data
clueweb
Hadoop tools for manipulating ClueWeb collections
clueweb09en01-webgraph
Webgraph for ClueWeb09 Category B
ClueWeb09-TREC-LTR
learning-to-rank dataset extracted from ClueWeb09 using TREC judgments
Congress108-metadata
Metadata for 108th United States Congress
Enron2mbox
Converting the Enron email collection to mbox format
GiraphTutorial
Giraph Tutorial
GrimmerSenatePressReleases
Grimmer's Senate Press Releases
guide
The Student's Guide to @lintool
hadoop1
a MapReduce execution engine for multi-core, shared-memory architectures
hadoop1-data
null
IR-Reproducibility
Open-Source Information Retrieval Reproducibility Challenge
IR-Reproducibility-exp
Experimental runs from the Open-Source Information Retrieval Reproducibility Challenge.
Ivory
A Hadoop toolkit for web-scale information retrieval research
Ivory-data
null
JASS
Anytime Ranking for Impact-Ordered Indexes
JScene
A proof-of-concept in-browser JavaScript-based search engine
MapReduceAlgorithms
Data-Intensive Text Processing with MapReduce
Mr.LDA
Scalable Topic Modeling using Variational Inference in MapReduce
Mr.LDA-data
Sample Data for Mr.LDA
my-data-is-bigger-than-your-data
My data is bigger than your data!
NSF-projects
NSF project homepages
OptTrees
Source code for: Nima Asadi, Jimmy Lin, and Arjen P. de Vries. Runtime Optimizations for Tree-Based Machine Learning Models. IEEE Transactions on Knowledge and Data Engineering, 26(9):2281-2292, 2014.
scholar-scraper
Scrapes citation statistics from Google Scholar
SparkTutorial
Spark Tutorial at the University of Maryland
tools
Lintools: tools by @lintool
trec-mb-vis
Visualization of TREC Microblog Track relevance judgments
TweetAnalysisWithSpark
Tweet Analysis with Spark
Tweets2013-stats
null
TweetTap
Simple program to tap the Twitter sample stream
twitter-tools
Twitter Tools
UMD-courses
Course homepages for courses that I've taught at the University of Maryland
UROC-projects
Undergraduate Research Opportunities Conference sponsored by the University of Waterloo
warcbase
Warcbase is an open-source platform for managing analyzing web archives
wikiclean
A Java Wikipedia markup to plain text converter
wiki-tools
Collection of tools for working with Wikipedia
Zambezi
Real-time indexer and search engine

Commits To

RepositoryMost Recent Commit# Commits


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.