Szilard Pafka

Login: szilard

Company: null

Location: Santa Monica, California

Bio: physics PhD, chief (data) scientist, meetup organizer, datascience.la, (visiting) professor

Blog: https://www.linkedin.com/in/szilard

Blog: https://www.linkedin.com/in/szilard

Member of

  1. DataScience.LA
  2. useR! 2014

Repositories

2018.erum.io
Homepage of the 2018 event
app-consumer-loan
null
benchm-databases
A minimal benchmark of various tools (statistical software, databases etc.) for working with tabular data of moderately large sizes (interactive data analysis).
benchm-dl
Playing with various deep learning tools and network architectures
benchm-dplyr-dt
null
benchm-ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
benchm-R-mysql
null
BigDataDayLA2015-DataScience
List of talks from the Data Science Track of Big Data Day LA 2015 (annual free conference)
datascience-1slide
Data Science in 1 Slide
datascience-course-historical
Inspired by David Donoho's "50 Years of Data Science" (2015) paper, I'm releasing here a course proposal draft I wrote in 2009 for a possible course of "data science".
datascience-latency
Latency numbers every data scientist should know (aka the pyramid of analytical tasks) - the order of magnitude of computational time for the most common analytical tasks (SQL-like data munging, linear and non-linear supervised learning etc.) with the typically available tools on commodity hardware.
dataset-sizes-kdnuggets
Size of datasets used for analytics based on 10 years of surveys by KDnuggets.
dplyr
Plyr specialised for data frames: faster & with remote datastores
event-BigDataCampLA2014
null
GBM-meltdown
The Effect of the Linux Kernel Page-Table Isolation (KPTI) Patch (Meltdown Vulnerability) on GBMs
GBM-multicore
GBM multicore scaling: h2o, xgboost and lightgbm on multicore and multi-socket systems
GBM-perf
Performance of various open source GBM implementations
GBM-tune
Tuning GBMs (hyperparameter tuning) and impact on out-of-sample predictions
GBM-workshop
Code (and other materials) for an introductory talk/workshop on GBMs (developed originally for an R-Ladies Meetup)
h2o-experiments
null
h2o-scoring--OLD
Various options for deploying h2o.ai models to production (scoring new data)
kaggle-scripts-R-pydata
Kaggle scripts: R vs pydata + most popular R and Python packages for Machine Learning
LA-data-meetups
null
LightGBM
A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. It is under the umbrella of the DMTK(http://github.com/microsoft/dmtk) project of Microsoft.
meetup-presentations_budapest
R-Ladies Budapest - This is the collection of code, presentations and additional materials created by the Budapest R-Ladies community
ml-algos-perf
Performance of Machine Learning Algorithms - playground for experimentation in order to understand their performance characteristics as a function of the attributes of the datasets used for training
ml-hacks
null
ml-prod
Some thoughts on how to use machine learning in production
MLprod-1slide
Machine Learning in Production in 1 Slide
ML-scoring
Compare the scoring speed of several open source machine learning libraries.
ml-x1
Machine learning tools on monster EC2 X1 instance (128 cores, 2 TB RAM)
mxnet_shiny
Image Classification using MXNetR
RMySQL
An R interface for MySQL
shinyvalidinp
null
shinyvalidinp-demo
null
student-data-science-project-1-kaggle
Sample student project for the Data Science course I was teaching at CEU's MSc in Business Analytics https://github.com/szilard/teach-data-science-msc-analytics-ceu
student-data-science-project-2
Sample student project for the Data Science course I was teaching at CEU's MSc in Business Analytics https://github.com/szilard/teach-data-science-msc-analytics-ceu
student-data-science-project-3
Sample student project for the Data Science course I was teaching at CEU's MSc in Business Analytics https://github.com/szilard/teach-data-science-msc-analytics-ceu
survey-ml-tools
Quick informal survey at the Los Angeles Machine learning meetup about tools used for machine learning.
talk-DataVisLA-intro
null
talk-GALA-DScourse
null
talk-LARUG-munging
null
talks
A list of recent talks by Szilard at various meetups, conferences etc. (link to slides/code/video etc.).
teach-data-science-msc-analytics-ceu
Materials for a short introductory/intermediate Data Science course taught in the MSc in Business Analytics program at the Central European University
teach-data-science-UCLA-master-appl-stats
Materials for STATS 418 - Tools in Data Science course taught in the Master of Applied Statistics at UCLA
teach-ML-CEU-master-bizanalytics
Machine Learning #1 and #2 courses at CEU Master of Science in Business Analytics
useR2016-subm
null
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
xgboost-adv-workshop-LA
Advanced workshop on XGBoost with Tianqi Chen in Santa Monica, June 2, 2016

Commits To

RepositoryMost Recent Commit# Commits


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.