Szilard Pafka
Login:
szilard
Company:
null
Location:
Santa Monica, California
Bio:
physics PhD, chief (data) scientist, meetup organizer, datascience.la, (visiting) professor
Blog:
https://www.linkedin.com/in/szilard
Blog:
https://www.linkedin.com/in/szilard
Member of
- DataScience.LA
- useR! 2014
Repositories
-
2018.erum.io
-
Homepage of the 2018 event
-
app-consumer-loan
-
null
-
benchm-databases
-
A minimal benchmark of various tools (statistical software, databases etc.) for working with tabular data of moderately large sizes (interactive data analysis).
-
benchm-dl
-
Playing with various deep learning tools and network architectures
-
benchm-dplyr-dt
-
null
-
benchm-ml
-
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
-
benchm-R-mysql
-
null
-
BigDataDayLA2015-DataScience
-
List of talks from the Data Science Track of Big Data Day LA 2015 (annual free conference)
-
datascience-1slide
-
Data Science in 1 Slide
-
datascience-course-historical
-
Inspired by David Donoho's "50 Years of Data Science" (2015) paper, I'm releasing here a course proposal draft I wrote in 2009 for a possible course of "data science".
-
datascience-latency
-
Latency numbers every data scientist should know (aka the pyramid of analytical tasks) - the order of magnitude of computational time for the most common analytical tasks (SQL-like data munging, linear and non-linear supervised learning etc.) with the typically available tools on commodity hardware.
-
dataset-sizes-kdnuggets
-
Size of datasets used for analytics based on 10 years of surveys by KDnuggets.
-
dplyr
-
Plyr specialised for data frames: faster & with remote datastores
-
event-BigDataCampLA2014
-
null
-
GBM-meltdown
-
The Effect of the Linux Kernel Page-Table Isolation (KPTI) Patch (Meltdown Vulnerability) on GBMs
-
GBM-multicore
-
GBM multicore scaling: h2o, xgboost and lightgbm on multicore and multi-socket systems
-
GBM-perf
-
Performance of various open source GBM implementations
-
GBM-tune
-
Tuning GBMs (hyperparameter tuning) and impact on out-of-sample predictions
-
GBM-workshop
-
Code (and other materials) for an introductory talk/workshop on GBMs (developed originally for an R-Ladies Meetup)
-
h2o-experiments
-
null
-
h2o-scoring--OLD
-
Various options for deploying h2o.ai models to production (scoring new data)
-
kaggle-scripts-R-pydata
-
Kaggle scripts: R vs pydata + most popular R and Python packages for Machine Learning
-
LA-data-meetups
-
null
-
LightGBM
-
A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. It is under the umbrella of the DMTK(http://github.com/microsoft/dmtk) project of Microsoft.
-
meetup-presentations_budapest
-
R-Ladies Budapest - This is the collection of code, presentations and additional materials created by the Budapest R-Ladies community
-
ml-algos-perf
-
Performance of Machine Learning Algorithms - playground for experimentation in order to understand their performance characteristics as a function of the attributes of the datasets used for training
-
ml-hacks
-
null
-
ml-prod
-
Some thoughts on how to use machine learning in production
-
MLprod-1slide
-
Machine Learning in Production in 1 Slide
-
ML-scoring
-
Compare the scoring speed of several open source machine learning libraries.
-
ml-x1
-
Machine learning tools on monster EC2 X1 instance (128 cores, 2 TB RAM)
-
mxnet_shiny
-
Image Classification using MXNetR
-
RMySQL
-
An R interface for MySQL
-
shinyvalidinp
-
null
-
shinyvalidinp-demo
-
null
-
student-data-science-project-1-kaggle
-
Sample student project for the Data Science course I was teaching at CEU's MSc in Business Analytics https://github.com/szilard/teach-data-science-msc-analytics-ceu
-
student-data-science-project-2
-
Sample student project for the Data Science course I was teaching at CEU's MSc in Business Analytics https://github.com/szilard/teach-data-science-msc-analytics-ceu
-
student-data-science-project-3
-
Sample student project for the Data Science course I was teaching at CEU's MSc in Business Analytics https://github.com/szilard/teach-data-science-msc-analytics-ceu
-
survey-ml-tools
-
Quick informal survey at the Los Angeles Machine learning meetup about tools used for machine learning.
-
talk-DataVisLA-intro
-
null
-
talk-GALA-DScourse
-
null
-
talk-LARUG-munging
-
null
-
talks
-
A list of recent talks by Szilard at various meetups, conferences etc. (link to slides/code/video etc.).
-
teach-data-science-msc-analytics-ceu
-
Materials for a short introductory/intermediate Data Science course taught in the MSc in Business Analytics program at the Central European University
-
teach-data-science-UCLA-master-appl-stats
-
Materials for STATS 418 - Tools in Data Science course taught in the Master of Applied Statistics at UCLA
-
teach-ML-CEU-master-bizanalytics
-
Machine Learning #1 and #2 courses at CEU Master of Science in Business Analytics
-
useR2016-subm
-
null
-
xgboost
-
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
-
xgboost-adv-workshop-LA
-
Advanced workshop on XGBoost with Tianqi Chen in Santa Monica, June 2, 2016
Commits To
Repository | Most Recent Commit | # Commits |