TMLS Data Science Workshop¶
This is an introductory workshop to data science using Python. We’ll be building a binary classification model to predict hospital readmission in patients with diabetes. A large focus will be on data pre-processing, which is a key part of the machine learning pipeline.
Key topics include:
- Exploratory data analysis
- Data cleaning
- Feature selection
- Supervised learning
- Binary classification
- Hyperparameter tuning
We’ll be using these packages to do our analysis:
Our dataset is from the UCI Machine Learning Repository which includes patient and hospital outcome data from 130 U.S. hospitals collected from 1999 to 2008.
Environment Setup¶
- Option 1: Running Jupyter notebook locally
- Option 2: Running Jupyter notebook via Google Colab (recommended)
For more details on how to get started, check out the ‘Getting Started’ section. All code is stored on Github (see repo diabetes-ml-workshop). There’s a fill-in-the-blank notebook that you can use to follow along in Google Colab. You can duplicate the notebook and modify it on your own account.
Glossary of Terms: