TMLS Data Science Workshop¶

This is an introductory workshop to data science using Python. We’ll be building a binary classification model to predict hospital readmission in patients with diabetes. A large focus will be on data pre-processing, which is a key part of the machine learning pipeline.

Key topics include:

Exploratory data analysis
Data cleaning
Feature selection
Supervised learning
Binary classification
Hyperparameter tuning

We’ll be using these packages to do our analysis:

Our dataset is from the UCI Machine Learning Repository which includes patient and hospital outcome data from 130 U.S. hospitals collected from 1999 to 2008.

Environment Setup¶

Option 1: Running Jupyter notebook locally
Option 2: Running Jupyter notebook via Google Colab (recommended)

For more details on how to get started, check out the ‘Getting Started’ section. All code is stored on Github (see repo diabetes-ml-workshop). There’s a fill-in-the-blank notebook that you can use to follow along in Google Colab. You can duplicate the notebook and modify it on your own account.

Glossary of Terms:

Workshop Walkthrough:

TMLS Data Science Workshop¶

Environment Setup¶

Indices and tables¶