TMLS Data Science Workshop

This is an introductory workshop to data science using Python. We’ll be building a binary classification model to predict hospital readmission in patients with diabetes. A large focus will be on data pre-processing, which is a key part of the machine learning pipeline.

Key topics include:

  • Exploratory data analysis
  • Data cleaning
  • Feature selection
  • Supervised learning
  • Binary classification
  • Hyperparameter tuning

We’ll be using these packages to do our analysis:

Our dataset is from the UCI Machine Learning Repository which includes patient and hospital outcome data from 130 U.S. hospitals collected from 1999 to 2008.

Environment Setup

  • Option 1: Running Jupyter notebook locally
  • Option 2: Running Jupyter notebook via Google Colab (recommended)

For more details on how to get started, check out the ‘Getting Started’ section. All code is stored on Github (see repo diabetes-ml-workshop). There’s a fill-in-the-blank notebook that you can use to follow along in Google Colab. You can duplicate the notebook and modify it on your own account.

Indices and tables