.. Intro to Data Science documentation master file, created by
sphinx-quickstart on Thu Oct 31 11:07:37 2019.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
TMLS Data Science Workshop
===========================
This is an introductory workshop to data science using Python. We'll be
building a binary classification model to predict hospital readmission
in patients with diabetes. A large focus will be on data pre-processing,
which is a key part of the machine learning pipeline.
Key topics include:
- Exploratory data analysis
- Data cleaning
- Feature selection
- Supervised learning
- Binary classification
- Hyperparameter tuning
We'll be using these packages to do our analysis:
- `pandas `_
- `numpy `_
- `matplotlib `_
- `seaborn `_
- `scikit-learn `_
Our `dataset `_
is from the UCI Machine Learning Repository which includes patient and hospital outcome
data from 130 U.S. hospitals collected from 1999 to 2008.
Environment Setup
-----------------
- Option 1: Running Jupyter notebook locally
- Option 2: Running Jupyter notebook via `Google Colab `_ (recommended)
For more details on how to get started, check out the 'Getting Started' section. All code is stored
on Github (see repo `diabetes-ml-workshop `_).
There's a `fill-in-the-blank notebook `_
that you can use to follow along in Google Colab. You can duplicate the notebook and modify it on your own account.
.. toctree::
:maxdepth: 1
:caption: Glossary of Terms:
markdown/machine_learning.md
.. toctree::
:maxdepth: 2
:caption: Workshop Walkthrough:
markdown/getting_started.md
notebooks/walkthrough.ipynb
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`