eScience logo

Data Science

for

Administrative datasets

April 23rd - 25th, 2018

Data Science

Data visualization

Machine learning

Software

Reproducibility

Why is programming important?

Flexibility

Automation

Reproducibility

Administrative datasets

Repurposed

Often messy

Data about people

Significant ethical and practical implications

Tabular data

Tidy?

Observations in rows, variables in columns, datatypes in tables

Pandas

Data maturity

U Chicago data maturity framework

Problem Definition

Data and Technology Readiness

Organizational Readiness

Structure of this course

Modular

Hands-on

Schedule

Day 1

Time Topic
9 - 9:50 Introduction
9:50 - 10 Break
10 - noon Programming in Python
noon - 1 Lunch
1 - 1:45 Version control with git
1:45 - 2 Break
2 - 4 Git (continued)

Schedule

Day 2

Time Topic
9 - 10:30 Tabular data in Python
10:30 - 10:45 Break
11 - noon Tabular data in Python (continued)
noon - 1 Lunch
1 - 1:45 Data manipulations in Pandas
1:45 - 2 Break
2 - 4 Data manipulations in Pandas (continued)

Schedule

Day 3

Time Topic
9 - 10:30 Statistics in Pandas
10:30 - 10:45 Break
11 - noon Statistics in Pandas (continued)
noon - 1 Lunch
1 - 2:15 Data visualization with Matplotlib
2:15 - 2:30 Break
2:30 - 4 Discussion: next steps

Technical infrastructure

https://notebooks.azure.com/

What questions do you have?