## Overview

Data Science is emerging as a hot new profession and academic discipline. Harvard Business Review says Data Scientist is the Sexiest Job of the 21st Century. But demand for data scientists is racing ahead of supply. People with the necessary skills are scarce, primarily because the discipline is so new. This course is designed to give a start and introduction to this new discipline. This course is spread across 2 days and will have a plenty of hands on exercises on real world data.

## Topics to be covered

- Setting up Data Analysis Environment in python
- Working with Numbers using Numpy
- Accessing, preparing and exploring data with Pandas & Scipy
- Data Exploration and visualization & Basic Statistical Analysis
- Machine Learning using scikit-learn
- Making Predictions using Regression and classification algorithms
- Clustering
- Text Analytics
- Building and validating a model

## Pre-requisites

- A basic understanding of data and programming is required.
- Programming experience using Python is essential

## Detailed Outline

### Introduction to Data Science

- Introduction to Data Science
- Setting up Python Environment for Data Analysis
- Overview of Data Analysis Software Stack - Numpy, Pandas, Matplotlib, scipy and Scikit-learn
- Hands On Exercise

### Working with Numbers

- Introduction to Numpy array
- Overview of Array and operations
- N-Dimensional array and manipulations

### Accessing and preparing data with Pandas

- Loading data from varieties of sources: CSV, Databases
- Data manipulation - Filtering, Grouping, Ordering of data
- Dealing with missing Data
- Dealing with Continuous and categorical variables
- Normalizing and transforming data

### Data Exploration, Visualizations & Statistical Analysis

- Basic Statistical analysis using scipy.stats
- Drawing Histograms, Bar charts, Density Plots, Box Plots
- Drawing Density plots and understating data distributions
- Univariate Analysis – Statistics Summary, Hypothesis tests

### Regression and Classification Algorithms

- Understand Regression Techniques
- Simple Linear Regression & Multiple Linear Regressions
- Measuring accuracy of the models
- Regression Diagnostics - Validating Models
- Making Predictions using Classification algorithms - Logistic Regression

### Clustering

- Understanding k-means clustering and creating Segments
- Creating clustering plots and Dendograms

### Text Analytics

- Handling Text and unstructured Data
- Accessing Social Data - Integrating with Twitter
- Trend Analysis
- Web scraping using Python

### Steps to build and validate models

- Creating Training, validation and Test Data Sets
- Cross validations
- Understanding Accuracy measures

## Comments