The ever increasing computational capacity has enabled us to acquire, process and analyse larger data-sets and information. We increasingly want to take a data-driven lens to solve business problems. But business problems are inherently ‘wicked in nature’ - with multiple stakeholder, different problem definition, different solutions interdependence, constraints, amplifying loops etc. There is no one trick to solve them. What is required is learning a structured approach to problem solving that can be applied to large set of these problems. One possible way is to use a Hypotheses Driven Approach - problems definition, scoping, issue identification and hypothesis generation - as a starting point for this. In this workshop, you will learn how to apply this hypotheses driven approach through seven pragmatic steps - Frame, Acquire, Refine, Transform, Explore, Model, and Insight - to any business problem. The focus will be to learn the principles through an applied case study and by actually coding in R or Python or JavaScript to solve this.

Objective

Imbibe the underlying principles of data analytics and learn how to use the data science pipeline
Develop proficiency in using R or python data stack and libraries like ggplot, dplyr, stats (for R) and pandas, scikit-learn (for python)
Learn how to employ statistical and machine learning algorithms to solve real life problems

Approach

Taught by real life practicioners
Tested and practical curriculum with real data sets
Interactive and live coding sessions

Target Audicence

Professionals interested in learning data science
Programmers interested in building data driven products
Journalist, scientist, researchers interested in telling data stories
Business Intelligence analysts and consultants

The workshop is ideal for anyone who wants to learn how to use open source software - R or Python stack for statistical analysis and visualization. If you are not using R or Python for statistical analysis, then existing familiarity with data analysis in some other tool would help. There is no pre-requisite requirement to be familiar with the R or Python libraries mentioned above.

Software Requirements

For doing the exercise during the workshop, we would be using R and R IDE - R Studio or Anaconda Distribution for Python. Please install the same in your machine prior to the workshop session. For attendees more curious, we will be using Rmarkdown or Jupyter Notebook as our IDE. Some of the main libraries we will using in the session are:

For R: dplyr, tidyr, ggplot, ggmap, plotly, rmarkdown, purr, prophet and forcats
For python: numpy, plotnine, seaborn, matplotlib, prophet and scikit-learn.

The working repo for this workshop is at https://github.com/amitkaps/art-data-science

Curriculum

1. INTRO

“I think, therefore I am”

What is data science?
What type of questions can be answered?
Frame/Acquire/Refine/Explore/Model/Insight/Build/Deploy framework

2. FRAME

“Framing the problem is often far more essential than its solution”

How to frame a data science problem?
Learn the hypothesis-driven approach?
How do you start - question driven, dataset driven or both?

3. ACQUIRE

“Data is the new oil”

Sources of Data
- Download from an internal system
- Obtained from client or other 3rd party
- Extracted from a web-based API
- Scraped from a website / pdfs
- Gathered manually and recorded
Acquire data from a csv file or a database
Acquire data from a website (scrapint) or 3rd part client (e.g. twitter)

4. REFINE

“Data is messy”

Concept of Tidy Data - Why is it important?
- Missing e.g. Check for missing or incomplete data
- Quality e.g. Check for duplicates, accuracy, unusual data
- Remove e.g. remove redundant data
- Parse e.g. extract year from date
- Derive e.g. gender from title

5. TRANSFORM

“What is measured may not help answer what is needed “

Convert e.g. free text to coded value
Merge e.g. first and surname for full name
Calculate e.g. percentages, proportion
Aggregate e.g. rollup by year, cluster by area
Filter e.g. exclude based on location
Sample e.g. extract a representative data
Summary e.g. show summary stats like mean
Basic statistics: variance, standard deviation, co-variance, correlation

6. EXPLORE

“I don’t know, what I don’t know”

Why do visual exploration?
Understand Data Structure & Types
Grammar of Graphics and Basics of visualisation
Explore single & dual variable graphs
Explore multi-dimensional variable graphs
Creating new features from the data

7. MODEL

“All models are wrong, Some of them are useful”

Introduction to Machine Learning
The power and limits of models
Tradeoff between Prediction Accuracy and Model Interpretability
Bias-Variance tradeoff & Overfitting
Prediction Problem - Regression & Classification
Model Family - Linear & Tress

8. INSIGHT

“The goal is to turn data into insight”

Assessing Model Accuracy
- For Regression problems: RMSE
- For classification problems: Accuracy, AUC/ROC
Selecting a Model: Cross Validation
Why do we need to communicate insight?
Types of communication - Exploration vs. Explanation
Explanation: Telling a story with data
Exploration: Building an interface for people to find stories

Instructors

Amit Kapoor

Crafting Visual Stories with Data

Bargava Subramanian

Senior Data Scientist

Tickets

Venue

Loading...

IKP EDEN, 16, Bhuvanappa Layout, Tavarekere Main Rd, Kaveri Layout, Suddagunte Palya, Bengaluru, Karnataka 560029.

Directions

Workshops: The Art of Data Science

Learn how to employ statistical and machine learning algorithms to solve real life problems

3rd Nov: Workshop in R, 4th Nov: Workshop in Python | IKP Eden, Bangalore

Objective

Approach

Target Audicence

Software Requirements

Curriculum

1. INTRO

2. FRAME

3. ACQUIRE

4. REFINE

5. TRANSFORM

6. EXPLORE

7. MODEL

8. INSIGHT

Instructors

Tickets

Loading...

Venue

Loading...