My Data Science Research Canvas

3 min readDec 10, 2020

Don’t Repeat Myself when starting a data science project.

I am writing this post to remind myself about the list of items that I need to step through when starting a data science project.

Define the problem

You can’t solve a problem until you have defined it.

Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise. — John Tukey

Knowing what is possible will also save you time down the track. It is practically impossible to predict churn when they all cancel their subscription as soon as they finished the signup.

The following steps are steps I would take to formulate and refine the problem before I dive into building a model. I will use churn as an example and provide some of the resources I found useful during the research.

Identify existing business applications, practices and resources :

I often start by identifying existing business applications and practices, the first step does not need to start with an ML model.

This phase often helps me to identify:

How people think about churn. Churn is not a single problem, it can result from natural usage, churn from bad product-market fit, poor experience etc.
What is the industry standard fo churn? Are we doing well?

Here are some of the examples of research that I found while researching for churn.

CHURN.FM - It's your churn!

My name is Andrew Michael and I started CHURN.FM, as I was tired of hearing stories about some magical silver bullet…

www.churn.fm

What is good retention — Issue 29

Segment session on retention

Identify ML practices in these the area

The next research step focuses more on the ML practice, how are other data scientists tackling the problem.

How people formulate the problem. Take the churn prediction task, you can formulate it as a binary classification, or detecting anomaly behaviours in product usage prior to churn.
Identify a starting point and benchmark metric.
Common problems faced by others. (Data leakage is one of the most common difficulties for curating a dataset for churn prediction.)

Customer Churn Prediction Using Machine Learning: Main Approaches and Models

Customer Churn Prediction for Subscription Businesses Using Machine Learning: Main Approaches and Models

Games and Big Data: A Scalable Multi-Dimensional Churn Prediction Model

Why you should stop predicting customer churn and start using uplift models

Public dataset and analysis

Another thing I found quite helpful is to take a look at the public dataset and analysis on a platform like Kaggle. They give ideas of how the data can be curated and also issues associated.

Telecom Customer Churn Prediction

Bank Customer Churn Prediction

Perform an Exploratory Data Analysis

This ties together the research in the above steps and how it fits with the specific problem you are dealing with.

Aftermaths

At the end of this particular project, although I did end up building a churn model as requested. What was most valuable was the insight into the different pathway of how a customer can churn.

Giving a binary or probability a customer will churn to stakeholder is of very limited value as they don’t understand why they churn and they are limited to generic approach such as price discount.

On the other hand, providing segments of the customer by their usage and then identify the point of intervention provides marketers and stakeholders more power to design solutions for each individual scenario.

The retention strategy for a streaming platform user who returns every day at 8 pm will be very different from a customer who binges 10 series in 3 days and then goes dormant.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Data Science

Churn

Written by Michael C. J. kao

94 Followers

123 Following

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Michael C. J. kao

Machine Learning System Design Interview Extension — People You May Know

Michael C. J. kao

Machine Learning System Design Interview Extension — People You May Know

Welcome to another post on the extension of the Machine Learning System Design Interview Extension where I share some personal extensions…

Dec 28, 2023

Machine Learning System Design Interview — A Personal Extension (Chapter 2)

Michael C. J. kao

Machine Learning System Design Interview — A Personal Extension (Chapter 2)

This is the first of the multi-part series on personal extensions I would make to the solution listed in the Machine Learning System…

Dec 15, 2023

Machine Learning System Design Interview — A Personal Extension (Intro)

Michael C. J. kao

Machine Learning System Design Interview — A Personal Extension (Intro)

I’ve recently picked up the Machine Learning System Design Interview book for Christmas, which is undoubtedly one of the most insightful…

Dec 15, 2023

Machine Learning System Design Interview Extension— Video Recommendation System

Michael C. J. kao

Machine Learning System Design Interview Extension— Video Recommendation System

In this post, I will continue to share additional personal considerations on the case presented in the Machine Learning System Design…

Dec 20, 2023

See all from Michael C. J. kao

Recommended from Medium

How I Landed a Spotify Data Science Internship: My Guide to Get Into Top Tech

TDS Archive

Khouloud El Alami

How I Landed a Spotify Data Science Internship: My Guide to Get Into Top Tech

Interview + resume tips to ace your application process

Apr 21, 2024

How Does Our Sense of Humor Change With Age? A Statistical Analysis

Fanfare

Daniel Parris

How Does Our Sense of Humor Change With Age? A Statistical Analysis

How do our comedic sensibilities form and transform over time?

Jun 22, 2024

Lists

Predictive Modeling w/ Python

20 stories1857 saves

Practical Guides to Machine Learning

10 stories2225 saves

Coding & Development

11 stories1033 saves

ChatGPT prompts

51 stories2643 saves

Pulp Analytics

Juan Pablo Duque

Data Analytics Methods for Marketing

Cracking the Case: How Data Analytics Solves Marketing’s Biggest Mysteries

Feb 12

Incrementality Testing Frameworks: A Deep Dive

Harminder Puri

Incrementality Testing Frameworks: A Deep Dive

In today’s data-driven marketing landscape, accurately measuring the impact of your campaigns is crucial. Enter incrementality testing — a…

Sep 19, 2024

Mock Interview 1: Data Structures, Algorithms, Computer Networks, Operating Systems, DBMS

NRT0401

Mock Interview 1: Data Structures, Algorithms, Computer Networks, Operating Systems, DBMS

Sep 24, 2024

Rakuten Machine Learning Engineer Interview Process

NextGenAI

Prem Vishnoi(cloudvala)

Rakuten Machine Learning Engineer Interview Process

Tech stack at Rakuten

Feb 27

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams