User: ML_Beginner_India Subject: How much data needed to train ML model?: ML and AI

Navigation

RSS Feeds

Articles Downloads Forums News Web Links

Member Polls

User: ML_Beginner_India Subject: How much data needed to train ML model?

Last updated on 3 months ago

ML and AI

Track thread Print

KevinVeteran Member

Posted 3 months ago

I am trying to build a machine learning model for predicting customer churn in my company. Currently I have data for 5000 customers over 2 years. Is this enough data or do I need more? Also which algorithm should I use - logistic regression or random forest?

KevinVeteran Member

Posted 3 months ago

Reply by: DataScientist_10yrs
5000 records is decent dataset for binary classification problem like churn prediction. You can definitely start with this. For algorithm choice, I suggest try both and compare results. Start with logistic regression as baseline because its simple and interpretable. Then try random forest or XGBoost which usually give better accuracy but are more complex. Use cross-validation to evaluate properly.

KevinVeteran Member

Posted 3 months ago

Reply by: Analytics_Consultant
Also make sure your data is balanced - meaning you have similar number of churned and non-churned customers. If 95% customers didnt churn and only 5% churned, then your model will have class imbalance problem. In that case you need to use techniques like SMOTE or adjust class weights. Data quality and feature engineering is more important than quantity of data.

You can view all discussion threads in this forum.
You cannot start a new discussion thread in this forum.
You cannot reply in this discussion thread.
You cannot start on a poll in this forum.
You cannot upload attachments in this forum.
You cannot download attachments in this forum.

Users Online Now

Guests Online 11
Members Online 0

Total Members: 27
Newest Member: Howardzit