Oh no! Where's the JavaScript?
Your Web browser does not have JavaScript enabled or does not support JavaScript. Please enable JavaScript on your Web browser to properly view this Web site, or upgrade to a Web browser that does support JavaScript.

User: ML_Beginner_India Subject: How much data needed to train ML model?

Last updated on 3 days ago
K
KevinSenior Member
Posted 3 days ago
I am trying to build a machine learning model for predicting customer churn in my company. Currently I have data for 5000 customers over 2 years. Is this enough data or do I need more? Also which algorithm should I use - logistic regression or random forest?
K
KevinSenior Member
Posted 3 days ago
Reply by: DataScientist_10yrs
5000 records is decent dataset for binary classification problem like churn prediction. You can definitely start with this. For algorithm choice, I suggest try both and compare results. Start with logistic regression as baseline because its simple and interpretable. Then try random forest or XGBoost which usually give better accuracy but are more complex. Use cross-validation to evaluate properly.
K
KevinSenior Member
Posted 3 days ago
Reply by: Analytics_Consultant
Also make sure your data is balanced - meaning you have similar number of churned and non-churned customers. If 95% customers didnt churn and only 5% churned, then your model will have class imbalance problem. In that case you need to use techniques like SMOTE or adjust class weights. Data quality and feature engineering is more important than quantity of data.
You can view all discussion threads in this forum.
You cannot start a new discussion thread in this forum.
You cannot reply in this discussion thread.
You cannot start on a poll in this forum.
You cannot upload attachments in this forum.
You cannot download attachments in this forum.
Sign In
Not a member yet? Click here to register.
Forgot Password?
Users Online Now
Guests Online 6
Members Online 0

Total Members: 19
Newest Member: bokovac