A Guide to Debug your ML Model

Dec 07, 2025 • 5 min read

Your model trains flawlessly. No SyntaxError. You run model.predict() on new data, hold your breath, and the results are garbage. In machine learning, a bug is silent. It’s a model that runs perfectly but learns nothing. So, if you want to learn about debugging ML models, this article is for you. In this article, I’ll take you through a complete guide on how to debug your ML model.

A Guide to Debug Your ML Model

Debugging ML is part art, part science, and 100% detective work. Over the years, I’ve seen thousands of broken models, and they almost always break in the same few ways.

Here’s my practical guide to finding the bugs and debugging your ML model.

It’s (Almost) Always the Data

Before you touch a single hyperparameter, look at your data. I’m going to say this again because it will save you hundreds of hours in your career: It is almost always the data.

We love to tweak model architecture because it feels smart, but nine times out of ten, the problem lies in the X_train.

The Silent Killer: Data Leakage

This is the sneakiest bug in all of machine learning. Data leakage occurs when information from the future (or the answer itself) is accidentally included in your training features. Your model looks like a genius during training because it’s cheating; it’s like it found the answer key stuck to the test paper.

To fix data leakage, be ruthless. Look at every single feature and ask:

“Would I have this exact piece of information at the moment of prediction?”

So, always split your data into train, validation, and test sets first, before you do any preprocessing (like scaling or imputation).

Garbage In, Garbage Out (GIGO)

Your model is a sophisticated learning machine, but it can’t make sense of nonsense. So, always:

Check for NaNs (Missing Values): Sounds basic, but it’s the #1 culprit for models that output NaN.
Check for Outliers: A single data point with a user_age of 999 or a purchase_amount of $10,000,000 can throw off your entire model’s understanding of normal.
Check for Imbalance: Is your model predicting customer churn, and only 0.5% of customers actually churn? If so, your model can achieve 99.5% accuracy by just predicting “no churn” every single time. It looks great, but it’s completely useless.

When Your Model Isn’t Learning

You hit “train,” and you watch the loss. And it doesn’t move. It’s stuck at 0.693 or some other high number. It means your model isn’t learning.

Here’s what you need to do!

The Overfit a Single Batch Trick

This is the single most powerful debugging technique I know. If your model can’t even learn to solve a tiny piece of the problem, it will never solve the whole thing.

Take one small batch of data (e.g., 32 samples) and train your model on only that batch, over and over, for 100 epochs.

Here’s how to do that:

# (Assume 'model' is your compiled Keras model)

# Grab one batch of data
(X_batch, y_batch) = next(iter(train_generator)) 

# Train on ONLY this batch, repeatedly
history = model.fit(
    X_batch,
    y_batch,
    epochs=100,
    verbose=1
)

# Now check the history...
# Is the loss dropping to near-zero? Is accuracy hitting 100%?

If it works, your model architecture and learning rate are capable of learning. The problem is with your broader dataset (it’s too noisy, too hard, or you need more data).

If it fails, you have a fundamental problem. Your model cannot even memorize 32 samples. The learning rate is the most likely culprit in such scenarios. If it’s too high, the loss will explode (NaN). If it’s too low, it won’t move at all. Try values like 1e-5, 1e-4, 1e-3.

When Your Model is Overfitting

When your training accuracy is 99%, you feel like a genius. Then you run it on the test set, and your accuracy plummets to 60%.

It means your model didn’t learn; it memorized. It’s like a student who memorized the textbook’s practice questions but can’t answer a new question because they never learned the concepts.

To fix this, you need to make it harder for the model to memorize. We do this with regularization. You can learn about regularization here.

When Your Model is Underfitting

This is the opposite of overfitting, and it’s honestly easier to fix. Your training accuracy and your test accuracy are both low.

It means the model just isn’t smart enough. It’s trying to fit a complex, curvy line with a simple straight line.

Here’s how to fix underfitting:

Choose a Stronger Model: Make the model “deeper” (more layers) or “wider” (more neurons per layer).
Train longer: Maybe it just didn’t have enough time.
Feature Engineering: This is the most powerful one. Is your model getting a timestamp? It can’t use that. But if you give it day_of_week or hour_of_day, it can suddenly find patterns. You are giving it better tools.

Final Words

Your best tool isn’t TensorFlow or PyTorch. It’s a simple baseline model (like a LogisticRegression or a simple average) to compare against. If your complex, 50-layer neural network can’t beat a simple average, you know you have a problem.

Debugging is the most frustrating part of this job. It’s also the most rewarding. Every bug you fix teaches you something fundamental.