How I Use AI to Automate My Machine Learning Workflow
We are often told that Data Science is about math and insights. But in reality, it’s mostly plumbing. It’s connecting pipes, cleaning leaks, and ensuring data flows from A to B without crashing your RAM. Generative AI has changed the math. It hasn’t replaced the data scientist; it has given us an exoskeleton. In this article, I aim to guide you through the exact process of using AI to automate the heavy lifting of my Machine Learning workflow, allowing me to focus on the art, not just the code.
How I Use AI to Automate My Entire Machine Learning Workflow
Before we dive into the tactics, we need to set the stage. When I say automate, I don’t mean I tell ChatGPT to build me a fraud detection system, and then go take a nap. That is a recipe for hallucinations and bad architecture.
Think of Large Language Models as an incredibly fast, well-read, but occasionally confident junior developer sitting next to you. They know the syntax for every library in existence, but they don’t know your business context.
My formula for success is simple: Value = (Your Strategic Intent) x (AI’s Coding Speed).
Here is the step-by-step workflow I use to maximise that equation.
Step 1: The Smart Start (EDA & Ideation)
The hardest part of any project is the blank page. I used to spend hours just setting up the environment and writing initial inspection scripts. Now, I treat the AI as a brainstorming partner.
Instead of manually typing out df.head(), df.describe(), and df.isnull().sum(), I provide the AI with the schema (not the data itself, for privacy) and a goal.
Here’s the kind of prompt I use:
I have a dataset with the following columns: [List Columns]. I want to predict [Target Variable]. Act as a senior Data Scientist. Suggest 5 specific hypotheses I should test during Exploratory Data Analysis (EDA) and write the Python/Pandas code to visualise these relationships using Seaborn.
The AI often suggests relationships I hadn’t considered (like interaction effects between two seemingly unrelated features).
Step 2: The Grunt Work (Preprocessing & Cleaning)
This is where 80% of our time dies. Data cleaning is essential, but it is rarely fun. This is where LLMs work at best.
I haven’t written a raw Regular Expression in two years. Here’s the kind of prompt I use:
Write a Python regex to extract email domains from a text string, but exclude any domains ending in .net. Handle edge cases where the email is None.
And, instead of just blindly filling missing values with the mean, I ask the AI for a strategy. Here’s the kind of prompt I use for that:
I have missing values in the 'Age' column. The dataset also has 'Job_Title' and 'Education'. Write a function to impute 'Age' based on the median age of people with the same Job and Education level.
AI is great at writing the logic for complex transformations that would take you 20 minutes to structure in your head.
Step 3: Modelling & The Explain Like I’m 5 Trick
When building models, we often get stuck on complex documentation or obscure error messages.
When my PyTorch tensor dimensions don’t match (a classic nightmare), I don’t just stare at the screen. I paste the error trace and the model class into the AI. This is the prompt I use for that:
Here is my model architecture, and here is the Runtime Error regarding size mismatch. Trace the tensor shapes layer by layer and tell me exactly where the mismatch happens.
Sometimes, I find a research paper with a complex new loss function. Here’s how I translate it into code:
Convert this mathematical formula from the paper into a clean, vectorised Python function using NumPy. Explain the variables using comments.
Step 4: Refactoring and Future-Proofing
We are all guilty of writing spaghetti code in Jupyter Notebooks. It works, but it’s a mess. Before I move a model to production (or even share it with a colleague), I let the AI clean up my mess.
Here are the three prompts you can use for that:
Modularisation: Take this long cell of training code and refactor it into a class called ModelTrainer with methods for train, evaluate, and save_checkpoint."
Docstrings: Add Google-style docstrings to every function in this script, explaining the arguments and return types.
Optimisation: Review this Pandas loop. Is there a way to vectorise this operation to make it run faster?
This step alone has made me a better programmer. Seeing how the AI refactors my code teaches me design patterns I unintentionally skipped.
Final Thoughts
When I started, I thought using tools like this was cheating. I worried that if I didn’t write every line of code myself, I wasn’t learning.
I was wrong. By automating the syntax and the boilerplate, I am forced to spend more time on the concepts. I spend less time worrying about how to centre a plot in Matplotlib, and more time asking, “Is this plot actually telling me the truth about the data?”
This technology doesn’t make the learning curve flatter; it makes the climb faster. It invites you to be more curious, more ambitious, and more creative.
I hope you liked this article on how I use AI to automate the heavy lifting of my Machine Learning workflow. Follow me on Instagram for many more resources.