Devot Logo
Devot Logo
Arrow leftBack to blogs

How AI Is Trained: From Data to Decision-Making Models

Rino K.11 min readOct 16, 2025Technology
Rino K.11 min read
Contents:
What is AI model training?
How does AI training work? 
Methods used to train AI in decision-making 
AI algorithms: How AI learns from data
How does AI learn by itself?
Supporting procedures in AI training
How is AI trained? Final thoughts

How is AI trained to effectively start and complete the decision-making process? The oversimplified answer is that AI models are trained through trial-and-error—but on a large scale, turbocharged by extremely complex math and graphics processing units (GPUs). In this article, we'll walk through the 6 essential steps through which most AI systems are trained, from data collection to optimization

What is AI model training?

AI training is the process of teaching an AI model how to make decisions or predictions. When you train an AI model, your end goal is for it to learn how to execute that particular task—or groups of tasks—as well as, or even better than, a real human being would. 

How does AI training work? 

For the purpose of this article, we’ll describe how AI is trained through supervised learning, as it’s the most common and widely applicable approach to AI training today. Not all types of AI training follow this pipeline. Given that, here's a summary of how AI is trained. 

  1. Data collection: AI training data sets are gathered. 

  2. Data preprocessing: Before training AI technologies, the raw data to be used must be cleaned, tokenized, shuffled, and formatted. 

  3. Model architecture definition: This step involves designing the “brain” of the AI model, deciding how many layers it would have, what type of layers it would have, what each layer would do, and how big each layer would be.

  4. Select loss function: There is usually a correct answer or a decision or prediction that the AI is expected to make. This answer is called a label. The model's prediction after the forward-pass is compared to the label, and the loss—or the extent to which the model's answer is off—is calculated, usually as a number. The loss function is the mathematical formula used to perform this calculation.

  5. Forward pass & backward pass

    1. Forward pass: The data that's been collected and pre-processed is “passed forward,” or is made to flow through the AI model. The model makes a decision or prediction based on the data. 

    2. Backward pass: This is like a stage of reflection for the AI model. It looks at the loss value—or how far off it was from the correct answer—and what internal settings or “weights” caused that extent of loss. Then, it calculates “gradients,” which are like little maps that guide it in the right direction.

  6. Optimization: The “weights” are adjusted to improve the AI model's future performance. 

Step 1: Data collection 

The first thing to do after deciding to train AI is to gather a relevant dataset. For AI models intended to be used by many people, the dataset may be hundreds of gigabytes to several terabytes in size, often containing hundreds of millions to billions of examples.

On the other hand, a model that’ll be used for a small business chatbot that answers customer questions from a limited knowledge base will require a dataset as small as a few thousand examples or just a few megabytes, particularly when it's been pre-trained and is just being fine-tuned for a specific purpose.

Where do you get these vast amounts of data? Sources of data used to train AI include:

  • Public web data

  • Open datasets

  • Company internal data

  • User-generated content 

  • Sensor and device data 

  • Synthetic data

For the purpose of this article, we'll work with this small raw dataset:

ID

Age

Salary

Pets

Defaulted

1

25

50000

1

No

2

45

80000

3

Yes

3

100000

0

No

4

35

-90000

2

No

5

28

72000

1

yes

6

28

72000

1

Yes

7

52

abc

2

No

8

30

0

1

No

9

60

110000

NaN

Yes

10

35

68000

1

no

Step 2: Data preprocessing 

Data preprocessing is necessary to maintain high data quality. It could take different forms, such as cleaning, normalization, categorical encoding, splitting, and so on. 

Cleaning

Any problems with the relevant data that can affect how well the AI model is trained have to be fixed through a process called “cleaning.” Some common issues are missing values, duplication, and erroneous entries. 

Here are some things that can be “cleaned” in the dataset above:

  • Third row: Missing age

  • Fourth row: Negative salary

  • Fifth row: “yes” is written in the wrong case

  • Sixth row: Duplicates the fifth row

To “clean” this dataset, you fix these mistakes. For instance, you can either “impute” or guess an age for the third row or delete the whole row. 

Normalization

All of the data you collect are hardly of the same type. In our dataset above, there are different data types: ID, age, salary, and number of pets. But the AI model doesn't understand that these data types are different. All it sees are figures—some larger than others. 

To prevent the model from giving more weight to a type of data (e.g., prioritizing salary over everything else), you normalize the data—that is, adjust all of them so that they're on the same scale. There are different normalization formulas for preprocessing data for different AI systems. 

A common one is: 

Normalized value = (Original value – Mean)/Standard deviation 

Where: 

  • Original value: This is the actual value of the data piece we're normalizing, e.g., 25, 50000, 45.

  • Mean: The average of all data values of a specific type, e.g., the average of all ages. 

  • Standard deviation: Another mathematical formula. In plain text terms, the standard deviation is the square root of [(1 divided by (n – 1)) × the sum of squared differences between each value and the mean]

Categorical encoding

The aim of categorical encoding is to express textual data in terms of figures. There are multiple ways to do this, such as label encoding, one-hot encoding, tokenization, etc. 

Splitting

Not all available data is used for training AI; a portion must be reserved for testing. Without a separate test set, we’d have to evaluate the model on the same data it was trained on, which would lead to biased and unreliable results.

Step 3: Define the model architecture 

What “brain” will the model use for the decision-making process? How will the large amount of data that's been pre-processed make it through the input-output chain? These questions are answered by defining the architecture of the neural network. Different architecture types can be set up, depending on how the model will analyze data and the type of data it'll analyze. 

Examples of architectures include:

  • Transformers (such as GPT and BERT) 

  • Convolutional neural networks, or CNNs (e.g., ResNet, EfficientNet)

In practice, you create your chosen architecture with the aid of a model training engine such as PyTorch, TensorFlow, etc. 

The architecture defines:

  • The type of layers used (e.g., fully connected, convolutional, recurrent)

  • The number of layers

  • The size of each layer (number of neurons/units)

  • The activation functions between layers

  • Optional additions like dropout, batch normalization, or residual connections

A model that’s too simple may underfit, failing to capture the complexity of the data, while an overly complex model can overfit, memorizing the training data and performing poorly on unseen inputs. The optimal architecture depends on the task. Different problems require different network designs to achieve the best results.

Step 4: Select the loss function and your rationale for it

A loss function is a mathematical equation. When you apply your loss function to the outputs or predictions of your AI model, it returns a number that you'll use to decide how far off from the correct answer the model is. That number is your loss value. The loss value helps your AI train itself in future reiterations of the training cycle. 

As you may have anticipated, all AI models don't use the same loss function. Types of loss functions include:

  • Cross-entropy loss: The cross-entropy loss function is adopted when the decision-making process that the AI model will be involved in will require it to choose one out of several options. 

  • Binary cross-entropy: When the prediction or decision is either “yes” or “no,” binary cross-entropy is used. 

  • Mean squared error: This is employed when the AI model is supposed to predict a number. The “error,” that is, the difference between the number chosen by the AI model and the correct answer, is squared. Then, the average of all the squares of all the “errors” in the data sample is the loss value. 

  • Mean absolute error: This returns the loss value by calculating the average, or the mean of all the “errors” made in the data sample. 

Looking at the dataset we introduced earlier, the goal is to predict whether someone defaulted ('yes') or didn’t ('no'). That makes binary cross-entropy the right loss function for this model.

If you'd love to try your hand at the math, here's the formula for binary cross-entropy loss:

Loss = – (1/N) · Σ [ y · ln(p) + (1 – y) · ln(1 – p) ]

Where:

N is the number of data points

y is the actual label (1 for “Yes”, 0 for “No”)

p is the predicted probability that the label is 1

Please recall that the “label” is the correct prediction or answer. 

Step 5: Forward pass and backward pass

The previous steps were preliminary. This is the major step that is used to train the AI model. That training happens in iterative cycles. Every cycle includes two phases:

  • The forward pass and 

  • The backward pass.

Forward pass

The forward pass is where your model makes a prediction. It takes in the training data (for example, someone’s age, salary, number of pets) and sends it through every layer of the model's architecture. These layers apply mathematical operations to the input until the model produces an output. 

For our example dataset, that output for a particular row might be a number like 0.82, meaning the model is 82% confident that the person represented by the ID of that row will default.

Once the model has produced an output, you run your loss function. That’s where you compare the model’s prediction (0.82) to the correct answer (e.g. “Yes,” or 1). If the person in question actually defaulted, then the loss value will be low (just 0.18). But if they don't default, the loss value will be higher. 

Backward pass 

After the forward pass comes the backward pass, the final part of the cycle that every trained AI system has passed through. This is where learning actually happens. The backward pass uses an algorithm called backpropagation. What it does is trace the loss backward through the model, layer by layer, calculating how much each weight (remember that a weight is an internal setting) in the model contributed to the final error. This lets the model figure out which weights need to be adjusted—and in what direction—so it can make a better prediction next time.

The math behind backpropagation is complex, but the good news is that modern AI frameworks (like PyTorch and TensorFlow) do it for you. You simply call a command like loss.backward(), and the framework computes all the necessary gradients—that is, the directions and amounts by which each model parameter should be tweaked.

Step 6: Optimize the parameters of the trained AI

After the gradients have been computed, the optimizer applies a formula like this to each parameter or weight in the AI model, one at a time:

new weight = oldweight - (learning rate × gradient)

Here's what happens when you apply this formula:

  • The gradient shows the direction and steepness of change.

  • The learning rate controls how big the change should be.

  • The weight gets nudged slightly in a better direction.

This process is called a parameter update. After the parameters have been updated, the old gradients need to be reset or deleted. If they aren't reset, they'll be added during the next backward pass. This is a problem because they were originally computed for the old weights, which have now been adjusted. Not resetting them will lead to inaccurate learning. 

Let's look at an example based on that dataset we provided at the beginning of this article. Suppose we’re training a simple model on two data points from the dataset:

Age

Salary

Pets

Defaulted

25

50000

1

No (0)

45

80000

3

Yes (1)

Let’s say the model is a single neuron (logistic regression), and we use binary cross-entropy as our loss function.

Here’s what should happen at each training step:

  • Step 1: Clear old gradients

  • Step 2: Forward pass

  • Step 3: Compute gradients

  • Step 4: Update weights

What happens if we don't reset the gradients? Let’s look at what could go wrong over two steps:

Step 1: First data point

  • Model outputs 0.2 (predicts 20% chance of default)

  • True label is 0 (no default)

  • Loss = ~0.22

  • During backpropagation, the computed gradient is +0.6, indicating that if the weight is increased by 1 unit, the loss will rise by approximately 0.6.

Step 2: Second data point

But you forget to clear the gradient.

  • The model outputs 0.8 (80% chance of default).

  • True label is 1 (default). This means that the output of 0.8 is actually a good prediction.

  • Loss = ~0.22

  • Backpropagation computes a gradient of +0.1, indicating that if the weight is increased by 1 unit, the loss will rise by approximately 0.1. 

  • But, because the previous gradient wasn't reset, the new gradient becomes 0.6 + 0.1 = 0.7. 

  • The optimizer then believes that a 1 unit weight increase will result in a 0.7 increase in the loss value. 

  • Because of this, the optimizer lowers the weight aggressively in an attempt to achieve a lower loss value. The weight is pushed too far off in the wrong direction, and the next prediction’s accuracy is negatively impacted. 

Methods used to train AI in decision-making 

In the business of training AI models, the training process doesn't always follow the same method. Three of the most popular methods that may be used for training AI models are supervised learning, unsupervised learning, and semi-supervised learning.

  • Supervised learning: This is a method where you train the AI model using labeled data or data containing the correct answers. Over time, it gets better at predicting labels for new, unseen data.

  • Unsupervised learning: AI decision-making isn't guided by labels or known answers. Essentially, you feed the model unlabeled data. It has to find patterns in the data on its own. 

  • Semi-supervised learning: The vast data sets available are split into two batches. One group of data is unlabeled, while the smaller batch is labeled. The model learns from labeled examples. Then, it applies what it has learned to guess labels for the unlabeled data. Those guesses help improve the model further (in a loop).

In addition to these three, artificial intelligence models may be trained via other methods such as self-learning, reinforcement learning, transfer learning, federated learning, and online learning

AI algorithms: How AI learns from data

AI tools (and, by extension, machine learning models) follow sets of step-by-step rules or instructions to learn patterns in data. This is true of both complex AI models and simpler ones. These rules for pattern learning are called algorithms. During training, the algorithm adjusts the model’s internal settings (weights) by learning from examples (input → correct answer). After AI has been trained, an algorithm helps the model make decisions or predictions based on new data it hasn’t seen before.

The role of AI algorithms sets in after the initial training phase, during optimization (that is, after real-world data has been gathered, pre-processed, and forward and backward passes have been completed). 

Examples of AI algorithms include:

  • Linear regression 

  • Logistic regression 

  • Decision tree

  • Random forest

  • Autoencoders, etc. 

How does AI learn by itself?

The idea that AI models are built to learn by themselves isn't true. While one of the advantages of AI solutions is their ability to improve over time, that doesn't equate to them learning by themselves. What actually happens is that the rules for automated learning (gradient computation, weight updates) are automated inside the training framework (e.g., PyTorch and TensorFlow). Over many cycles, the AI systems become better at systematically reducing error, improving performance without manual intervention.

Supporting procedures in AI training

The first five steps we covered form the basic pipeline for training AI. Different types of models may follow different training methods, depending on their purpose. But even after training, the model isn’t ready to ship. Additional steps—like setting up the training loop, checkpointing, evaluation, fine-tuning, and deployment—are needed to make it production-ready.

  • Training loop: Multiple iterations of the process are completed, with each iteration bringing the model closer to the trainer’s goal. 

  • Checkpointing: After each iteration, the model's progress is automatically saved at a “checkpoint.” To see the model's performance at any point in time, you can go back to the relevant checkpoint. 

  • Evaluation: The person, team, or teams in charge of training the AI model review its performance and assess it using specific benchmarks. 

  • Fine-tuning: If you want to, after you train AI, you can further customize it to perform a specific action. 

  • Deployment: The model is exported into a usable format or hosted on a server, ready to be tested. 

How is AI trained? Final thoughts

It's not possible to just deploy AI systems out of the box. During training, the model is fed data—whether that’s new data, historical data, or synthetic data when real-world data is limited. Whichever the case, the data is pre-processed so the model can work with it. Then comes the forward pass, the backward pass, and the optimization phase. These steps repeat in loops, again and again. If someone asks, “How is AI trained?”—this is how. 

Spread the word:
Keep readingSimilar blogs for further insights
From Pixels to Product: Why AI Makes Critical Thinking More Important Than Ever
Technology
Tisa B.5 min readOct 8, 2025
From Pixels to Product: Why AI Makes Critical Thinking More Important Than EverA shift from interface polish to strategic product thinking, powered by AI. Explore how designers can grow into product-minded roles without losing the craft.
Building a Scalable API Testing Architecture: Tools, Frameworks, and Best Practices
Technology
Leo C.6 min readSep 24, 2025
Building a Scalable API Testing Architecture: Tools, Frameworks, and Best PracticesRobust API testing architecture is key to ensuring performance, reliability, and security across modern software systems. This guide covers layered testing strategies, automation tools, framework design principles, and CI/CD integration best practices.
The Full-Stack AI Developer: Frameworks, Tools, and Deployment Skills You Actually Need
Technology
Iva P.10 min readSep 18, 2025
The Full-Stack AI Developer: Frameworks, Tools, and Deployment Skills You Actually NeedWhat does a full-stack AI developer really do—and why aren’t these roles as common as you'd think in 2025? Discover the skills, salaries, job boards, and step-by-step roadmap for breaking into this future-facing career.