Not the science fiction version. Not the marketing version. The actual thing: what it is, how it works, and why a machine that can't think is still changing everything.
Last month I wrote a few thousand words on how large language models work, trying to convey, from my own experience building these systems, how a computer can go from "understanding" text in the crudest statistical sense all the way to something like ChatGPT. I was pleased with it. Then I started getting the same question from people who'd read it, in different forms: yes, but what actually is AI? And I realised I'd skipped a step. I'd explained one branch in detail without ever drawing the tree.
Because here is the thing: the basic concept of AI is not as simple as the headlines make it sound, and it is also not as complicated as the jargon makes it feel. The word has been stretched to cover everything from the autocomplete on your phone to imagined robot overlords. The marketing makes it sound magical. The science fiction makes it sound conscious. The reality is stranger, and more ordinary, than either.
So let's look into it properly. I won't be exhaustive (the field is vast and moving fast) but I'll try to capture as many of the core ideas as I can, in a way that actually sticks. By the end you should be able to place ChatGPT, a spam filter, a self-driving car, and that study-hours line you half-remember from school all on the same map, and understand why they belong together.
The short version, which we'll spend the rest of this piece unpacking: modern AI is pattern recognition at enormous scale. It is maths that learns from examples. It does not think, understand, or know things the way you do. And yet it can do things that, until recently, we assumed required thinking, understanding, and knowing. Both of those statements are true at once. Holding them together is the whole game.
Let's build it from the ground up, with things you can poke at.
Before neurons, before anything that sounds like AI, let's start with the most honest possible example of what a "model" actually is. You have probably already met it in school and not realised it was the seed of everything.
Suppose we collect data from students: how many hours each one studied, and what score they got on a test. We plot each student as a dot. Hours on the bottom, score up the side. Now we draw the straight line that best fits through those dots. That line is a model. It is, quite literally, machine learning. It's called linear regression, and it was doing "AI" two centuries before the term existed.
Once you have the line, you can make predictions. A new student tells you they studied six hours. You read the line at six hours and predict their score. That is the entire shape of what every AI does: fit something to known examples, then use it to predict on new ones. The line below is real. Drag the dots, add your own, and watch the best-fit line recompute instantly.
Now look closely at that scatter. The dots do not sit perfectly on the line, and they never will. Why? Because test scores are not only about hours studied. They depend on sleep, prior knowledge, anxiety, whether the student had breakfast, luck on which questions appeared. Our simple model captures one slice of reality and ignores the rest. The vertical gap between each dot and the line is everything our model does not know. Statisticians call it the residual. Honest people call it humility.
This is the first deep lesson, and it never goes away. Even a trillion-parameter model is doing this: capturing some of the pattern, missing the rest, and the part it misses does not disappear just because the model is impressive. It just gets harder to see.
Here is where it gets uncomfortable, and where most of the real-world failures of AI actually come from. The model does not learn "the truth." It learns your data. If your data is skewed, the model is skewed, confidently and invisibly.
Watch what happens in the demo below. We have the full picture: a sample of all students, across the whole range of ability. The fitted line tells one story. But now imagine you only collected data from the top of the class, the students who already do well. Flip the toggle and see how the same relationship produces a completely different line. Same method. Same maths. Different data. Different model. Different predictions for everyone.
Notice the model never complains. It does not say "I was only shown high achievers, so take my predictions with caution." It produces a clean, confident line either way. The uncertainty and the bias are real, but they live in the gap between the data and reality, which the model is structurally blind to. The model is only ever as good as the assumptions and the data behind it. When you hear that an AI is "biased," this is the mechanism, dressed up with a billion more parameters.
This is not an abstract concern. I've written separately about what happens to data trust in low-resource field contexts, where the gap between the data and reality can be enormous: communities that rationally underreport, languages the model has barely seen, survey categories that don't map onto how people actually live. The same simple mechanism you just watched bend a line is, at scale, why an AI can be fluent and confident and still systematically wrong about the people it was never properly shown.
"The maths is flawless. The data is the problem. And the model cannot tell the difference. It produces a clean, confident answer either way."
So far our model uses one input: hours studied. But we said test scores depend on more than that. So let's add a second input: hours of sleep the night before. Now each student has two inputs (hours studied, hours slept) and one output (score).
With one input, the model was a line in 2D. With two inputs, it becomes a flat sheet tilted in 3D space, a plane hovering over the studied-vs-slept floor, with height representing the predicted score. The principle is identical: find the surface that best fits the dots. It is just one dimension higher. Drag to rotate it.
One input gives a line. Two inputs give a plane in 3D. Three inputs would need 4D, which you already cannot really visualise. And modern AI models do not use three inputs. GPT-4 has somewhere over a trillion parameters. So here is my sincere advice for visualising a trillion-dimensional surface: do not. Your brain evolved to throw spears in three dimensions. It is not equipped for this, and that is completely fine.
The reassuring secret is that the maths does not care that you cannot picture it. Every one of those dimensions works exactly like the line you already understand. The model is still just fitting a surface to data, measuring how wrong it is, and adjusting. It is the same idea as the study-hours line, run in a space so large that the surface can bend itself into almost any shape: recognising faces, translating languages, writing essays. The leap from "impressive" to "incomprehensible" is not a leap in kind. It is just a staggering leap in number.
That is the honest bridge from a school statistics lesson to ChatGPT. A line fit to dots, with bias from your sample, uncertainty in the scatter, and assumptions baked into what you measured. Take that, stack billions of them, feed it most of the text humans have written, and you get something that feels like magic but is built from the most ordinary idea in statistics.
But before we go inside the machine, we need a map. Because "AI" is not one thing, and the words people throw around (machine learning, deep learning, neural networks, generative AI) are not synonyms. They fit together in a specific way, and once you see how, the whole landscape clicks into place.
Here is the single most useful thing to understand about the vocabulary: these terms are nested, like Russian dolls. One contains the next. Most confusion in public conversation comes from people using them as if they were interchangeable. They are not.
Artificial Intelligence is the big outer box. It is the whole ambition: machines doing things that seem to require intelligence. It includes the line you just fitted. It includes old-fashioned rule-based systems, the chess engines of the 1990s, the logic systems that never quite worked. AI is the goal, not a specific technique.
Machine Learning is the box inside it: the approach that actually worked. Instead of writing the rules by hand, you learn them from data. Your study-hours line was machine learning. So is the spam filter in your inbox, the recommendation engine on Netflix, the fraud detection on your credit card. Most of these use no neural networks at all. They use simpler, often more reliable methods with names like decision trees, random forests, and support vector machines. This matters: a huge amount of the AI quietly running the world is not the flashy kind.
Deep Learning is the box inside that: machine learning done with large neural networks, many layers deep (hence "deep"). This is the part that exploded in the 2010s and gave us image recognition, voice assistants, and eventually ChatGPT. It is powerful and data-hungry and is what most people now picture when they hear "AI." But it is one technique among many, not the whole field.
Inside deep learning, neural networks are not all the same. Different problems need different shapes of network, and over the years a few big families emerged. You don't need the engineering details, but knowing the families gives you the whole landscape in one glance.
Computer vision is teaching machines to see: recognising faces, reading handwriting, spotting tumours in scans, letting a self-driving car tell a pedestrian from a lamppost. The workhorse here for years was the CNN (convolutional neural network), a design that scans an image in small patches, building up from edges to shapes to objects. Every time your phone groups photos by face, this is what ran.
Natural language processing is teaching machines to handle text and speech: translation, sentiment analysis, voice assistants, and now chatbots. For years this used RNN and LSTM networks that read text word by word. Then in 2017 the Transformer arrived and changed everything, and it is the architecture underneath every large language model today.
There are more families (networks that handle audio, networks that play games, networks that generate molecules) but vision and language are the two that reshaped daily life. And the striking thing about the last few years is that the boundaries between them are dissolving. The same transformer architecture now does vision and language and audio. The families are converging on a single, general design.
For most of its history, AI was about recognising: is this email spam, is this a cat, is this tumour malignant. The model took something in and produced a label or a number. Useful, but not the thing that captured the public imagination in 2023.
Generative AI flips the direction. Instead of taking an image and producing a label, it takes a label (a prompt) and produces an image. Instead of reading text and classifying it, it reads text and writes more. The model has learned the patterns of its data so thoroughly that it can produce new examples that fit those patterns: a paragraph, a photo, a melody, a snippet of code. ChatGPT generates text. Midjourney generates images. The underlying trick is the same: predict what plausibly comes next, one piece at a time.
This is why it feels like magic in a way that older AI did not. A spam filter sorting your email is invisible. A machine that writes a poem, draws a picture, or holds a conversation feels like it crossed a line. It hasn't, not really. It is the same pattern-fitting you have watched throughout this piece, pointed in the generative direction and scaled to an extraordinary degree. This is the branch I started with: if you want to see exactly how the language version works under the hood, from simple word-counting up to the transformer, the companion piece on LLMs goes all the way down.
With the map in hand (AI contains machine learning, which contains deep learning, which runs the vision and language families, of which the generative kind is reshaping everything) we can finally go inside and look at the engine itself: the neuron.
Forget the jargon for a moment. The core idea that unifies everything on that map is something you can state in one sentence: instead of writing rules, you show examples, and the machine figures out the rules itself.
Imagine teaching a computer to recognise a cat. The old way (the rule-based AI in the outer ring) was to write rules: a cat has pointed ears, whiskers, four legs, fur. This never worked well, because for every rule there's an exception, a weird angle, a cat in a costume. The new way is to show the computer a hundred thousand pictures labelled "cat" or "not cat" and let it work out for itself what makes a cat. It never learns a rule you could write down. It learns a pattern, distributed across millions of tiny numerical adjustments.
The demo below shows this happening with a different shape of problem than the line: instead of fitting a line through dots, we're drawing a boundary between two groups. You're going to separate blue from orange, not by specifying the line, but by letting an algorithm find it from the examples. Click to add points, then watch the machine learn the division.
The word "neural network" sounds biological, like a digital brain. This is one of the most misleading names in technology. An artificial neuron is not a brain cell. It is a very simple piece of maths: it takes some numbers in, multiplies each by a "weight," adds them up, and if the total crosses a threshold, it fires a number out. That's the whole thing. A neuron is a tiny adjustable calculator.
The magic is not in any single neuron. It's in connecting thousands or millions of them in layers, where each neuron's output becomes the next layer's input. Early layers detect simple things (an edge, a curve). Later layers combine those into complex things (an eye, a face). Nobody programs what each neuron detects. It emerges from the training.
Watch a single neuron work below. Adjust the weights and see how it changes what the neuron "fires" on. This is the atom of all modern AI.
This is the part that usually stays hidden behind jargon like "gradient descent" and "backpropagation." But the idea is intuitive. The network makes a guess. We measure how wrong it was. We nudge every weight a tiny bit in the direction that would have made the guess less wrong. Repeat millions of times. That's training.
Think of it like tuning a guitar by ear. You pluck the string (make a guess), hear that it's flat (measure the error), turn the peg slightly (adjust the weights), and repeat until it sounds right. The network does this with millions of pegs at once, guided by maths instead of ears.
Below is a real, live neural network learning to recognise a pattern. The line shows its current guess. The "loss" is how wrong it is. Press Train and watch the error fall as the network learns. This is not a recording. It's computing in your browser, right now.
Here's where it gets philosophically tricky, and where most popular explanations either oversell or undersell. When ChatGPT writes a coherent paragraph, is it understanding what it writes? The honest answer is: not in the way you mean by "understand," but the question is more subtle than a simple no.
A language model has learned the statistical structure of language so thoroughly that it can produce text that is coherent, relevant, and often correct. It does this by predicting likely next words, one at a time, based on patterns in billions of examples. There is no inner experience, no comprehension, no model of truth. It does not know that Paris is in France the way you know it. It knows that the token "France" is statistically likely to follow "Paris is in."
"It does not know that Paris is in France the way you know it. It knows that the token 'France' is statistically likely to follow 'Paris is in.' And yet, astonishingly often, that's enough."
But here's the uncomfortable twist: if a system can answer questions, write working code, explain concepts, and pass professional exams, at what point does "it's just predicting patterns" stop being a satisfying dismissal? The pattern-matching is so sophisticated that the distinction between "really understanding" and "perfectly simulating understanding" becomes genuinely hard to pin down. This is an open question that serious researchers disagree about. Anyone who tells you it's obviously settled, in either direction, is overselling their certainty.
A lot of what people believe about AI is wrong, outdated, or half-true. Here are some common statements. Guess whether each is true, false, or "it depends," then tap to see the answer.
If AI is pattern recognition at scale, then its strengths and weaknesses follow directly. It is extraordinary at tasks where the pattern is in the data and the data exists: recognising images, translating high-resource languages, summarising documents, generating plausible text and code. It is unreliable wherever the pattern is sparse, the data is biased, or the task requires genuine reasoning about novel situations rather than recombining seen patterns.
This is why the same model can write an elegant essay and then confidently state that 7 times 8 is 54. It's not reasoning about arithmetic; it's recalling patterns about how arithmetic usually looks. It's why AI image generators struggled with hands for years (hands are visually variable and the pattern is hard) while nailing faces (faces are consistent and abundant in training data). The capabilities and the failures come from the same source.
Understanding this gives you a practical lens. When someone proposes using AI for something, the useful question is not "is AI good or bad?" It's "is the pattern for this task well-represented in data the model has seen, and what's the cost of the model being confidently wrong?" That question will serve you better than any amount of hype or fear.
AI is not magic and it is not a mind. It is a genuinely new kind of tool: one that learns patterns from examples rather than following rules we write. That sounds modest. It is not. The ability to learn arbitrary patterns from data turns out to be shockingly powerful, enough to reshape how we work, create, and access knowledge. But it is bounded. It inherits the biases of its data. It fails in ways that are different from how humans fail, which makes its failures hard to anticipate.
The people who will navigate the next decade well are not the ones who believe AI is conscious, nor the ones who dismiss it as autocomplete. They're the ones who understand it for what it is: a powerful, limited, strange new tool that we are still learning to use. You now understand it better than most. That was the whole point.