People Don’t Understand Military AI. Here’s an Explainer
“Oh… The Army Just Spent $20 Billion on AI." Here's What That Actually Means

“Hey Wes, did you hear? The Army just dropped $20B on Anduril AI!”
Yep, I heard. And no, they didn’t.
What you’re about to read is mostly an explainer on military AI, so you don’t fall for the buzzwords. But it’s also a little bit of a rant against current “buzzword journalism.”
When the Pentagon dropped the Anduril contract announcement on a Friday evening, (because that’s when the Army releases things it wants you to half-read before the weekend swallows your attention) the headlines practically wrote themselves:
“$20 Billion AI Deal.” “Army Bets on Artificial Intelligence.” “Anduril Wins HUGE Historic AI Contract.”
Every single one of those headlines used the word “AI” as if it meant one thing.
Folks, this is, if not bad journalism, then at least “lazy” journalism. Or maybe these writers just assume their readers won’t know the difference?
AI, especially applied to the military, most certainly doesn’t mean one thing.
I’m going to explain what “AI” really means when the Army throws around numbers like $20 billion and why “AI” is at least three completely different technologies being collapsed into one noun.
Actually, the Anduril deal is the perfect vehicle for this conversation because it accidentally illustrates the entire military AI stack in one procurement action.
By the way, last summer I wrote a primer about AI in modern warfare. I’d like to think it is approachable for the layperson, but it also has enough detail to satisfy my readers who are already fluent in technobabble.
I highly recommend taking a look if you’re so inclined as most of the information is still current. Warning: It’s a looooong read.
AI in Warfare: What You Need to Know in 2025
Author’s note: Before we jump in, I just wanted to thank my paid subscribers who make this work possible. I wouldn’t be able to devote a full month to writing a single piece (yes, this mammoth AI in warfare primer took me 28 days) unless I had the financial backing of my awesome readers. I hope this document is…
But if you’re the type who doesn’t like homework, I got you covered. That piece is not required to understand what you’re about to read.
I personally think AI is the buzzword of the century.
It’s my humble opinion that most journalists, like those working for non-specialized publications (think Reuters instead of TechCrunch), don’t know the difference between Chat-GPT and a narrow computer vision model that’s been trained to hunt Russian T-72s.
The reality is that by the time you’re done reading this, you’ll know more about AI in war than most members of the US House of Representatives.
As a writer with a highly intelligent audience, (I know because I see it in the comments) my challenge is to thread the needle between writing an approachable technology article for the interested nonexpert, but also not making it too rudimentary and risk insulting the intelligence of my PhD readers. I know, this is a “me” problem, but I just thought I would vocalize it if for no other reason than to pre-explain why my writing is often filled with analogies… I think that’s the best way to make a tough concept approachable.
First, What the Deal Actually Is
Let me save you from the most common misreading, because Anduril’s own president had to correct it within 48 hours of the announcement.
“We got a lot of messages over the weekend, like, ‘Oh, you made $20 billion.’ There’s no money attached to it, this is just a contract vehicle, but it reduces a lot of friction in things that just simply shouldn’t have it,” said Anduril president Matt Steckman.
A contract vehicle is not a purchase order. It’s a pre-negotiated framework that allows any buyer within the federal government to purchase Anduril’s commercially available products without restarting the acquisition process from scratch each time.
The $20 billion is a ceiling over ten years, not an appropriation.
Think of it less like a shopping cart and more like a store credit card with a high limit that nobody has to use if they don’t want to.
Do I ever use my Best Buy credit card? Hell no. I just like looking at the zero balance. It makes me smile.
(That’s not true. I did use it to buy a Sony camera for my YouTube channel. But I can write it off at tax time.)
The Army previously managed more than 120 separate procurement actions for Anduril’s solutions alone. This enterprise contract consolidates all of that into one vehicle, and the service says the consolidation effort across 14 such deals has resulted in an 88 percent reduction in the total number of contracts over the last eight months.
That’s the real headline. Not “Army buys AI brain.” It’s “Army finally stops drowning in its own paperwork long enough to buy things efficiently.” Bout time.
Which is unglamorous but is also genuinely significant for a procurement bureaucracy that has historically moved at the speed of a Greenland glacier wearing ankle weights.
The first actual task order under the vehicle, the first money actually spent, is $87 million, awarded through the Army-led Joint Interagency Task Force 401, for a command-and-control backbone to link distributed sensing and engagement for counter-drone operations.
Eighty-seven million dollars. That’s the opening bet. And it’s not on some autonomous robot general. It’s on software that helps sensors talk to shooters faster.
Now that we’ve deflated the hype balloon, let’s talk about what Anduril’s Lattice is and why the Army wants it badly enough to build a decade-long procurement framework around it.
The Three-Layer Stack Nobody Explains
Here’s the thing that frustrates me about defense AI coverage. Reporters use “AI” the same way my uncle uses “the web,” as a single omnibus concept that covers everything from a Netflix algorithm to Club Penguin to eBay.
In reality, when the Army buys “AI” for a counter-drone mission, it’s buying at least three fundamentally different capabilities that do fundamentally different jobs.
Call them the eyes, the nervous system, and the voice.
The Eyes: Vision AI
Vision AI is the part that sees. It takes raw inputs like pixels from a camera, returns from a radar, infrared signatures from a sensor, motion data from a track, and asks one question: what am I looking at?
You’ll see the term “computer vision.” That's what this is.
Is that a quadcopter? A bird? A commercial delivery drone that wandered off course? A one-way attack drone with a mortar round duct-taped to the frame? Is it stationary or moving? What direction? What speed? What altitude? Is the behavior consistent with a threat or consistent with someone’s lost DJI?
Vision AI is the descendant of a long lineage of machine learning work in computer vision, and it was the original mission of Project Maven when the Pentagon launched it in 2017.
Maven’s mandate was specific: use computer vision models to automatically detect, identify, characterize, and track objects in full-motion video from ISR platforms.
Not to think.
Not to plan.
Not to communicate.
To see. Faster than a human analyst staring at a feed could see, and across more feeds simultaneously than any human team could cover.
That’s classic narrow AI. A trained statistical model that learned to recognize patterns from labeled examples rather than from a human-authored rulebook. Ahem… a machine that’s learning, if you will.
Old timey, traditional software is usually rule-based. A human writes explicit logic instructions.
If X happens, do Y.
If the pixel is brighter than this threshold, mark it white. If the target moves faster than this speed, classify it as suspicious.
It’s deterministic and handcrafted. The programmer is basically telling the machine exactly how to think.
Machine learning flips that arrangement. Instead of handwriting all the rules, you build a model architecture that can learn statistical patterns from examples.
You don’t say, “A tank has exactly this shape, this shadow, and these edges.” You feed it a mountain of labeled images and let training adjust the internal parameters, so the system gets better at guessing “tank” versus “not tank.”
You’ll also see the word “neural network,” which is ripe for bad journalism.
The “brain-like” comparison is partly useful and mostly a trap.
Useful, because artificial neural networks were loosely inspired by biological neurons.
You have units, weighted connections, activation, signal flow, layers, and adaptation. That gave the field its language. Neurons, weights, training, learning.
Fair enough.
But it’s mostly a trap, because modern machine learning systems do not work like a human brain in the way most people imagine. They don’t understand the world the way humans do. They don’t have common sense, embodied experience, or durable conceptual models unless those are approximated statistically.
They’re better thought of as giant “function approximators.” You put in inputs, like pixels, radar returns, or metadata, and the model maps them to outputs, like “truck,” “person,” “building,” or “possible launcher.”
Humans still design the framework, choose the training data, choose the loss function, decide what success means, and tune the process. The machine learns the decision boundary from data rather than from a long list of explicit instructions.
That is the core difference.
Traditional logic says: Here are the rules.
Machine learning says: Here are many examples, now infer the rules your damn self dawg.
Nobody told Maven, “a T-72 has exactly these edge characteristics and this shadow geometry.”
Analysts fed it a mountain of labeled imagery and let the training process adjust the model’s internal parameters until it got good at recognizing tanks.
This imagery includes video and would have thousands of visual representations of a T-72 in every condition: at dawn, at dusk, in snow, at night, from all angles, from the air, from the back, and on and on…
This is why we call it “narrow AI” because the model is trained on a small target set, or sometimes just one, target.
The difference between that and traditional software is worth pausing on, because it’s the source of most of the public confusion about what “AI” even is.
This is my favorite analogy:
Traditional software is a cookbook.
A human writes every rule in advance. The machine executes it. The meal gets made and it’s delicious!
Machine learning is more like a chef.
With a trained chef, you don’t specify every movement. You give experience, examples, corrections, and repetition.
Over time, the chef learns what “medium-well” looks like and can handle variation better than a rigid recipe. The chef is still working in the same kitchen, with the same stove and pans. The difference is where the decision-making pattern came from.
This is why more raw compute alone doesn’t create AI.
You can throw a mountain of Nvidia GPUs at a bad hand-coded algorithm and all you’ve done is create a faster idiot. What matters is the learning structure.
And the payoff is generalization.
A hand-coded, traditional vision system works fine… until the enemy drapes Saab’s MCS camo net over their tank and changes the silhouette. A well-trained model has seen enough variation in training data that it can handle novel presentations of familiar threat signatures more gracefully.
The Nervous System: Lattice and C2 Fusion
Anduril’s Lattice suite collects and fuses data from various sensors and platforms, providing a coherent operational picture while enabling faster, AI-assisted decision-making.
Its modular architecture allows integration with existing Army networks and tactical systems, supporting command, control, communications, computers, intelligence, surveillance, and reconnaissance operations.
If you’re like me and like organization, a clean office desk, spreadsheets, a de-cluttered home, then this section is for you. Lattice takes many disparate pieces of information and organizes it into something genuinely useful for humans.
This is a different and harder problem than detection. Detection asks, “what is that?” Fusion asks “given everything we’re seeing from all these different sources simultaneously, what is actually happening, and what will likely happen next, based on the law of probability?”
Here’s why that matters in a counter-drone fight.
The acoustic sensor hears something and says something small is inbound.
The radar has a track that might be the same thing or might be a bird close by.
The camera operator sees something but isn’t sure of the range.
The jammer crew is waiting on an identification call.
The interceptor crew is waiting on a cueing order.
And someone is trying to reconcile all of this on a phone while a one-way attack drone closes the distance.
Lattice sits in what you might call the command-and-control layer of the AI stack. It consumes the outputs of vision systems, cross-references them against other data streams, tracks objects over time, and presents operators with a synthesized picture rather than raw sensor feeds.
It also manages the relationships between effectors like who has the shot, what weapon is available, what the rules of engagement are, what the deconfliction geometry looks like when multiple systems are all trying to engage the same target.
This is closer to “nervous system” than “brain.” It’s not making strategic decisions. It’s making the physical and informational connections that allow human decision-makers to act on what the sensors are seeing at a speed the threat actually demands.
The Voice: Large Language Models
This is the layer that gets the most public attention and is probably the least operationally important of the three in the immediate near-term.
Large language models like Claude, GPT, and their defense-integrated cousins are not detection systems. They’re not tracking systems. They’re just interface and synthesis systems.
They simply help humans interact with the underlying architecture in natural language, query complex datasets without knowing the technical command structure, summarize multi-source reporting, generate draft outputs, compare options, and make machine-generated information legible to GenX commanders and Millennial staff officers who didn’t necessarily grow up speaking machine language.
Think about what that means practically for a battalion staff officer at 2 am who needs to understand the air threat picture for the next six hours.
The old way: navigate multiple systems, pull data from separate feeds, manually correlate tracks, try to summarize it into a product that the commander can actually use.
The new way: ask the system in plain English, “What are the top five air threat indicators in my sector from the last four hours, and what’s the confidence level on each?” and receive an answer.
The LLM didn’t detect anything. The LLM didn’t track anything. The LLM translated a human question into a database query, synthesized the results across sources, and presented the answer in a form the human can actually act on.
That sounds modest. It’s actually significant because one of the consistent friction points in military operations is not the lack of data, it’s too much data and the difficulty of accessing and interpreting the data that already exists at the speed decisions require.
LLMs are, at their best, friction reducers between humans and complex systems.
They’re also the layer most vulnerable to the specific failure mode that keeps AI safety people awake: hallucination, false confidence, and the tendency to generate fluent-sounding answers that are subtly or dramatically wrong.
Why Military AI Didn’t Come From Chatbots
To me, this is the most infuriating part of the AI conversation because military AI existed long before the modern AI-large language model bubble.
So, let’s start with why the current military AI stack looks the way it does.
Project Maven launched in 2017. Its initial mission was computer vision for object detection in full-motion video. That’s before ChatGPT. That’s before the public had any concept of a conversational AI.
The military perception stack was already being built on the foundations of convolutional neural networks, GPU-accelerated training, and large labeled datasets, all of which predated the large language model explosion.
The technical lineage runs through transformers, not chatbots.
When Google published “Attention Is All You Need“ in 2017, introducing the transformer architecture, it first proved itself in natural language processing.
But by 2020, researchers had applied the same architecture to computer vision with Vision Transformer, demonstrating that the same fundamental approach that made language models powerful also worked for image recognition.
Both fields were drinking from the same technological fire hose: better architecture, better compute, better data, scaling-law confidence, at roughly the same time.
The pivot point that genuinely bridges language and vision is CLIP, published by OpenAI in 2021. CLIP trained on 400 million image-text pairs and showed that natural language supervision could dramatically improve visual recognition, allowing systems to identify objects and concepts in images based on language descriptions rather than fixed label sets.
That’s the actual technical bridge between “language model world” and “computer vision world.” A shared training approach producing more flexible recognition systems.
But the military perception stack was already well underway before anyone was chatting with an AI.
What LLMs did contribute is arguably more sociological than technical: they made AI legible to generals, procurement officials, and members of Congress.
Once a machine could answer questions in plain English, “AI-enabled” stopped sounding like a DARPA conference call and started sounding like a budget line.
That changed money flows and executive attention in ways that affected the defense AI ecosystem pretty broadly.
In other words, the big money started flowing into AI… and military AI was an unintended beneficiary.
What This Means on an Actual Battlefield
Alrighty, let’s get out of the architecture and into the trenches, because the test of any system is what it does when someone is shooting at you.
Against a modern drone threat, the kind Ukraine has been living inside for four years, and the US military is now encountering seriously in the Middle East, the speed of the detection-to-engagement cycle is everything.
A cheap one-way attack drone moving at 150 kilometers per hour covers roughly 40 meters per second. At that speed, the difference between a detection-to-engagement cycle of 30 seconds and 10 seconds is 800 meters of airspace, which is sometimes the difference between an intercept over open ground and a crater where the command post used to be.
The current problem isn’t that the sensors can’t see the drones. It’s that the information from the sensors moves through too many stovepipes, too many manual handoffs, and too many interfaces that weren’t designed to talk to each other before a human can authorize an engagement.
Lattice is specifically designed to reduce that friction by creating a common data layer where detection, tracking, identification, and cueing all happen in one integrated environment rather than being passed phone-to-phone through a chain of operators.
The first $87 million task order is for exactly this: a C2 backbone that connects sensors and weapons so operators can detect, track, classify, and engage drone threats, with the explicit goal of achieving “common air domain awareness” across military and federal users.
That means the humans get inside the threat’s decision timeline more consistently.
Brig. Gen. Matt Ross said the agreement will “strengthen interoperability for counter-unmanned aerial systems operations” and that it “directly addresses the critical interoperability challenge that has hampered joint and interagency counter-UAS operations.”
Interoperability. That’s the word that matters.
The Army’s most immediate AI problem is that its sensors, its software, and its effectors have historically been too poorly connected to let the humans move fast enough against the threats they’re actually facing.
The Anduril deal is the Army’s attempt to fix that problem with a common software foundation before the next war makes the lesson more expensive than the contract.
I think it was Winston Churchill who said that “the generals are always fighting the last war.”
Welp, this is the Army’s way to prove the great orator wrong.
The buzzword is AI. The real story is infrastructure.
Vision AI sees. C2 software fuses. Language models explain. None of those layers are magic or self-aware. All of them are necessary.
Ukraine has been running that experiment live for four years. The Army just spent $20 billion on the notes.
Слава Україні!




Thank you Wes. I didn't read your July 2025 post but feel I have a firmer grasp. Now if you could only explain the NYSE (or TSX) with such understanding. Bottom might be falling out of your 401K today. Strait of Hormuz.
Good job Wes! I have what most would consider an in-depth knowledge of Artificial Intelligence but I lack the DoD context which you provide.
I believe you could refine your point of view on frontier AI systems, you down play their capabilities a bit. It’s OK for this piece and your analysis is spot on concerning purpose built architectures for AI systems with the military and defense use case. It is transferable to other use cases, autonomous vehicle operation for example.
Keep up the good work, I learn a lot from you.