What AI is actually doing in crisis PR, and what it still isn't

I get asked about AI and crisis PR most weeks now, usually by clients, sometimes by journalists, occasionally by people at dinners who have read something unsettling. The short version of what I tell them is that AI has already changed the field substantially, just not in the places most of the coverage assumes. The conversation that dominates online is mostly about chatbots writing statements and AI-generated apologies, which is the least interesting layer of what's happening, and also the part of the work I trust AI with least. The actual shifts are further down the stack, in the parts of reputation management that most clients never see, and that most people writing about this don't understand well enough to cover accurately.

So this is an honest read on where AI is really moving the needle, where it isn't, and why the difference matters a lot more than it looks from the outside.

Agentic monitoring is the real shift

The genuine change of the last eighteen months is agentic monitoring, and it isn't a rebrand of sentiment dashboards. It's a structurally different model of doing the work.

Traditional media monitoring is keyword-based. You give a system a list of terms and it surfaces every mention, and a human then reads, classifies, and triages the pile. The volume problem with that approach has been worsening for a decade. Roughly two million news articles a month, hundreds of millions of social posts, fragmented across six or seven major platforms plus Reddit, Discord, Telegram, Substack, niche forums, and a long tail of creator-led news sources that younger audiences treat as primary. Keyword monitoring at that scale is effectively broken, and quietly has been for a while.

Agentic systems replace this with goal-directed autonomous agents. You give an agent an objective, say, track reputational risk for this brand, these individuals, these product lines, these issue categories. You give it context, meaning your messaging, priorities, known vulnerabilities, a persistent brief. The agent scans continuously, uses large language models to actually read and understand what's being said rather than pattern-match on strings, and decides which signals are worth escalating. It writes its own reports, flags anomalies, tracks stories as they develop, and surfaces issues you didn't think to tell it to look for.

The practical difference is semantic understanding replacing keyword matching. An older system would miss a story about an executive because the specific search term wasn't in the headline. A modern agent reads the article, understands the entity, recognises the reputational implication, and escalates. The detection figures the better vendors are claiming, ninety-nine percent plus on important events, false positives below two percent, roughly line up with what I see in client use, with the obvious caveat that "important" is defined in-house and should be interrogated.

Search is where most of the real game is now

The application that actually matters, and the one almost nobody is talking about at conferences, is what AI has done to search.

Google's generative surfaces (AI Overviews, SGE, and whatever follows them) have quietly rewritten how reputational information gets presented. When someone searches a person's name now, what increasingly appears first isn't the top-ranking article. It's an AI-generated summary pulled from whatever the model has decided are the authoritative sources. That summary is the new first impression. It can contain outdated information, context-stripped quotes, misrepresentations, and wrong associations, rendered with the confident typography of a definitive answer.

This has changed the work. Traditional SEO-led reputation management was about moving results up and down the first page of blue links. The first page of blue links is becoming less central. The fight is increasingly over what Google's model treats as the canonical version of you, which is built from entity recognition, structured data, knowledge graph connections, and the signals the model weights from what it considers high-authority sources. Influencing that layer is technical work. It involves schema markup, Wikidata and Wikipedia entries where the subject is eligible, coordinated source placement in publications the model respects, and in some cases active disputation of incorrect AI-generated summaries through the reporting channels the platforms now provide.

None of that is speculative. It's where a meaningful chunk of my actual hours go.

The adjacent shift is generative engine optimisation, usually called GEO, which is the adjustment of content so that LLMs treat it as reliable when they're generating answers about a person or brand. It's a different discipline to classical SEO and will probably split off into its own specialism inside the next two years. The work isn't about ranking. It's about being the source the model chooses to paraphrase.

Synthetic media and coordinated inauthenticity

Two other technical areas that matter more than they get discussed.

Deepfake audio and video are cheap enough now to be a routine reputational threat rather than a novelty. I've had cases this year where a synthetic clip of a client was already circulating before we had the chance to flag it. AI detection tools for synthetic media are improving, but the accurate state of play is that detection lags generation by six to twelve months at any given point. The workable approach is a combination of provenance standards (C2PA content credentials where they exist), forensic analysis of the specific file, and fast coordination with the trust and safety teams at the relevant platforms, who are running their own AI classifiers at scale and tend to move faster if you arrive with a proper case file than if you send a panicked email.

Coordinated inauthentic behaviour, the term of art for networks of accounts pushing a narrative in concert, is now almost always AI-enabled on both sides. The bad actors use LLMs to generate plausible posts at scale with enough stylometric variation to evade basic detection. The defensive side, meaning platforms, monitoring vendors, and practitioners like me, use AI to find coordination signals through posting patterns, graph anomalies, and stylometric clustering. A reputational attack in 2026 is often less "someone said a bad thing about you" and more "a coordinated network of semi-synthetic accounts is amplifying an old clip with artificially generated variation", and working out which one you're actually dealing with is a technical diagnostic question. The answer decides the whole response strategy. Getting that wrong is how people end up making a statement to placate an outrage that was never organic in the first place.

What AI still doesn't do well

Before I get to what's left for people like me, a thing worth saying plainly, because the question always comes up.

I don't use AI to write client-facing statements. Not the first draft, not as a starting point, not for ideas. I write them by hand. A lot of my work runs alongside other firms, which means I see plenty of what's being produced elsewhere, and it has become noticeably easier to spot when something has been generated rather than written by a specific person for a specific moment. The tell is rarely in one sentence. It's in the rhythm, the over-even cadence, the absence of the small awkwardnesses that mark writing as coming from a human under pressure. Journalists are picking up on it. Audiences will catch up quickly once someone names the pattern. In a crisis, the thing people need to feel is that a human is actually standing behind the words, and a well-prompted model will get you most of the way there. The last stretch is what makes the difference between a statement that is tolerated and one that is believed. That part, at least for now, I still do by hand.

With that said, here are the specific things AI is not yet good at in this work.

It doesn't do second-order judgment well. It can tell you a story is surging. It can't reliably tell you whether the surge will compound or decay, and the correct response differs completely in each case. That's a pattern-recognition problem requiring a long memory of how particular kinds of stories play out in particular media ecosystems, and it's hard to capture in training data because the base rate of any given crisis type is low. This is the main reason experienced human judgment still outperforms the tools on the one call that matters most, which is whether to respond at all.

It doesn't hold goal state well. Most reputational work is trade-offs between objectives clients haven't yet fully articulated to themselves. Career continuation, family protection, litigation posture, future publishing deals, industry standing, legal exposure, relationships with specific individuals. A strategy that optimises for one dimension usually damages another. Most of the first hour with a new client is just mapping which of those matter and by how much. A model left to its own devices defaults to the most obvious goal, which is almost always the wrong one.

It doesn't do real-time platform negotiation. Takedowns, de-indexing requests, trust and safety escalations, right-to-be-forgotten submissions under UK and EU data protection law, DMCA counter-notices. All of these still move through human relationships, case law knowledge, and procedural fluency that isn't in any model's training set. The tools help assemble the packets. Humans still move them.

The thing that matters practically

Better detection raises the stakes on judgment. It doesn't lower them. This is the single most important point in this whole piece, and the one I'd want anyone reading it to hold onto. If you are being alerted to more things, faster, by systems that are quietly very persuasive about the importance of what they're showing you, the reflex to respond to everything gets worse, not better. And responding to everything is already the most common error in this field by a wide margin.

The working mental model is that AI has massively expanded the input layer, meaning what you can see, how quickly, in how much detail. It has modestly expanded the execution layer, meaning what you can actually do about it, faster and cheaper than before. The decision layer in between, which is where reputations are actually won and lost, is still a pattern-matching problem that benefits from someone who has watched several hundred of these cycles run all the way to the end. That gap is narrowing, and I expect it to keep narrowing. I don't think it closes this decade. When it does, I'll probably be using whatever replaces me to run this website.

I run confidential one-hour advisory sessions over Zoom for £350, including pre-call research and a written follow-up. Initial calls are free. No sales pitch. I once tried a sales job and was fired inside a fortnight, so the risk is low.

Next
Next

Going public with a sexual assault allegation: what it costs, what it changes, and what tends to happen next