Using LLMs in SecOps Without Handing AI the Keys -- Redmondmag.com

Using LLMs in SecOps Without Handing AI the Keys

TechMentor speaker Heather Wilde Renze says LLMs can help security teams move faster, but data boundaries, review loops and access controls need to come first.

By Chris Paoli
05/06/2026

INSIDE THE SESSION

What: LLMs Are Your New Junior Engineer: Using AI Safely in Security and IT Ops

When: Aug. 4, 1:15-2:30 p.m.

Who: Heather Wilde Renze, CTO, Angel Investor, Author

Why: "With agents, it's no longer an information leak. An injected instruction doesn't just return bad output, it does something."

Save $400 when you register for TechMentor & CyberSecurity Live! by June 5!

Large language models are already finding their way into security operations, from summarizing Defender alerts to drafting incident response updates. But for IT teams, the harder question is not whether AI can help. It is where the line should be drawn.

That is the focus of an upcoming TechMentor session, "LLMs Are Your New Junior Engineer: Using AI Safely in Security and IT Ops," at this year's TechMentor & CyberSecurity Live!, taking place at Microsoft HQ Aug. 3-7 . The intermediate-level session will be led by Heather Wilde Renze, a fractional CTO, angel investor and author whose background includes work on the founding team at Evernote.

In this Q&A, Renze argues that LLMs can deliver value in areas such as log review, policy documentation and first-pass incident communications, but should not be treated as substitutes for human judgment. She also warns that agentic systems raise the stakes, especially when AI is connected to internal tools, email, identity systems or Microsoft 365 workflows

You can get more of Renze's perspective (and a host of other thought leaders and experts) by joining us for this year's TechMentor & CyberSecurity Live! Register by June 5 to save $400!

Redmondmag: Where do you see LLMs delivering real value in SecOps today, and where do you think teams are overestimating what they can safely delegate?
Renze: Log summarization is the obvious one. No one should be reading 400,000 log lines at 2 a.m. That's what the machine is for.

Drafting first-pass incident comms is underrated. We actually did something similar by hand during the Evernote breach and it was chaos. If I'd had something that could draft a "here's what we know so far" template while the rest of us were in a war room, that would've bought real time.

Where people get into trouble is confusing speed with judgment. The model doesn't know what normal looks like for your org. It doesn't know that the spike in auth failures every Tuesday at 3 p.m. is the batch job from 2019 that nobody's touched. A seasoned analyst knows that. The LLM is going to flag it every single time and eventually people stop looking. That's the actual risk; not that the AI is wrong, but that it trains your team to stop paying attention.

And honestly, this calculus changes completely when you move into agentic systems. We're already running them in my company -- and you probably are too. An LLM that summarizes logs wrong costs you an hour. An agent that's wrong and has tool access costs you your environment. The overestimation risk doesn't scale linearly; it compounds.

For orgs just beginning to use AI in IT operations, what should they define first: use cases, guardrails, data boundaries, or approval processes?
Data boundaries. And I say that knowing most people are going to nod and then go define the use cases anyway.

Here's why it matters: most orgs genuinely don't know what data they have. Where it lives, who has access, what's classified as what. That's not an AI problem; that predates AI by a decade. But the moment you start feeding an LLM your internal systems, that gap becomes very visible, very fast.

For everything else -- what you're going to use it for, what guardrails you need -- those conversation is actually pretty short once you know your data. The approval process is last because you can't write a sensible one until you've done the other work.

This also matters more than people realize once you're in agentic territory. An agent doesn't just read your data; it acts on it. If you haven't defined your data boundaries before you get there, you're finding out what they are the hard way.

Threat triage often requires context, judgment, and skepticism. What role should an LLM play in that process, and where should a human analyst always remain involved?
Junior engineer who's read everything and understood none of it. Fast, thorough, no actual judgment.

Skepticism is the thing that can't be faked. Knowing that this alert at this time from this IP is suspicious for reasons that aren't in any rule; that's pattern recognition built on years of institutional knowledge. The LLM doesn't have your institution. It has the Internet.

Human in the loop, always, for anything irreversible. AI can tell you what it sees. It doesn't get to decide what you do about it.

With agentic systems, this gets thornier because the "analyst" is now a chain of models passing information back and forth. The skepticism problem doesn't go away; it just gets harder to see. You need humans at the handoff points that actually matter, not just at the end.

Prompt injection is a major concern for AI-enabled workflows. What should IT and security teams understand about this risk before connecting LLMs to internal systems or data?
It's social engineering. Exactly the same concept, different target.

Instead of tricking a person into handing over credentials, you're hiding instructions inside content the model will process (a document, an email, a support ticket), and the model follows those instructions because it cannot tell your system prompt from a payload. It doesn't know the difference. It's doing what it's told.

If your LLM touches email or uploaded files, you have this surface. The mitigation is the same thing you'd tell a new hire: treat everything incoming as untrusted, and don't give the model access to things it doesn't need for the task it's doing. Permissions creep is how this goes wrong.

With agents, it's no longer an information leak. An injected instruction doesn't just return bad output, it does something. Deletes, exfiltrates, pivots. That's a different threat model entirely, and most teams aren't treating it that way yet.

What does a healthy human review loop look like when AI is assisting with incident response preparation or operational decision-making?
Honestly? Most orgs don't have one. They have "someone is supposed to check it," and that's about it.

What actually works: the AI drafts, a named person reviews, and a (preferably) different named person approves before anything is executed. Not a team. A person. With their name on it. That accountability changes how carefully people read things.

For high-stakes, irreversible calls, I want two people: one who saw the AI recommendation and one who didn't. Otherwise, you're just getting two people who read the same output and agreed with it, which isn't a second opinion.

The thing nobody builds until they're in a postmortem: a log of every AI recommendation and whether it got overridden. Start keeping that on day one. You'll learn more about where your model fails from that log than from any benchmark.

The same principle holds for agentic pipelines. Transparent audit logging is what lets agents check each other's work. Something fails midway down the chain, you need to trace exactly where, revert to that point, and continue. The review loop isn't just a human safety net, it's structural to how the system runs.

In a Microsoft-centric environment, where do you see the most practical opportunities to introduce AI without exposing the org's "crown jewels"?
Edges first. Don't start with identity, don't start with email security rules, don't start anywhere near Active Directory. Those are the crown jewels, and the blast radius if something goes wrong is enormous.

Start with the stuff that's high effort and low drama -- compliance reporting, policy documentation, summarizing Defender alerts for tier one. Copilot for Security is scoped to the user's existing permissions, so the architectural guardrail is already in place. That's a reasonable sandbox to learn in.

Get good at the boring stuff first. Build the review habits -- then expand. People always want to skip to the exciting part, and that's exactly how you end up on the front page of a breach report.

Copilot agents and the Power Automate/AI Builder stack are already being deployed in M365 environments. That's where agentic risk is actually landing for most orgs, whether they've named it that or not. The "crown jewels" conversation needs to include those surfaces.

About the Author

Chris Paoli (@ChrisPaoli5) is the associate editor for Converge360.

Featured

Microsoft Cuts 4,800 Jobs in Companywide Transformation

Microsoft said it is eliminating about 4,800 roles, or about 2.1% of its global workforce, as part of a company transformation tied to changing customer needs, business models and operating priorities.
Entra ID Security Starts With the Settings Most Teams Overlook

Louis Mastelinck previews common Entra ID pitfalls around authentication, Conditional Access, privilege and app consent.
Anthropic Claude Goes GA in Microsoft Foundry

Anthropic's Claude family of AI models is now generally available in Microsoft Foundry on Azure, giving enterprise developers another frontier model they can deploy, manage and govern through Microsoft's cloud AI platform.
Taking Hyper-V Health Monitoring to the Next Level, Part 2

Let's launch a PowerShell script that goes beyond replication health to evaluate Hyper-V failover readiness by checking storage, networking, memory, VM configuration and other key conditions that could determine whether a virtual machine can successfully fail over.
Microsoft Moves Up Quantum Safe Security Timeline

Microsoft is accelerating its quantum-safe security timeline, saying advances in quantum computing and new federal requirements have pushed post-quantum cryptography from a future planning issue into an immediate engineering priority.