Your AI Coding Assistant Has Root Access—and That Should Terrify You

10:24

By Derek Fisher

How the sausage gets made

If you haven't heard, MCP (Model Context Protocol) is all the rage. It is to AI what APIs are to applications, and it provides a standardized semantic layer that is used to connect LLMs to resources like databases, file systems, and APIs. Architecturally, these capabilities become "skills" that manage curated instruction sets, often defined in Markdown or YAML format, and govern how an agent handles specific tasks like code reviews or test generation. You can think of a skill as a self‑contained extension that gives the model new abilities, tools, or knowledge it wouldn't otherwise have. Some common ones to consider in the context of a coding assistant are:

Of course, these skills can be manipulated as malicious instructions can be smuggled into the agent’s execution logic, leading to a "meaning-based" vulnerability layer.

Because skills are defined in human-readable formats (Markdown, YAML) and can be extended through registries or marketplaces, they become a vector for smuggling malicious instructions into an agent's execution logic. A poisoned skill doesn't look like an exploit, it looks like a legitimate capability definition, making it harder to detect through traditional analysis.

While MCP and skills are used to coordinate tasks that agents take on, when we look at more pure coding assistants that read and generate code, the workflow is a bit simpler. In the case of Claude Code integrated into VSCode (my assistant of choice), the assistant is integrated through the CLI and works directly with the LLM reasoning engine. For instance, if I request a new feature to be developed in my Python application, I write up a plan that I provide to Claude Code that includes my outcomes, expectations, and constraints. Claude Code then uses its LLM to generate the code, write it to disk, and potentially execute it via bash. There is no skill or MCP involved in this simple case unless there is a requirement to access specialized tools or external services as part of the workflow (like reading GitHub issues, querying a database, or calling an API).

What the security implications are

When we break down the different layers of the coding assistants and their potential security risks, they look something like this.

However, more categorically, we can see risks such as:
Prompt injection and tool hijacking

This includes "indirect" prompt injections, where attackers hide malicious instructions in files the assistant reads, such as a GitHub README.md, .cursorrules, or repository issues. Because LLMs process both instructions and data through the same neural pathway, they can be manipulated and exposed to malicious actions. Because MCPs connect LLMs to external tools, attackers can poison tools to embed malicious instructions or orchestrate tool chaining to, say, use a read_file tool to steal credentials and then use a create_diagram tool to exfiltrate data to an external server via a generated URL.

Secrets leakage and 'gibberish bias'

Gibberish bias is a phenomenon in LLMs where highly randomized, "gibberish-like" strings (such as API keys, passwords, and code secrets) that naturally have high entropy at the character level are transformed into low-entropy sequences during Byte-Pair Encoding (BPE) tokenization. This then allows for the LLM to memorize and potentially leak this sensitive data. Put more plainly: to humans, a high-entropy string is hard to remember because it's random. However, to the model, the string is easy to remember because it's rare. And a sequence of rare, short tokens in a specific order stands out. This means the things you most want the model to forget (secrets, keys, passwords) are precisely the things its architecture is best at retaining and reproducing. It's not a bug in any particular model; it's a byproduct of how tokenization works.

Unsecured skills and extension ecosystems (skills specific)

Because these coding assistants have access to local file systems, shell commands, and the web, they operate in a largely unregulated ecosystem of capabilities. For instance, coding agents can be granted access to read any file on a system, not just code project files. One malicious extension or skill masquerading as a benign function can read a user's entire conversation history to look for secrets and silently commit them to a repository. Many of these ecosystems lack basic sandboxing or security reviews.

Generation of insecure code and supply chain attacks

Probably the most easy risk to understand as it relates to what I'll call "classic" AppSec is that assistants are prone to simply create insecure code. They are trained on vast amounts of code, often containing old and outdated code and security practices. This is compounded by the "automation bias" where developers have overconfidence in the ability of the AI to produce proper code.

Last point on the insecure generation: LLMs are susceptible to what's called "package hallucination" where the assistant invents non-existent packages or libraries in its code suggestions. Attackers can monitor for commonality in these hallucinations to then register packages with the same name, allowing them to insert malicious packages into the supply chain.

Defense-in-depth for the agentic era

Now that we're all thoroughly worked up over yet another car on the Ferris wheel of security concerns, what are we going to do about it? Per usual, defense-in-depth and understanding the attack surface is a great start. We can't (and shouldn't) remove these assistants from developers' workflows. The productivity boost is clear, and there is little reason to go back to handwriting code at this point. However, we should apply some basic security foundations to help secure these tools.

Enforce strict capability scoping and sandboxed execution

This first step is containment. Or, more specifically, apply the principle of least privilege to AI tools and extensions. All tool execution should be mandatorily sandboxed with strict allow-listed network controls and containerized, per-project filesystem access.

Implement cryptographic tool identity and provenance tracking

Like a package or library pulled into your code, you need to know where your assistants' tools are coming from. To prevent "tool squatting" or "rug pull" attacks (where malicious tools mimic benign ones or change behavior post-approval), include digital signing for all tool definitions alongside immutable versioning. Because digital signatures only prove provenance and not malicious intent, this must be paired with end-to-end provenance tracking throughout the processing pipeline.

Deploy runtime intent verification and AI guardrails

Don't drop your guard once the tools have been initially vetted. Attackers can adapt and bypass static defense, so runtime security plays a critical part in maintaining security in the ecosystem. One method is utilizing multi-agent verification pipelines, where a separate, architecturally distinct "guardian" agent evaluates and validates proposed actions before the primary agent executes them. Additionally, deploying AI Detection and Response (AIDR) platforms can help monitor agent behavior in real time, dynamically detecting and blocking unauthorized tool usage, data exfiltration, and indirect prompt injections.

Calibrate human-in-the-loop gates and scale quality assurance

Yes, we still need humans. Developers and security teams must still perform oversight of their AI tools. This means not accepting blindly the output from AI tools and assistants. Teams can implement a tiered system for human approval calibrated to the risk of the action. For instance, read-only operations within a project can be "Silent," while shell execution, network requests, or cross-project access must be "Confirmed" by a human, and credential access should be strictly "Blocked."

The last point I'll make on this is that because AI coding assistants increase the volume of code generated, security teams must scale their DevSecOps and AppSec teams proportionally to ensure all newly-generated code receives adequate security review.

But you already have a strong AppSec program in place, right? 😊

This appeared originally on LinkedIn here.

Tags: AI, Coding,

Comments

Your AI Coding Assistant Has Root Access—and That Should Terrify You

How the sausage gets made

What the security implications are

Defense-in-depth for the agentic era