2025-07-03

A TTP for Prompts: HiddenLayer Proposes New Framework for LLM Threats

Level: 
Strategic
  |  Source: 
The Record
Global
Share:

A TTP for Prompts: HiddenLayer Proposes New Framework for LLM Threats

A new framework proposed by HiddenLayer aims to bring structure and clarity to the evolving threat of adversarial prompt engineering against large language models (LLMs). Prompt injection attacks, often broadly categorized under a single label, vary widely in execution and intent, making detection and response inconsistent. HiddenLayer’s taxonomy addresses this by establishing a shared language and hierarchy to better understand how malicious prompts function. Their model introduces a four-layer structure that builds on the established Tactics, Techniques, and Procedures (TTPs) framework, while adding a distinct layer, Objectives, to capture attacker intent. This distinction between motive and method reflects a key design choice. As HiddenLayer explains, “Objectives describe intent: data theft, reputation harm, task redirection, resource exhaustion, and so on, which is rarely visible in prompts. Separating the why from the how avoids forcing brittle one-to-one mappings between motive and method.”

The taxonomy’s most granular element is the adversarial prompt itself—a sequence of inputs crafted to subvert or exploit LLM behavior. These prompts are difficult to generalize, as “prompts are highly contextual and nuanced, making them difficult to reuse or correlate across systems, campaigns, red team engagements, and experiments,” according to HiddenLayer. To manage this variability, the taxonomy categorizes prompts into techniques, which abstract common adversarial strategies such as refusal suppression. In refusal suppression, attackers attempt to bypass model safeguards by embedding language that explicitly prevents the model from rejecting tasks. Grouping prompts by technique allows for broader pattern recognition, making it possible to assess which methods an LLM is most vulnerable to, and to develop mitigation strategies that address entire classes of behavior rather than individual inputs.

Building on this, HiddenLayer defines tactics as higher-level groupings that cluster similar techniques. For instance, both "Tool Call Spoofing,"" and "Conversation Spoofing" are categorized under the tactic of "Context Manipulation," reflecting their shared goal of altering the model’s understanding or state. Tactics offer a strategic view of threat activity, which can guide prioritization efforts and security planning. Techniques and tactics together enable defenders and researchers to assess threat patterns and identify systemic weaknesses, while allowing flexibility in analysis without relying solely on the specific language of the prompt. The taxonomy also recognizes that prompts may use multiple techniques at once or none at all, underscoring the need for adaptable classification.

The proposed system is intended to be collaborative and iterative. HiddenLayer notes that it is not a definitive or final model but rather a practical tool shaped by operational experience and real-world use cases. As generative AI applications expand and new architectures emerge, the taxonomy is expected to evolve. HiddenLayer invites community feedback and participation to improve the system, calling on practitioners, red teamers, and researchers to contribute. The project’s long-term effectiveness will rely on collective input and adoption. As the report concludes, the goal is to move the AI security community away from reactive responses and general labels like “jailbreak,” toward clear, actionable descriptions of adversarial behavior—enabling more precise analysis and mitigation efforts moving forward.

Get trending threats published weekly by the Anvilogic team.

Sign Up Now