Security Vulnerabilities | Prompt Injection | Attack Vector | Adversarial user input modifies system behavior by injecting malicious instructions | “Ignore previous instructions…”; hidden HTML or file-based commands | Data leak, integrity breach, unauthorized actions | Input sanitization, strict system prompts, input/output filtering, access restrictions |
Jailbreaking | Attack Vector | Crafting prompts to bypass safety filters or policy constraints | “Pretend this is fiction…”; chaining prompts to extract forbidden info | Harmful content generation, policy violations | Red-teaming, RLHF updates, adversarial testing, guardrails |
Backdoor / Poisoning | Training/Embedding Exploit | Training data or embeddings embed hidden malicious triggers | Specific inputs cause LLM to produce harmful outputs | Persistent vulnerabilities, stealth behaviors | Data vetting, anomaly detection, training pipeline audits |
Indirect Prompt Injection | Data Injection | Malicious prompts embedded in external content accessed via RAG or scraping | HTML/PDF with hidden instructions | Uncontrolled behavior, data leaks | Sanitize retrieved content, isolate sources, human-in-loop review |
Insecure Plugins/Tooling | Plugin Vulnerability | Plugins extend LLM capability but introduce security holes | Plugin runs attacker-controlled scripts or leaks data | Arbitrary code execution, unauthorized access | Sandbox plugins, vet third-party tools, restrict permissions |
Data & Privacy Concerns | Information Leakage | Privacy Violation | LLM reveals private or sensitive data, intentionally or inadvertently | Outputs include names, passwords, SSNs, proprietary info | Privacy violations, regulatory issues | Scrub data, apply output filtering, auditing, privacy mechanisms |
System Prompt Leakage | Prompt Exposure | System/policy prompts revealed to users or attackers | Leaks via responses or bugs exposing internal structure | Aids attack design and prompt reverse-engineering | Hide system prompts, privilege separation, output filters |
Prompt Leaking / Stealing | Confidentiality Threat | Attackers reconstruct hidden prompts or extract template behavior | Query models to reverse-engineer internal prompt structure | Loss of IP, strategic prompt exposure | Limit prompt exposure, guardrails, query pattern detection |
Reliability & Performance Issues | Denial of Service / Resource Abuse | Availability Attack | Prompts that induce excessive computation, causing outages or costs | Recursive prompts, prompt bombing, token flooding | High costs, degraded performance | Token limits, rate limiting, input validation |
Vector/Embedding Exploits | Embedding Manipulation | Embedding input is manipulated to skew search or retrieval accuracy | Embedding space poisoned to prioritize malicious results | Integrity compromise, hijacked retrieval | Sanitize input, monitor vector drift, restrict uploads |
Ethical & Societal Concerns | Misinformation & Bias | Content Risk | Prompt context leads to inaccurate, biased, or fabricated (hallucinated) outputs | Incorrect medical/legal advice, hallucinated sources, stereotypes | Public harm, misinformation spread, reputation risk | Bias audits, prompt shaping, citation enforcement, post-checking, factual grounding |
Operational / Platform Risks | Excessive Agency | Over-Autonomy | Model is allowed too much autonomy, causing harmful or uncontrolled actions | LLM sends emails, makes purchases, executes code | Loss of control, security breach, compliance failure | Principle of least privilege, action gating, logging |
Supply-Chain / Model Theft | Dependency Risk | Third-party dependencies or models are malicious or compromised | Backdoored models or plugins; stolen embeddings | IP loss, backdoor attacks, data exfiltration | Audit dependencies, secure hosting, integrity verification |
Lock-In | Vendor Dependency | Vendors restrict access to key data or models, or skew it due to proprietary or regulatory reasons | APIs hide sensitive regulatory topics; fine-tuned models biased to protect commercial interests | Lack of transparency, stifled competition, reduced user trust | Vendor-neutral standards, model disclosures, auditability, open-source alternatives |