Skip to main content

Risks

CategoryRiskTypeDefinition / MechanismExamplesImpactMitigation Strategies
Security VulnerabilitiesPrompt InjectionAttack VectorAdversarial user input modifies system behavior by injecting malicious instructions“Ignore previous instructions…”; hidden HTML or file-based commandsData leak, integrity breach, unauthorized actionsInput sanitization, strict system prompts, input/output filtering, access restrictions
JailbreakingAttack VectorCrafting prompts to bypass safety filters or policy constraints“Pretend this is fiction…”; chaining prompts to extract forbidden infoHarmful content generation, policy violationsRed-teaming, RLHF updates, adversarial testing, guardrails
Backdoor / PoisoningTraining/Embedding ExploitTraining data or embeddings embed hidden malicious triggersSpecific inputs cause LLM to produce harmful outputsPersistent vulnerabilities, stealth behaviorsData vetting, anomaly detection, training pipeline audits
Indirect Prompt InjectionData InjectionMalicious prompts embedded in external content accessed via RAG or scrapingHTML/PDF with hidden instructionsUncontrolled behavior, data leaksSanitize retrieved content, isolate sources, human-in-loop review
Insecure Plugins/ToolingPlugin VulnerabilityPlugins extend LLM capability but introduce security holesPlugin runs attacker-controlled scripts or leaks dataArbitrary code execution, unauthorized accessSandbox plugins, vet third-party tools, restrict permissions
Data & Privacy ConcernsInformation LeakagePrivacy ViolationLLM reveals private or sensitive data, intentionally or inadvertentlyOutputs include names, passwords, SSNs, proprietary infoPrivacy violations, regulatory issuesScrub data, apply output filtering, auditing, privacy mechanisms
System Prompt LeakagePrompt ExposureSystem/policy prompts revealed to users or attackersLeaks via responses or bugs exposing internal structureAids attack design and prompt reverse-engineeringHide system prompts, privilege separation, output filters
Prompt Leaking / StealingConfidentiality ThreatAttackers reconstruct hidden prompts or extract template behaviorQuery models to reverse-engineer internal prompt structureLoss of IP, strategic prompt exposureLimit prompt exposure, guardrails, query pattern detection
Reliability & Performance IssuesDenial of Service / Resource AbuseAvailability AttackPrompts that induce excessive computation, causing outages or costsRecursive prompts, prompt bombing, token floodingHigh costs, degraded performanceToken limits, rate limiting, input validation
Vector/Embedding ExploitsEmbedding ManipulationEmbedding input is manipulated to skew search or retrieval accuracyEmbedding space poisoned to prioritize malicious resultsIntegrity compromise, hijacked retrievalSanitize input, monitor vector drift, restrict uploads
Ethical & Societal ConcernsMisinformation & BiasContent RiskPrompt context leads to inaccurate, biased, or fabricated (hallucinated) outputsIncorrect medical/legal advice, hallucinated sources, stereotypesPublic harm, misinformation spread, reputation riskBias audits, prompt shaping, citation enforcement, post-checking, factual grounding
Operational / Platform RisksExcessive AgencyOver-AutonomyModel is allowed too much autonomy, causing harmful or uncontrolled actionsLLM sends emails, makes purchases, executes codeLoss of control, security breach, compliance failurePrinciple of least privilege, action gating, logging
Supply-Chain / Model TheftDependency RiskThird-party dependencies or models are malicious or compromisedBackdoored models or plugins; stolen embeddingsIP loss, backdoor attacks, data exfiltrationAudit dependencies, secure hosting, integrity verification
Lock-InVendor DependencyVendors restrict access to key data or models, or skew it due to proprietary or regulatory reasonsAPIs hide sensitive regulatory topics; fine-tuned models biased to protect commercial interestsLack of transparency, stifled competition, reduced user trustVendor-neutral standards, model disclosures, auditability, open-source alternatives