Pentesting Research References
This page collects representative papers matched to the design themes reflected in pentesting.
It is an inference-based reconstruction from topic overlap, not a verbatim personal reading log.
Mapping
- Offensive security agent papers inform the autonomous pentest workflow.
- Planner-executor and heterogeneous collaboration papers inform task decomposition and coordination.
- Multi-agent orchestration papers inform role separation, delegation, and control topology.
- Benchmark and evaluation papers inform capability framing and validation strategy.
Offensive Security Agents
- PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing
USENIX Security 2024 - D-CIPHER: Dynamic Collaborative Intelligent Agents with Planning and Heterogeneous Execution for Enhanced Reasoning in Offensive Security
arXiv 2025 - Towards Automated Software Security Testing: Augmenting Penetration Testing through LLMs
ESEC/FSE 2023 - LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
arXiv 2023 - Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks
arXiv 2025 - LLM Agents can Autonomously Hack Websites
arXiv 2024 - LLM Agents can Autonomously Exploit One-day Vulnerabilities
arXiv 2024 - Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
arXiv 2024 - AutoPentester: An LLM Agent-based Framework for Automated Pentesting
arXiv 2025
Benchmarks and Cyber Evaluation
- AutoPenBench: A Vulnerability Testing Benchmark for Generative Agents
EMNLP Industry 2025 - Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
arXiv 2025 - Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
arXiv 2024 - CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale
arXiv 2025 - CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models
arXiv 2024 - When LLMs Meet Cybersecurity: A Systematic Literature Review
arXiv 2024 - Large Language Models in Cybersecurity: State-of-the-Art
arXiv 2024
Multi-Agent Collaboration and Orchestration
- A Survey on Large Language Model based Autonomous Agents
arXiv 2023 - Large Language Model based Multi-Agents: A Survey of Progress and Challenges
arXiv 2024 - AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
arXiv 2023 - MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
arXiv 2023 - ChatDev: Communicative Agents for Software Development
ACL 2024 - CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
arXiv 2023 - AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
arXiv 2023 - Scaling Large-Language-Model-based Multi-Agent Collaboration
arXiv 2024 - Multi-Agent Collaboration via Evolving Orchestration
arXiv 2025