Pentesting Research References

This page collects representative papers matched to the design themes reflected in pentesting.

It is an inference-based reconstruction from topic overlap, not a verbatim personal reading log.

Mapping

  • Offensive security agent papers inform the autonomous pentest workflow.
  • Planner-executor and heterogeneous collaboration papers inform task decomposition and coordination.
  • Multi-agent orchestration papers inform role separation, delegation, and control topology.
  • Benchmark and evaluation papers inform capability framing and validation strategy.

Offensive Security Agents

  1. PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing
    USENIX Security 2024
  2. D-CIPHER: Dynamic Collaborative Intelligent Agents with Planning and Heterogeneous Execution for Enhanced Reasoning in Offensive Security
    arXiv 2025
  3. Towards Automated Software Security Testing: Augmenting Penetration Testing through LLMs
    ESEC/FSE 2023
  4. LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
    arXiv 2023
  5. Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks
    arXiv 2025
  6. LLM Agents can Autonomously Hack Websites
    arXiv 2024
  7. LLM Agents can Autonomously Exploit One-day Vulnerabilities
    arXiv 2024
  8. Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
    arXiv 2024
  9. AutoPentester: An LLM Agent-based Framework for Automated Pentesting
    arXiv 2025

Benchmarks and Cyber Evaluation

  1. AutoPenBench: A Vulnerability Testing Benchmark for Generative Agents
    EMNLP Industry 2025
  2. Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
    arXiv 2025
  3. Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
    arXiv 2024
  4. CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale
    arXiv 2025
  5. CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models
    arXiv 2024
  6. When LLMs Meet Cybersecurity: A Systematic Literature Review
    arXiv 2024
  7. Large Language Models in Cybersecurity: State-of-the-Art
    arXiv 2024

Multi-Agent Collaboration and Orchestration

  1. A Survey on Large Language Model based Autonomous Agents
    arXiv 2023
  2. Large Language Model based Multi-Agents: A Survey of Progress and Challenges
    arXiv 2024
  3. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
    arXiv 2023
  4. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
    arXiv 2023
  5. ChatDev: Communicative Agents for Software Development
    ACL 2024
  6. CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
    arXiv 2023
  7. AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
    arXiv 2023
  8. Scaling Large-Language-Model-based Multi-Agent Collaboration
    arXiv 2024
  9. Multi-Agent Collaboration via Evolving Orchestration
    arXiv 2025