Code Injection

Code injection is the umbrella name for a family of attacks that all share one root cause: an application takes input it should treat as inert data and instead lets some interpreter treat it as code or commands. The CWE database captures the general pattern as CWE-74, “Improper Neutralization of Special Elements in Output Used by a Downstream Component (‘Injection’),” which describes constructing a command or data structure from external input without neutralizing the special characters that would otherwise change how a downstream component reads it (cwe.mitre.org/data/definitions/74.html). SQL injection, OS command injection, and cross-site scripting are all listed there as specific manifestations of the same flaw.

The narrower CWE-94, “Improper Control of Generation of Code (‘Code Injection’),” covers the most direct version: a program builds a code segment, such as a string later handed to an interpreter’s eval facility, out of input it does not fully control (cwe.mitre.org/data/definitions/94.html). When the attacker’s input lands inside that generated code, the interpreter runs it, giving the attacker the program’s own privileges.

What unites every variant is a confusion between the control plane and the data plane. In a safe design, the structure of a query, command, page, or program is fixed in advance, and untrusted values are slotted into clearly bounded data positions. Injection occurs when a value can break out of its data position and add or alter the surrounding syntax. SQL injection breaks out into SQL, command injection into a shell, and cross-site scripting into the HTML and JavaScript a browser executes.

Because the cause is shared, so are the defenses. Keep code and data separate by construction wherever possible, for example with parameterized queries instead of string concatenation; where input must be embedded, neutralize it by validation against an allowlist or by context-appropriate escaping; and avoid handing user input to interpreters such as eval or a raw shell at all. The pattern recurs across the OWASP Top Ten precisely because the same mistake reappears in every new context where data and code are allowed to mix.