Code Injection

Code injection is the umbrella name for a family of attacks that all share one root cause: an application takes input it should treat as inert data and instead lets some interpreter treat it as code or commands. The CWE database captures the general pattern as CWE-74, “Improper Neutralization of Special Elements in Output Used by a Downstream Component (‘Injection’),” which describes constructing a command or data structure from external input without neutralizing the special characters that would otherwise change how a downstream component reads it (cwe.mitre.org/data/definitions/74.html). SQL injection, OS command injection, and cross-site scripting are all listed there as specific manifestations of the same flaw.

The narrower CWE-94, “Improper Control of Generation of Code (‘Code Injection’),” covers the most direct version: a program builds a code segment, such as a string later handed to an interpreter’s eval facility, out of input it does not fully control (cwe.mitre.org/data/definitions/94.html). When the attacker’s input lands inside that generated code, the interpreter runs it, giving the attacker the program’s own privileges.

What unites every variant is a confusion between the control plane and the data plane. In a safe design, the structure of a query, command, page, or program is fixed in advance, and untrusted values are slotted into clearly bounded data positions. Injection occurs when a value can break out of its data position and add or alter the surrounding syntax. SQL injection breaks out into SQL, command injection into a shell, and cross-site scripting into the HTML and JavaScript a browser executes.

Because the cause is shared, so are the defenses. Keep code and data separate by construction wherever possible, for example with parameterized queries instead of string concatenation; where input must be embedded, neutralize it by validation against an allowlist or by context-appropriate escaping; and avoid handing user input to interpreters such as eval or a raw shell at all. The pattern recurs across the OWASP Top Ten precisely because the same mistake reappears in every new context where data and code are allowed to mix.

Sources

Related