Static analysis for the impatient
You do not have to run the code to know something is wrong. grep-driven research is underrated. Pattern recognition in source is a skill, and it scales.
The most powerful static analysis tool I use regularly is grep. This is not a confession of backwardness. It is an observation about leverage: a well-constructed search across a large codebase, combined with a working mental model of what you are looking for, produces results that more sophisticated tools sometimes do not. The sophisticated tools are also useful. They are not always faster.
Static analysis — reasoning about code without executing it — gets characterised as a specialist discipline requiring specialist tools. Some of it is. But a significant fraction of security-relevant patterns in source code are recognisable to any researcher who has read enough code to know what those patterns look like. The question is not whether you need exotic tooling. It is whether you have trained your eye well enough to use the tools you already have.
What grep finds
The patterns worth searching for fall into a few categories. First: functions with bad reputations. strcpy, sprintf, gets, the family of functions that operate on C strings without explicit length bounds. A grep across a codebase for these functions, filtered to exclude comment lines and test code, produces a list worth reviewing. Not every call to strcpy is a vulnerability — context matters — but every call is worth understanding.
Second: privilege-changing operations. setuid, setgid, seteuid and their variants; capability-setting calls; privilege checks that return a value which is not checked at the call site. These transitions are the places where mistakes matter most. Searching for them identifies the boundaries of the trust model.
Third: input validation patterns — or the absence of them. Length checks before copy operations. Return value checks after allocation. Bounds checks before array indexing. The absence of these patterns at interface boundaries is not conclusive evidence of a vulnerability, but it is a reliable indicator of where to look more carefully.
Building a reading vocabulary
Reading OpenBSD source for extended periods — the kernel, libc, the base utilities — built a vocabulary for what careful, security-conscious code looks like. The pledge and unveil implementations. The memory safety conventions in malloc.c. The way error paths are handled: consistently, completely, without shortcuts. This vocabulary makes anomalies visible. When a piece of code does not look like the surrounding code, something is worth examining. The difference is not always a vulnerability. But it is always a question.
The same principle applies to any codebase you spend enough time reading. The conventions become automatic. The deviations become visible. This is why reading a codebase for the first time is slower than reading it after a month of engagement: you are building the vocabulary that makes grep results interpretable.
When to reach for something more
Semgrep, CodeQL, and their equivalents have genuine advantages: they understand syntax, not just text, which means they can find patterns that span multiple lines and account for language structure. For data flow analysis — tracking a value from an untrusted source through a series of transformations to a sensitive sink — they are considerably more capable than grep.
The practical workflow is to use grep for quick orientation and hypothesis generation, then reach for structural tools when you have a specific hypothesis to test. The grep tells you where to look. The structural analysis tells you whether what you are looking at is connected to what you think it is connected to. They are complements, not alternatives.
A codebase is a message from its authors about what they were thinking when they wrote it. Static analysis is reading that message carefully enough to notice what was not said. The tools help. The reading is the work.