Spectral Complexity Screening for Binary Security Analysis

1. Introduction

Security researchers face an adversarial triage problem: given a large corpus of compiled binaries, identify the small fraction containing exploitable vulnerabilities and, within those, the precise functions warranting costly symbolic execution or manual reverse engineering. The state of practice relies on human expertise developed over years of experience — expertise that does not scale. macOS 26 ships with over 3,000 Mach-O binaries in framework and private framework directories, each containing several hundred functions on average.

TriageForge is built around the observation that call graph spectral properties discriminate between typical and atypical code structure in a statistically principled way. The approach draws on three bodies of theory: random matrix theory [1,2], SAT phase-transition analysis [6,10], and symbolic execution [13]. Together, these provide a fast, theory-grounded screening mechanism that reduces the candidate space before expensive analysis begins.

The paper is organised as follows. Section 2 surveys the relevant mathematical background. Section 3 describes the TriageForge pipeline. Section 4 provides theoretical analysis. Section 5 presents an empirical evaluation on 335 macOS PrivateFrameworks binaries. Sections 6 and 7 discuss limitations and conclude.

2. Background

2.1 Random Matrix Theory

Random matrix theory, initiated by Wigner in the study of nuclear energy levels [1,2], characterises the statistical properties of eigenvalue spectra of large random matrices. The central result is the Wigner semicircle law: for an n×n symmetric matrix with i.i.d. entries of mean zero and variance σ², the empirical spectral distribution converges as n→∞ to:

ρ(x) = (2 / πR²) · √(R² − x²), |x| ≤ R, R = 2σ√n

The Tracy–Widom distribution [3] characterises the limiting distribution of the largest eigenvalue: the normalised quantity n^2/3(λ_max / (σ√n) − 2) converges in distribution to the Tracy–Widom law of order 1 (GOE).

Graph energy, defined by Gutman [9] as E(G) = ∑|λ_i|, captures total spectral mass. Eigenvalue entropy H = −∑ p_i log p_i, where p_i = |λ_i| / ∑|λ_j|, measures dispersion of spectral mass and is sensitive to structural regularities absent from random models.

Figure 1. Empirical eigenvalue histogram for a 400×400 random symmetric matrix compared to the theoretical Wigner semicircle density. Convergence improves with n.

2.2 Configuration Model Null

For call graphs with non-uniform degree sequences, the appropriate null distribution is the configuration model [4]: the maximum-entropy random graph ensemble consistent with the observed degree sequence. TriageForge z-scores each spectral statistic against this null, using an analytical approximation for λ_max and graph energy calibrated by Monte Carlo degree-sequence-preserving rewiring [16].

2.3 SAT Phase Transitions

Random 3-SAT instances exhibit a sharp satisfiability phase transition [6]. As the clause-to-variable ratio α = m/n increases through a critical threshold α_c, satisfiability probability drops from near-1 to near-0. Extensive numerical studies place this at α_c ≈ 4.267 [6,10], with theoretical support from the cavity method [7]. Near the transition, backbone variables — those taking the same value in all satisfying assignments — constitute a significant fraction, correlating with computational hardness [10].

Figure 2. Illustrative 3-SAT phase transition. Probability of satisfiability vs. clause-to-variable ratio α, with critical threshold α_c ≈ 4.267 marked.

2.4 Cyclomatic Complexity

McCabe [5] defined cyclomatic complexity V(G) = E − N + 2P for a program control flow graph G = (N, E). V(G) > 10 is the classical threshold for high-complexity functions, widely used as a proxy for defect likelihood [17].

3. The TriageForge Pipeline

3.1 Architecture

TriageForge processes binaries through four sequential stages with aggressive early termination. C2 provides a binary-level triage signal in approximately 28 seconds; full C1–C6 analysis of a flagged binary averages 8–12 minutes.

Table 1. TriageForge pipeline stages.
Stage	Name	Input	Output
C1	SAT Backbone	Control flow graph	Function rank
C2	RMT Screen	Call graph	Binary flag (NORMAL / ANOMALOUS)
C3	Template Scan	C1 top-N functions	Dataflow hits
C6	Symbolic Taint	C3 hits	PoC candidate

3.2 C1: SAT Backbone Prioritisation

The function's control flow graph is encoded as 3-SAT-like reachability constraints. The clause-to-variable ratio α(f) is computed and a chi-squared proximity score χ²(f) = (α(f) − α_c)² / α_c is derived. Functions with small χ² — near the phase transition — are ranked highest, as proximity to α_c correlates empirically with tightly-coupled computational structure.

3.3 C2: RMT Spectral Screen

The call graph is extracted via lief [11] for Mach-O parsing and capstone [12] for control-flow reconstruction. For ARM64e binaries, authenticated pointer chains are resolved through the DYLD chained fixup map. Three spectral statistics are z-scored against the configuration-model null. A binary is flagged ANOMALOUS if |z| > 2.0 for any statistic.

Engineering note. Swift and Objective-C binaries produce degenerate z = 0 results because BLR (branch-to-register) indirect calls are unresolvable statically. This is an expected limitation of static call graph analysis.

3.4 C3: Template Dataflow Analysis

Five structural dataflow templates are evaluated on C1-ranked functions: MACH_OOB, XPC_TYPE, INT_OVF, PORT_UAF, and IOKIT_OOB. Each specifies a source (attacker-controlled input), a sink (memory operation), and barrier conditions (bounds checks, type validations). A V(G) > 10 cyclomatic complexity gate is enforced.

3.5 C6: Symbolic Taint Analysis

The angr framework [8] places taint marks at source operations and propagates them symbolically. Z3 synthesises a concrete input on finding a taint-to-sink path without intervening bounds checks. On Apple ARM64e, Pointer Authentication Codes (PAC) block function-pointer hijacking even where write primitives exist; C6 adjusts impact classification accordingly.

4. Theoretical Analysis

4.1 Why Spectral Statistics Discriminate

RMT universality results [2,3] establish that spectral statistics of many random matrix ensembles follow the same limiting distributions regardless of entry distribution. Two mechanisms drive discrimination:

Negative deviations. Cryptographic S-boxes and sorting routines implement near-regular call subgraphs (uniform out-degree in S-boxes; near-binary-tree structure in merge sort). For a call graph with heterogeneous degree distribution, such regularity produces large negative z-scores on energy and entropy — spectral mass is more concentrated than the null predicts.

Positive deviations. Complex XPC dispatch tables generate hub nodes with high in-degree, shifting λ_max upward and producing large positive z-scores. Both anomaly directions are diagnostically useful but require different interpretations.

4.2 Backbone Proximity as Complexity Proxy

The mapping from function control flow graphs to SAT-like constraints follows naturally from symbolic execution [13]: the control flow graph induces reachability constraints between basic blocks, and encoding these in conjunctive normal form yields a clause-to-variable ratio α characterising the function's logical density.

The C1 score is a heuristic prioritisation signal rather than a rigorous phase-transition argument: software functions are far from random 3-SAT instances. It performs well in practice because high-scoring functions consistently implement complex protocol parsing or message dispatch — precisely the functions of security interest.

4.3 Completeness of Symbolic Execution

Symbolic execution is theoretically sound for bounded programs but suffers from path explosion in practice. TriageForge mitigates this by applying C6 only to C1–C3 pre-screened functions and by using angr's loop bounding and state merging. The C6 stage is a candidate PoC generator, not a sound vulnerability prover.

5. Empirical Validation

5.1 Dataset

The evaluation corpus comprises 335 Mach-O binaries from /System/Library/PrivateFrameworks/ on macOS 26.4.1 (build 25E253, Apple Silicon ARM64e). Binary sizes ranged from 47 KB to 84 MB. All analyses used the ARM64e slice.

5.2 Screening Results

C2 flagged 12 of 335 binaries (3.6%) as ANOMALOUS. The remaining 323 (96.4%) were classified NORMAL and required no further analysis. All 12 candidates were fully resolved through C3 evaluation and manual reverse engineering. Zero exploitable buffer overflows were confirmed.

Figure 3. C2 z-score distribution (z(λ_max) vs. z(energy)) for 335 PrivateFrameworks binaries. Red diamonds: ANOMALOUS (n=12). Grey circles: NORMAL (n=323). Dashed circle: |z| = 2.0 threshold.

5.3 Selected Findings

Table 2. Selected anomalous binary verdicts.
Binary	Anomaly	Verdict
FindMyMacd	E, H	CLOSED — 37/37 memcpy callers have explicit bounds checks
mediasharingd	H (−8.44)	CLOSED — DMAP/DAAP TLV source is constant global
AppleCredentialManagerDaemon	E, H	CLOSED — entitlement gate; all callers checked
ecosystemd	λ_max (+2.56)	CLOSED — Swift runtime sorting artifact
slapadd / slapacl / slapauth / slapcat	E, λ_max	ARTIFACT — OpenLDAP S-box; do not file

5.4 False Positive Taxonomy

Category 1: Cryptographic lookup tables. OpenLDAP slap* S-box tables produce near-regular subgraphs with extreme negative z-scores on energy and entropy (z(energy) ∈ [−0.61, −1.55]). All four are third-party code bundled with macOS. Detection heuristic: identical binary family with uniform anomaly signature.

Category 2: Standard library sorting. Swift runtime tagged-pointer merge sort produces a near-regular subgraph indistinguishable spectrally from cryptographic regularity. Heuristic: presence of swift_retain/swift_release import symbols.

Category 3: No network attack surface. catutil and VoiceBankingDiagnostics have anomalous spectral properties from complex parsing routines but receive no data from unprivileged network sources. Automated attack surface reachability assessment remains an open problem.

5.5 Performance

On a 64 GB AMD Ryzen 9 Linux workstation, C2 processes a 3 MB ARM64e binary in approximately 28 seconds (FastC2 path: lief+capstone, no symbolic execution). Full C1–C6 analysis averages 8–12 minutes. The 335-binary sweep ran as a multi-day distributed campaign.

6. Discussion

6.1 Theoretical Limitations

The configuration model null conditions on degree sequence only. A more accurate null would also condition on clustering coefficient [19], capturing the modular structure inherent in software call graphs. This extension is computationally feasible and is a direction for future work.

The C1 SAT backbone mapping is heuristic. Software control flow graphs are far from random 3-SAT instances; proximity to α_c is a useful empirical signal but carries no formal guarantee of backbone fraction or computational hardness.

6.2 Security Research Context

TriageForge is a prioritisation tool, not a sound vulnerability prover. The empirical false negative rate is unknown without a corpus of ground truth vulnerabilities. All closed findings were verified by full manual analysis. The pipeline was not deployed on any binary subsequently found to contain an exploitable vulnerability, leaving detection coverage an open question.

6.3 Future Directions

(1) Binary provenance heuristics to suppress third-party artifact signatures. (2) Dynamic call graph enrichment via DTrace to resolve BLR indirect calls. (3) Null distributions conditioned on clustering coefficient. (4) Extension to /System/Library/Frameworks for cross-corpus comparison.

7. Conclusion

We have presented TriageForge, a binary vulnerability triage pipeline grounded in random matrix theory, SAT phase transition analysis, cyclomatic complexity theory, and symbolic execution. The C2 spectral screen provides a statistically principled binary-level signal in approximately 28 seconds per binary, reducing a 335-binary corpus to 12 candidates (3.6%) for deeper analysis.

All 12 candidates were fully resolved; zero exploitable vulnerabilities were confirmed. The false positive taxonomy — cryptographic lookup tables, standard library sorting, and no-attack-surface binaries — provides actionable guidance for pipeline refinement. Spectral complexity screening represents a promising, mathematically principled direction for large-scale binary security analysis.

Spectral Complexity Screening for Binary Security Analysis:
A Random Matrix Theory Approach

1. Introduction

2. Background

2.1 Random Matrix Theory

2.2 Configuration Model Null

2.3 SAT Phase Transitions

2.4 Cyclomatic Complexity

3. The TriageForge Pipeline

3.1 Architecture

3.2 C1: SAT Backbone Prioritisation

3.3 C2: RMT Spectral Screen

3.4 C3: Template Dataflow Analysis

3.5 C6: Symbolic Taint Analysis

4. Theoretical Analysis

4.1 Why Spectral Statistics Discriminate

4.2 Backbone Proximity as Complexity Proxy

4.3 Completeness of Symbolic Execution

5. Empirical Validation

5.1 Dataset

5.2 Screening Results

5.3 Selected Findings

5.4 False Positive Taxonomy

5.5 Performance

6. Discussion

6.1 Theoretical Limitations

6.2 Security Research Context

6.3 Future Directions

7. Conclusion

Acknowledgements

References

Spectral Complexity Screening for Binary Security Analysis:A Random Matrix Theory Approach

1. Introduction

2. Background

2.1 Random Matrix Theory

2.2 Configuration Model Null

2.3 SAT Phase Transitions

2.4 Cyclomatic Complexity

3. The TriageForge Pipeline

3.1 Architecture

3.2 C1: SAT Backbone Prioritisation

3.3 C2: RMT Spectral Screen

3.4 C3: Template Dataflow Analysis

3.5 C6: Symbolic Taint Analysis

4. Theoretical Analysis

4.1 Why Spectral Statistics Discriminate

4.2 Backbone Proximity as Complexity Proxy

4.3 Completeness of Symbolic Execution

5. Empirical Validation

5.1 Dataset

5.2 Screening Results

5.3 Selected Findings

5.4 False Positive Taxonomy

5.5 Performance

6. Discussion

6.1 Theoretical Limitations

6.2 Security Research Context

6.3 Future Directions

7. Conclusion

Acknowledgements

References

Spectral Complexity Screening for Binary Security Analysis:
A Random Matrix Theory Approach