Hybrid Analysis Based on Binary Rewriting for Control Flow Graph Construction

Hybrid Analysis Based on Binary Rewriting for Control Flow Graph Construction

Control Flow Graphs (CFGs) serve as fundamental structures for binary program analysis, playing a crucial role in various applications such as vulnerability detection, malware analysis, and code similarity assessment. Traditional approaches for constructing CFGs can be broadly categorized into static and dynamic analysis methods, each with inherent limitations. Static analysis techniques, while efficient and capable of achieving high code coverage, struggle with resolving indirect jumps—a common feature in modern software due to dynamic linking, function pointers, and polymorphism. On the other hand, dynamic analysis methods can accurately capture indirect jumps by monitoring program execution but suffer from low code coverage and significant performance overhead.

To address these challenges, this paper introduces a hybrid analysis approach that combines the strengths of static and dynamic analysis while mitigating their weaknesses. The proposed method leverages binary rewriting to instrument target programs, enabling efficient collection of indirect jump targets during execution. By integrating static analysis results with dynamically acquired control flow information, the hybrid approach constructs more complete and accurate CFGs.

Background and Motivation

A CFG is a directed graph representation of a program’s execution paths, where nodes represent basic blocks—linear sequences of instructions with single entry and exit points—and edges denote control flow transitions between these blocks. Static analysis constructs CFGs by disassembling binary code and analyzing instruction sequences without executing the program. While this method is fast and covers most code paths, it fails to resolve indirect jumps, where the target address is computed at runtime (e.g., via register-based calls or dynamic library loading).

Dynamic analysis, in contrast, executes the program with specific inputs and monitors runtime behavior to capture control flow transitions, including indirect jumps. However, its effectiveness depends heavily on input quality; without diverse test cases, many execution paths remain unexplored. Additionally, dynamic instrumentation techniques, such as dynamic binary instrumentation (DBI), introduce substantial runtime overhead by modifying and monitoring instructions during execution.

The hybrid approach proposed in this paper overcomes these limitations by first performing static analysis to generate an initial CFG and then using fuzz testing to guide dynamic analysis. The key innovation lies in employing binary rewriting—a technique that modifies the target binary before execution—to efficiently log indirect jump targets without the runtime penalties associated with DBI.

Methodology

The hybrid analysis framework, named HBRCFG (Hybrid Binary Rewriting-based Control Flow Graph), consists of three main phases: static analysis, dynamic analysis, and CFG fusion.

Static Analysis Phase

The static analysis phase begins with recursive disassembly, which follows control flow transitions to distinguish code from data more accurately than linear disassembly. Basic blocks are identified based on entry and exit points, where an entry point can be the first instruction of a function, the target of a jump, or the instruction following a branch. Exit points include jumps, returns, and calls. Indirect jumps, such as calls via registers (e.g., call rcx), are marked as unresolved in the initial CFG.

To prevent infinite loops during static analysis—particularly in recursive functions—an algorithm ensures that branches targeting the current function are not further analyzed. This optimization maintains efficiency while preserving correctness.

Dynamic Analysis Phase

The dynamic phase enhances the static CFG by resolving indirect jumps. First, a fuzzing engine generates diverse test inputs to maximize code coverage. Unlike symbolic execution, which relies on constraint solving and often faces path explosion, fuzz testing uses genetic algorithms to mutate inputs, prioritizing those that uncover new execution paths.

The target binary is then rewritten to include instrumentation code that logs indirect jump addresses during execution. Unlike DBI, which modifies instructions at runtime, binary rewriting embeds monitoring logic directly into the binary, reducing overhead. When executed with fuzz-generated inputs, the instrumented program records indirect jump targets, which are later integrated into the static CFG.

CFG Fusion

The final phase merges static and dynamic analysis results. For each unresolved indirect jump in the static CFG, the fusion algorithm identifies corresponding dynamic records and updates the graph with new edges. If multiple targets are observed for the same indirect jump (e.g., due to conditional branches), all are incorporated, ensuring completeness.

Experimental Evaluation

The effectiveness of HBRCFG was validated using both synthetic programs and real-world binaries from the Cyber Grand Challenge (CGC). Key findings include:

  1. Effectiveness in Resolving Indirect Jumps: In a manually constructed program with function pointer calls, HBRCFG successfully identified all indirect jump targets, whereas static analysis alone missed these transitions.
  2. Improved Coverage: Compared to static tools like IDA Pro, HBRCFG discovered additional functions and edges, particularly in programs with dynamic library calls or complex control flow.
  3. Performance Advantages: Against dynamic instrumentation-based approaches (e.g., CFGConstructor), HBRCFG achieved significant speedups—up to 38× faster—while recovering more indirect jumps and their subsequent call chains.

For large binaries like CableGrind (11.6 MB), HBRCFG uncovered 19.5% more indirect jumps and 128% more edges, demonstrating scalability. The fuzzing phase efficiently generated high-quality inputs, with execution times ranging from milliseconds for small programs to under an hour for large ones.

Discussion and Future Work

The hybrid approach strikes a balance between static and dynamic analysis, offering high coverage with minimal overhead. Binary rewriting proves particularly effective for indirect jump resolution, avoiding the pitfalls of DBI and symbolic execution. However, input generation remains a bottleneck; future work could optimize fuzzing by prioritizing bytes that influence control flow, further reducing analysis time.

Another direction involves extending HBRCFG to handle obfuscated binaries, where static disassembly is challenging. Combining the hybrid approach with advanced deobfuscation techniques could enhance robustness against adversarial code.

Conclusion

This paper presents a practical and efficient method for constructing complete CFGs from binaries. By synergizing static analysis with binary rewriting-guided dynamic execution, HBRCFG resolves indirect jumps and recovers hidden control flow paths that traditional tools miss. Experimental results confirm its superiority in both accuracy and performance, making it a valuable tool for security analysis, reverse engineering, and vulnerability research.

doi.org/10.19734/j.issn.1001-3695.2024.06.0281

Was this helpful?

0 / 0