Malware analysts spend appreciable time inspecting the circumstances underneath which malicious applications will take sure actions. For instance, contemplate a malware program that incorporates a examine for the presence of a debugger, a standard method meant to hinder evaluation. The analyst could want to know if there’s a viable execution path that circumvents this examine and, if there’s a path, what inputs and environmental circumstances are wanted to traverse it. We name this kind of reasoning path discovering. The paths to seek out could be primarily based on quite a few standards, akin to user-specified begin and finish addresses or passing by particular program factors. CERT’s reverse engineers and malware analysts have discovered path discovering is helpful when analyzing malware.
In a earlier put up, we mentioned Kaiju, the CERT/CC‘s extension framework to Ghidra, the Nationwide Safety Company’s software program reverse engineering (SRE) software suite. Kaiju contains many instruments to help malware evaluation and reverse engineering. One of many extra complicated plugins included in Kaiju is a Satisfiability Modulo Theories (SMT) primarily based path evaluation software named GhiHorn. On this put up I delve deeper into GhiHorn to debate the way it works and the way it may be used to resolve path evaluation issues.
Path Discovering With out Supply Code
As famous above with the debugger detection instance, we consider that many widespread challenges in malware evaluation and reverse engineering could be framed when it comes to discovering a path to a particular level in a program. On the binary degree we don’t have the additional info that you just usually have in supply code. For instance, as a substitute of programmer-named variables we should cause about CPU registers or reminiscence values. Moreover, optimizations carried out throughout compilation could lead to unique or convoluted code preparations that make it laborious to acknowledge widespread program options.
Recall the instance from our earlier weblog put up on binary path discovering, which is proven in Determine 1. Discovering a path from line 1 to the goal at line 20 (marked as “Goal line!”) is comparatively easy by visible inspection: we all know that native integer variable x have to be set to 42 due to the situation at line 19, which may solely occur at line 12 when variable y is the same as 2, which in flip depends upon the three integer enter parameters, i, j, and ok being set to 6, 7, and 8, respectively.

Determine 1: Instance program to show path discovering
Nevertheless, when this program is compiled, quite a few complexities emerge:
- Variable names and kinds are misplaced.
- Optimizations could lead to convoluted code buildings.
- There are various meeting directions which might be mixed to type higher-level operations.
- Every CPU offers its personal instruction set structure (ISA) with particular calling conventions, registers, and operations.
From Pharos to Ghidra
We now have been creating instruments to evaluate path viability in our Pharos Binary Evaluation Framework for a while. Pharos’ path discovering instruments encode program management circulation graphs as SMT assertions which might be solved by the Z3 theorem prover. The unique encoding scheme utilized in Pharos is predicated on mutually unique Z3 assertions which might be generated utilizing Pharos’ evaluation. Representing a program utilizing this scheme turned out to be cumbersome for a number of causes:
- The notation was not designed for human legibility, e.g., information and management circulation buildings and the transitions between them had been made utilizing normal Z3 assertions, which made encoding management circulation unnatural and laborious for analysts to know.
- Widespread program phenomena had been difficult to mannequin, akin to calling features, recursion, and sophisticated information sorts.
- State variables turned derived symbolic values generated by Pharos that could possibly be laborious to map again to significant program buildings.
Our different Pharos-based path evaluation software is ApiAnalyzer. ApiAnalyzer traverses program management circulation graphs in search of sequences of Utility Program Interface (API) operate calls that match prescribed behavioral signatures. The issue of figuring out a path in a program that satisfies particular constraints, akin to traversing a sequence of API operate calls, lends itself properly to path discovering. Our new work thus seeks to reframe this API evaluation when it comes to path discovering.
The Pharos strategy to program evaluation was initially designed to be easy, quick, and helpful for malware analysts. Many of those objectives had been achieved on the expense of deeper analyses. It seems that superior path evaluation requires extra constancy that Pharos had problem offering.
Pharos can reply sure questions shortly, for instance: Is the worth used at X the identical as the worth used at Y? Since Pharos focuses on velocity, nonetheless, it trades off how complicated paths can turn out to be earlier than accuracy is now not assured. Because of this, reasoning about complicated program paths in Pharos is tough. As famous above, we had been already exploring extra rigorous path evaluation utilizing SMT solvers in Pharos when Ghidra was launched. The Ghidra decompiler, with its wealthy programming API, opened new and thrilling methods to strategy path evaluation issues.
GhiHorn
We now have created a brand new Ghidra-based software named GhiHorn (See Determine 2 beneath) to make the most of our latest advances and to supply an extensible framework for binary-path evaluation issues. GhiHorn helps reverse engineers and malware analysts reply attention-grabbing questions akin to
- Does a path exist to a specified level in a program (i.e. feasibility)?
- If there’s a path, what values needs to be assigned to program variables to succeed in it?
- If there’s not a possible path, why?
- Does the trail point out an attention-grabbing or indicative conduct?

Determine 2: Ghihorn Person Interface
We have named this new Kaiju software “GhiHorn” (GHI-dra HORN-ifier), consistent with the custom of comparable source-code evaluation instruments utilizing Horn clauses, together with SeaHorn (a C language hornifier) and JayHorn (a Java hornifier). GhiHorn is created within the spirit of those different verification instruments, nevertheless it operates on Ghidra-generated information buildings, particularly p-code. Ghidra’s decompiler and p-code language present sturdy details about program semantics for various architectures. In accordance with Ghidra’s p-code documentation:
A p-code operation is the analog of a machine instruction. All p-code operations have the identical fundamental format internally. All of them take a number of varnodes as enter and optionally produce a single output varnode.
Varnodes in Ghidra are information parts represented as triples (reminiscence house, offset, and dimension) on which p-code operates. P-code and varnodes are important to Ghidra’s decompilation course of. Ghidra generates each p-code and varnodes for every instruction in a program throughout preliminary program evaluation and disassembly. The p-code and varnodes initially generated are uncooked within the sense that they’re solely meant to characterize the instruction semantics with little or no high-level info gleaned from larger order evaluation.
Throughout decompilation, pcode and varnodes are refined and related to summary native variables and source-code degree information buildings. We time period this “excessive p-code” as a result of it’s sure to information buildings in Ghidra that embrace decompilation info, akin to HighVariables and HighFunctions. Thankfully, the construction of excessive p-code lends itself to SMT-based encoding.
With Ghidra offering the info and management circulation buildings essential to characterize a path, we want a approach to encode this system buildings for an SMT solver. Enter Horn clauses, that are a particular encoding for verification circumstances that may be constructed mechanically from program management circulation buildings. Researchers have continued to make advances in Horn clause solvers, and lots of SMT solvers, together with Z3, now embrace Horn solvers, which makes Horn clause encoding a viable answer for path evaluation issues. We delve into a number of the particulars of how GhiHorn encodes p-code beneath.
GhiHorn Encoding
GhiHorn encodes Ghidra p-code as SMT-Lib Horn clauses appropriate for the Z3 solver. Horn clauses are rule-like constraints expressed as implications. Horn clauses can characterize transitions by a management circulation graph of the logical type:
𝐼𝑛𝑝𝑢𝑡 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 ∧𝐶𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠⇒𝑂𝑢𝑡𝑝𝑢𝑡 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛Input Location ∧Constraints⇒Output Location
the place an enter location conjoined with constraints on state variables transitions to an output location. In program phrases, the enter location is the originating fundamental block, the constraints are circumstances over program variables, and the output location is a succeeding fundamental block. An instance of a conditional expression and the related Horn rule generated for it are proven in Desk 1 beneath. When management circulation arrives at Line 1 and the variable x is 42, then management circulation could progress to Line 2. As a result of it is a resolution level within the code, a second rule is required to transition from Line 1 to Line 3. Taken collectively, these guidelines mannequin the conditional construction proven within the following supply code.

Desk 1: Instance of a conditional expression and the related Horn rule generated for it
Ghidra offers a management circulation graph from which the enter and output elements of the principles could be derived. Provided that p-code represents machine directions, you should utilize them to characterize state transitions inside blocks. Translating p-code statements into Z3 expressions then seems to be comparatively straight ahead. For instance, here’s a complicated supply code assertion (proven in Desk 2) that was generated by Ghidra’s decompiler: ((param_1 >> 2) + 1U & 0xff) == 0x55. This instance is used as a result of it decomposes to a number of p-code operations, and these could be mapped to Z3 expressions. Observe that the supply code operations, akin to >> have analogs in each p-code (INT_SRIGHT) and in Z3 SMT (bvashr). For essentially the most half variables are represented as 64-bit bit vectors.
Observe that the varnodes current in p-code operations are changed with variables within the ultimate Z3 expression (e.g., param_1). That is attainable as a result of, throughout decomposition, Ghidra assocates varnodes with higher-level decompiler parts, akin to variables. Working on significant information parts akin to variables offers for far more attention-grabbing outcomes, that are mentioned beneath.

Desk 2: Decompilation, p-code, and Z3 expressions.
Because the desk above illustrates, the decompilation was generated by Ghidra. The p-code operations embrace varnodes represented as triples: (reminiscence house, offset, dimension). The Z3 expressions are all equalities of the shape output = operations enter.
After every fundamental block is hornified it’s organized right into a set of Z3 guidelines. Every rule is an implication the place the antecedent is the supply fundamental block and the consequence is the sink block. A whole instance primarily based on the Ghidra decompilation proven in Desk 2 is proven in Determine 3. The rule captures the transition from the fundamental block at deal with 0x100003f60 to the fundamental block at deal with 0x100003fa2. The constraints seize circumstances on param_1 taken from the p-code operations that have to be true to allow the transition.

Determine 3: Horn rule generated by GhiHorn
A group of those guidelines is generated to characterize a management circulation graph for a program. Observe that variables are handed as state (i.e., arguments) to each the enter and output blocks. On this manner, program state is up to date and maintained. Finally the encoded program is handed to Z3, and GhiHorn queries for a state (i.e., the deal with current within the encoding) to find out if a viable path is current.
Though this strategy to reasoning about program paths is intuitive it requires extra buildings to mannequin actual applications. For simplicity’s sake, all reminiscence operations are carried out on a single giant reminiscence array (named Reminiscence in Determine 3) that’s managed by Z3. This strategy retains issues easy however could be restricted, and it’s certainly not performant. In future variations of GhiHorn we plan to enhance reminiscence modeling by higher dealing with pointers and higher illustration of various reminiscence areas.
One other downside is the right way to deal with exterior dependencies, akin to imported API features for which the code might not be obtainable. GhiHorn offers a functionality to construct a simulated API ecosystem by compiling library information that include easy implementations of widespread API features. For instance, Determine 4 exhibits the supply code for the simulated API features CreateFile() and CloseHandle(). On this instance, the implementation merely maintains an array meant to mannequin a file deal with desk, which makes it easy to trace which handles are open and closed, nothing extra.

Determine 4: Simulated API features
Trying Forward
In a future put up I’ll current two path evaluation instruments that now we have carried out on prime of the Ghihorn platform: Path Analyzer and API Analyzer.















