Targeted fuzzing using static lookahead analysis: how to guide fuzzers using online static analysis

By Valentin Wustholz | Tuesday, March 31st, 2020

In previous posts, we introduced Harvey, a fuzzer for Ethereum smart contracts, and presented two techniques to boost its effectiveness: input prediction and multi-transaction fuzzing.

Harvey is being developed by MythX in collaboration with Maria Christakis from MPI-SWS. It is one of the tools that powers our smart contract analysis service. Sign up for our free plan to give it a try!

In this post, we summarize our upcoming ICSE 2020 paper and provide a high-level overview of how we use online static analysis to guide Harvey. If you are not familiar with fuzzing we suggest to check out the above post since we will assume some basic background below.

Targeted fuzzing

Let’s assume that there are specific target locations in a program you want to fuzz. For example, these could be locations that were recently modified or contain assertions that should be checked. Or maybe a static analyzer has reported warnings for these locations.

The goal of targeted fuzzing is to generate inputs that explore these locations as quickly and frequently as possible. This is challenging since the fuzzer cannot easily predict if a given input explores locations that are “close” to a target location. Unfortunately, the fuzzer only knows which locations are actually explored by the input.

To help the fuzzer with this task, we perform a static lookahead analysis for each input that is added to its queue of “interesting” inputs. The fuzzer uses this (cyclic) queue to select the next input that will be fuzzed and typically adds inputs that increase coverage.

Lookahead Analysis

The diagram below illustrates the idea behind this novel analysis. Given an input that explores a program path π (bold in the diagram), we consider several so-called split points P1 to Pn along that path. For each of these points Pi, starting from P1, we perform a light-weight static analysis that involves two steps.

Lookahead analysis for a given input

First, we consider the prefix πpre of the entire program path π that ends at split point Pi. We compute a condition ϕ that over-approximates all the program states at the split point when executing the prefix. We then check if any target locations are reachable when starting from Pi in a program state that satisfies the condition ϕ. In other words, we check if there are any program paths (with prefix πpre) that reach a target location. In the diagram, the triangles represent sets of paths.

As soon as no targets are found in the second step, we stop the lookahead analysis and we consider the corresponding prefix πpre to be a so-called no-target-ahead prefix. The fuzzer records the split points along this prefix and computes a lookahead identifier by hashing the prefix. For instance, in the diagram above πpre is a no-target-ahead prefix since none of the targets Tx and Ty are reachable from split point Pi.

The lookahead identifier is similar to the path identifier that a greybox fuzzer computes based on which control-flow edges are traversed when executing a given input (see our post on greybox fuzzing for more details). Observe however, that two inputs with different path IDs may share the same lookahead ID if both exercise the same no-target-ahead prefix.

In our implementation, we use Bran, an abstract interpreter for EVM bytecode, to perform the static analysis. Since this analysis runs frequently it needs to be very efficient and light-weight.

Adapting the greybox fuzzing algorithm

Let’s see how we can use this additional information—the split points and the lookahead identifier—to guide the fuzzer. Before that, we will quickly recap the standard greybox fuzzing algorithm below:

// We set up the initial queue by executing the program P for the seed inputs S. 
Q := run_seed_inputs(P, S)
while (running) {
  i := select_input(Q)
  e := assign_energy(i) // Adaptation: We assign energy based on lookahead information.
  while (0 < e) {
    f := fuzz_input(i)
    pid := run_input(P, f)
    if pid not in Q {
      Q[pid] := f
      // Adaptation: We will run the lookahead analysis for input i.
    }
    e := e - 1
  }
}

The fuzzer initially runs the program P with the seed inputs S to initialize the queue Q. While the fuzzer is running, it keeps selecting inputs that will be fuzzed from the queue. It uses a function assign_energy to determine how often a selected input is fuzzed.

When a fuzzed input discovers a new path ID, it is added to the queue (here, a simple map from path ID to input). This is where we run the lookahead analysis (see third comment) to gather additional information about this newly discovered input.

We use this information to adjust how much energy we assign to it in the future (see second comment). We will assign more energy if (1) the input’s no-target-ahead prefix was rarely executed, or (2) any of its split points were rarely executed.

Intuitively, the lookahead ID groups (or partitions) the inputs in the queue since multiple inputs may share the same lookahead ID. The above energy schedule aims to prevent the fuzzer from spending disproportionate amounts of energy on a few partitions that contain many inputs. The split points partition the queue in a similar way and both factors turn out to be important in practice. The key insight is that the fuzzer gains more from discovering inputs that explore new lookahead IDs or new split points than ones that explore existing ones.

Evaluation

We evaluated the effectiveness of this approach on 27 real-world smart contracts and found that the fuzzer was able to reach 83% of the challenging target locations significantly faster (with a speedup of up to 14x). We also found that the overhead of the static analysis was very low (~3s per fuzzing campaign).


In this post, we provided a high-level overview of how we use online static analysis to guide Harvey. For more details please check out our upcoming ICSE 2020 paper.


Thanks to Maria Christakis and Joran Honig for feedback on drafts of this article.

Share this post:
Category:

Easy multi-contract security analysis using Mythril

By Joran Honig | Monday, March 9th, 2020


by Anastasia Dulgier

The MythX platform leverages several internal components to provide the best possible analysis results. One of these components is available open-source; the symbolic executor Mythril. In this article, I’ll demonstrate how you can use Mythril to analyze a set-up of multiple smart contracts.


By default, Mythril will analyze a contract in isolation. Interactions with external contracts are generalized so that we capture all possible vulnerabilities. Sometimes, this means we find a weakness in your smart contract that might not affect your particular setup. That’s because the specific deployment values you use can have a considerable effect on how the system behaves as a whole.

Luckily you can also use Mythril to execute multi-contract analysis and analyze a specific configuration of multiple smart contracts. As a result, you might capture fewer warnings, but the results will be tailor-made to your deployment.

To do multi-contract analysis, we’ll use Mythril’s ability to analyze contracts deployed on an Ethereum network (hint: we’ll use Ganache to launch our private test network). We will deploy our contract on a network, and in doing so, create a possible target for Mythril to analyze.

How To

To perform the multi-contract analysis, we will execute the following steps:

  • Deploy a smart contract system to Ganache
  • Set up Mythril to load code and world state from Ganache
  • ⚡ Fire lasers at the smart contract ⚡

But first, make sure you have everything you need installed:

# The dependencies you will install are: truffle, ganache and mythrilnpm install -g trufflenpm install -g ganache-clipip3 install -U mythril

You will also need to get an example project to try multi-contract analysis on, I’ll be using the simple Metacoin contract.

# These commands create a metacoin directory,# with the metacoin truffle boxmkdir metacoincd metacointruffle unbox metacoin

Deploying your smart contract system to Ganache

We’ll start by getting a test network up and running. This network is where we deploy our contracts and where Mythril will find the smart contract state.

# Execute this command to create a new ganache networkganache-cli -p 7545 --i 5777

With the network set up, we can deploy the contracts; and because we’re using Truffle, this will be quite easy.

truffle migrate

This command will execute all the necessary steps to get your contract set up on the test network we provided. (check out the Truffle documentation for more information on this command)

Now search the ouput of the truffle migrate command for the address where the Metacoin contract (our target) was deployed. 
Look for contract address in the Deploying 'MetaCoin' section.

You can see an example of the output in the image below, where I’ve highlighted the address of the fresh Metacoin Contract: 0x8712c227680bc0c8a4c7a65317C7e7700e5D566f

output of “truffle migrate”

Analysis Time

With the deployment completed, all that is left is to execute the following command (and possibly enjoy a ☕) :

# This command initiates the multi-contract analysismyth a -a <target_address> -l --rpc localhost:7545

As you can see, there are a few options required, here’s a brief explanation of all the elements in this command:

myth a – This is the analyze command that enables the analysis capabilities of Mythril.
-a <target_address> – This option specifies the analysis target, in our case the Metacoin contract account on the test network.
-l – This second option tells Mythril to not just use the code of the target Account, but to use world state data from the Ethereum network whenever possible.
--rpc localhost:7545 – The rpc option tells Mythril where to find an Ethereum node to get all the necessary data. In our case the node is part of our very own test network, which lives on localhost listening on port 7545.

All Good!

$ myth a -a 0x8712c227680bc0c8a4c7a65317C7e7700e5D566f --rpc localhost:7545 -lThe analysis was completed successfully. No issues were detected.

Luckily there were no bugs in the MetaCoin contract. Unfortunately this is not always the case. Try running multi-contract analysis on your own smart contracts and see what you can find!
Also, let me know if you found a nice vulnerability using multi-contract Mythril! (Nothing beats hearing about how Mythril helped catch a nasty bug 🐛)

Share this post:
Category: