Towards Automated Security Analysis of Smart Contracts based on Execution Property Graph - Clue

2023 | Kaihua Qin* · Zhe Ye* · Zhun Wang · Weilin Li · Liyi Zhou · Chao Zhang · Dawn Song · Arthur Gervais | https://arxiv.org/pdf/2305.14046.pdf

Identifying and mitigating vulnerabilities in smart contracts is crucial, especially considering the rapid growth and increasing complexity of DeFi platforms. To address the challenges associated with securing these contracts, we introduce a versatile dynamic analysis framework specifically designed for the EVM. This comprehensive framework focuses on tracking contract executions, capturing valuable runtime information, while introducing and employing the EPG to propose a unique graph traversal technique that swiftly detects potential smart contract attacks. Our approach showcases its efficacy with rapid average graph traversal time per transaction and high true positive rates. The successful identification of a zero-day vulnerability affecting Uniswap highlights the framework's potential to effectively uncover smart contract vulnerabilities in complex DeFi systems.

Read Paper

These figures show the solidity source code and three corresponding basic execution representation graphs: (i) Call Trace Graph (CTG), (ii) Dynamic Control Flow Graph (DCFG), and (iii) Dynamic Dependence Graph (DDG). The CTG represents the calling relationships between multiple smart contracts in a transaction. The DCFG represents the dynamically executed smart contract code as a graph, where each vertex denotes a basic block. The DDG represents data dependencies and control dependencies in a smart contract built from concrete executions.

EPG is a comprehensive graph representation of smart contract executions, which is constructed by merging the three basic property graphs, CTG, DCFG, and DDG. It combines the dynamic execution information on EVM bytecode level, representing and formalizing contract executions as a graph. The left figure shows the partial EPG of a reentrancy attack transaction.

This flow chart shows the high-level architecture of Clue. Clue offers support for both online and offline modes. The online mode enables real-time analysis of unconfirmed transactions, while the offline mode facilitates postmortem analysis. Central to Clue is an EVM emulator, which emulates and tracks the execution of the EVM. Clue allows extracting the data, control, asset flows from a transaction and further constructing EPG efficiently. Based on the EPG, we devise a graph traversal approach for identifying smart contract attacks automatically.

The EPG provides extensive information about the contract executions involved in a transaction. The Graph Traversal refers to the process of visiting vertices of a graph in a specific manner. This process can involve simply visiting every vertex in a predetermined order, or using more sophisticated rules to navigate through the graph. It is a prevalent method for mining information in property graphs, automates the identification of contract attacks.

Clue's Performance in Detecting Typical Attacks

To evaluate Clue's performance in detecting reentrancy, access control, and price manipulation attacks, we construct three separate datasets: (i) an Attack, (ii) a High-Gas, and (iii) a Regular dataset. The Attack dataset comprises all attack transactions that were reported in SoK: Decentralized Finance (DeFi) Attacks and correspond to the evaluated vulnerability type. Its main objective is to assess the true positive rate (FPR) and the false negative rate (FNR). Sampling from the non-attack transactions that have interacted with the related victim contracts in the Attack dataset, the High-Gas and Regular datasets comprise about 1k transactions with the highest gas, and about 20k randomly selected transactions, respectively. The primary objective is to assess the true negative rate (TNR) and false positive rate (FPR).

Reentrancy Attack

The Attack dataset showcases a high true positive rate (91.95%) and a relatively low false negative rate (8.05%) with all false negatives resulting from "No Asset Flow" cases. In non-attack datasets, the true negative rates are remarkably high (99.81%) for High-Gas and (99.99%) for Regular. The few false positives are caused by Flash Loan and Rebase Token cases.

Dataset	Attack	Non-Attack
Dataset	Attack	High-Gas	Regular
Size	87	1,077	19,985
Gas Cost	3.33 ± 3.41M	2.13 ± 1.38M	0.24 ± 0.29M
Generic Rule
Traversal Time	108 ± 136ms	20 ± 21ms	7 ± 3ms
TP (%)	80 (91.95%)	-	-
FN (%)	7 (8.05%)	-	-
TN (%)	-	1075 (99.81%)	19,984 (99.99%)
FP (%)	-	2 (0.19%)	1 (0.01%)
Refined Rule
Traversal Time	0.32 ± 0.93s	52 ± 109ms	16 ± 347ms
TP (%)	87 (100%)	-	-
FN (%)	0 (0%)	-	-
TN (%)	-	1,069 (99.26%)	19,812 (99.08%)
FP (%)	-	8 (0.74%)	173 (0.87%)

Access Control Attack

The Attack dataset demonstrates a high true positive rate (75.41%) and a relatively low false negative rate (24.59%) with all false negatives resulting from "Multi-tx" cases. In non-attack datasets, the true negative rates are remarkably high (98.44%) for High-Gas and (94.43%) for Regular. The few false positives stem from complex DeFi transactions and insufficient authorization checks.

Dataset	Attack	Non-Attack
Dataset	Attack	High-Gas	Regular
Size	61	1,091	19,992
Gas Cost	0.22 ± 0.65M	2.21 ± 1.53M	0.24 ± 0.28M
Generic Rule
Traversal Time	9 ± 18ms	0.7 ± 8.6s	13 ± 244ms
TP (%)	38 (62.30%)	-	-
FN (%)	23 (37.70%)	-	-
TN (%)	-	942 (86.34%)	14,850 (74.28%)
FP (%)	-	149 (13.66%)	5,142 (25.72%)
Refined Rule
Traversal Time	48 ± 121ms	8 ± 47s	0.08 ± 3.07s
TP (%)	46 (75.41%)	-	-
FN (%)	15 (24.59%)	-	-
TN (%)	-	1,074 (98.44%)	18,879 (94.43%)
FP (%)	-	17 (1.56%)	1,113 (5.57%)

Price Manipulation Attack

The refined rule yields a high true positive rate (94.44%) and a relatively low false negative rate (5.56%) in the Attack dataset, with false negatives resulting from low profit margin and multi-transaction cases. In non-attack datasets, the true negative rates are significantly improved (98.51%) for High-Gas and (99.48%) for Regular after incorporating several refinements. The remaining false positives are primarily caused by arbitrage, complex transactions, and add/remove liquidity actions.

Dataset	Attack	Non-Attack
Dataset	Attack	High-Gas	Regular
Size	54	1,075	19,989
Gas Cost	6.89 ± 3.37M	2.14 ± 1.38M	0.24 ± 0.26M
Generic Rule
Traversal Time	32 ± 17ms	7 ± 27ms	2.2 ± 1.1ms
TP (%)	53 (98.15%)	-	-
FN (%)	1 (1.85%)	-	-
TN (%)	-	322 (29.95%)	18,410 (92.10%)
FP (%)	-	753 (70.05%)	1,579 (7.90%)
Refined Rule
Traversal Time	47 ± 23ms	10 ± 24ms	2.4 ± 1.5ms
TP (%)	51 (94.44%)	-	-
FN (%)	3 (5.56%)	-	-
TN (%)	-	1,059 (98.51%)	19,886 (99.48%)
FP (%)	-	16 (1.49%)	103 (0.52%)

Towards Automated Security Analysis of Smart Contracts based on Execution Property Graph - Clue

Contract Execution Representation Graphs

Execution Property Graph (EPG)

Architecture of Clue

Traversal Based Security Analysis

Clue's Performance in Detecting Typical Attacks

Reentrancy Attack

Access Control Attack

Price Manipulation Attack