2023 | Yu Gai* · Liyi Zhou* · Kaihua Qin · Dawn Song · Arthur Gervais | https://arxiv.org/pdf/2304.12749.pdf
This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, BlockGPT is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of BlockGPT through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 transactions per second on average. Our results show that, BlockGPT identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation.
High-level overview of the BlockGPT defense mechanism, which consists of the following four major steps. ➊ BlockGPT is bootstrapped by feeding in a dataset of historical transactions to train the model using unsupervised learning. ➋ Depending on the system and threat model, BlockGPT detects new block states, including already confirmed transactions, and pending transactions. ➌ BlockGPT ranks transactions based on how abnormal their execution traces are. ➍ If an abnormal transaction is detected , BlockGPT triggers a defense mechanism such as a front-running emergency pause.
This table provides a comparison of intrusion detection and prevention techniques, highlighting the unique aspects of each method. Unlike reward-based approaches, our technique employs an unrestricted search space, enabling it to identify unexpected execution patterns instead of focusing solely on profitable vulnerabilities. In contrast to pattern-based techniques (dynamic analysis, fuzzing, symbolic execution, and static analysis), our method does not rely on predefined rules or patterns, which allows it to detect a broader range of anomalies. Furthermore, our technique is capable of real-time analysis, a feature not present in pattern-based symbolic execution or static analysis methods.
Technique | Assumed Prior Knowledge | Search space Unrestricted From Vulnerability Patterns | Real-Time Capable | Application Agnostic |
---|---|---|---|---|
Rank based -- the goal is to find all unexpected execution patterns, implicitly capturing vulnerabilities | ||||
BlockGPT (this paper) | All historical transactions | Unrestricted | Yes (0.16s) | Yes |
Reward based -- the goal is to extract financial revenue, implicitly capturing vulnerabilities | ||||
APE | N/A | Only profitable patterns | Yes (0.07s) | Yes |
Naive Imitation | N/A | Only profitable patterns | Yes (0.01s) | Yes |
DeFiPoser | DApp models | Only profitable patterns + Limited by the DApp models | Yes (5.93s) | No |
Pattern based -- the goal is to match / classify predefined known vulnerability patterns with rules (including machine learning methods) | ||||
Pattern based dynamic analysis | Rule | Limited by the rule | Yes | Partially |
Pattern based fuzzing | Rule + ABI / DApp models | Limited by the rule | Partially | Partially |
Pattern based symbolic execution | Rule + Source code / Bytecode | Limited by the rule | N/A | Partially |
Pattern based static analysis | Rule + Source code / Bytecode | Limited by the rule | N/A | Partially |
Proof based -- the goal is to prove that a set of smart contracts meet specific security properties | ||||
Formal verification | Formal security properties + Source code / DApp models | Limited by the security properties | N/A | Partially |
The table below presents the performance of the intrusion detection method under various alarm threshold configurations, organized by the number of transactions interacting with the vulnerable smart contracts. The results indicate that using a lower alarm threshold enables the detection of a higher percentage of attacks, albeit at the cost of an increased false positive rate. Notably, the efficacy of the alarm threshold varies across different dataset sizes, emphasizing the need to select a suitable threshold based on the specific attributes of the smart contract under investigation.
Dataset Size | Percentage Ranking Alarm Threshold (%) | Absolute Ranking Alarm Threshold | ||||||
---|---|---|---|---|---|---|---|---|
≤0.01% | ≤0.1% | ≤0.5% | ≤1% | ≤10% | top-1 | top-2 | top-3 | |
0 - 99 txs (32 attacks, 28% of dataset) | - | - | - | - | 5 (16%) | 7 (22%) | 20 (63%) | 23 (72%) |
Average false positive rate | - | - | - | - | 8.18% | 0% | 14.8% | 28.3% |
Average number of false positives | - | - | - | - | 5.1 | 0 | 1 | 2 |
100 - 999 txs (38 attacks, 33% of dataset) | - | - | 8 (21%) | 12 (32%) | 28 (74%) | 7 (18%) | 12 (32%) | 15 (39%) |
Average false positive rate | - | - | 0.24% | 0.71% | 9.65% | 0% | 0.46% | 0.81% |
Average number of false positives | - | - | 1.5 | 3.5 | 39.4 | 0 | 1 | 2 |
1000 - 9999 txs (17 attacks, 15% of dataset) | - | 6 (35%) | 9 (53%) | 11 (65%) | 13 (76%) | 4 (24%) | 7 (41%) | 7 (41%) |
Average false positive rate | - | 0.054% | 0.45% | 0.95% | 9.96% | 0% | 0.049% | 0.098% |
Average number of false positives | - | 1.4 | 11.5 | 23.7 | 324.5 | 0 | 1 | 2 |
10000 + txs (29 attacks, 25% of dataset) | 2 (7%) | 7 (24%) | 16 (55%) | 18 (62%) | 21 (72%) | 2 (7%) | 3 (10%) | 4 (14%) |
Average false positive rate | 0.007% | 0.097% | 0.50% | 1% | 10% | 0% | 0.004% | 0.008% |
Average number of false positives | 2.5 | 120.1 | 429.9 | 819.6 | 7302.1 | 0 | 1 | 2 |
Overall | 2 (2%) | 13 (11%) | 33 (28%) | 41 (35%) | 67 (58%) | 20 (17%) | 42 (36%) | 49 (42%) |
Average false positive rate | 0.007% | 0.077% | 0.42% | 0.90% | 9.71% | 0% | 7.19% | 13.5% |
Average number of false positives | 2.5 | 65.3 | 211.9 | 367.2 | 2368.5 | 0 | 1 | 2 |
Copyright ©2022 UC Regents | Email us at rdi@berkeley.edu.