Blockchain Large Language Models - BlockGPT

2023  |  Yu Gai* · Liyi Zhou* · Kaihua Qin · Dawn Song · Arthur Gervais  |  https://arxiv.org/pdf/2304.12749.pdf

This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, BlockGPT is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of BlockGPT through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 transactions per second on average. Our results show that, BlockGPT identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation.

  Read Paper

How it works:

High Level Overview Diagram

High-level overview of the BlockGPT defense mechanism, which consists of the following four major steps. BlockGPT is bootstrapped by feeding in a dataset of historical transactions to train the model using unsupervised learning. Depending on the system and threat model, BlockGPT detects new block states, including already confirmed transactions, and pending transactions. BlockGPT ranks transactions based on how abnormal their execution traces are. If an abnormal transaction is detected , BlockGPT triggers a defense mechanism such as a front-running emergency pause.

Comparison with Other Intrusion Detection Techniques

This table provides a comparison of intrusion detection and prevention techniques, highlighting the unique aspects of each method. Unlike reward-based approaches, our technique employs an unrestricted search space, enabling it to identify unexpected execution patterns instead of focusing solely on profitable vulnerabilities. In contrast to pattern-based techniques (dynamic analysis, fuzzing, symbolic execution, and static analysis), our method does not rely on predefined rules or patterns, which allows it to detect a broader range of anomalies. Furthermore, our technique is capable of real-time analysis, a feature not present in pattern-based symbolic execution or static analysis methods.

Technique Assumed Prior Knowledge Search space Unrestricted From Vulnerability Patterns Real-Time Capable Application Agnostic
Rank based -- the goal is to find all unexpected execution patterns, implicitly capturing vulnerabilities
BlockGPT (this paper) All historical transactions Unrestricted Yes (0.16s) Yes
Reward based -- the goal is to extract financial revenue, implicitly capturing vulnerabilities
APE N/A Only profitable patterns Yes (0.07s) Yes
Naive Imitation N/A Only profitable patterns Yes (0.01s) Yes
DeFiPoser DApp models Only profitable patterns + Limited by the DApp models Yes (5.93s) No
Pattern based -- the goal is to match / classify predefined known vulnerability patterns with rules (including machine learning methods)
Pattern based dynamic analysis Rule Limited by the rule Yes Partially
Pattern based fuzzing Rule + ABI / DApp models Limited by the rule Partially Partially
Pattern based symbolic execution Rule + Source code / Bytecode Limited by the rule N/A Partially
Pattern based static analysis Rule + Source code / Bytecode Limited by the rule N/A Partially
Proof based -- the goal is to prove that a set of smart contracts meet specific security properties
Formal verification Formal security properties + Source code / DApp models Limited by the security properties N/A Partially

Performance Under Various Alarm Threshold Configurations

The table below presents the performance of the intrusion detection method under various alarm threshold configurations, organized by the number of transactions interacting with the vulnerable smart contracts. The results indicate that using a lower alarm threshold enables the detection of a higher percentage of attacks, albeit at the cost of an increased false positive rate. Notably, the efficacy of the alarm threshold varies across different dataset sizes, emphasizing the need to select a suitable threshold based on the specific attributes of the smart contract under investigation.

Dataset Size Percentage Ranking Alarm Threshold (%) Absolute Ranking Alarm Threshold
≤0.01% ≤0.1% ≤0.5% ≤1% ≤10% top-1 top-2 top-3
0 - 99 txs (32 attacks, 28% of dataset) - - - - 5 (16%) 7 (22%) 20 (63%) 23 (72%)
Average false positive rate - - - - 8.18% 0% 14.8% 28.3%
Average number of false positives - - - - 5.1 0 1 2
100 - 999 txs (38 attacks, 33% of dataset) - - 8 (21%) 12 (32%) 28 (74%) 7 (18%) 12 (32%) 15 (39%)
Average false positive rate - - 0.24% 0.71% 9.65% 0% 0.46% 0.81%
Average number of false positives - - 1.5 3.5 39.4 0 1 2
1000 - 9999 txs (17 attacks, 15% of dataset) - 6 (35%) 9 (53%) 11 (65%) 13 (76%) 4 (24%) 7 (41%) 7 (41%)
Average false positive rate - 0.054% 0.45% 0.95% 9.96% 0% 0.049% 0.098%
Average number of false positives - 1.4 11.5 23.7 324.5 0 1 2
10000 + txs (29 attacks, 25% of dataset) 2 (7%) 7 (24%) 16 (55%) 18 (62%) 21 (72%) 2 (7%) 3 (10%) 4 (14%)
Average false positive rate 0.007% 0.097% 0.50% 1% 10% 0% 0.004% 0.008%
Average number of false positives 2.5 120.1 429.9 819.6 7302.1 0 1 2
Overall 2 (2%) 13 (11%) 33 (28%) 41 (35%) 67 (58%) 20 (17%) 42 (36%) 49 (42%)
Average false positive rate 0.007% 0.077% 0.42% 0.90% 9.71% 0% 7.19% 13.5%
Average number of false positives 2.5 65.3 211.9 367.2 2368.5 0 1 2

Copyright ©2022 UC Regents  |  Email us at rdi@berkeley.edu.