Blockchain Large Language Models - BlockGPT

2023 | Yu Gai* · Liyi Zhou* · Kaihua Qin · Dawn Song · Arthur Gervais | https://arxiv.org/pdf/2304.12749.pdf

This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, BlockGPT is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of BlockGPT through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 transactions per second on average. Our results show that, BlockGPT identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation.

Read Paper

Comparison with Other Intrusion Detection Techniques

This table provides a comparison of intrusion detection and prevention techniques, highlighting the unique aspects of each method. Unlike reward-based approaches, our technique employs an unrestricted search space, enabling it to identify unexpected execution patterns instead of focusing solely on profitable vulnerabilities. In contrast to pattern-based techniques (dynamic analysis, fuzzing, symbolic execution, and static analysis), our method does not rely on predefined rules or patterns, which allows it to detect a broader range of anomalies. Furthermore, our technique is capable of real-time analysis, a feature not present in pattern-based symbolic execution or static analysis methods.

Technique	Assumed Prior Knowledge	Search space Unrestricted From Vulnerability Patterns	Real-Time Capable	Application Agnostic
Rank based -- the goal is to find all unexpected execution patterns, implicitly capturing vulnerabilities
BlockGPT (this paper)	All historical transactions	Unrestricted	Yes (0.16s)	Yes
Reward based -- the goal is to extract financial revenue, implicitly capturing vulnerabilities
APE	N/A	Only profitable patterns	Yes (0.07s)	Yes
Naive Imitation	N/A	Only profitable patterns	Yes (0.01s)	Yes
DeFiPoser	DApp models	Only profitable patterns + Limited by the DApp models	Yes (5.93s)	No
Pattern based -- the goal is to match / classify predefined known vulnerability patterns with rules (including machine learning methods)
Pattern based dynamic analysis	Rule	Limited by the rule	Yes	Partially
Pattern based fuzzing	Rule + ABI / DApp models	Limited by the rule	Partially	Partially
Pattern based symbolic execution	Rule + Source code / Bytecode	Limited by the rule	N/A	Partially
Pattern based static analysis	Rule + Source code / Bytecode	Limited by the rule	N/A	Partially
Proof based -- the goal is to prove that a set of smart contracts meet specific security properties
Formal verification	Formal security properties + Source code / DApp models	Limited by the security properties	N/A	Partially

Performance Under Various Alarm Threshold Configurations

The table below presents the performance of the intrusion detection method under various alarm threshold configurations, organized by the number of transactions interacting with the vulnerable smart contracts. The results indicate that using a lower alarm threshold enables the detection of a higher percentage of attacks, albeit at the cost of an increased false positive rate. Notably, the efficacy of the alarm threshold varies across different dataset sizes, emphasizing the need to select a suitable threshold based on the specific attributes of the smart contract under investigation.

Dataset Size	Percentage Ranking Alarm Threshold (%)					Absolute Ranking Alarm Threshold
Dataset Size	≤0.01%	≤0.1%	≤0.5%	≤1%	≤10%	top-1	top-2	top-3
0 - 99 txs (32 attacks, 28% of dataset)	-	-	-	-	5 (16%)	7 (22%)	20 (63%)	23 (72%)
Average false positive rate	-	-	-	-	8.18%	0%	14.8%	28.3%
Average number of false positives	-	-	-	-	5.1	0	1	2
100 - 999 txs (38 attacks, 33% of dataset)	-	-	8 (21%)	12 (32%)	28 (74%)	7 (18%)	12 (32%)	15 (39%)
Average false positive rate	-	-	0.24%	0.71%	9.65%	0%	0.46%	0.81%
Average number of false positives	-	-	1.5	3.5	39.4	0	1	2
1000 - 9999 txs (17 attacks, 15% of dataset)	-	6 (35%)	9 (53%)	11 (65%)	13 (76%)	4 (24%)	7 (41%)	7 (41%)
Average false positive rate	-	0.054%	0.45%	0.95%	9.96%	0%	0.049%	0.098%
Average number of false positives	-	1.4	11.5	23.7	324.5	0	1	2
10000 + txs (29 attacks, 25% of dataset)	2 (7%)	7 (24%)	16 (55%)	18 (62%)	21 (72%)	2 (7%)	3 (10%)	4 (14%)
Average false positive rate	0.007%	0.097%	0.50%	1%	10%	0%	0.004%	0.008%
Average number of false positives	2.5	120.1	429.9	819.6	7302.1	0	1	2
Overall	2 (2%)	13 (11%)	33 (28%)	41 (35%)	67 (58%)	20 (17%)	42 (36%)	49 (42%)
Average false positive rate	0.007%	0.077%	0.42%	0.90%	9.71%	0%	7.19%	13.5%
Average number of false positives	2.5	65.3	211.9	367.2	2368.5	0	1	2

Blockchain Large Language Models - BlockGPT

How it works:

Comparison with Other Intrusion Detection Techniques

Performance Under Various Alarm Threshold Configurations