Unique Identification of 50,000+ Virtual Reality Users from Head & Hand Motion Data

DATA

3.96 TB

USERS

55,541

COUNTRIES

40+

VR DEVICES

20+

REPLAYS

2,669,886

SESSIONS

713,013

2023 | Vivek Nair · Wenbo Guo · Justus Mattern · Rui Wang · James F. O’Brien · Louis Rosenberg · Dawn Song | https://doi.org/10.48550/arXiv.2302.08927

With the recent explosive growth of interest and investment in virtual reality (VR) and the so-called "metaverse," public attention has rightly shifted toward the unique security and privacy threats that these platforms may pose. While it has long been known that people reveal information about themselves via their motion, the extent to which this makes an individual globally identifiable within virtual reality has not yet been widely understood. In this study, we show that a large number of real VR users (N=55,541) can be uniquely and reliably identified across multiple sessions using just their head and hand motion relative to virtual objects. After training a classification model on 5 minutes of data per person, a user can be uniquely identified amongst the entire pool of 50,000+ with 94.33% accuracy from 100 seconds of motion, and with 73.20% accuracy from just 10 seconds of motion. This work is the first to truly demonstrate the extent to which biomechanics may serve as a unique identifier in VR, on par with widely used biometrics such as facial or fingerprint recognition.

Motion Features

Motion data (telemetry) is the primary source of data for user identification and inference in VR. Each frame of telemetry data encodes 3D position and orientation coordinates across each of the three tracked objects. Replacing the three Euler angles with four quaternion elements and summarizing each of these 21 data streams using five summary statistics, namely the minimum, maximum, mean, median, and standard deviation, results in a 105-dimensional motion feature vector.

Context Features

We found 22 features that most accurately characterize movement relative to a single event. These features include, for example, the position, orientation, type, and color of the object, the angle, speed, location, and accuracy of the motion, and the relative distance in both space and time. These context features and 105 motion features corresponding to the one-second intervals before and after the event, totalling 232 dimensions, can be used to identify users with a high degree of accuracy.

Results

After training a classification model on 5 minutes of data per person, a user can be uniquely identified amongst the entire pool of 50,000+ with 94.33% accuracy from 100 seconds of motion, and with 73.20% accuracy from just 10 seconds of motion. Even with a single sample generated from just 2 seconds of telemetry data, the correct user out of 50,000 is identified about 48.45% of the time. Users with 5 or less total replays submitted were harder to identify, while users with 100 or more replays could be identified with over 99.5% accuracy. While static measurements comprise many of the most important features, they account for only 22.9% of the overall performance of the model. Motion features constitute 73.9% of all entropy gain, while contextual features compose the remaining 3.2%.