Driver identification in autonomous driving environments is a core technology for blocking malicious control takeovers, proactively detecting risk factors in both AI systems and human drivers, and providing customized driving assistance tailored to the situation. Conventional research for driver identification has primarily utilized vision-based approaches that leverage easily obtainable front-camera images. However, the reliability of this information can be compromised in specific driving conditions, such as hostile environments like inclement weather, low-light situations, and image sensor noise. To enhance system robustness against these scenarios, this paper proposes a multimodal time-series classification technique that fuses front-camera images with vehicle trajectory data. The proposed method is designed so that trajectory data can play a complementary role when visual information is compromised, aiming to maintain stable classification performance even in situations of information loss by effectively fusing the two data streams.
To validate the effectiveness of our method, we designed a two-stage experiment. In the first stage, to confirm the feasibility of identifying drivers using only trajectory data, we conducted a basic experiment in the simplified CarRacing-v2 environment. The classification targets consisted of data (position, velocity, steering angle, acceleration) from reinforcement learning agents (PPO, DQN) and a human driver. In the second stage, to assess the scalability to real-world road conditions, we utilized the CARLA simulator to fully evaluate the performance of the multimodal approach combining front-camera images and driving records. To handle CARLA's high-dimensional image inputs, we collected driving data from AI agents using VAE-PPO and VAE-DQN models, which incorporate a Variational Autoencoder (VAE). We then analyzed the performance of identifying AI and human drivers by comparing this data with human driving records. In both experimental stages, we applied and compared the performance of RNN-based and Transformer models.
The initial results in the CarRacing-v2 environment showed that PPO, DQN, and human drivers could be distinguished with 94% accuracy using only trajectory information. This demonstrates that driving records can serve as a reliable alternative when visual information is limited. In the CARLA environment, by systematically introducing hostile environments such as adverse weather, low light, and uniform noise, we confirmed that the proposed technique operates stably even when visual information is severely degraded. This study is expected to contribute to the advancement of safety and intelligence in future autonomous driving systems by overcoming the uncertainty of information that occurs in real driving scenarios and enabling stable driver identification.