A Unified Dual-Target Learning Paradigm for Vehicle Acoustic Forensics Using Self-Supervised Transformer Representations

D. Ramesh; B. Poojitha; K. Chiranjeevi; N. Siva Nagamani

doi:10.64751/ijaene.2026.v2.n2(1).417

Authors

D. Ramesh Author
B. Poojitha Author
K. Chiranjeevi Author
N. Siva Nagamani Author

DOI:

https://doi.org/10.64751/ijaene.2026.v2.n2(1).417

Keywords:

Agentic AI, Electric Motor Monitoring, Predictive Maintenance, Anomaly Detection, Time-Series Analysis

Abstract

The rapid growth of intelligent transportation systems and the rising complexity of modern vehicles have intensified the need for precise acoustic-based vehicle state identification, as global road traffic is expected to triple acoustic data generation by 2030 and nearly 85% of vehicle diagnostics still rely on subjective auditory assessments. Acoustic signatures captured from vehicles provide valuable, nonintrusive insights that support accident reconstruction, predictive maintenance, workshop diagnostics, smart city monitoring, and autonomous navigation, yet traditional manual sound inspections suffer from human bias, limited hearing range, environmental interference, and inconsistent expertise. To address these limitations, this work introduces a comprehensive vehicle-sound analysis pipeline beginning with a labeled dataset covering braking state, combined state, idle state, and startup state, followed by preprocessing steps involving segmentation, denoising, normalization, and embedding generation. The system leverages Waveform Language Model (WavLM) to extract robust, noise-resistant acoustic representations, which are then evaluated using multiple existing classifiers including logistic regression, Categorical Boosting (CatBoost), Support Vector Classifier (SVC), K-Nearest Neighbors (KNN), Decision Tree, Linear and Quadratic Discriminant Analysis, Histogram-based Gradient Boosting (HGB), and Extra Trees Classifier (ETC). To enhance nonlinear learning capability and improve decision boundaries, a proposed Tree-based Generation Additive Model (TGAM) is integrated into the classification pipeline. The entire framework produces reliable outputs corresponding to the four target vehicle classes such as braking state, combined state, idle state, and startup state—providing a scalable, accurate, and data-driven solution for real-world vehicle acoustic forensics and operational state monitoring.