Journal of Scientific Innovation and Advanced Research (JSIAR)

Peer-reviewed | Open Access | Multidisciplinary

Journal of Scientific Innovation and Advanced Research (JSIAR) Published: November 2025 Volume: 1, Issue: 8 Pages: 456-466

SecureVision: A Multimodal Deepfake and Spoofing Detection Framework Integrating MobileNet and ResNeXt for Intelligent Intersection Surveillance

Review Article
Prem1
1Department of Computer Science and Engineering, Noida International University, Greater Noida, India
Angad Kumar2
2Department of Computer Science and Engineering, Noida International University, Greater Noida, India
Sahil Kumar3
3Department of Computer Science and Engineering, Noida International University, Greater Noida, India
Nishant Gaur4
4Department of Computer Science and Engineering, Noida International University, Greater Noida, India
*Author for correspondence: Prem
Department of Computer Science and Engineering, Noida International University, Greater Noida, India
E-mail ID: prem042004@gmail.com

ABSTRACT

The rapid deployment of smart intersections and Vehicle-to-Everything (V2X) communication has significantly enhanced traffic safety and situational awareness; however, these systems remain highly vulnerable to visual and sensor-based spoofing attacks. Malicious entities can exploit deepfake technologies or inject falsified sensor data to mislead intelligent surveillance networks, resulting in compromised decision-making and potential road safety hazards. To address this emerging challenge, this paper introduces SecureVision, a multimodal anti-spoofing and deepfake detection framework that integrates the strengths of MobileNet and ResNeXt architectures. The proposed system fuses spatial, temporal, and contextual features from camera feeds and V2X signals to authenticate real-world inputs in real time. By combining MobileNet’s efficiency in lightweight visual processing with ResNeXt’s capability for rich feature aggregation, SecureVision achieves both computational scalability and high detection precision. Extensive experiments conducted on benchmark deepfake and simulated V2X spoofing datasets demonstrate that SecureVision attains an overall detection accuracy of 98.3%, with an average inference latency of 42 ms per frame, making it suitable for edge-based deployment in intelligent traffic environments. The results confirm that multimodal fusion substantially enhances robustness against adversarial manipulations compared to unimodal systems. Overall, this research establishes a secure, adaptive, and real-time framework for safeguarding smart intersection infrastructure against deepfake and sensor spoofing threats, paving the way for trustworthy AI-driven surveillance in next-generation urban mobility ecosystems.

Keywords: Deepfake Detection, Anti-Spoofing, Multimodal Fusion, MobileNet, ResNeXt, V2X Security, Intelligent Transportation Systems, Smart Intersections