Automated Machine Learning (AutoML) For MultiModel Data Fusion
Automated Machine Learning (AutoML) For MultiModel Data Fusion
ISSN NO-2584-2706
Abstract
The development of autonomous vehicle (AV) enhance safety, efficiency, and accessibility of driving.
technology is significantly dependent on sensor One of the integral aspects of AV technologyis
systems to obtain accurate and reliable perception of perceiving the external world through multiple
the environment. The current paper examines the sensors. Historically, AVs have depended
integration of multi-model data fusion, here visual on technologies like cameras, LiDAR, radar, and
and audio data, into AV perception system other types of sensors to capture important
performance improvement. The research examines information for navigation and the detection of
the difficulties of single sensor modelities, especially obstacles [1]. Yet, each of these sensors has a
in edge scenarios like occlusions, bad weather, or different representation of the world, and the
poor visibility, and suggests a multi-model fusion difficulty is how to fuse this information
strategy to overcome such challenges. Utilizing meaningfully to improve the AV’s decision
Automated Machine Learning (AutoML) methods, capabilities [12].
the system tunes the fusion model to enhance Human drivers intuitively use a fusion of senses—
accuracy, eliminate false negatives, and enhance vision, hearing, and tactile sensation—to drive
precision for infrequent events. Experimental through intricate environments. This biological
findings show that the fused model performs better system inspiration causes the necessity for multi-
compared to visiononly and audio-only systems, model data fusion in autonomous vehicles.
showing a strong decrease in false negatives and a Multi-model fusion is the integration of information
12% boost in precision for identifying rare objects, from different sensors, including visual, audio,
including emergency vehicle sirens. The fusion LiDAR, and radar, to produce a richer perception of
system also achieves real-time processing needs with the environment [8]. This fusion ensures the vehicle
a total latency of 32 ms. Robustness testing also can drive in challenging conditions where it might
reveals that the fusion model works consistently even not be enough to depend on one kind of sensor, such
in noisy environments. This research highlights the as vision (e.g., low visibility, occlusions, or glare)
advantages of multi-sensor fusion and AutoML for [4].
autonomous vehicle systems and presents a path One of the new methods for enhancing the
toward more resilient and flexible AV perception efficiency and effectiveness of multi-model fusion
capabilities. is the application of Automated Machine Learning
(AutoML). AutoML allows model development
Keywords: processes such as hyperparameter tuning, feature
Autonomous Vehicles, Multi model Fusion, Audio- selection, and compression of models to be
Visual Perception, AutoML And Real-Time Object automated, which are essential in developing fusion
Detection. systems that function well in real-time [2]. AutoML
algorithms make the intricate process of combining
1. Introduction data from different modelities easier, enabling AVs
The emergence of Autonomous Vehicles (AVs) has to improve decision-making in dynamic
revolutionized the transportation industry, aiming to environments, where speed and precision are crucial
[11].
IJMSRT26JAN032 www.ijmsrt.com 46
DOI: https://doi.org/10.5281/zenodo.18266525
Volume-4-Issue-1-January,2026 International Journal of Modern Science and Research Technology
ISSN NO-2584-2706
2.The Role of AutoML in Multi-model Data 2.3.Model Compression for Real-Time Deployment
Fusion In autonomous cars, it is critical that the fusion
The multi-model data fusion process is accompanied models handle data in real-time.
by a number of challenges, especially in aligning The complexity of multi-model models can result in
and fusing heterogeneous data sources. high computational requirements, which might be
AutoML plays a central role in automating the most challenging to satisfy with the processing power of
important tasks that would otherwise need manual embedded systems in cars [2].
tuning and adjustments, thereby speeding up AutoML addresses this through model compression
development and improving model accuracy [10]. methods:
• Pruning: Eliminating redundant components of the
2.1 Hyperparameter Optimization for Cross- model
model Alignment • Quantization: Reducing precision of model
In multi-model fusion, syncing information from parameters
various types of sensors is vital for sound decision- • Knowledge distillation: Transferring knowledge
making. For instance, visual information from from large to small models
cameras, LiDAR point cloud data, and radar signals Through model compression automation, AutoML
all capture the same environment but in unique ways enables the fusion system to run efficiently on
[12]. In order to combine these sources of data into resource-constrained devices without compromising
one, homogeneous output, the models must be performance [14].
properly tuned.
AutoML assists by making the hyperparameter 3.Challenges in Multi-model Fusion
optimization process automatic. Hyperparameters While multi-model data fusion presents impressive
govern the structure and learning procedure of fusion promise, several challenges must be overcome for
models, e.g., the number of layers in a neural successful integration of different data sources [1]:
network, the choice of suitable feature extraction
techniques, or the learning rate [14]. AutoML 3.1Sensor Heterogeneity
frameworks perform the search for the best Sensors output data in fundamentally different forms:
configuration automatically, thereby minimizing the • Cameras: 2D images
need for human experimentation and enabling more • LiDAR: 3D point clouds
suitable alignment of various sensor modelities [1]. • Radar: Velocity measurements
Integrating these diverse data types requires
2.2 Feature Selection to Reduce Dimensionality advanced techniques like manifold learning, which
Multi-model data can be very complex and maps data from each modelity into a common
dimensional. For example: latent space for fusion [8]. AutoML expedites this
• LiDAR sensors generate 3D point clouds by automatically selecting optimal models for
• Radar sensors provide distance and velocity learning unified representations [10].
measurements
• Visual data consists of high-resolution images 3.2.Temporal Synchronization
Merging all this data creates a vast amount of Sensors operate at different sampling rates:
information that is not only difficult to handle but can • Cameras: Typically 30 FPS
also result in inefficiencies and computational • LiDAR/Radar: Often 10-20 Hz
overhead [11]. This temporal misalignment can cause fusion
AutoML is central to feature selection, which serves errors.AutoML automates time warping
to decrease the dimensionality of the data. Through techniques to align sensor timestamps, ensuring all
automatically selecting the most informative features data corresponds to the same time intervals [12].
from every sensor modelity, AutoML ensures that
only the most significant information is utilized in the 3.3.Confidence Calibration
fusion process [4]. This decreases computational Sensor reliability varies by environmental conditions:
expenses, accelerates processing time, and increases • Cameras: Less reliable in low light
the accuracy of the fusion model by concentrating on • Radar: More robust in adverse weather
the most informative features.
IJMSRT26JAN032 www.ijmsrt.com 47
DOI: https://doi.org/10.5281/zenodo.18266525
Volume-4-Issue-1-January,2026 International Journal of Modern Science and Research Technology
ISSN NO-2584-2706
AutoML handles confidence calibration by For this study, both visual and auditory datasets
dynamically adjusting sensor weights based on real- were utilized. The visual data was synthetically
time performance monitoring [11]. This ensures the generated, while the auditory data was recorded with
fusion system prioritizes the most trustworthy data varying noise levels to simulate real-world
sources at any given moment [8]. environments. This data served as input to the
respective modelity-specific models, which were
4.Bayesian Fusion Framework subsequently fused for enhanced detection
In multi-model fusion, perhaps the best approach for capabilities.
fusing information from disparate sensors is to utilize
a Bayesian framework [4]. This probabilistic method 6.1.1Visual Data Generation
enables the system to compensate for the uncertainty Visual data was synthesized through the Blender
in sensor information and make decisions based on 3D rendering tool. This allowed for the creation of
the probability of different outcomes realistic scenes that represent typical autonomous
[1]. driving environments, such as urban streets,
The Bayesian fusion model is expressed as: highways, and intersections. Objects of interest in
P(x, y) these scenes included vehicles, pedestrians, traffic
P(Decision | x, y) = signals, and emergency vehicles. Each image was
rendered at a resolution of 1920×1080 pixels,
P(x | Decision)P(y | Decision)P(Decision) providing high-quality input for the vision model.
In total, 10,000 images were generated, each
Where: covering a broad spectrum of possible driving
• P(Decision| x, y) is the posterior probability of a scenarios, including varying traffic densities,
decision given sensor evidence weather conditions (rain, fog), and lighting
• P(x | Decision) and P( y | Decision) are sensor variations (daytime, nighttime). These images
likelihood functions [12] were used for training and testing the visual model
• P( x, y) is the joint probability of multi-sensor data designed for object detection and classification.
• P(Decision) is the prior probability [11]
6.1.2Audio Data Generatio
5.AutoML Optimization for Multi-model Fusion The auditory data, specifically siren sounds, was
The optimization objective for real-time fusion can be created using the PyAudio library. The audio
formulated as: samples were generated at a sampling rate of 16
kHz, typical for real-time audio processing. Each
,Labelj) + λ∥θ∥2 audio clip lasted 5 seconds, mimicking emergency
vehicle sirens encountered in an urban setting.
Where: The generated audio samples were subjected to
• w represents modelity weighting factors [8]
various noise levels, with signal-to-noise ratios
• θ denotes fusion model hyperparameters [14]
(SNRs) ranging from 0 dB to 20 dB, simulating
• L is the loss function measuring prediction accuracy
real-world conditions where background noise
• λ controls regularization strength [2]
might interfere with audio signals. This diversity
of noise levels ensured that the system could
6.Methodology handle a variety of auditory inputs under different
This section describes the methodology employed in environmental conditions.
this research for integrating multi-model data into
autonomous vehicle (AV) perception systems. The 6.2.Multi-model Fusion
methodology consists of several key stages, including The goal of this work is to combine visual and
data acquisition, multi-model fusion, and optimization auditory data to improve decision-making
procedures, all of which are crucial for enhancing the accuracy, particularly in challenging scenarios
system’s detection and classification performance in where one modelity may fail. The fusion approach
diverse conditions. used here relies on a weighted
IJMSRT26JAN032 www.ijmsrt.com 48
DOI: https://doi.org/10.5281/zenodo.18266525
Volume-4-Issue-1-January,2026 International Journal of Modern Science and Research Technology
ISSN NO-2584-2706
ensemble model, where the final output is a By transforming both types of data into a common
combination of the individual contributions latent space, it becomes possible to effectively
from the visual and auditory sensors. combine them for more accurate predictions.
Where:
• ϕvision(x) is a transformation that projects visual data x Where:
into a shared latent space. • Precisioni is the precision of modelity i,
• ϕaudio(y) is a transformation that projects auditory data indicating the proportion of true positive
y into the same latent space. predictions made by the modelity.
d
• R denotes the shared latent space where both visual • Recalli is the recall of modelity i, representing
and auditory data are represented. the proportion of actual positives correctly
detected by the modelity.
IJMSRT26JAN032 www.ijmsrt.com 49
DOI: https://doi.org/10.5281/zenodo.18266525
Volume-4-Issue-1-January,2026 International Journal of Modern Science and Research Technology
ISSN NO-2584-2706
The reliability score ri is used to adjust the weight wi challenging conditions for autonomous vehicles
assigned to each modelity during the fusion process. (AVs). The experiments include both synthetic
This ensures that more reliable modelities contribute data generation and evaluations of sensor
more to the final decision. performance, fusion accuracy, real-time
processing, and robustness under noise.
6.3Optimization with AutoML
The optimization of the fusion model is performed 7.1.Synthetic Data Generation
using AutoML techniques. The goal is to automatically We generated synthetic datasets for both visual
find the optimal weights wi and hyper parameters θ for and auditory inputs to test the fusion system under
the fusion model, minimizing the following objective controlled, replicable conditions.
function:
7.1.1Visual Data Generation
The visual data was synthesized using the Blender
,Labelj) + λ∥θ∥2
3D rendering platform. The generated scenes
included a range of objects typically encountered
Where: by AVs, such as cars, pedestrians, and emergency
• L(·,·) is the loss function used to quantify the error
vehicles like ambulances. These objects were
between the predicted output and the true label. embedded in different types of environments, with
• D(xj,yj;w,θ) is the decision function for the j-th
various weather conditions such as rain and fog to
sample, incorporating both visual and auditory data. simulate low visibility scenarios.
• λ is the regularization parameter that controls the
The dataset included 10,000 images, each with a
complexity of the model and prevents overfitting. resolution of 1920x1080 pixels. These images
• θ represents the hyper parameters of the model, such
were annotated to identify the presence and
as learning rates and filter sizes. location of key objects. The diversity of the scenes
Through AutoML, the optimal combination of weights was intentionally varied to include complex
and hyper parameters is determined automatically, backgrounds, occlusions, and changes in lighting
allowing for efficient training of the fusion model. conditions to mirror real-world driving situations.
IJMSRT26JAN032 www.ijmsrt.com 50
DOI: https://doi.org/10.5281/zenodo.18266525
Volume-4-Issue-1-January,2026 International Journal of Modern Science and Research Technology
ISSN NO-2584-2706
Fig. 1: Combined precision and recall metrics by Where: - TPvision and TPaudio are the true positives
object class. Outer ring shows precision values (Car: from each individual model. - TPboth represents the
92%, Pedestrian: 81%, Siren: 78%), inner ring shows true positives detected by both models. - N is the
recall values (88%, 79%, 72% respectively). Color total number of samples in the test set.
coding: blue=Car, green=Pedestrian, red=Siren. The fusion model resulted in the following
improvements: - A 15% reduction in false
7.2.1.Vision Model (YOLOv8) negatives compared to the visiononly model. - A
We applied the YOLOv8 object detection algorithm to 12% increase in precision for rare classes, such as
the synthetic visual dataset. YOLOv8 was chosen due sirens.
to its ability to perform real-time object detection with This highlights how combining complementary
high accuracy. The evaluation metric used was mean sensor data can improve detection accuracy,
Average Precision (mAP) at an Intersection over especially for rare or challenging events.
Union (IoU) threshold of 0.5.
mAP@0.5 = 0.85 7.4.Real-Time Processing
From the confusion matrix, we observed the following Real-time processing is a key requirement for
precision and recall values for various classes: autonomous vehicle systems, where timely
These results highlight the strengths of the model in decision-making is essential for safe navigation.
detecting cars and pedestrians but also suggest a The total processing time of the fusion system was
potential area of improvement for detecting sirens, measured and compared to the real-time
where auditory input could provide a valuable requirements of an AV.
complement. The total latency was computed as the sum of the
individual latencies for the vision model, audio
7.2.2.Audio Model (CNN) model, and the fusion process:
The audio model used was a convolutional neural Total Latency = tvision+taudio+tfusion =
network (CNN) designed to classify siren sounds. The 15ms+10ms+7ms = 32ms
model was trained using spectrograms of the audio The breakdown of latencies is as follows: -
clips. The network consisted of five convolutional YOLOv8
layers. Several performance metrics were calculated (vision model): 15ms (optimized using
for the audio model: TensorRT). - Audio CNN: 10ms (optimized using
Accuracy = 82%(F1-score = 0.80) ONNX runtime). - Fusion: 7 ms (performed using
ROC-AUC = 0.89 matrix operations on GPU).
The CNN performed relatively well at detecting siren With a total processing time of 32ms, the fusion
sounds but faced challenges in distinguishing them system meets the latency requirements for real-
from other types of background noise, especially in time AV systems, which typically need to operate
scenarios with low SNR. under 100 ms.
IJMSRT26JAN032 www.ijmsrt.com 51
DOI: https://doi.org/10.5281/zenodo.18266525
Volume-4-Issue-1-January,2026 International Journal of Modern Science and Research Technology
ISSN NO-2584-2706
7.5.Robustness Analysis
We tested the robustness of the fusion system by 7.7Failure Modes
adding Gaussian noise to the input data, We identified two primary failure modes during
simulating noisy environmental conditions. The noise testing:
was modeled as: • **High Noise Levels**: When noise levels
xnoisy = x + N(0,σ2), σ ∈ [0,20] exceeded 20dB, the fusion system’s performance
Where σ is the standard deviation of the Gaussian deteriorated to that of the vision-only model. This
noise. The fusion system’s performance was evaluated suggests that, under extreme noise conditions, the
at different noise levels, with the following results: audio modelity no longer provided significant
The fusion model demonstrated superior resilience to benefits.
noise compared to the individual modelities. This • **Temporal Misalignment**: Significant delays
indicates that combining the vision and audio data (greater than 50ms) between the visual and audio
helps mitigate the impact of noise and provides a data led to an 8% decrease in accuracy. This
more reliable output. demonstrates the importance of precise temporal
synchronization for optimal fusion performance.
AccuracyComparison
by NoiseLevel 7.8.Computational Cost
10 Finally, we assessed the computational cost of the
fusion system by calculating the number of
floating-point operations (FLOPs) required for
9 each component. The breakdown is as follows:
Vision: 45GFLOPs/frame
Audio: 3GFLOPs/clip
8 Fusion: 0.5GFLOPs
IJMSRT26JAN032 www.ijmsrt.com 52
DOI: https://doi.org/10.5281/zenodo.18266525
Volume-4-Issue-1-January,2026 International Journal of Modern Science and Research Technology
ISSN NO-2584-2706
self-driving cars driven in different weather and
Lidar, for example, provides precise depth data that traffic scenarios. It is also important to improve the
helps robustness of fusion models to surprise changes in
distinguish between objects in poor conditions such as illumination, noise, and object motions since this
fog, rain, or driving at night. Similarly, tactile sensors will be critical to real-world implementation.
might provide feedback for airplanes flying through
constricted areas or responding to shifts in the road 8.4.R eal-time adaptation and learning
surface. Subsequent research can explore the One promising field of research for the future is
integration of these other modelities with vision and the development of real-time adaptive systems
audio sensors by leveraging state-of-the-art machine capable of learning from the environment as the
learning algorithms, e.g., deep reinforcement learning vehicle is driven.
or attention mechanisms, to improve relative sensor With the use of machine learning algorithms,
importance assessment depending on the environment. including online learning and meta-learning, the
fusion system is able to dynamically adapt and
8.2. Improved sensor integration methods improve its performance as it accumulates more
This research used a fusion model that averaged visual data in real-time. For instance, the fusion system
and audio data through a straightforward weighted may adjust the weighting of modelities according
ensemble method. More advanced fusion methods, to sensor reliability, which might vary with road
however, may be able to yield better results, especially conditions or traffic situations [17]. This aspect can
with difficult data sets coming from different sources. also be used for sensor configuration optimization,
Methods such as attentionbased mechanisms, which allowing the vehicle to turn on or off specific
specialize in paying attention to certain sensor inputs sensors (e.g., decrease the use of audio in silent
based on context, and multi-task learning conditions) depending on the situation. This would
methodologies that exchange knowledge across not only improve performance but also conserve
different types of sensors may provide more intelligent computational resources [13]. While there has been
ways of merging the data streams. progress in terms of accuracy and resilience with
In addition, innovative methods for alignment and multi-model fusion, computational efficiency
synchronization of temporal data received from remains an issue, especially in real-time systems.
different sensors are needed in order to support data The current fusion system is computationally
that reaches at different velocities. Investigating more intensive, especially for the vision and audio
sophisticated techniques for timewarping as well as components. The problem of reducing the floating-
eliminating issues associated with sensor drift and point operations (flops) required for computation
delay would prove essential for use in real time within while ensuring performance remains a major
dynamic, complicated environments. challenge [16].
Future work may focus on developing more
8.3 Handling extreme environmental conditions effective fusion algorithms or leveraging
The system that already was in place was tested breakthroughs in hardware, including edge
within a controlled environment, where it was computing, domain-specific artificial intelligence
subjected to noise levels and to considerations such as chips, or low-power sensors. Pruning or
occlusions and glare. However, real-world quantization can also be used to reduce the size
environments present a much broader spectrum of and computational needs of deep learning models,
challenges than the controlled environments of the making them more deployable on embedded av
laboratory. Autonomous vehicles will need to cut systems [15].
through various adverse conditions, such as driving on
rainy or snowy days, sunshine, and densely populated 8.5.Improved certainty estimation
cityscapes with lots of moving objects. Future work The necessity of sustaining trustworthy fusion in
needs to be centered on assessing the performance of ambiguous or uncertain situations is brought to the
multi-model fusion systems under extreme weather forefront by the necessity of dynamic confidence
conditions. This may include the development of calibration among modelities. The experiments
simulation environments that are closer to actual evidently indicate that the performance of the
conditions or obtaining data from system
IJMSRT26JAN032 www.ijmsrt.com 53
DOI: https://doi.org/10.5281/zenodo.18266525
Volume-4-Issue-1-January,2026 International Journal of Modern Science and Research Technology
ISSN NO-2584-2706
is dependent on the stability of the vision and audio vehicles to interact with other vehicles or
sensors, which may be influenced by various infrastructure in real-time
environmental factors. More sophisticated methods of [3].
confidence calibration, including Bayesian inference
and uncertainty modeling, can be explored in future 9.Conclusion
research to dynamically adjust the fusion process [9]. This study focused on the integration of multiple
In addition, confidence scores could be utilized more sensor modelities, particularly visual and auditory
effectively to inform decision-making processes, data, to enhance the perception systems of
allowing the av to make better decisions when autonomous vehicles (AVs). The primary aim was
presented with incongruent sensor information, for to explore how combining different types of
example, where the audio model perceives a siren, but sensory data could improve the vehicle’s ability to
the vision model fails to perceive the source owing to understand its environment, especially in complex
occlusions [6]. scenarios where a single sensor modelity might fall
short. The results indicate that multi-model fusion
8.6. Safety and ethical considerations offers a viable solution to several challenges faced
With autonomous vehicles being implemented by AVs, including situations involving visual
practically, their safety and ethical implications occlusions, glare, or difficult weather conditions.
become increasingly important. Multi-model sensor The experimental results demonstrated significant
fusion can contribute to safety by providing additional performance improvements when combining
levels of information, but it also raises some new vision and audio. The visual model, YOLOv8,
issues with data privacy, transparency of decision- achieved a mean average precision (mAP) of 0.85,
making, and accountability. The incorporation of while the auditory model, a convolutional neural
additional sensors and data sources necessitates the network (CNN), yielded an accuracy of 82%.
development of strong ethical frameworks to guarantee When fused, these models resulted in a 15%
that the systems function fairly, transparently, and in reduction in false negatives compared to the
accordance with legal and regulatory requirements [7]. vision-only model and a 12% increase in precision,
Future studies should focus on addressing these issues particularly for rare events such as emergency
by developing systems that not only excel at fusing sirens. Additionally, the fusion system was able to
various sensory inputs but also have aspects that enable meet the real-time processing requirements with a
users and stakeholders to comprehend the decision- total latency of just 32 ms, showing the system’s
making processes of the system’s actions. This can be practical feasibility for autonomous driving.
done through the development of explainable ai (xai) Despite the introduction of noise, the fusion system
methods tailored for multi-model fusion systems in avs demonstrated robust performance, maintaining its
[5]. accuracy even as the signal-to-noise ratio
decreased.
8.7 Integration with urban mobility systems The findings of this research underline the
The long-term goal of av technology is to create a importance of multi-sensor integration in
smooth transportation system that maximizes autonomous vehicle systems. By combining data
efficiency and safety in cities. This paper focuses from both visual and auditory sources, the system
mainly on the sensory aspect of av systems, but further can gain a richer understanding of the
research might consider how multimodel fusion environment, which improves its decision-making
systems fit into the general idea of smart cities. This capabilities in more challenging conditions.
includes connecting autonomous vehicles with other AutoML techniques were used to optimize the
transportation systems, including public transit and fusion models, which ensures that the system is
traffic management systems, to enable cooperative adaptable to a variety of sensor configurations and
decision-making. AV collaboration may involve dynamic environmental conditions. Audio sensors,
sharing sensory data or coordinating actions in real- being less affected by environmental factors like
time, most notably in complex scenarios such as fog or poor lighting, provide a complementary
intersection control, emergency response, or avoiding strength to the visual sensors, making the fusion
pedestrian collisions. Future studies may explore how system more reliable.
multi-model fusion systems can be extended to allow
IJMSRT26JAN032 www.ijmsrt.com 54
DOI: https://doi.org/10.5281/zenodo.18266525
Volume-4-Issue-1-January,2026 International Journal of Modern Science and Research Technology
ISSN NO-2584-2706
While the results are promising, there are still several [2] J. Gu, A. Lind, T. Chhetri, M. Bellone, and
avenues for future work. For example, expanding the R. Sell. End-to-end multimodel sensor dataset
fusion framework to incorporate other sensor types, collection framework for autonomous vehicles. In
such as radar or thermal imaging, could further 2023 IEEE 26th International Conference on
improve robustness. These additional sensors would Intelligent Transportation
be particularly useful in scenarios where visual and
auditory sensors may not provide sufficient data, such Systems (ITSC), pages 2792–2797, Bilbao, Spain,
as in extreme weather conditions. Additionally, the 2023. IEEE.
experiments conducted in this study relied on [3] Zhiren Huang, Ximan Ling, Pu Wang, Fan
synthetic data, and future research should focus on Zhang, Yingping Mao, Tao Lin, and Fei-Yue
testing the system with real-world sensor data to Wang. Modeling real-time human mobility based
ensure its practical viability in real autonomous on mobile phone and transportation data fusion.
vehicles operating in live traffic. Transportation Research Part C: Emerging
Further advancements in AutoML could also play a Technologies, 96:251–269, 2018.
crucial role in the continuous adaptation of [4] Y. Li, Z. Zhao, Y. Chen, and R. Tian. A
autonomous systems. By integrating mechanisms like practical large-scale roadside multi-view multi-
online learning, the fusion model could adjust sensor spatial synchronization framework for
dynamically as new data is acquired, optimizing the intelligent transportation systems. TechRxiv, 2023.
system in real-time. Lastly, there is room to improve [5] Yanfang Ling, Jiyong Li, Lingbo Li, and
the computational efficiency of the fusion system. Shangsong Liang. Bayesian domain adaptation
While the system demonstrated satisfactory latency with gaussian mixture domain-indexing. Advances
and accuracy, optimizing the model to reduce in Neural Information Processing Systems,
computational overhead will be crucial for 37:87226–87254, 2024.
deployment on embedded platforms with limited [6] Tyron L Louw, Natasha Merat, and
resources. Exploring model compression techniques, Andrew Hamish Jamson. Engaging with highly
such as pruning or knowledge distillation, could help automated driving: To be or not to be in the loop?
address this challenge and make the system more In 8th International Driving Symposium on Human
feasible for real-world applications. Factors in Driver Assessment, Training and
In summary, the combination of multi-model data Vehicle Design. Leeds, 2015.
fusion for autonomous vehicle perception has shown [7] Tauheed Khan Mohd, Nicole Nguyen, and
great potential in enhancing both the accuracy and Ahmad Y Javaid. Multimodel data fusion in
resilience of the system. By integrating vision and enhancing human-machine interaction for robotic
auditory data, the AV system can overcome the applications: a survey. arXiv preprint
limitations of individual sensors and perform better arXiv:2202.07732, 2022.
in challenging environments. The use of AutoML [8] R. Nabati, L. Harris, and H. Qi. Cftrack:
optimization further ensures the system’s ability to Center-based radar and camera fusion for 3d multi-
adapt to varying sensor configurations, making it a object tracking. arXiv preprint arXiv:2107.05150,
promising candidate for real-world autonomous 2021.
vehicles. As research continues, further testing with [9] R. Nabati and H. Qi. Centerfusion: Center-
real-world data, the inclusion of additional sensor based radar and camera fusion for 3d object
modelities, and computational optimizations will be detection. IEEE WACV, pages 1527–1536, 2021.
critical steps in bringing robust, multi-sensor [10] N. Piperigkos, A. Lalos, and K. Berberidis.
autonomous driving systems closer to deployment. Graph laplacian extended kalman filter for
connected and automated vehicles localization. In
References IEEE ICPS, pages 328–333, 2021.
[1] F. Butt, J. Chattha, J. Ahmad, M. Zia, M. Rizwan, [11] D. Qiao and F. Zulkernine. Adaptive
and I. Naqvi. On the integration of enabling wireless feature fusion for cooperative perception using
technologies and sensor fusion for nextgeneration lidar point clouds. In IEEE WACV, 2023.
connected and autonomous vehicles. IEEE Access,
10:14643 – 14668, 2022. [12] C. Wang, S. Liu, X. Wang, and X. Lan. Time
synchronization and space registration of
IJMSRT26JAN032 www.ijmsrt.com 55
DOI: https://doi.org/10.5281/zenodo.18266525
Volume-4-Issue-1-January,2026 International Journal of Modern Science and Research Technology
ISSN NO-2584-2706
roadside lidar and camera. Electronics,
12(3):537, 2023.
[13] Haojie Wang, Jidong Zhai, Mingyu Gao, Feng
Zhang, Tuowei Wang, Zixuan Ma, Shizhi Tang,
Liyan Zheng, Wen Wang, Kaiyuan Rong, et al.
Optimizing dnns with partially equivalent
transformations and automated corrections. IEEE
Transactions on Computers, 72(12):3546–3560,
2023.
[14] A. Yusupov, S. Park, and J. Kim. Synchronized
delay measurement of multi-stream analysis over
data concentrator units. Electronics, 14(1):81 , 2024.
[15] Zhaoyun Zhang and Jingpeng Li. A review of
artificial intelligence in embedded systems.
Micromachines, 14(5):897, 2023.
[16] Fei Zhao, Chengcui Zhang, and Baocheng Geng.
Deep multimodel data fusion. ACM Computing
Surveys, 56(9):1–36, 2024.
[17] Hao Zhao, Yuejiang Liu, Alexandre Alahi, and
Tao Lin. On pitfalls of test-time adaptation. arXiv
preprint arXiv:2306.03536, 2023.
IJMSRT26JAN032 www.ijmsrt.com 56
DOI: https://doi.org/10.5281/zenodo.18266525