-
LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks
Authors:
Malsha Ashani Mahawatta Dona,
Beatriz Cabrero-Daniel,
Yinan Yu,
Christian Berger
Abstract:
Today's Large Language Models (LLMs) have showcased exemplary capabilities, ranging from simple text generation to advanced image processing. Such models are currently being explored for in-vehicle services such as supporting perception tasks in Advanced Driver Assistance Systems (ADAS) or Autonomous Driving (AD) systems, given the LLMs' capabilities to process multi-modal data. However, LLMs ofte…
▽ More
Today's Large Language Models (LLMs) have showcased exemplary capabilities, ranging from simple text generation to advanced image processing. Such models are currently being explored for in-vehicle services such as supporting perception tasks in Advanced Driver Assistance Systems (ADAS) or Autonomous Driving (AD) systems, given the LLMs' capabilities to process multi-modal data. However, LLMs often generate nonsensical or unfaithful information, known as ``hallucinations'': a notable issue that needs to be mitigated. In this paper, we systematically explore the adoption of SelfCheckGPT to spot hallucinations by three state-of-the-art LLMs (GPT-4o, LLaVA, and Llama3) when analysing visual automotive data from two sources: Waymo Open Dataset, from the US, and PREPER CITY dataset, from Sweden. Our results show that GPT-4o is better at generating faithful image captions than LLaVA, whereas the former demonstrated leniency in mislabeling non-hallucinated content as hallucinations compared to the latter. Furthermore, the analysis of the performance metrics revealed that the dataset type (Waymo or PREPER CITY) did not significantly affect the quality of the captions or the effectiveness of hallucination detection. However, the models showed better performance rates over images captured during daytime, compared to during dawn, dusk or night. Overall, the results show that SelfCheckGPT and its adaptation can be used to filter hallucinations in generated traffic-related image captions for state-of-the-art LLMs.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Tapping in a Remote Vehicle's onboard LLM to Complement the Ego Vehicle's Field-of-View
Authors:
Malsha Ashani Mahawatta Dona,
Beatriz Cabrero-Daniel,
Yinan Yu,
Christian Berger
Abstract:
Today's advanced automotive systems are turning into intelligent Cyber-Physical Systems (CPS), bringing computational intelligence to their cyber-physical context. Such systems power advanced driver assistance systems (ADAS) that observe a vehicle's surroundings for their functionality. However, such ADAS have clear limitations in scenarios when the direct line-of-sight to surrounding objects is o…
▽ More
Today's advanced automotive systems are turning into intelligent Cyber-Physical Systems (CPS), bringing computational intelligence to their cyber-physical context. Such systems power advanced driver assistance systems (ADAS) that observe a vehicle's surroundings for their functionality. However, such ADAS have clear limitations in scenarios when the direct line-of-sight to surrounding objects is occluded, like in urban areas. Imagine now automated driving (AD) systems that ideally could benefit from other vehicles' field-of-view in such occluded situations to increase traffic safety if, for example, locations about pedestrians can be shared across vehicles. Current literature suggests vehicle-to-infrastructure (V2I) via roadside units (RSUs) or vehicle-to-vehicle (V2V) communication to address such issues that stream sensor or object data between vehicles. When considering the ongoing revolution in vehicle system architectures towards powerful, centralized processing units with hardware accelerators, foreseeing the onboard presence of large language models (LLMs) to improve the passengers' comfort when using voice assistants becomes a reality. We are suggesting and evaluating a concept to complement the ego vehicle's field-of-view (FOV) with another vehicle's FOV by tapping into their onboard LLM to let the machines have a dialogue about what the other vehicle ``sees''. Our results show that very recent versions of LLMs, such as GPT-4V and GPT-4o, understand a traffic situation to an impressive level of detail, and hence, they can be used even to spot traffic participants. However, better prompts are needed to improve the detection quality and future work is needed towards a standardised message interchange format between vehicles.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Evaluating and Enhancing Trustworthiness of LLMs in Perception Tasks
Authors:
Malsha Ashani Mahawatta Dona,
Beatriz Cabrero-Daniel,
Yinan Yu,
Christian Berger
Abstract:
Today's advanced driver assistance systems (ADAS), like adaptive cruise control or rear collision warning, are finding broader adoption across vehicle classes. Integrating such advanced, multimodal Large Language Models (LLMs) on board a vehicle, which are capable of processing text, images, audio, and other data types, may have the potential to greatly enhance passenger comfort. Yet, an LLM's hal…
▽ More
Today's advanced driver assistance systems (ADAS), like adaptive cruise control or rear collision warning, are finding broader adoption across vehicle classes. Integrating such advanced, multimodal Large Language Models (LLMs) on board a vehicle, which are capable of processing text, images, audio, and other data types, may have the potential to greatly enhance passenger comfort. Yet, an LLM's hallucinations are still a major challenge to be addressed. In this paper, we systematically assessed potential hallucination detection strategies for such LLMs in the context of object detection in vision-based data on the example of pedestrian detection and localization. We evaluate three hallucination detection strategies applied to two state-of-the-art LLMs, the proprietary GPT-4V and the open LLaVA, on two datasets (Waymo/US and PREPER CITY/Sweden). Our results show that these LLMs can describe a traffic situation to an impressive level of detail but are still challenged for further analysis activities such as object localization. We evaluate and extend hallucination detection approaches when applying these LLMs to video sequences in the example of pedestrian detection. Our experiments show that, at the moment, the state-of-the-art proprietary LLM performs much better than the open LLM. Furthermore, consistency enhancement techniques based on voting, such as the Best-of-Three (BO3) method, do not effectively reduce hallucinations in LLMs that tend to exhibit high false negatives in detecting pedestrians. However, extending the hallucination detection by including information from the past helps to improve results.
△ Less
Submitted 18 July, 2024;
originally announced August 2024.
-
AirDnD -- Asynchronous In-Range Dynamic and Distributed Network Orchestration Framework
Authors:
Malsha Ashani Mahawatta Dona,
Christian Berger,
Yinan Yu
Abstract:
The increasing usage of IoT devices has generated an extensive volume of data which resulted in the establishment of data centers with well-structured computing infrastructure. Reducing underutilized resources of such data centers can be achieved by monitoring the tasks and offloading them across various compute units. This approach can also be used in mini mobile data ponds generated by edge devi…
▽ More
The increasing usage of IoT devices has generated an extensive volume of data which resulted in the establishment of data centers with well-structured computing infrastructure. Reducing underutilized resources of such data centers can be achieved by monitoring the tasks and offloading them across various compute units. This approach can also be used in mini mobile data ponds generated by edge devices and smart vehicles. This research aims to improve and utilize the usage of computing resources in distributed edge devices by forming a dynamic mesh network. The nodes in the mesh network shall share their computing tasks with another node that possesses unused computing resources. This proposed method ensures the minimization of data transfer between entities. The proposed AirDnD vision will be applied to a practical scenario relevant to an autonomous vehicle that approaches an intersection commonly known as ``looking around the corner'' in related literature, collecting essential computational results from nearby vehicles to enhance its perception. The proposed solution consists of three models that transform growing amounts of geographically distributed edge devices into a living organism.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.