-
Evaluating the Generation of Spatial Relations in Text and Image Generative Models
Authors:
Shang Hong Sim,
Clarence Lee,
Alvin Tan,
Cheston Tan
Abstract:
Understanding spatial relations is a crucial cognitive ability for both humans and AI. While current research has predominantly focused on the benchmarking of text-to-image (T2I) models, we propose a more comprehensive evaluation that includes \textit{both} T2I and Large Language Models (LLMs). As spatial relations are naturally understood in a visuo-spatial manner, we develop an approach to conve…
▽ More
Understanding spatial relations is a crucial cognitive ability for both humans and AI. While current research has predominantly focused on the benchmarking of text-to-image (T2I) models, we propose a more comprehensive evaluation that includes \textit{both} T2I and Large Language Models (LLMs). As spatial relations are naturally understood in a visuo-spatial manner, we develop an approach to convert LLM outputs into an image, thereby allowing us to evaluate both T2I models and LLMs \textit{visually}. We examined the spatial relation understanding of 8 prominent generative models (3 T2I models and 5 LLMs) on a set of 10 common prepositions, as well as assess the feasibility of automatic evaluation methods. Surprisingly, we found that T2I models only achieve subpar performance despite their impressive general image-generation abilities. Even more surprisingly, our results show that LLMs are significantly more accurate than T2I models in generating spatial relations, despite being primarily trained on textual data. We examined reasons for model failures and highlight gaps that can be filled to enable more spatially faithful generations.
△ Less
Submitted 12 November, 2024;
originally announced November 2024.
-
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Authors:
Maojia Song,
Shang Hong Sim,
Rishabh Bhardwaj,
Hai Leong Chieu,
Navonil Majumder,
Soujanya Poria
Abstract:
LLMs are an integral component of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the overall quality of end-to-end RAG systems, there is a gap in understanding the appropriateness of LLMs for the RAG task. To address this, we introduce Trust-Score, a holistic metric that evaluates the trustworthiness of LLMs within the RAG framework. Our results show that vari…
▽ More
LLMs are an integral component of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the overall quality of end-to-end RAG systems, there is a gap in understanding the appropriateness of LLMs for the RAG task. To address this, we introduce Trust-Score, a holistic metric that evaluates the trustworthiness of LLMs within the RAG framework. Our results show that various prompting methods, such as in-context learning, fail to effectively adapt LLMs to the RAG task as measured by Trust-Score. Consequently, we propose Trust-Align, a method to align LLMs for improved Trust-Score performance. 26 out of 27 models aligned using Trust-Align substantially outperform competitive baselines on ASQA, QAMPARI, and ELI5. Specifically, in LLaMA-3-8b, Trust-Align outperforms FRONT on ASQA (up 12.56), QAMPARI (up 36.04), and ELI5 (up 17.69). Trust-Align also significantly enhances models' ability to correctly refuse and provide quality citations. We also demonstrate the effectiveness of Trust-Align across different open-weight models, including the LLaMA series (1b to 8b), Qwen-2.5 series (0.5b to 7b), and Phi3.5 (3.8b). We release our code at https://github.com/declare-lab/trust-align.
△ Less
Submitted 24 April, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Wafer-scale synthesis and transfer of graphene films
Authors:
Youngbin Lee,
Sukang Bae,
Houk Jang,
Sukjae Jang,
Shou-En Zhu,
Sung Hyun Sim,
Young Il Song,
Byung Hee Hong,
Jong-Hyun Ahn
Abstract:
We developed means to produce wafer scale, high-quality graphene films as large as 3 inch wafer size on Ni and Cu films under ambient-pressure and transfer them onto arbitrary substrates through instantaneous etching of metal layers. We also demonstrated the applications of the large-area graphene films for the batch fabrication of field-effect transistor (FET) arrays and stretchable strain gaug…
▽ More
We developed means to produce wafer scale, high-quality graphene films as large as 3 inch wafer size on Ni and Cu films under ambient-pressure and transfer them onto arbitrary substrates through instantaneous etching of metal layers. We also demonstrated the applications of the large-area graphene films for the batch fabrication of field-effect transistor (FET) arrays and stretchable strain gauges showing extraordinary performances. Transistors showed the hole and electron mobilities of the device of 1,100 cm2/Vs and 550 cm2/Vs at drain bias of -0.75V, respectively. The piezo-resistance gauge factor of strain sensor was ~6.1. These methods represent a significant step toward the realization of graphene devices in wafer scale as well as application in optoelectronics, flexible and stretchable electronics.
△ Less
Submitted 26 October, 2009;
originally announced October 2009.