-
TranslationCorrect: A Unified Framework for Machine Translation Post-Editing with Predictive Error Assistance
Authors:
Syed Mekael Wasti,
Shou-Yi Hung,
Christopher Collins,
En-Shiun Annie Lee
Abstract:
Machine translation (MT) post-editing and research data collection often rely on inefficient, disconnected workflows. We introduce TranslationCorrect, an integrated framework designed to streamline these tasks. TranslationCorrect combines MT generation using models like NLLB, automated error prediction using models like XCOMET or LLM APIs (providing detailed reasoning), and an intuitive post-editi…
▽ More
Machine translation (MT) post-editing and research data collection often rely on inefficient, disconnected workflows. We introduce TranslationCorrect, an integrated framework designed to streamline these tasks. TranslationCorrect combines MT generation using models like NLLB, automated error prediction using models like XCOMET or LLM APIs (providing detailed reasoning), and an intuitive post-editing interface within a single environment. Built with human-computer interaction (HCI) principles in mind to minimize cognitive load, as confirmed by a user study. For translators, it enables them to correct errors and batch translate efficiently. For researchers, TranslationCorrect exports high-quality span-based annotations in the Error Span Annotation (ESA) format, using an error taxonomy inspired by Multidimensional Quality Metrics (MQM). These outputs are compatible with state-of-the-art error detection models and suitable for training MT or post-editing systems. Our user study confirms that TranslationCorrect significantly improves translation efficiency and user satisfaction over traditional annotation methods.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
LegiGPT: Party Politics and Transport Policy with Large Language Model
Authors:
Hyunsoo Yun,
Eun Hak Lee
Abstract:
Given the significant influence of lawmakers' political ideologies on legislative decision-making, analyzing their impact on transportation-related policymaking is of critical importance. This study introduces a novel framework that integrates a large language model (LLM) with explainable artificial intelligence (XAI) to analyze transportation-related legislative proposals. Legislative bill data f…
▽ More
Given the significant influence of lawmakers' political ideologies on legislative decision-making, analyzing their impact on transportation-related policymaking is of critical importance. This study introduces a novel framework that integrates a large language model (LLM) with explainable artificial intelligence (XAI) to analyze transportation-related legislative proposals. Legislative bill data from South Korea's 21st National Assembly were used to identify key factors shaping transportation policymaking. These include political affiliations and sponsor characteristics. The LLM was employed to classify transportation-related bill proposals through a stepwise filtering process based on keywords, sentences, and contextual relevance. XAI techniques were then applied to examine the relationships between political party affiliation and associated attributes. The results revealed that the number and proportion of conservative and progressive sponsors, along with district size and electoral population, were critical determinants shaping legislative outcomes. These findings suggest that both parties contributed to bipartisan legislation through different forms of engagement, such as initiating or supporting proposals. This integrated approach offers a valuable tool for understanding legislative dynamics and guiding future policy development, with broader implications for infrastructure planning and governance.
△ Less
Submitted 27 June, 2025; v1 submitted 19 June, 2025;
originally announced June 2025.
-
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability
Authors:
Genta Indra Winata,
David Anugraha,
Emmy Liu,
Alham Fikri Aji,
Shou-Yi Hung,
Aditya Parashar,
Patrick Amadeus Irawan,
Ruochen Zhang,
Zheng-Xin Yong,
Jan Christian Blaise Cruz,
Niklas Muennighoff,
Seungone Kim,
Hanyang Zhao,
Sudipta Kar,
Kezia Erina Suryoraharjo,
M. Farid Adilazuarda,
En-Shiun Annie Lee,
Ayu Purwarianti,
Derry Tanti Wijaya,
Monojit Choudhury
Abstract:
High-quality datasets are fundamental to training and evaluating machine learning models, yet their creation-especially with accurate human annotations-remains a significant challenge. Many dataset paper submissions lack originality, diversity, or rigorous quality control, and these shortcomings are often overlooked during peer review. Submissions also frequently omit essential details about datas…
▽ More
High-quality datasets are fundamental to training and evaluating machine learning models, yet their creation-especially with accurate human annotations-remains a significant challenge. Many dataset paper submissions lack originality, diversity, or rigorous quality control, and these shortcomings are often overlooked during peer review. Submissions also frequently omit essential details about dataset construction and properties. While existing tools such as datasheets aim to promote transparency, they are largely descriptive and do not provide standardized, measurable methods for evaluating data quality. Similarly, metadata requirements at conferences promote accountability but are inconsistently enforced. To address these limitations, this position paper advocates for the integration of systematic, rubric-based evaluation metrics into the dataset review process-particularly as submission volumes continue to grow. We also explore scalable, cost-effective methods for synthetic data generation, including dedicated tools and LLM-as-a-judge approaches, to support more efficient evaluation. As a call to action, we introduce DataRubrics, a structured framework for assessing the quality of both human- and model-generated datasets. Leveraging recent advances in LLM-based evaluation, DataRubrics offers a reproducible, scalable, and actionable solution for dataset quality assessment, enabling both authors and reviewers to uphold higher standards in data-centric research. We also release code to support reproducibility of LLM-based evaluations at https://github.com/datarubrics/datarubrics.
△ Less
Submitted 3 June, 2025; v1 submitted 2 June, 2025;
originally announced June 2025.
-
Uncertainty-Aware Genomic Classification of Alzheimer's Disease: A Transformer-Based Ensemble Approach with Monte Carlo Dropout
Authors:
Taeho Jo,
Eun Hye Lee,
Alzheimer's Disease Sequencing Project
Abstract:
INTRODUCTION: Alzheimer's disease (AD) is genetically complex, complicating robust classification from genomic data. METHODS: We developed a transformer-based ensemble model (TrUE-Net) using Monte Carlo Dropout for uncertainty estimation in AD classification from whole-genome sequencing (WGS). We combined a transformer that preserves single-nucleotide polymorphism (SNP) sequence structure with a c…
▽ More
INTRODUCTION: Alzheimer's disease (AD) is genetically complex, complicating robust classification from genomic data. METHODS: We developed a transformer-based ensemble model (TrUE-Net) using Monte Carlo Dropout for uncertainty estimation in AD classification from whole-genome sequencing (WGS). We combined a transformer that preserves single-nucleotide polymorphism (SNP) sequence structure with a concurrent random forest using flattened genotypes. An uncertainty threshold separated samples into an uncertain (high-variance) group and a more certain (low-variance) group. RESULTS: We analyzed 1050 individuals, holding out half for testing. Overall accuracy and area under the receiver operating characteristic (ROC) curve (AUC) were 0.6514 and 0.6636, respectively. Excluding the uncertain group improved accuracy from 0.6263 to 0.7287 (10.24% increase) and F1 from 0.5843 to 0.8205 (23.62% increase). DISCUSSION: Monte Carlo Dropout-driven uncertainty helps identify ambiguous cases that may require further clinical evaluation, thus improving reliability in AD genomic classification.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings
Authors:
Junseo Kim,
Jongwook Han,
Dongmin Choi,
Jongwook Yoon,
Eun-Ju Lee,
Yohan Jo
Abstract:
Visual persuasion, which uses visual elements to influence cognition and behaviors, is crucial in fields such as advertising and political communication. With recent advancements in artificial intelligence, there is growing potential to develop persuasive systems that automatically generate persuasive images tailored to individuals. However, a significant bottleneck in this area is the lack of com…
▽ More
Visual persuasion, which uses visual elements to influence cognition and behaviors, is crucial in fields such as advertising and political communication. With recent advancements in artificial intelligence, there is growing potential to develop persuasive systems that automatically generate persuasive images tailored to individuals. However, a significant bottleneck in this area is the lack of comprehensive datasets that connect the persuasiveness of images with the personal information about those who evaluated the images. To address this gap and facilitate technological advancements in personalized visual persuasion, we release the Personalized Visual Persuasion (PVP) dataset, comprising 28,454 persuasive images across 596 messages and 9 persuasion strategies. Importantly, the PVP dataset provides persuasiveness scores of images evaluated by 2,521 human annotators, along with their demographic and psychological characteristics (personality traits and values). We demonstrate the utility of our dataset by developing a persuasive image generator and an automated evaluator, and establish benchmark baselines. Our experiments reveal that incorporating psychological characteristics enhances the generation and evaluation of persuasive images, providing valuable insights for personalized visual persuasion.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
Using Cross-Domain Detection Loss to Infer Multi-Scale Information for Improved Tiny Head Tracking
Authors:
Jisu Kim,
Alex Mattingly,
Eung-Joo Lee,
Benjamin S. Riggan
Abstract:
Head detection and tracking are essential for downstream tasks, but current methods often require large computational budgets, which increase latencies and ties up resources (e.g., processors, memory, and bandwidth). To address this, we propose a framework to enhance tiny head detection and tracking by optimizing the balance between performance and efficiency. Our framework integrates (1) a cross-…
▽ More
Head detection and tracking are essential for downstream tasks, but current methods often require large computational budgets, which increase latencies and ties up resources (e.g., processors, memory, and bandwidth). To address this, we propose a framework to enhance tiny head detection and tracking by optimizing the balance between performance and efficiency. Our framework integrates (1) a cross-domain detection loss, (2) a multi-scale module, and (3) a small receptive field detection mechanism. These innovations enhance detection by bridging the gap between large and small detectors, capturing high-frequency details at multiple scales during training, and using filters with small receptive fields to detect tiny heads. Evaluations on the CroHD and CrowdHuman datasets show improved Multiple Object Tracking Accuracy (MOTA) and mean Average Precision (mAP), demonstrating the effectiveness of our approach in crowded scenes.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Improved Approximation Algorithms for Chromatic and Pseudometric-Weighted Correlation Clustering
Authors:
Dahoon Lee,
Chenglin Fan,
Euiwoong Lee
Abstract:
Correlation Clustering (CC) is a foundational problem in unsupervised learning that models binary similarity relations using labeled graphs. While classical CC has been widely studied, many real-world applications involve more nuanced relationships, either multi-class categorical interactions or varying confidence levels in edge labels. To address these, two natural generalizations have been propo…
▽ More
Correlation Clustering (CC) is a foundational problem in unsupervised learning that models binary similarity relations using labeled graphs. While classical CC has been widely studied, many real-world applications involve more nuanced relationships, either multi-class categorical interactions or varying confidence levels in edge labels. To address these, two natural generalizations have been proposed: Chromatic Correlation Clustering (CCC), which assigns semantic colors to edge labels, and pseudometric-weighted CC, which allows edge weights satisfying the triangle inequality. In this paper, we develop improved approximation algorithms for both settings. Our approach leverages LP-based pivoting techniques combined with problem-specific rounding functions. For the pseudometric-weighted correlation clustering problem, we present a tight $10/3$-approximation algorithm, matching the best possible bound achievable within the framework of standard LP relaxation combined with specialized rounding. For the Chromatic Correlation Clustering (CCC) problem, we improve the approximation ratio from the previous best of $2.5$ to $2.15$, and we establish a lower bound of $2.11$ within the same analytical framework, highlighting the near-optimality of our result.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference
Authors:
Yue Zhu,
Hao Yu,
Chen Wang,
Zhuoran Liu,
Eun Kyung Lee
Abstract:
The increasing adoption of large language models (LLMs) with extended context windows necessitates efficient Key-Value Cache (KVC) management to optimize inference performance. Inference workloads like Retrieval-Augmented Generation (RAG) and agents exhibit high cache reusability, making efficient caching critical to reducing redundancy and improving speed. We analyze real-world KVC access pattern…
▽ More
The increasing adoption of large language models (LLMs) with extended context windows necessitates efficient Key-Value Cache (KVC) management to optimize inference performance. Inference workloads like Retrieval-Augmented Generation (RAG) and agents exhibit high cache reusability, making efficient caching critical to reducing redundancy and improving speed. We analyze real-world KVC access patterns using publicly available traces and evaluate commercial key-value stores like Redis and state-of-the-art RDMA-based systems (CHIME [1] and Sherman [2]) for KVC metadata management. Our work demonstrates the lack of tailored storage solution for KVC prefilling, underscores the need for an efficient distributed caching system with optimized metadata management for LLM workloads, and provides insights into designing improved KVC management systems for scalable, low-latency inference.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Generative AI for Autonomous Driving: Frontiers and Opportunities
Authors:
Yuping Wang,
Shuo Xing,
Cui Can,
Renjie Li,
Hongyuan Hua,
Kexin Tian,
Zhaobin Mo,
Xiangbo Gao,
Keshu Wu,
Sulong Zhou,
Hengxu You,
Juntong Peng,
Junge Zhang,
Zehao Wang,
Rui Song,
Mingxuan Yan,
Walter Zimmer,
Xingcheng Zhou,
Peiran Li,
Zhaohan Lu,
Chia-Ju Chen,
Yue Huang,
Ryan A. Rossi,
Lichao Sun,
Hongkai Yu
, et al. (22 additional authors not shown)
Abstract:
Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This revolutionary force offers the most promising path yet toward solving one of engineering's grandest challenges: achieving reliable, fully autonomous driving, partic…
▽ More
Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This revolutionary force offers the most promising path yet toward solving one of engineering's grandest challenges: achieving reliable, fully autonomous driving, particularly the pursuit of Level 5 autonomy. This survey delivers a comprehensive and critical synthesis of the emerging role of GenAI across the autonomous driving stack. We begin by distilling the principles and trade-offs of modern generative modeling, encompassing VAEs, GANs, Diffusion Models, and Large Language Models (LLMs). We then map their frontier applications in image, LiDAR, trajectory, occupancy, video generation as well as LLM-guided reasoning and decision making. We categorize practical applications, such as synthetic data workflows, end-to-end driving strategies, high-fidelity digital twin systems, smart transportation networks, and cross-domain transfer to embodied AI. We identify key obstacles and possibilities such as comprehensive generalization across rare cases, evaluation and safety checks, budget-limited implementation, regulatory compliance, ethical concerns, and environmental effects, while proposing research plans across theoretical assurances, trust metrics, transport integration, and socio-technical influence. By unifying these threads, the survey provides a forward-looking reference for researchers, engineers, and policymakers navigating the convergence of generative AI and advanced autonomous mobility. An actively maintained repository of cited works is available at https://github.com/taco-group/GenAI4AD.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
MolMole: Molecule Mining from Scientific Literature
Authors:
LG AI Research,
Sehyun Chun,
Jiye Kim,
Ahra Jo,
Yeonsik Jo,
Seungyul Oh,
Seungjun Lee,
Kwangrok Ryoo,
Jongmin Lee,
Seung Hwan Kim,
Byung Jun Kang,
Soonyoung Lee,
Jun Ha Park,
Chanwoo Moon,
Jiwon Ham,
Haein Lee,
Heejae Han,
Jaeseung Byun,
Soojong Do,
Minju Ha,
Dongyun Kim,
Kyunghoon Bae,
Woohyung Lim,
Edward Hwayoung Lee,
Yongmin Park
, et al. (9 additional authors not shown)
Abstract:
The extraction of molecular structures and reaction data from scientific documents is challenging due to their varied, unstructured chemical formats and complex document layouts. To address this, we introduce MolMole, a vision-based deep learning framework that unifies molecule detection, reaction diagram parsing, and optical chemical structure recognition (OCSR) into a single pipeline for automat…
▽ More
The extraction of molecular structures and reaction data from scientific documents is challenging due to their varied, unstructured chemical formats and complex document layouts. To address this, we introduce MolMole, a vision-based deep learning framework that unifies molecule detection, reaction diagram parsing, and optical chemical structure recognition (OCSR) into a single pipeline for automating the extraction of chemical data directly from page-level documents. Recognizing the lack of a standard page-level benchmark and evaluation metric, we also present a testset of 550 pages annotated with molecule bounding boxes, reaction labels, and MOLfiles, along with a novel evaluation metric. Experimental results demonstrate that MolMole outperforms existing toolkits on both our benchmark and public datasets. The benchmark testset will be publicly available, and the MolMole toolkit will be accessible soon through an interactive demo on the LG AI Research website. For commercial inquiries, please contact us at \href{mailto:[email protected]}{contact\[email protected]}.
△ Less
Submitted 7 May, 2025; v1 submitted 30 April, 2025;
originally announced May 2025.
-
Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind
Authors:
Mouad Abrini,
Omri Abend,
Dina Acklin,
Henny Admoni,
Gregor Aichinger,
Nitay Alon,
Zahra Ashktorab,
Ashish Atreja,
Moises Auron,
Alexander Aufreiter,
Raghav Awasthi,
Soumya Banerjee,
Joe M. Barnby,
Rhea Basappa,
Severin Bergsmann,
Djallel Bouneffouf,
Patrick Callaghan,
Marc Cavazza,
Thierry Chaminade,
Sonia Chernova,
Mohamed Chetouan,
Moumita Choudhury,
Axel Cleeremans,
Jacek B. Cywinski,
Fabio Cuzzolin
, et al. (83 additional authors not shown)
Abstract:
This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.
This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.
△ Less
Submitted 28 April, 2025;
originally announced May 2025.
-
Testing SSD Firmware with State Data-Aware Fuzzing: Accelerating Coverage in Nondeterministic I/O Environments
Authors:
Gangho Yoon,
Eunseok Lee
Abstract:
Solid-State Drive (SSD) firmware manages complex internal states, including flash memory maintenance. Due to nondeterministic I/O operations, traditional testing methods struggle to rapidly achieve coverage of firmware code areas that require extensive I/O accumulation. To address this challenge, we propose a state data-aware fuzzing approach that leverages SSD firmware's internal state to guide i…
▽ More
Solid-State Drive (SSD) firmware manages complex internal states, including flash memory maintenance. Due to nondeterministic I/O operations, traditional testing methods struggle to rapidly achieve coverage of firmware code areas that require extensive I/O accumulation. To address this challenge, we propose a state data-aware fuzzing approach that leverages SSD firmware's internal state to guide input generation under nondeterministic I/O conditions and accelerate coverage discovery. Our experiments with an open-source SSD firmware emulator show that the proposed method achieves the same firmware test coverage as a state-of-the-art coverage-based fuzzer (AFL++) while requiring approximately 67% fewer commands, without reducing the number of crashes or hangs detected. Moreover, we extend our experiments by incorporating various I/O commands beyond basic write/read operations to reflect real user scenarios, and we confirm that our strategy remains effective even for multiple types of I/O tests. We further validate the effectiveness of state data-aware fuzzing for firmware testing under I/O environments and suggest that this approach can be extended to other storage firmware or threshold-based embedded systems in the future.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Monitoring morphometric drift in lifelong learning segmentation of the spinal cord
Authors:
Enamundram Naga Karthik,
Sandrine Bédard,
Jan Valošek,
Christoph S. Aigner,
Elise Bannier,
Josef Bednařík,
Virginie Callot,
Anna Combes,
Armin Curt,
Gergely David,
Falk Eippert,
Lynn Farner,
Michael G Fehlings,
Patrick Freund,
Tobias Granberg,
Cristina Granziera,
RHSCIR Network Imaging Group,
Ulrike Horn,
Tomáš Horák,
Suzanne Humphreys,
Markus Hupp,
Anne Kerbrat,
Nawal Kinany,
Shannon Kolind,
Petr Kudlička
, et al. (31 additional authors not shown)
Abstract:
Morphometric measures derived from spinal cord segmentations can serve as diagnostic and prognostic biomarkers in neurological diseases and injuries affecting the spinal cord. While robust, automatic segmentation methods to a wide variety of contrasts and pathologies have been developed over the past few years, whether their predictions are stable as the model is updated using new datasets has not…
▽ More
Morphometric measures derived from spinal cord segmentations can serve as diagnostic and prognostic biomarkers in neurological diseases and injuries affecting the spinal cord. While robust, automatic segmentation methods to a wide variety of contrasts and pathologies have been developed over the past few years, whether their predictions are stable as the model is updated using new datasets has not been assessed. This is particularly important for deriving normative values from healthy participants. In this study, we present a spinal cord segmentation model trained on a multisite $(n=75)$ dataset, including 9 different MRI contrasts and several spinal cord pathologies. We also introduce a lifelong learning framework to automatically monitor the morphometric drift as the model is updated using additional datasets. The framework is triggered by an automatic GitHub Actions workflow every time a new model is created, recording the morphometric values derived from the model's predictions over time. As a real-world application of the proposed framework, we employed the spinal cord segmentation model to update a recently-introduced normative database of healthy participants containing commonly used measures of spinal cord morphometry. Results showed that: (i) our model outperforms previous versions and pathology-specific models on challenging lumbar spinal cord cases, achieving an average Dice score of $0.95 \pm 0.03$; (ii) the automatic workflow for monitoring morphometric drift provides a quick feedback loop for developing future segmentation models; and (iii) the scaling factor required to update the database of morphometric measures is nearly constant among slices across the given vertebral levels, showing minimum drift between the current and previous versions of the model monitored by the framework. The model is freely available in Spinal Cord Toolbox v7.0.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items
Authors:
Jongwook Han,
Dongmin Choi,
Woojung Song,
Eun-Ju Lee,
Yohan Jo
Abstract:
The importance of benchmarks for assessing the values of language models has been pronounced due to the growing need of more authentic, human-aligned responses. However, existing benchmarks rely on human or machine annotations that are vulnerable to value-related biases. Furthermore, the tested scenarios often diverge from real-world contexts in which models are commonly used to generate text and…
▽ More
The importance of benchmarks for assessing the values of language models has been pronounced due to the growing need of more authentic, human-aligned responses. However, existing benchmarks rely on human or machine annotations that are vulnerable to value-related biases. Furthermore, the tested scenarios often diverge from real-world contexts in which models are commonly used to generate text and express values. To address these issues, we propose the Value Portrait benchmark, a reliable framework for evaluating LLMs' value orientations with two key characteristics. First, the benchmark consists of items that capture real-life user-LLM interactions, enhancing the relevance of assessment results to real-world LLM usage. Second, each item is rated by human subjects based on its similarity to their own thoughts, and correlations between these ratings and the subjects' actual value scores are derived. This psychometrically validated approach ensures that items strongly correlated with specific values serve as reliable items for assessing those values. Through evaluating 44 LLMs with our benchmark, we find that these models prioritize Benevolence, Security, and Self-Direction values while placing less emphasis on Tradition, Power, and Achievement values. Also, our analysis reveals biases in how LLMs perceive various demographic groups, deviating from real human data.
△ Less
Submitted 11 June, 2025; v1 submitted 2 May, 2025;
originally announced May 2025.
-
All-Subsets Important Separators with Applications to Sample Sets, Balanced Separators and Vertex Sparsifiers in Directed Graphs
Authors:
Aditya Anand,
Euiwoong Lee,
Jason Li,
Thatchaphol Saranurak
Abstract:
Given a directed graph $G$ with $n$ vertices and $m$ edges, a parameter $k$ and two disjoint subsets $S,T \subseteq V(G)$, we show that the number of all-subsets important separators, which is the number of $A$-$B$ important vertex separators of size at most $k$ over all $A \subseteq S$ and $B \subseteq T$, is at most $β(|S|, |T|, k) = 4^k {|S| \choose \leq k} {|T| \choose \leq 2k}$, where…
▽ More
Given a directed graph $G$ with $n$ vertices and $m$ edges, a parameter $k$ and two disjoint subsets $S,T \subseteq V(G)$, we show that the number of all-subsets important separators, which is the number of $A$-$B$ important vertex separators of size at most $k$ over all $A \subseteq S$ and $B \subseteq T$, is at most $β(|S|, |T|, k) = 4^k {|S| \choose \leq k} {|T| \choose \leq 2k}$, where ${x \choose \leq c} = \sum_{i = 1}^c {x \choose i}$, and that they can be enumerated in time $O(β(|S|,|T|,k)k^2(m+n))$. This is a generalization of the folklore result stating that the number of $A$-$B$ important separators for two fixed sets $A$ and $B$ is at most $4^k$ (first implicitly shown by Chen, Liu and Lu Algorithmica '09). From this result, we obtain the following applications: We give a construction for detection sets and sample sets in directed graphs, generalizing the results of Kleinberg (Internet Mathematics' 03) and Feige and Mahdian (STOC' 06) to directed graphs. Via our new sample sets, we give the first FPT algorithm for finding balanced separators in directed graphs parameterized by $k$, the size of the separator. Our algorithm runs in time $2^{O(k)} (m + n)$. We also give a $O({\sqrt{\log k}})$ approximation algorithm for the same problem. Finally, we present new results on vertex sparsifiers for preserving small cuts.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Min-CSPs on Complete Instances II: Polylogarithmic Approximation for Min-NAE-3-SAT
Authors:
Aditya Anand,
Euiwoong Lee,
Davide Mazzali,
Amatya Sharma
Abstract:
This paper studies complete $k$-Constraint Satisfaction Problems (CSPs), where an $n$-variable instance has exactly one nontrivial constraint for each subset of $k$ variables, i.e., it has $\binom{n}{k}$ constraints. A recent work started a systematic study of complete $k$-CSPs [Anand, Lee, Sharma, SODA'25], and showed a quasi-polynomial time algorithm that decides if there is an assignment satisf…
▽ More
This paper studies complete $k$-Constraint Satisfaction Problems (CSPs), where an $n$-variable instance has exactly one nontrivial constraint for each subset of $k$ variables, i.e., it has $\binom{n}{k}$ constraints. A recent work started a systematic study of complete $k$-CSPs [Anand, Lee, Sharma, SODA'25], and showed a quasi-polynomial time algorithm that decides if there is an assignment satisfying all the constraints of any complete Boolean-alphabet $k$-CSP, algorithmically separating complete instances from dense instances.
The tractability of this decision problem is necessary for any nontrivial (multiplicative) approximation for the minimization version, whose goal is to minimize the number of violated constraints. The same paper raised the question of whether it is possible to obtain nontrivial approximation algorithms for complete Min-$k$-CSPs with $k \geq 3$.
In this work, we make progress in this direction and show a quasi-polynomial time $\text{polylog}(n)$-approximation to Min-NAE-3-SAT on complete instances, which asks to minimize the number of $3$-clauses where all the three literals equal the same bit. To the best of our knowledge, this is the first known example of a CSP whose decision version is NP-Hard in general (and dense) instances while admitting a $\text{polylog}(n)$-approximation in complete instances. Our algorithm presents a new iterative framework for rounding a solution from the Sherali-Adams hierarchy, where each iteration interleaves the two well-known rounding tools: the conditioning procedure, in order to almost fix many variables, and the thresholding procedure, in order to completely fix them.
Finally, we improve the running time of the decision algorithms of Anand, Lee, and Sharma and show a simple algorithm that decides any complete Boolean-alphabet $k$-CSP in polynomial time.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Static to Dynamic Correlation Clustering
Authors:
Nairen Cao,
Vincent Cohen-Addad,
Euiwoong Lee,
Shi Li,
David Rasmussen Lolck,
Alantha Newman,
Mikkel Thorup,
Lukas Vogl,
Shuyi Yan,
Hanwen Zhang
Abstract:
Correlation clustering is a well-studied problem, first proposed by Bansal, Blum, and Chawla [BBC04]. The input is an unweighted, undirected graph. The problem is to cluster the vertices so as to minimizing the number of edges between vertices in different clusters and missing edges between vertices inside the same cluster. This problem has a wide application in data mining and machine learning. W…
▽ More
Correlation clustering is a well-studied problem, first proposed by Bansal, Blum, and Chawla [BBC04]. The input is an unweighted, undirected graph. The problem is to cluster the vertices so as to minimizing the number of edges between vertices in different clusters and missing edges between vertices inside the same cluster. This problem has a wide application in data mining and machine learning. We introduce a general framework that transforms existing static correlation clustering algorithms into fully-dynamic ones that work against an adaptive adversary.
We show how to apply our framework to known efficient correlation clustering algorithms, starting from the classic $3$-approximate Pivot algorithm from [ACN08]. Applied to the most recent near-linear $1.437$-approximation algorithm from [CCL+25], we get a $1.437$-approximation fully-dynamic algorithm that works with worst-case constant update time. The original static algorithm gets its approximation factor with constant probability, and we get the same against an adaptive adversary in the sense that for any given update step not known to our algorithm, our solution is a $1.437$-approximation with constant probability when we reach this update.
Previous dynamic algorithms had approximation factors around $3$ in expectation, and they could only handle an oblivious adversary.
△ Less
Submitted 22 April, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Quantum Annealing for Combinatorial Optimization: A Benchmarking Study
Authors:
Seongmin Kim,
Sang-Woo Ahn,
In-Saeng Suh,
Alexander W. Dowling,
Eungkyu Lee,
Tengfei Luo
Abstract:
Quantum annealing (QA) has the potential to significantly improve solution quality and reduce time complexity in solving combinatorial optimization problems compared to classical optimization methods. However, due to the limited number of qubits and their connectivity, the QA hardware did not show such an advantage over classical methods in past benchmarking studies. Recent advancements in QA with…
▽ More
Quantum annealing (QA) has the potential to significantly improve solution quality and reduce time complexity in solving combinatorial optimization problems compared to classical optimization methods. However, due to the limited number of qubits and their connectivity, the QA hardware did not show such an advantage over classical methods in past benchmarking studies. Recent advancements in QA with more than 5,000 qubits, enhanced qubit connectivity, and the hybrid architecture promise to realize the quantum advantage. Here, we use a quantum annealer with state-of-the-art techniques and benchmark its performance against classical solvers. To compare their performance, we solve over 50 optimization problem instances represented by large and dense Hamiltonian matrices using quantum and classical solvers. The results demonstrate that a state-of-the-art quantum solver has higher accuracy (~0.013%) and a significantly faster problem-solving time (~6,561x) than the best classical solver. Our results highlight the advantages of leveraging QA over classical counterparts, particularly in hybrid configurations, for achieving high accuracy and substantially reduced problem solving time in large-scale real-world optimization problems.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Chain of Understanding: Supporting Code Understanding with Large Language Models
Authors:
Jie Gao,
Yue Xue,
Xiaofei Xie,
SoeMin Thant,
Erika Lee
Abstract:
Code auditing demands a robust understanding of codebases - an especially challenging task for end-user developers with limited expertise. To address this, we conducted formative interviews with experienced auditors and identified a Chain-of-Understanding approach, in which Large Language Models (LLMs) guide developers through hierarchical code comprehension - from high-level overviews to specific…
▽ More
Code auditing demands a robust understanding of codebases - an especially challenging task for end-user developers with limited expertise. To address this, we conducted formative interviews with experienced auditors and identified a Chain-of-Understanding approach, in which Large Language Models (LLMs) guide developers through hierarchical code comprehension - from high-level overviews to specific functions and variables. Building on this, we incorporated the Chain-of-Understanding concept into CodeMap, a system offering interactive visualizations, stepwise guided analysis, and context-aware chatbot support. Through within-subject user studies with 10 participants of diverse backgrounds and 5 expert and 2 novice interviews, CodeMap proved effective in reducing the manual effort of prompt engineering while enhancing engagement with visualization, outperforming both standalone LLMs and traditional static visualization tools.
△ Less
Submitted 6 April, 2025;
originally announced April 2025.
-
Exploration of Approaches for Robustness and Safety in a Low Code Open Environment for Factory Automation
Authors:
Gustavo Quiros A.,
Yi Peng Zhu,
Tao Cui,
Shaokai Lin,
Marten Lohstroh,
Edward A. Lee
Abstract:
This report is a compilation of technical knowledge and concepts that were produced by the authors and additional contributors in the context of the collaboration projects "Abstraction Requirements for Language of Choice in Industrial Automation" (FY21-22) and "Approaches for Robust and Safe Low-Code" (FY23-24) from Siemens Technology and the University of California, Berkeley. The primary objecti…
▽ More
This report is a compilation of technical knowledge and concepts that were produced by the authors and additional contributors in the context of the collaboration projects "Abstraction Requirements for Language of Choice in Industrial Automation" (FY21-22) and "Approaches for Robust and Safe Low-Code" (FY23-24) from Siemens Technology and the University of California, Berkeley. The primary objective of these projects was to assess Siemens Open Industrial Edge (OIE) engineering capabilities by defining a concept that ensures the satisfaction of coordination and safety requirements when using disparate OIE modules. The objective was to use the Lingua Franca (LF) coordination language to demonstrate how to address challenges in: 1. engineering modular, distributed, and flexible automation solutions that ensure, by design, robust and safe operation1; 2. the use of IEC 61499, the event driven execution model for specifying the execution order of OIE modules (defined as function blocks); 3. support large-scale distributed OIE automation solutions, and eventually 4. define optimal solutions with synchronization and time-optimal mechanisms.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
Investigating Affective Use and Emotional Well-being on ChatGPT
Authors:
Jason Phang,
Michael Lampe,
Lama Ahmad,
Sandhini Agarwal,
Cathy Mengying Fang,
Auren R. Liu,
Valdemar Danry,
Eunhae Lee,
Samantha W. T. Chan,
Pat Pataranutaporn,
Pattie Maes
Abstract:
As AI chatbots see increased adoption and integration into everyday life, questions have been raised about the potential impact of human-like or anthropomorphic AI on users. In this work, we investigate the extent to which interactions with ChatGPT (with a focus on Advanced Voice Mode) may impact users' emotional well-being, behaviors and experiences through two parallel studies. To study the affe…
▽ More
As AI chatbots see increased adoption and integration into everyday life, questions have been raised about the potential impact of human-like or anthropomorphic AI on users. In this work, we investigate the extent to which interactions with ChatGPT (with a focus on Advanced Voice Mode) may impact users' emotional well-being, behaviors and experiences through two parallel studies. To study the affective use of AI chatbots, we perform large-scale automated analysis of ChatGPT platform usage in a privacy-preserving manner, analyzing over 3 million conversations for affective cues and surveying over 4,000 users on their perceptions of ChatGPT. To investigate whether there is a relationship between model usage and emotional well-being, we conduct an Institutional Review Board (IRB)-approved randomized controlled trial (RCT) on close to 1,000 participants over 28 days, examining changes in their emotional well-being as they interact with ChatGPT under different experimental settings. In both on-platform data analysis and the RCT, we observe that very high usage correlates with increased self-reported indicators of dependence. From our RCT, we find that the impact of voice-based interactions on emotional well-being to be highly nuanced, and influenced by factors such as the user's initial emotional state and total usage duration. Overall, our analysis reveals that a small number of users are responsible for a disproportionate share of the most affective cues.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Improved visual-information-driven model for crowd simulation and its modular application
Authors:
Xuanwen Liang,
Jiayu Chen,
Eric Wai Ming Lee,
Wei Xie
Abstract:
Data-driven crowd simulation models offer advantages in enhancing the accuracy and realism of simulations, and improving their generalizability is essential for promoting application. Current data-driven approaches are primarily designed for a single scenario, with very few models validated across more than two scenarios. It is still an open question to develop data-driven crowd simulation models…
▽ More
Data-driven crowd simulation models offer advantages in enhancing the accuracy and realism of simulations, and improving their generalizability is essential for promoting application. Current data-driven approaches are primarily designed for a single scenario, with very few models validated across more than two scenarios. It is still an open question to develop data-driven crowd simulation models with strong generalizibility. We notice that the key to addressing this challenge lies in effectively and accurately capturing the core common influential features that govern pedestrians' navigation across diverse scenarios. Particularly, we believe that visual information is one of the most dominant influencing features. In light of this, this paper proposes a data-driven model incorporating a refined visual information extraction method and exit cues to enhance generalizability. The proposed model is examined on four common fundamental modules: bottleneck, corridor, corner and T-junction. The evaluation results demonstrate that our model performs excellently across these scenarios, aligning with pedestrian movement in real-world experiments, and significantly outperforms the classical knowledge-driven model. Furthermore, we introduce a modular approach to apply our proposed model in composite scenarios, and the results regarding trajectories and fundamental diagrams indicate that our simulations closely match real-world patterns in the composite scenario. The research outcomes can provide inspiration for the development of data-driven crowd simulation models with high generalizability and advance the application of data-driven approaches.This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
△ Less
Submitted 11 April, 2025; v1 submitted 2 April, 2025;
originally announced April 2025.
-
Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations
Authors:
Mahjabin Nahar,
Eun-Ju Lee,
Jin Won Park,
Dongwon Lee
Abstract:
While we increasingly rely on large language models (LLMs) for various tasks, these models are known to produce inaccurate content or `hallucinations' with potentially disastrous consequences. The recent integration of web search results into LLMs prompts the question of whether people utilize them to verify the generated content, thereby accurately detecting hallucinations. An online experiment (…
▽ More
While we increasingly rely on large language models (LLMs) for various tasks, these models are known to produce inaccurate content or `hallucinations' with potentially disastrous consequences. The recent integration of web search results into LLMs prompts the question of whether people utilize them to verify the generated content, thereby accurately detecting hallucinations. An online experiment (N = 560) investigated how the provision of search results, either static (i.e., fixed search results provided by LLM) or dynamic (i.e., participant-led searches), affects participants' perceived accuracy of LLM-generated content (i.e., genuine, minor hallucination, major hallucination), self-confidence in accuracy ratings, as well as their overall evaluation of the LLM, as compared to the control condition (i.e., no search results). Results showed that participants in both static and dynamic conditions (vs. control) rated hallucinated content to be less accurate and perceived the LLM more negatively. However, those in the dynamic condition rated genuine content as more accurate and demonstrated greater overall self-confidence in their assessments than those in the static search or control conditions. We highlighted practical implications of incorporating web search functionality into LLMs in real-world contexts.
△ Less
Submitted 6 May, 2025; v1 submitted 1 April, 2025;
originally announced April 2025.
-
A Preliminary Model of Coordination-free Consistency
Authors:
Shulu Li,
Edward A. Lee
Abstract:
Building consistent distributed systems has largely depended on complex coordination strategies that are not only tricky to implement, but also take a toll on performance as they require nodes to wait for coordination messages. In this paper, we explore the conditions under which no coordination is required to guarantee consistency. We present a simple and succinct theoretical model for distribute…
▽ More
Building consistent distributed systems has largely depended on complex coordination strategies that are not only tricky to implement, but also take a toll on performance as they require nodes to wait for coordination messages. In this paper, we explore the conditions under which no coordination is required to guarantee consistency. We present a simple and succinct theoretical model for distributed computation that separates coordination from computation. The main contribution of this work is mathematically defining concepts in distributed computing such as strong eventual consistency, consistency, consistent under partition, confluence, coordination-free, and monotonicity. Based on these definitions, we prove necessary and sufficient conditions for strong eventual consistency and give a proof of the CALM theorem from a distributed computation perspective.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation
Authors:
Sarubi Thillainathan,
Songchen Yuan,
En-Shiun Annie Lee,
Sanath Jayasena,
Surangika Ranathunga
Abstract:
Fine-tuning multilingual sequence-to-sequence large language models (msLLMs) has shown promise in developing neural machine translation (NMT) systems for low-resource languages (LRLs). However, conventional single-stage fine-tuning methods struggle in extremely low-resource NMT settings, where training data is very limited. This paper contributes to artificial intelligence by proposing two approac…
▽ More
Fine-tuning multilingual sequence-to-sequence large language models (msLLMs) has shown promise in developing neural machine translation (NMT) systems for low-resource languages (LRLs). However, conventional single-stage fine-tuning methods struggle in extremely low-resource NMT settings, where training data is very limited. This paper contributes to artificial intelligence by proposing two approaches for adapting msLLMs in these challenging scenarios: (1) continual pre-training (CPT), where the msLLM is further trained with domain-specific monolingual data to compensate for the under-representation of LRLs, and (2) intermediate task transfer learning (ITTL), a method that fine-tunes the msLLM with both in-domain and out-of-domain parallel data to enhance its translation capabilities across various domains and tasks. As an application in engineering, these methods are implemented in NMT systems for Sinhala, Tamil, and English (six language pairs) in domain-specific, extremely low-resource settings (datasets containing fewer than 100,000 samples). Our experiments reveal that these approaches enhance translation performance by an average of +1.47 bilingual evaluation understudy (BLEU) score compared to the standard single-stage fine-tuning baseline across all translation directions. Additionally, a multi-model ensemble further improves performance by an additional BLEU score.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
Solving the Correlation Cluster LP in Sublinear Time
Authors:
Nairen Cao,
Vincent Cohen-Addad,
Shi Li,
Euiwoong Lee,
David Rasmussen Lolck,
Alantha Newman,
Mikkel Thorup,
Lukas Vogl,
Shuyi Yan,
Hanwen Zhang
Abstract:
Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges.
CCL+24 introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previou…
▽ More
Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges.
CCL+24 introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previous linear programming formulations. However, the cluster LP has exponential size, with a variable for every possible set of vertices in the input graph. Nevertheless, CCL+24 showed how to find a feasible solution for the cluster LP in time $O(n^{\text{poly}(1/\eps)})$ with objective value at most $(1+ε)$ times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the cluster LP, yielding a $(1.437+\eps)$-approximation algorithm for the Correlation Clustering problem.
The main technical result of this paper is a new approach to find a feasible solution for the cluster LP with objective value at most $(1+ε)$ of the optimum in time $\widetilde O(2^{\text{poly}(1/\eps)} n)$, where $n$ is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast $(1.437+\eps)$-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.
△ Less
Submitted 31 March, 2025; v1 submitted 26 March, 2025;
originally announced March 2025.
-
Gemini Robotics: Bringing AI into the Physical World
Authors:
Gemini Robotics Team,
Saminda Abeyruwan,
Joshua Ainslie,
Jean-Baptiste Alayrac,
Montserrat Gonzalez Arenas,
Travis Armstrong,
Ashwin Balakrishna,
Robert Baruch,
Maria Bauza,
Michiel Blokzijl,
Steven Bohez,
Konstantinos Bousmalis,
Anthony Brohan,
Thomas Buschmann,
Arunkumar Byravan,
Serkan Cabi,
Ken Caluwaerts,
Federico Casarini,
Oscar Chang,
Jose Enrique Chen,
Xi Chen,
Hao-Tien Lewis Chiang,
Krzysztof Choromanski,
David D'Ambrosio,
Sudeep Dasari
, et al. (93 additional authors not shown)
Abstract:
Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. This report introduces a new family of AI models purposefully designed for robotics and built upon the foundation of Gemini 2.0. We present Gemini Robotics, an advanced Vision-Lang…
▽ More
Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. This report introduces a new family of AI models purposefully designed for robotics and built upon the foundation of Gemini 2.0. We present Gemini Robotics, an advanced Vision-Language-Action (VLA) generalist model capable of directly controlling robots. Gemini Robotics executes smooth and reactive movements to tackle a wide range of complex manipulation tasks while also being robust to variations in object types and positions, handling unseen environments as well as following diverse, open vocabulary instructions. We show that with additional fine-tuning, Gemini Robotics can be specialized to new capabilities including solving long-horizon, highly dexterous tasks, learning new short-horizon tasks from as few as 100 demonstrations and adapting to completely novel robot embodiments. This is made possible because Gemini Robotics builds on top of the Gemini Robotics-ER model, the second model we introduce in this work. Gemini Robotics-ER (Embodied Reasoning) extends Gemini's multimodal reasoning capabilities into the physical world, with enhanced spatial and temporal understanding. This enables capabilities relevant to robotics including object detection, pointing, trajectory and grasp prediction, as well as multi-view correspondence and 3D bounding box predictions. We show how this novel combination can support a variety of robotics applications. We also discuss and address important safety considerations related to this new class of robotics foundation models. The Gemini Robotics family marks a substantial step towards developing general-purpose robots that realizes AI's potential in the physical world.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Randomized Controlled Study
Authors:
Cathy Mengying Fang,
Auren R. Liu,
Valdemar Danry,
Eunhae Lee,
Samantha W. T. Chan,
Pat Pataranutaporn,
Pattie Maes,
Jason Phang,
Michael Lampe,
Lama Ahmad,
Sandhini Agarwal
Abstract:
AI chatbots, especially those with voice capabilities, have become increasingly human-like, with more users seeking emotional support and companionship from them. Concerns are rising about how such interactions might impact users' loneliness and socialization with real people. We conducted a four-week randomized, controlled, IRB-approved experiment (n=981, >300K messages) to investigate how AI cha…
▽ More
AI chatbots, especially those with voice capabilities, have become increasingly human-like, with more users seeking emotional support and companionship from them. Concerns are rising about how such interactions might impact users' loneliness and socialization with real people. We conducted a four-week randomized, controlled, IRB-approved experiment (n=981, >300K messages) to investigate how AI chatbot interaction modes (text, neutral voice, and engaging voice) and conversation types (open-ended, non-personal, and personal) influence psychosocial outcomes such as loneliness, social interaction with real people, emotional dependence on AI and problematic AI usage. Results showed that while voice-based chatbots initially appeared beneficial in mitigating loneliness and dependence compared with text-based chatbots, these advantages diminished at high usage levels, especially with a neutral-voice chatbot. Conversation type also shaped outcomes: personal topics slightly increased loneliness but tended to lower emotional dependence compared with open-ended conversations, whereas non-personal topics were associated with greater dependence among heavy users. Overall, higher daily usage - across all modalities and conversation types - correlated with higher loneliness, dependence, and problematic use, and lower socialization. Exploratory analyses revealed that those with stronger emotional attachment tendencies and higher trust in the AI chatbot tended to experience greater loneliness and emotional dependence, respectively. These findings underscore the complex interplay between chatbot design choices (e.g., voice expressiveness) and user behaviors (e.g., conversation content, usage frequency). We highlight the need for further research on whether chatbots' ability to manage emotional content without fostering dependence or replacing human relationships benefits overall well-being.
△ Less
Submitted 21 March, 2025;
originally announced March 2025.
-
Rank-O-ToM: Unlocking Emotional Nuance Ranking to Enhance Affective Theory-of-Mind
Authors:
JiHyun Kim,
JuneHyoung Kwon,
MiHyeon Kim,
Eunju Lee,
YoungBin Kim
Abstract:
Facial Expression Recognition (FER) plays a foundational role in enabling AI systems to interpret emotional nuances, a critical aspect of affective Theory of Mind (ToM). However, existing models often struggle with poor calibration and a limited capacity to capture emotional intensity and complexity. To address this, we propose Ranking the Emotional Nuance for Theory of Mind (Rank-O-ToM), a framew…
▽ More
Facial Expression Recognition (FER) plays a foundational role in enabling AI systems to interpret emotional nuances, a critical aspect of affective Theory of Mind (ToM). However, existing models often struggle with poor calibration and a limited capacity to capture emotional intensity and complexity. To address this, we propose Ranking the Emotional Nuance for Theory of Mind (Rank-O-ToM), a framework that leverages ordinal ranking to align confidence levels with the emotional spectrum. By incorporating synthetic samples reflecting diverse affective complexities, Rank-O-ToM enhances the nuanced understanding of emotions, advancing AI's ability to reason about affective states.
△ Less
Submitted 24 February, 2025;
originally announced March 2025.
-
Learning 3D Scene Analogies with Neural Contextual Scene Maps
Authors:
Junho Kim,
Gwangtak Bae,
Eun Sun Lee,
Young Min Kim
Abstract:
Understanding scene contexts is crucial for machines to perform tasks and adapt prior knowledge in unseen or noisy 3D environments. As data-driven learning is intractable to comprehensively encapsulate diverse ranges of layouts and open spaces, we propose teaching machines to identify relational commonalities in 3D spaces. Instead of focusing on point-wise or object-wise representations, we introd…
▽ More
Understanding scene contexts is crucial for machines to perform tasks and adapt prior knowledge in unseen or noisy 3D environments. As data-driven learning is intractable to comprehensively encapsulate diverse ranges of layouts and open spaces, we propose teaching machines to identify relational commonalities in 3D spaces. Instead of focusing on point-wise or object-wise representations, we introduce 3D scene analogies, which are smooth maps between 3D scene regions that align spatial relationships. Unlike well-studied single instance-level maps, these scene-level maps smoothly link large scene regions, potentially enabling unique applications in trajectory transfer in AR/VR, long demonstration transfer for imitation learning, and context-aware object rearrangement. To find 3D scene analogies, we propose neural contextual scene maps, which extract descriptor fields summarizing semantic and geometric contexts, and holistically align them in a coarse-to-fine manner for map estimation. This approach reduces reliance on individual feature points, making it robust to input noise or shape variations. Experiments demonstrate the effectiveness of our approach in identifying scene analogies and transferring trajectories or object placements in diverse indoor scenes, indicating its potential for robotics and AR/VR applications.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Genomic data processing with GenomeFlow
Authors:
Junseok Park,
Eduardo A. Maury,
Changhoon Oh,
Donghoon Shin,
Danielle Denisko,
Eunjung Alice Lee
Abstract:
Advances in genome sequencing technologies generate massive amounts of sequence data that are increasingly analyzed and shared through public repositories. On-demand infrastructure services on cloud computing platforms enable the processing of such large-scale genomic sequence data in distributed processing environments with a significant reduction in analysis time. However, parallel processing on…
▽ More
Advances in genome sequencing technologies generate massive amounts of sequence data that are increasingly analyzed and shared through public repositories. On-demand infrastructure services on cloud computing platforms enable the processing of such large-scale genomic sequence data in distributed processing environments with a significant reduction in analysis time. However, parallel processing on cloud computing platforms presents many challenges to researchers, even skillful bioinformaticians. In particular, it is difficult to design a computing architecture optimized to reduce the cost of computing and disk storage as genomic data analysis pipelines often employ many heterogeneous tools with different resource requirements. To address these issues, we developed GenomeFlow, a tool for automated development of computing architecture and resource optimization on Google Cloud Platform, which allows users to process a large number of samples at minimal cost. We outline multiple use cases of GenomeFlow demonstrating its utility to significantly reduce computing time and cost associated with analyzing genomic and transcriptomic data from hundreds to tens of thousands of samples from several consortia. Here, we describe a step-by-step protocol on how to use GenomeFlow for a common genomic data processing task. We introduce this example protocol geared toward a bioinformatician with little experience in cloud computing.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
Authors:
JuneHyoung Kwon,
MiHyeon Kim,
Eunju Lee,
Juhwan Choi,
YoungBin Kim
Abstract:
Vision-language (VL) models have demonstrated strong performance across various tasks. However, these models often rely on a specific modality for predictions, leading to "dominant modality bias.'' This bias significantly hurts performance, especially when one modality is impaired. In this study, we analyze model behavior under dominant modality bias and theoretically show that unaligned gradients…
▽ More
Vision-language (VL) models have demonstrated strong performance across various tasks. However, these models often rely on a specific modality for predictions, leading to "dominant modality bias.'' This bias significantly hurts performance, especially when one modality is impaired. In this study, we analyze model behavior under dominant modality bias and theoretically show that unaligned gradients or differences in gradient magnitudes prevent balanced convergence of the loss. Based on these findings, we propose a novel framework, BalGrad to mitigate dominant modality bias. Our approach includes inter-modality gradient reweighting, adjusting the gradient of KL divergence based on each modality's contribution, and inter-task gradient projection to align task directions in a non-conflicting manner. Experiments on UPMC Food-101, Hateful Memes, and MM-IMDb datasets confirm that BalGrad effectively alleviates over-reliance on specific modalities when making predictions.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
EXAONE Deep: Reasoning Enhanced Language Models
Authors:
LG AI Research,
Kyunghoon Bae,
Eunbi Choi,
Kibong Choi,
Stanley Jungkyu Choi,
Yemuk Choi,
Seokhee Hong,
Junwon Hwang,
Hyojin Jeon,
Kijeong Jeon,
Gerrard Jeongwon Jo,
Hyunjik Jo,
Jiyeon Jung,
Hyosang Kim,
Joonkee Kim,
Seonghwan Kim,
Soyeon Kim,
Sunkyoung Kim,
Yireun Kim,
Yongil Kim,
Youchul Kim,
Edward Hwayoung Lee,
Haeju Lee,
Honglak Lee,
Jinsik Lee
, et al. (7 additional authors not shown)
Abstract:
We present EXAONE Deep series, which exhibits superior capabilities in various reasoning tasks, including math and coding benchmarks. We train our models mainly on the reasoning-specialized dataset that incorporates long streams of thought processes. Evaluation results show that our smaller models, EXAONE Deep 2.4B and 7.8B, outperform other models of comparable size, while the largest model, EXAO…
▽ More
We present EXAONE Deep series, which exhibits superior capabilities in various reasoning tasks, including math and coding benchmarks. We train our models mainly on the reasoning-specialized dataset that incorporates long streams of thought processes. Evaluation results show that our smaller models, EXAONE Deep 2.4B and 7.8B, outperform other models of comparable size, while the largest model, EXAONE Deep 32B, demonstrates competitive performance against leading open-weight models. All EXAONE Deep models are openly available for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE
△ Less
Submitted 19 March, 2025; v1 submitted 16 March, 2025;
originally announced March 2025.
-
Direct-Write Printed Contacts to Layered and 2D Materials
Authors:
Sharadh Jois,
Erica Lee,
Philip Li,
Tsegereda Esatu,
Jason Fleischer,
Edwin Quinn,
Genda Gu,
Vadym Kulichenko,
Luis Balicas,
Son T. Le,
Samuel W. LaGasse,
Aubrey T. Hanbicki,
Adam L. Friedman
Abstract:
Advancements in fabrication methods have shaped new computing device technologies. Among these methods, depositing electrical contacts to the channel material is fundamental to device characterization. Novel layered and two-dimensional (2D) materials are promising for next-generation computing electronic channel materials. Direct-write printing of conductive inks is introduced as a surprisingly ef…
▽ More
Advancements in fabrication methods have shaped new computing device technologies. Among these methods, depositing electrical contacts to the channel material is fundamental to device characterization. Novel layered and two-dimensional (2D) materials are promising for next-generation computing electronic channel materials. Direct-write printing of conductive inks is introduced as a surprisingly effective, significantly faster, and cleaner method to contact different classes of layered materials, including graphene (semi-metal), MoS2 (semiconductor), Bi-2212 (superconductor), and Fe5GeTe2 (metallic ferromagnet). Based on the electrical response, the quality of the printed contacts is comparable to what is achievable with resist-based lithography techniques. These devices are tested by sweeping gate voltage, temperature, and magnetic field to show that the materials remain pristine post-processing. This work demonstrates that direct-write printing is an agile method for prototyping and characterizing the electrical properties of novel layered materials.
△ Less
Submitted 10 April, 2025; v1 submitted 6 March, 2025;
originally announced March 2025.
-
A $(2+\varepsilon)$-Approximation Algorithm for Metric $k$-Median
Authors:
Vincent Cohen-Addad,
Fabrizio Grandoni,
Euiwoong Lee,
Chris Schwiegelshohn,
Ola Svensson
Abstract:
In the classical NP-hard metric $k$-median problem, we are given a set of $n$ clients and centers with metric distances between them, along with an integer parameter $k\geq 1$. The objective is to select a subset of $k$ open centers that minimizes the total distance from each client to its closest open center.
In their seminal work, Jain, Mahdian, Markakis, Saberi, and Vazirani presented the Gre…
▽ More
In the classical NP-hard metric $k$-median problem, we are given a set of $n$ clients and centers with metric distances between them, along with an integer parameter $k\geq 1$. The objective is to select a subset of $k$ open centers that minimizes the total distance from each client to its closest open center.
In their seminal work, Jain, Mahdian, Markakis, Saberi, and Vazirani presented the Greedy algorithm for facility location, which implies a $2$-approximation algorithm for $k$-median that opens $k$ centers in expectation. Since then, substantial research has aimed at narrowing the gap between their algorithm and the best achievable approximation by an algorithm guaranteed to open exactly $k$ centers. During the last decade, all improvements have been achieved by leveraging their algorithm or a small improvement thereof, followed by a second step called bi-point rounding, which inherently increases the approximation guarantee.
Our main result closes this gap: for any $ε>0$, we present a $(2+ε)$-approximation algorithm for $k$-median, improving the previous best-known approximation factor of $2.613$. Our approach builds on a combination of two algorithms. First, we present a non-trivial modification of the Greedy algorithm that operates with $O(\log n/ε^2)$ adaptive phases. Through a novel walk-between-solutions approach, this enables us to construct a $(2+ε)$-approximation algorithm for $k$-median that consistently opens at most $k + O(\log n{/ε^2})$ centers. Second, we develop a novel $(2+ε)$-approximation algorithm tailored for stable instances, where removing any center from an optimal solution increases the cost by at least an $Ω(ε^3/\log n)$ fraction. Achieving this involves a sampling approach inspired by the $k$-means++ algorithm and a reduction to submodular optimization subject to a partition matroid.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
Authors:
Pol G. Recasens,
Ferran Agullo,
Yue Zhu,
Chen Wang,
Eun Kyung Lee,
Olivier Tardieu,
Jordi Torres,
Josep Ll. Berral
Abstract:
Large language models have been widely adopted across different tasks, but their auto-regressive generation nature often leads to inefficient resource utilization during inference. While batching is commonly used to increase throughput, performance gains plateau beyond a certain batch size, especially with smaller models, a phenomenon that existing literature typically explains as a shift to the c…
▽ More
Large language models have been widely adopted across different tasks, but their auto-regressive generation nature often leads to inefficient resource utilization during inference. While batching is commonly used to increase throughput, performance gains plateau beyond a certain batch size, especially with smaller models, a phenomenon that existing literature typically explains as a shift to the compute-bound regime. In this paper, through an in-depth GPU-level analysis, we reveal that large-batch inference remains memory-bound, with most GPU compute capabilities underutilized due to DRAM bandwidth saturation as the primary bottleneck. To address this, we propose a Batching Configuration Advisor (BCA) that optimizes memory allocation, reducing GPU memory requirements with minimal impact on throughput. The freed memory and underutilized GPU compute capabilities can then be leveraged by concurrent workloads. Specifically, we use model replication to improve serving throughput and GPU utilization. Our findings challenge conventional assumptions about LLM inference, offering new insights and practical strategies for improving resource utilization, particularly for smaller language models.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
PolyVer: A Compositional Approach for Polyglot System Modeling and Verification
Authors:
Pei-Wei Chen,
Shaokai Lin,
Adwait Godbole,
Ramneet Singh,
Elizabeth Polgreen,
Edward A. Lee,
Sanjit A. Seshia
Abstract:
Several software systems are polyglot; that is, they comprise programs implemented in a combination of programming languages. Verifiers that directly run on mainstream programming languages are currently customized for single languages. Thus, to verify polyglot systems, one usually translates them into a common verification language or formalism on which the verifier runs. In this paper, we presen…
▽ More
Several software systems are polyglot; that is, they comprise programs implemented in a combination of programming languages. Verifiers that directly run on mainstream programming languages are currently customized for single languages. Thus, to verify polyglot systems, one usually translates them into a common verification language or formalism on which the verifier runs. In this paper, we present an alternative approach, PolyVer, which employs abstraction, compositional reasoning, and synthesis to directly perform polyglot verification. PolyVer constructs a formal model of the original polyglot system as a transition system where the update functions associated with transitions are implemented in target languages such as C or Rust. To perform verification, PolyVer then connects a model checker for transition systems with language-specific verifiers (e.g., for C or Rust) using pre/post-condition contracts for the update functions. These contracts are automatically generated by synthesis oracles based on syntax-guided synthesis or large language models (LLMs), and checked by the language-specific verifiers. The contracts form abstractions of the update functions using which the model checker verifies the overall system-level property on the polyglot system model. PolyVer iterates between counterexample-guided abstraction-refinement (CEGAR) and counterexample-guided inductive synthesis (CEGIS) until the property is verified or a true system-level counterexample is found. We demonstrate the utility of PolyVer for verifying programs in the Lingua Franca polyglot language using the UCLID5 model checker connected with the CBMC and Kani verifiers for C and Rust respectively.
△ Less
Submitted 12 March, 2025; v1 submitted 5 March, 2025;
originally announced March 2025.
-
AlignFreeze: Navigating the Impact of Realignment on the Layers of Multilingual Models Across Diverse Languages
Authors:
Steve Bakos,
Félix Gaschi,
David Guzmán,
Riddhi More,
Kelly Chutong Li,
En-Shiun Annie Lee
Abstract:
Realignment techniques are often employed to enhance cross-lingual transfer in multilingual language models, still, they can sometimes degrade performance in languages that differ significantly from the fine-tuned source language. This paper introduces AlignFreeze, a method that freezes either the layers' lower half or upper half during realignment. Through controlled experiments on 4 tasks, 3 mod…
▽ More
Realignment techniques are often employed to enhance cross-lingual transfer in multilingual language models, still, they can sometimes degrade performance in languages that differ significantly from the fine-tuned source language. This paper introduces AlignFreeze, a method that freezes either the layers' lower half or upper half during realignment. Through controlled experiments on 4 tasks, 3 models, and in 35 languages, we find that realignment affects all the layers but can be the most detrimental to the lower ones. Freezing the lower layers can prevent performance degradation. Particularly, AlignFreeze improves Part-of-Speech (PoS) tagging performances in languages where full realignment fails: with XLM-R, it provides improvements of more than one standard deviation in accuracy in seven more languages than full realignment.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
SenDaL: An Effective and Efficient Calibration Framework of Low-Cost Sensors for Daily Life
Authors:
Seokho Ahn,
Hyungjin Kim,
Euijong Lee,
Young-Duk Seo
Abstract:
The collection of accurate and noise-free data is a crucial part of Internet of Things (IoT)-controlled environments. However, the data collected from various sensors in daily life often suffer from inaccuracies. Additionally, IoT-controlled devices with low-cost sensors lack sufficient hardware resources to employ conventional deep-learning models. To overcome this limitation, we propose sensors…
▽ More
The collection of accurate and noise-free data is a crucial part of Internet of Things (IoT)-controlled environments. However, the data collected from various sensors in daily life often suffer from inaccuracies. Additionally, IoT-controlled devices with low-cost sensors lack sufficient hardware resources to employ conventional deep-learning models. To overcome this limitation, we propose sensors for daily life (SenDaL), the first framework that utilizes neural networks for calibrating low cost sensors. SenDaL introduces novel training and inference processes that enable it to achieve accuracy comparable to deep learning models while simultaneously preserving latency and energy consumption similar to linear models. SenDaL is first trained in a bottom-up manner, making decisions based on calibration results from both linear and deep learning models. Once both models are trained, SenDaL makes independent decisions through a top-down inference process, ensuring accuracy and inference speed. Furthermore, SenDaL can select the optimal deep learning model according to the resources of the IoT devices because it is compatible with various deep learning models, such as long short-term memory-based and Transformer-based models. We have verified that SenDaL outperforms existing deep learning models in terms of accuracy, latency, and energy efficiency through experiments conducted in different IoT environments and real-life scenarios.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages
Authors:
Hao Yu,
Jesujoba O. Alabi,
Andiswa Bukula,
Jian Yun Zhuang,
En-Shiun Annie Lee,
Tadesse Kebede Guge,
Israel Abebe Azime,
Happy Buzaaba,
Blessing Kudzaishe Sibanda,
Godson K. Kalipe,
Jonathan Mukiibi,
Salomon Kabongo Kabenamualu,
Mmasibidi Setaka,
Lolwethu Ndolela,
Nkiruka Odu,
Rooweither Mabuya,
Shamsuddeen Hassan Muhammad,
Salomey Osei,
Sokhar Samb,
Juliet W. Murage,
Dietrich Klakow,
David Ifeoluwa Adelani
Abstract:
Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks often exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African…
▽ More
Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks often exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
HyGEN: Regularizing Negative Hyperedge Generation for Accurate Hyperedge Prediction
Authors:
Song Kyung Yu,
Da Eun Lee,
Yunyong Ko,
Sang-Wook Kim
Abstract:
Hyperedge prediction is a fundamental task to predict future high-order relations based on the observed network structure. Existing hyperedge prediction methods, however, suffer from the data sparsity problem. To alleviate this problem, negative sampling methods can be used, which leverage non-existing hyperedges as contrastive information for model training. However, the following important chall…
▽ More
Hyperedge prediction is a fundamental task to predict future high-order relations based on the observed network structure. Existing hyperedge prediction methods, however, suffer from the data sparsity problem. To alleviate this problem, negative sampling methods can be used, which leverage non-existing hyperedges as contrastive information for model training. However, the following important challenges have been rarely studied: (C1) lack of guidance for generating negatives and (C2) possibility of producing false negatives. To address them, we propose a novel hyperedge prediction method, HyGEN, that employs (1) a negative hyperedge generator that employs positive hyperedges as a guidance to generate more realistic ones and (2) a regularization term that prevents the generated hyperedges from being false negatives. Extensive experiments on six real-world hypergraphs reveal that HyGEN consistently outperforms four state-of-the-art hyperedge prediction methods.
△ Less
Submitted 18 February, 2025; v1 submitted 9 February, 2025;
originally announced February 2025.
-
Facility Location on High-dimensional Euclidean Spaces
Authors:
Euiwoong Lee,
Kijun Shin
Abstract:
Recent years have seen great progress in the approximability of fundamental clustering and facility location problems on high-dimensional Euclidean spaces, including $k$-Means and $k$-Median. While they admit strictly better approximation ratios than their general metric versions, their approximation ratios are still higher than the hardness ratios for general metrics, leaving the possibility that…
▽ More
Recent years have seen great progress in the approximability of fundamental clustering and facility location problems on high-dimensional Euclidean spaces, including $k$-Means and $k$-Median. While they admit strictly better approximation ratios than their general metric versions, their approximation ratios are still higher than the hardness ratios for general metrics, leaving the possibility that the ultimate optimal approximation ratios will be the same between Euclidean and general metrics. Moreover, such an improved algorithm for Euclidean spaces is not known for Uncapaciated Facility Location (UFL), another fundamental problem in the area.
In this paper, we prove that for any $γ\geq 1.6774$ there exists $\varepsilon > 0$ such that Euclidean UFL admits a $(γ, 1 + 2e^{-γ} - \varepsilon)$-bifactor approximation algorithm, improving the result of Byrka and Aardal. Together with the $(γ, 1 + 2e^{-γ})$ NP-hardness in general metrics, it shows the first separation between general and Euclidean metrics for the aforementioned basic problems. We also present an $(α_{Li} - \varepsilon)$-(unifactor) approximation algorithm for UFL for some $\varepsilon > 0$ in Euclidean spaces, where $α_{Li} \approx 1.488$ is the best-known approximation ratio for UFL by Li.
△ Less
Submitted 29 January, 2025;
originally announced January 2025.
-
ViT-2SPN: Vision Transformer-based Dual-Stream Self-Supervised Pretraining Networks for Retinal OCT Classification
Authors:
Mohammadreza Saraei,
Igor Kozak,
Eung-Joo Lee
Abstract:
Optical Coherence Tomography (OCT) is a non-invasive imaging modality essential for diagnosing various eye diseases. Despite its clinical significance, developing OCT-based diagnostic tools faces challenges, such as limited public datasets, sparse annotations, and privacy concerns. Although deep learning has made progress in automating OCT analysis, these challenges remain unresolved. To address t…
▽ More
Optical Coherence Tomography (OCT) is a non-invasive imaging modality essential for diagnosing various eye diseases. Despite its clinical significance, developing OCT-based diagnostic tools faces challenges, such as limited public datasets, sparse annotations, and privacy concerns. Although deep learning has made progress in automating OCT analysis, these challenges remain unresolved. To address these limitations, we introduce the Vision Transformer-based Dual-Stream Self-Supervised Pretraining Network (ViT-2SPN), a novel framework designed to enhance feature extraction and improve diagnostic accuracy. ViT-2SPN employs a three-stage workflow: Supervised Pretraining, Self-Supervised Pretraining (SSP), and Supervised Fine-Tuning. The pretraining phase leverages the OCTMNIST dataset (97,477 unlabeled images across four disease classes) with data augmentation to create dual-augmented views. A Vision Transformer (ViT-Base) backbone extracts features, while a negative cosine similarity loss aligns feature representations. Pretraining is conducted over 50 epochs with a learning rate of 0.0001 and momentum of 0.999. Fine-tuning is performed on a stratified 5.129% subset of OCTMNIST using 10-fold cross-validation. ViT-2SPN achieves a mean AUC of 0.93, accuracy of 0.77, precision of 0.81, recall of 0.75, and an F1 score of 0.76, outperforming existing SSP-based methods.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing
Authors:
Yoojin Jang,
Junsu Kim,
Hayeon Kim,
Eun-ki Lee,
Eun-sol Kim,
Seungryul Baek,
Jaejun Yoo
Abstract:
Human-object interaction (HOI) is an essential problem in artificial intelligence (AI) which aims to understand the visual world that involves complex relationships between humans and objects. However, current benchmarks such as HICO-DET face the following limitations: (1) severe class imbalance and (2) varying number of train and test sets for certain classes. These issues can potentially lead to…
▽ More
Human-object interaction (HOI) is an essential problem in artificial intelligence (AI) which aims to understand the visual world that involves complex relationships between humans and objects. However, current benchmarks such as HICO-DET face the following limitations: (1) severe class imbalance and (2) varying number of train and test sets for certain classes. These issues can potentially lead to either inflation or deflation of model performance during evaluation, ultimately undermining the reliability of evaluation scores. In this paper, we propose a systematic approach to develop a new class-balanced dataset, Benchmark Re-evaluation for Integrity in Generalized Human-object Interaction Testing (B-RIGHT), that addresses these imbalanced problems. B-RIGHT achieves class balance by leveraging balancing algorithm and automated generation-and-filtering processes, ensuring an equal number of instances for each HOI class. Furthermore, we design a balanced zero-shot test set to systematically evaluate models on unseen scenario. Re-evaluating existing models using B-RIGHT reveals substantial the reduction of score variance and changes in performance rankings compared to conventional HICO-DET. Our experiments demonstrate that evaluation under balanced conditions ensure more reliable and fair model comparisons.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Self-supervised Graph Transformer with Contrastive Learning for Brain Connectivity Analysis towards Improving Autism Detection
Authors:
Yicheng Leng,
Syed Muhammad Anwar,
Islem Rekik,
Sen He,
Eung-Joo Lee
Abstract:
Functional Magnetic Resonance Imaging (fMRI) provides useful insights into the brain function both during task or rest. Representing fMRI data using correlation matrices is found to be a reliable method of analyzing the inherent connectivity of the brain in the resting and active states. Graph Neural Networks (GNNs) have been widely used for brain network analysis due to their inherent explainabil…
▽ More
Functional Magnetic Resonance Imaging (fMRI) provides useful insights into the brain function both during task or rest. Representing fMRI data using correlation matrices is found to be a reliable method of analyzing the inherent connectivity of the brain in the resting and active states. Graph Neural Networks (GNNs) have been widely used for brain network analysis due to their inherent explainability capability. In this work, we introduce a novel framework using contrastive self-supervised learning graph transformers, incorporating a brain network transformer encoder with random graph alterations. The proposed network leverages both contrastive learning and graph alterations to effectively train the graph transformer for autism detection. Our approach, tested on Autism Brain Imaging Data Exchange (ABIDE) data, demonstrates superior autism detection, achieving an AUROC of 82.6 and an accuracy of 74%, surpassing current state-of-the-art methods.
△ Less
Submitted 18 January, 2025;
originally announced January 2025.
-
A Framework for Mining Collectively-Behaving Bots in MMORPGs
Authors:
Hyunsoo Kim,
Jun Hee Kim,
Jaeman Son,
Jihoon Song,
Eunjo Lee
Abstract:
In MMORPGs (Massively Multiplayer Online Role-Playing Games), abnormal players (bots) using unauthorized automated programs to carry out pre-defined behaviors systematically and repeatedly are commonly observed. Bots usually engage in these activities to gain in-game money, which they eventually trade for real money outside the game. Such abusive activities negatively impact the in-game experience…
▽ More
In MMORPGs (Massively Multiplayer Online Role-Playing Games), abnormal players (bots) using unauthorized automated programs to carry out pre-defined behaviors systematically and repeatedly are commonly observed. Bots usually engage in these activities to gain in-game money, which they eventually trade for real money outside the game. Such abusive activities negatively impact the in-game experiences of legitimate users since bots monopolize specific hunting areas and obtain valuable items. Thus, detecting abnormal players is a significant task for game companies. Motivated by the fact that bots tend to behave collectively with similar in-game trajectories due to the auto-programs, we developed BotTRep, a framework that comprises trajectory representation learning followed by clustering using a completely unlabeled in-game trajectory dataset. Our model aims to learn representations for in-game trajectory sequences so that players with contextually similar trajectories have closer embeddings. Then, by applying DBSCAN to these representations and visualizing the corresponding moving patterns, our framework ultimately assists game masters in identifying and banning bots.
△ Less
Submitted 1 July, 2025; v1 submitted 15 January, 2025;
originally announced January 2025.
-
W3ID: A Quantum Computing-Secure Digital Identity System Redefining Standards for Web3 and Digital Twins
Authors:
Joseph Yun,
Eli Lifton,
Eunseo Lee,
Yohan Yun,
Abigail Song,
Joshua Lee,
Cristian Jimenez-Bert,
Benedict Song,
Yejun Lee,
Alex Seo,
Sijung Yun
Abstract:
The rapid advancements in quantum computing present significant threats to existing encryption standards and internet security. Simultaneously, the advent of Web 3.0 marks a transformative era in internet history, emphasizing enhanced data security, decentralization, and user ownership. This white paper introduces the W3ID, an abbreviation of Web3 standard meeting universal digital ID, which is a…
▽ More
The rapid advancements in quantum computing present significant threats to existing encryption standards and internet security. Simultaneously, the advent of Web 3.0 marks a transformative era in internet history, emphasizing enhanced data security, decentralization, and user ownership. This white paper introduces the W3ID, an abbreviation of Web3 standard meeting universal digital ID, which is a Universal Digital Identity (UDI) model designed to meet Web3 standards while addressing vulnerabilities posed by quantum computing. W3ID innovatively generates secure Digital Object Identifiers (DOIs) tailored for the decentralized Web 3.0 ecosystem. Additionally, W3ID employs a dual-key system for secure authentication, enhancing both public and private verification mechanisms. To further enhance encryption strength and authentication integrity in the quantum computing era, W3ID incorporates an advanced security mechanism. By requiring quadruple application of SHA-256, with consecutive matches for validation, the system expands the number of possibilities to 256^4, which is approximately 4.3 billion times the current SHA-256 capacity. This dramatic increase in computational complexity ensures that even advanced quantum computing systems would face significant challenges in executing brute-force attacks. W3ID redefines digital identity standards for Web 3.0 and the quantum computing era, setting a new benchmark for security, scalability, and decentralization in the global digital twin ecosystem.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Exploiting Domain-Specific Parallel Data on Multilingual Language Models for Low-resource Language Translation
Authors:
Surangika Ranathungaa,
Shravan Nayak,
Shih-Ting Cindy Huang,
Yanke Mao,
Tong Su,
Yun-Hsiang Ray Chan,
Songchen Yuan,
Anthony Rinaldi,
Annie En-Shiun Lee
Abstract:
Neural Machine Translation (NMT) systems built on multilingual sequence-to-sequence Language Models (msLMs) fail to deliver expected results when the amount of parallel data for a language, as well as the language's representation in the model are limited. This restricts the capabilities of domain-specific NMT systems for low-resource languages (LRLs). As a solution, parallel data from auxiliary d…
▽ More
Neural Machine Translation (NMT) systems built on multilingual sequence-to-sequence Language Models (msLMs) fail to deliver expected results when the amount of parallel data for a language, as well as the language's representation in the model are limited. This restricts the capabilities of domain-specific NMT systems for low-resource languages (LRLs). As a solution, parallel data from auxiliary domains can be used either to fine-tune or to further pre-train the msLM. We present an evaluation of the effectiveness of these two techniques in the context of domain-specific LRL-NMT. We also explore the impact of domain divergence on NMT model performance. We recommend several strategies for utilizing auxiliary parallel data in building domain-specific NMT models for LRLs.
△ Less
Submitted 27 December, 2024;
originally announced December 2024.
-
Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation
Authors:
Changsun Lee,
Sangjoon Park,
Cheong-Il Shin,
Woo Hee Choi,
Hyun Jeong Park,
Jeong Eun Lee,
Jong Chul Ye
Abstract:
Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric feat…
▽ More
Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric features. Such process introduces overly correlated representations along the z-axis that neglect slice-specific clinical details, particularly for 3D medical images where adjacent slices have low redundancy. To address this limitation, we introduce MS-VLM that mimic radiologists' workflow in 3D medical image interpretation. Specifically, radiologists analyze 3D medical images by examining individual slices sequentially and synthesizing information across slices and views. Likewise, MS-VLM leverages self-supervised 2D transformer encoders to learn a volumetric representation that capture inter-slice dependencies from a sequence of slice-specific features. Unbound by sub-volumetric patchification, MS-VLM is capable of obtaining useful volumetric representations from 3D medical images with any slice length and from multiple images acquired from different planes and phases. We evaluate MS-VLM on publicly available chest CT dataset CT-RATE and in-house rectal MRI dataset. In both scenarios, MS-VLM surpasses existing methods in radiology report generation, producing more coherent and clinically relevant reports. These findings highlight the potential of MS-VLM to advance 3D medical image interpretation and improve the robustness of medical VLMs.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
Authors:
LG AI Research,
Soyoung An,
Kyunghoon Bae,
Eunbi Choi,
Kibong Choi,
Stanley Jungkyu Choi,
Seokhee Hong,
Junwon Hwang,
Hyojin Jeon,
Gerrard Jeongwon Jo,
Hyunjik Jo,
Jiyeon Jung,
Yountae Jung,
Hyosang Kim,
Joonkee Kim,
Seonghwan Kim,
Soyeon Kim,
Sunkyoung Kim,
Yireun Kim,
Yongil Kim,
Youchul Kim,
Edward Hwayoung Lee,
Haeju Lee,
Honglak Lee,
Jinsik Lee
, et al. (8 additional authors not shown)
Abstract:
This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) ou…
▽ More
This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) outstanding long-context comprehension, attaining the top performance in four benchmarks, and 3) competitive results compared to state-of-the-art open models of similar sizes across nine general benchmarks. The EXAONE 3.5 language models are open to anyone for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE. For commercial use, please reach out to the official contact point of LG AI Research: [email protected].
△ Less
Submitted 9 December, 2024; v1 submitted 6 December, 2024;
originally announced December 2024.