Skip to main content

Showing 1–50 of 2,431 results for author: Lee, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09978  [pdf, ps, other

    cs.IT eess.SP

    Low-Complexity Decoding for Low-Rate Block Codes of Short Length Based on Concatenated Coding Structure

    Authors: Mao-Chao Lin, Shih-Kai Lee, Pin Lin, Ching-Chang Lin, Chia-Chun Chen, Teng-Yuan Syu, Huang-Chang Lee

    Abstract: To decode a short linear block code, ordered statics decoding (OSD) and/or the $A^*$ decoding are usually considered. Either OSD or the $A^*$ decoding utilizes the magnitudes of the received symbols to establish the most reliable and independent positions (MRIP) frame. A restricted searched space can be employed to achieve near-optimum decoding with reduced decoding complexity. For a low-rate code… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09662  [pdf

    cs.CL

    Large Language Models Are More Persuasive Than Incentivized Human Persuaders

    Authors: Philipp Schoenegger, Francesco Salvi, Jiacheng Liu, Xiaoli Nan, Ramit Debnath, Barbara Fasolo, Evelina Leivada, Gabriel Recchia, Fritz Günther, Ali Zarifhonarvar, Joe Kwon, Zahoor Ul Islam, Marco Dehnert, Daryl Y. H. Lee, Madeline G. Reinecke, David G. Kamper, Mert Kobaş, Adam Sandford, Jonas Kgomo, Luke Hewitt, Shreya Kapoor, Kerem Oktar, Eyup Engin Kucuk, Bo Feng, Cameron R. Jones , et al. (15 additional authors not shown)

    Abstract: We directly compare the persuasion capabilities of a frontier large language model (LLM; Claude Sonnet 3.5) against incentivized human persuaders in an interactive, real-time conversational quiz setting. In this preregistered, large-scale incentivized experiment, participants (quiz takers) completed an online quiz where persuaders (either humans or LLMs) attempted to persuade quiz takers toward co… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    ACM Class: I.2.7; H.1.2; K.4.1; H.5.2

  3. arXiv:2505.09539  [pdf, ps, other

    cs.IR

    GlobalMood: A cross-cultural benchmark for music emotion recognition

    Authors: Harin Lee, Elif Çelen, Peter Harrison, Manuel Anglada-Tort, Pol van Rijn, Minsu Park, Marc Schönwiesner, Nori Jacoby

    Abstract: Human annotations of mood in music are essential for music generation and recommender systems. However, existing datasets predominantly focus on Western songs with mood terms derived from English, which may limit generalizability across diverse linguistic and cultural backgrounds. To address this, we introduce `GlobalMood', a novel cross-cultural benchmark dataset comprising 1,180 songs sampled fr… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09489  [pdf, ps, other

    cs.CE

    Radon Exposure Dataset

    Authors: Dakotah Maguire, Jeremy Logan, Heechan Lee, Heidi Hanson

    Abstract: Exposure to elevated radon levels in the home is one of the leading causes of lung cancer in the world. The following study describes the creation of a comprehensive, state-level dataset designed to enable the modeling and prediction of household radon concentrations at Zip Code Tabulation Area (ZCTA) and sub-kilometer scales. Details include the data collection and processing involved in compilin… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 7 pages, 2 tables

    ACM Class: J.2; J.3

  5. arXiv:2505.08854  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Generative AI for Autonomous Driving: Frontiers and Opportunities

    Authors: Yuping Wang, Shuo Xing, Cui Can, Renjie Li, Hongyuan Hua, Kexin Tian, Zhaobin Mo, Xiangbo Gao, Keshu Wu, Sulong Zhou, Hengxu You, Juntong Peng, Junge Zhang, Zehao Wang, Rui Song, Mingxuan Yan, Walter Zimmer, Xingcheng Zhou, Peiran Li, Zhaohan Lu, Chia-Ju Chen, Yue Huang, Ryan A. Rossi, Lichao Sun, Hongkai Yu , et al. (22 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This revolutionary force offers the most promising path yet toward solving one of engineering's grandest challenges: achieving reliable, fully autonomous driving, partic… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  6. arXiv:2505.08025  [pdf, other

    cs.RO cs.AI

    PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints

    Authors: Hannah Lee, Zachary Serlin, James Motes, Brendan Long, Marco Morales, Nancy M. Amato

    Abstract: We introduce PRISM (Pathfinding with Rapid Information Sharing using Motion Constraints), a decentralized algorithm designed to address the multi-task multi-agent pathfinding (MT-MAPF) problem. PRISM enables large teams of agents to concurrently plan safe and efficient paths for multiple tasks while avoiding collisions. It employs a rapid communication strategy that uses information packets to exc… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 38 pages, 8 figures

  7. arXiv:2505.05502  [pdf, ps, other

    math.OC cs.RO eess.SY

    Constraint Selection in Optimization-Based Controllers

    Authors: Haejoon Lee, Panagiotis Rousseas, Dimitra Panagou

    Abstract: Human-machine collaboration often involves constrained optimization problems for decision-making processes. However, when the machine is a dynamical system with a continuously evolving state, infeasibility due to multiple conflicting constraints can lead to dangerous outcomes. In this work, we propose a heuristic-based method that resolves infeasibility at every time step by selectively disregardi… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Submitted to IEEE Control Systems Letters (L-CSS)

  8. arXiv:2505.03896  [pdf, other

    cs.CV cs.AI

    Novel Extraction of Discriminative Fine-Grained Feature to Improve Retinal Vessel Segmentation

    Authors: Shuang Zeng, Chee Hong Lee, Micky C Nnamdi, Wenqi Shi, J Ben Tamo, Lei Zhu, Hangzhou He, Xinliang Zhang, Qian Chen, May D. Wang, Yanye Lu, Qiushi Ren

    Abstract: Retinal vessel segmentation is a vital early detection method for several severe ocular diseases. Despite significant progress in retinal vessel segmentation with the advancement of Neural Networks, there are still challenges to overcome. Specifically, retinal vessel segmentation aims to predict the class label for every pixel within a fundus image, with a primary focus on intra-image discriminati… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  9. arXiv:2505.03799  [pdf, other

    cs.LG cs.AI cs.CL

    Scalability Matters: Overcoming Challenges in InstructGLM with Similarity-Degree-Based Sampling

    Authors: Hyun Lee, Chris Yi, Maminur Islam, B. D. S. Aritra

    Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in various natural language processing tasks; however, their application to graph-related problems remains limited, primarily due to scalability constraints and the absence of dedicated mechanisms for processing graph structures. Existing approaches predominantly integrate LLMs with Graph Neural Networks (GNNs), using GNNs as featu… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: To be published in International Joint Conference on Neural Networks (IJCNN), 2025

  10. arXiv:2505.03777  [pdf, other

    cs.LG

    MolMole: Molecule Mining from Scientific Literature

    Authors: LG AI Research, Sehyun Chun, Jiye Kim, Ahra Jo, Yeonsik Jo, Seungyul Oh, Seungjun Lee, Kwangrok Ryoo, Jongmin Lee, Seung Hwan Kim, Byung Jun Kang, Soonyoung Lee, Jun Ha Park, Chanwoo Moon, Jiwon Ham, Haein Lee, Heejae Han, Jaeseung Byun, Soojong Do, Minju Ha, Dongyun Kim, Kyunghoon Bae, Woohyung Lim, Edward Hwayoung Lee, Yongmin Park , et al. (9 additional authors not shown)

    Abstract: The extraction of molecular structures and reaction data from scientific documents is challenging due to their varied, unstructured chemical formats and complex document layouts. To address this, we introduce MolMole, a vision-based deep learning framework that unifies molecule detection, reaction diagram parsing, and optical chemical structure recognition (OCSR) into a single pipeline for automat… ▽ More

    Submitted 7 May, 2025; v1 submitted 30 April, 2025; originally announced May 2025.

    Comments: 15 pages, 12 figures

  11. Compensating Spatiotemporally Inconsistent Observations for Online Dynamic 3D Gaussian Splatting

    Authors: Youngsik Yun, Jeongmin Bae, Hyunseung Son, Seoha Kim, Hahyun Lee, Gun Bang, Youngjung Uh

    Abstract: Online reconstruction of dynamic scenes is significant as it enables learning scenes from live-streaming video inputs, while existing offline dynamic reconstruction methods rely on recorded video inputs. However, previous online reconstruction approaches have primarily focused on efficiency and rendering quality, overlooking the temporal consistency of their results, which often contain noticeable… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: SIGGRAPH 2025, Project page: https://bbangsik13.github.io/OR2

  12. arXiv:2505.00855  [pdf, ps, other

    cs.HC

    Beyond the Mirror: Personal Analytics through Visual Juxtaposition with Other People's Data

    Authors: Sungbok Shin, Sunghyo Chung, Hyeon Jeon, Hyunwook Lee, Minje Choi, Taehun Kim, Jaehoon Choi, Sungahn Ko, Jaegul Choo

    Abstract: An individual's data can reveal facets of behavior and identity, but its interpretation is context dependent. We can easily identify various self-tracking applications that help people reflect on their lives. However, self-tracking confined to one person's data source may fall short in terms of objectiveness, and insights coming from various perspectives. To address this, we examine how those inte… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Submitted to IEEE VIS2025 Short Paper

  13. arXiv:2505.00684  [pdf, other

    cs.CV cs.AI cs.LG

    Visual Test-time Scaling for GUI Agent Grounding

    Authors: Tiange Luo, Lajanugen Logeswaran, Justin Johnson, Honglak Lee

    Abstract: We introduce RegionFocus, a visual test-time scaling approach for Vision Language Model Agents. Understanding webpages is challenging due to the visual complexity of GUI images and the large number of interface elements, making accurate action selection difficult. Our approach dynamically zooms in on relevant regions, reducing background clutter and improving grounding accuracy. To support this pr… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  14. arXiv:2505.00468  [pdf, other

    cs.CE

    Evaluation of Thermal Control Based on Spatial Thermal Comfort with Reconstructed Environmental Data

    Authors: Youngkyu Kim, Byounghyun Yoo, Ji Young Yun, Hyeokmin Lee, Sehyeon Park, Jin Woo Moon, Eun Ji Choi

    Abstract: Achieving thermal comfort while maintaining energy efficiency is a critical objective in building system control. Conventional thermal comfort models, such as the Predicted Mean Vote (PMV), rely on both environmental and personal variables. However, the use of fixed-location sensors limits the ability to capture spatial variability, which reduces the accuracy of occupant-specific comfort estimatio… ▽ More

    Submitted 4 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  15. arXiv:2505.00133  [pdf, other

    eess.IV cs.CV

    Efficient and robust 3D blind harmonization for large domain gaps

    Authors: Hwihun Jeong, Hayeon Lee, Se Young Chun, Jongho Lee

    Abstract: Blind harmonization has emerged as a promising technique for MR image harmonization to achieve scale-invariant representations, requiring only target domain data (i.e., no source domain data necessary). However, existing methods face limitations such as inter-slice heterogeneity in 3D, moderate image quality, and limited performance for a large domain gap. To address these challenges, we introduce… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  16. arXiv:2505.00023  [pdf, other

    cs.CL cs.AI

    CORG: Generating Answers from Complex, Interrelated Contexts

    Authors: Hyunji Lee, Franck Dernoncourt, Trung Bui, Seunghyun Yoon

    Abstract: In a real-world corpus, knowledge frequently recurs across documents but often contains inconsistencies due to ambiguous naming, outdated information, or errors, leading to complex interrelationships between contexts. Previous research has shown that language models struggle with these complexities, typically focusing on single factors in isolation. We classify these relationships into four types:… ▽ More

    Submitted 24 April, 2025; originally announced May 2025.

    Comments: published at Findings of NAACL 2025

  17. arXiv:2504.17219  [pdf, other

    cs.LG cs.AI cs.CR

    Enhancing Variational Autoencoders with Smooth Robust Latent Encoding

    Authors: Hyomin Lee, Minseon Kim, Sangwon Jang, Jongheon Jeong, Sung Ju Hwang

    Abstract: Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models, as in Stable Diffusion, yet questions regarding their robustness remain largely underexplored. Although adversarial training has been an established technique for enhancing robustness in predictive models, it has been overlooked for generative models due to concerns about potential fidelity degr… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: Under review

  18. arXiv:2504.16828  [pdf, other

    cs.LG cs.AI cs.CL

    Process Reward Models That Think

    Authors: Muhammad Khalifa, Rishabh Agarwal, Lajanugen Logeswaran, Jaekyeom Kim, Hao Peng, Moontae Lee, Honglak Lee, Lu Wang

    Abstract: Step-by-step verifiers -- also known as process reward models (PRMs) -- are a key ingredient for test-time scaling. PRMs require step-level supervision, making them expensive to train. This work aims to build data-efficient PRMs as verbalized step-wise reward models that verify every step in the solution by generating a verification chain-of-thought (CoT). We propose ThinkPRM, a long CoT verifier… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  19. arXiv:2504.15723  [pdf, other

    cs.CV

    Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models

    Authors: Dasol Jeong, Donggoo Kang, Jiwon Park, Hyebean Lee, Joonki Paik

    Abstract: We propose a diffusion-based framework for zero-shot image editing that unifies text-guided and reference-guided approaches without requiring fine-tuning. Our method leverages diffusion inversion and timestep-specific null-text embeddings to preserve the structural integrity of the source image. By introducing a stage-wise latent injection strategy-shape injection in early steps and attribute inje… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  20. arXiv:2504.15251  [pdf, other

    cs.LG cs.DS math.ST stat.ML

    On Learning Parallel Pancakes with Mostly Uniform Weights

    Authors: Ilias Diakonikolas, Daniel M. Kane, Sushrut Karmalkar, Jasper C. H. Lee, Thanasis Pittas

    Abstract: We study the complexity of learning $k$-mixtures of Gaussians ($k$-GMMs) on $\mathbb{R}^d$. This task is known to have complexity $d^{Ω(k)}$ in full generality. To circumvent this exponential lower bound on the number of components, research has focused on learning families of GMMs satisfying additional structural properties. A natural assumption posits that the component weights are not exponenti… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  21. arXiv:2504.15147  [pdf, other

    cs.NE

    The Iterative Chainlet Partitioning Algorithm for the Traveling Salesman Problem with Drone and Neural Acceleration

    Authors: Jae Hyeok Lee, Minjun Kim, Jinkyoo Park, Changhyun Kwon

    Abstract: This study introduces the Iterative Chainlet Partitioning (ICP) algorithm and its neural acceleration for solving the Traveling Salesman Problem with Drone (TSP-D). The proposed ICP algorithm decomposes a TSP-D solution into smaller segments called chainlets, each optimized individually by a dynamic programming subroutine. The chainlet with the highest improvement is updated and the procedure is r… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  22. arXiv:2504.14582  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Kai Liu, Jue Gong, Jingkai Wang, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Xiangyu Kong, Xiaoxuan Yu, Hyunhee Park, Suejin Han, Hakjae Jeon, Dafeng Zhang, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Lu Zhao, Yuyi Zhang, Pengyu Yan, Jiawei Hu, Pengwei Liu, Fengjun Guo, Hongyuan Yu , et al. (86 additional authors not shown)

    Abstract: This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025. The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective network designs or solutions that ach… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_ImageSR_x4

  23. arXiv:2504.13560  [pdf, other

    cs.CV cs.AI

    Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation

    Authors: SoYoung Park, Hyewon Lee, Mingyu Choi, Seunghoon Han, Jong-Ryul Lee, Sungsu Lim, Tae-Ho Kim

    Abstract: Anomaly segmentation is essential for industrial quality, maintenance, and stability. Existing text-guided zero-shot anomaly segmentation models are effective but rely on fixed prompts, limiting adaptability in diverse industrial scenarios. This highlights the need for flexible, context-aware prompting strategies. We propose Image-Aware Prompt Anomaly Segmentation (IAP-AS), which enhances anomaly… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: Accepted to PAKDD 2025, 12 pages

  24. arXiv:2504.13169  [pdf, other

    cs.CV

    Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

    Authors: Tsung-Han Wu, Heekyung Lee, Jiaxin Ge, Joseph E. Gonzalez, Trevor Darrell, David M. Chan

    Abstract: Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations, where they generate descriptions of nonexistent objects, actions, or concepts, posing significant risks in safety-critical applications. Existing hallucination mitigation methods typically follow one of two paradigms: generation adjustment, which modifies decoding behavior to align text with vi… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Preprint. Project Page: https://reverse-vlm.github.io

  25. arXiv:2504.12082  [pdf, other

    cs.CL cs.AI

    Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection

    Authors: Yumin Kim, Hwanhee Lee

    Abstract: Hate speech detection is a crucial area of research in natural language processing, essential for ensuring online community safety. However, detecting implicit hate speech, where harmful intent is conveyed in subtle or indirect ways, remains a major challenge. Unlike explicit hate speech, implicit expressions often depend on context, cultural subtleties, and hidden biases, making them more challen… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  26. arXiv:2504.11780  [pdf, other

    cs.SE cs.AI

    Agile Retrospectives: What went well? What didn't go well? What should we do?

    Authors: Maria Spichkova, Hina Lee, Kevin Iwan, Madeleine Zwart, Yuwon Yoon, Xiaohan Qin

    Abstract: In Agile/Scrum software development, the idea of retrospective meetings (retros) is one of the core elements of the project process. In this paper, we present our work in progress focusing on two aspects: analysis of potential usage of generative AI for information interaction within retrospective meetings, and visualisation of retros' information to software development teams. We also present our… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: Preprint. Accepted to the 20th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2025). Final version to be published by SCITEPRESS, http://www.scitepress.org

  27. arXiv:2504.11765  [pdf, other

    cs.AI

    Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs

    Authors: Hyungwoo Lee, Kihyun Kim, Jinwoo Kim, Jungmin So, Myung-Hoon Cha, Hong-Yeon Kim, James J. Kim, Youngjae Kim

    Abstract: Recent large language models (LLMs) face increasing inference latency as input context length and model size continue to grow. In particular, the retrieval-augmented generation (RAG) technique, which enhances LLM responses by incorporating external knowledge, exacerbates this issue by significantly increasing the number of input tokens. This expansion in token length leads to a substantial rise in… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  28. arXiv:2504.11673  [pdf, other

    cs.CL

    Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

    Authors: Minwoo Kang, Suhong Moon, Seung Hyeong Lee, Ayush Raj, Joseph Suh, David M. Chan

    Abstract: Large language models (LLMs) are increasingly capable of simulating human behavior, offering cost-effective ways to estimate user responses during the early phases of survey design. While previous studies have examined whether models can reflect individual opinions or attitudes, we argue that a \emph{higher-order} binding of virtual personas requires successfully approximating not only the opinion… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  29. arXiv:2504.11019  [pdf, other

    cs.CV

    DRIFT open dataset: A drone-derived intelligence for traffic analysis in urban environment

    Authors: Hyejin Lee, Seokjun Hong, Jeonghoon Song, Haechan Cho, Zhixiong Jin, Byeonghun Kim, Joobin Jin, Jaegyun Im, Byeongjoon Noh, Hwasoo Yeo

    Abstract: Reliable traffic data are essential for understanding urban mobility and developing effective traffic management strategies. This study introduces the DRone-derived Intelligence For Traffic analysis (DRIFT) dataset, a large-scale urban traffic dataset collected systematically from synchronized drone videos at approximately 250 meters altitude, covering nine interconnected intersections in Daejeon,… ▽ More

    Submitted 25 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: 30 pages, 15 figures

    ACM Class: I.2.10; I.4.8; H.2.8; J.7

  30. arXiv:2504.10714  [pdf, other

    cs.HC

    Playing to Pay: Interplay of Monetization and Retention Strategies in Korean Mobile Gaming

    Authors: HwiJoon Lee, Kashif Imteyaz, Saiph Savage

    Abstract: Mobile gaming's global growth has introduced evolving monetization strategies, such as in app purchases and ads, designed to boost revenue while maintaining player engagement. However, there is limited understanding of the scope and frequency of these strategies, particularly in mature markets like South Korea. To address this research gap, this study examines the monetization strategies used in t… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  31. arXiv:2504.10428  [pdf, other

    stat.ML cs.DS cs.LG math.ST

    Learning with Positive and Imperfect Unlabeled Data

    Authors: Jane H. Lee, Anay Mehrotra, Manolis Zampetakis

    Abstract: We study the problem of learning binary classifiers from positive and unlabeled data when the unlabeled data distribution is shifted, which we call Positive and Imperfect Unlabeled (PIU) Learning. In the absence of covariate shifts, i.e., with perfect unlabeled data, Denis (1998) reduced this problem to learning under Massart noise; however, that reduction fails under even slight shifts. Our mai… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  32. arXiv:2504.09702  [pdf, other

    cs.AI

    MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?

    Authors: Yunxiang Zhang, Muhammad Khalifa, Shitanshu Bhushan, Grant D Murphy, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang

    Abstract: Existing evaluation of large language model (LLM) agents on scientific discovery lacks objective baselines and metrics to assess the viability of their proposed methods. To address this issue, we introduce MLRC-Bench, a benchmark designed to quantify how effectively language agents can tackle challenging Machine Learning (ML) Research Competitions. Our benchmark highlights open research problems t… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  33. arXiv:2504.09435  [pdf, other

    cs.HC

    Design Probes for AI-Driven AAC: Addressing Complex Communication Needs in Aphasia

    Authors: Lei Mao, Jong Ho Lee, Yasmeen Faroqi Shah, Stephanie Valencia

    Abstract: AI offers key advantages such as instant generation, multi-modal support, and personalized adaptability - potential that can address the highly heterogeneous communication barriers faced by people with aphasia (PWAs). We designed AI-enhanced communication tools and used them as design probes to explore how AI's real-time processing and generation capabilities - across text, image, and audio - can… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  34. A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7

    Authors: Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter R. Wurman

    Abstract: Deep reinforcement learning has achieved superhuman racing performance in high-fidelity simulators like Gran Turismo 7 (GT7). It typically utilizes global features that require instrumentation external to a car, such as precise localization of agents and opponents, limiting real-world applicability. To address this limitation, we introduce a vision-based autonomous racing agent that relies solely… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: Accepted for Publication at the IEEE Robotics and Automation Letters (RA-L) 2025

  35. arXiv:2504.08601  [pdf, other

    cs.RO eess.SY

    Enabling Safety for Aerial Robots: Planning and Control Architectures

    Authors: Kaleb Ben Naveed, Devansh R. Agrawal, Daniel M. Cherenson, Haejoon Lee, Alia Gilbert, Hardik Parwana, Vishnu S. Chipade, William Bentz, Dimitra Panagou

    Abstract: Ensuring safe autonomy is crucial for deploying aerial robots in real-world applications. However, safety is a multifaceted challenge that must be addressed from multiple perspectives, including navigation in dynamic environments, operation under resource constraints, and robustness against adversarial attacks and uncertainties. In this paper, we present the authors' recent work that tackles some… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 2025 ICRA Workshop on 25 years of Aerial Robotics: Challenges and Opportunities

  36. arXiv:2504.08528  [pdf, other

    cs.CL cs.SD eess.AS

    On The Landscape of Spoken Language Models: A Comprehensive Survey

    Authors: Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, Yifan Peng, Haibin Wu, Yossi Adi, Emmanuel Dupoux, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

    Abstract: The field of spoken language processing is undergoing a shift from training custom-built, task-specific models toward using and optimizing spoken language models (SLMs) which act as universal speech processing systems. This trend is similar to the progression toward universal language models that has taken place in the field of (text) natural language processing. SLMs include both "pure" language… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  37. arXiv:2504.07053  [pdf, other

    cs.CL cs.SD eess.AS

    TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling

    Authors: Liang-Hsuan Tseng, Yi-Chang Chen, Kuan-Yi Lee, Da-Shan Shiu, Hung-yi Lee

    Abstract: Large Language Models (LLMs) excel in text-based natural language processing tasks but remain constrained by their reliance on textual inputs and outputs. To enable more natural human-LLM interaction, recent progress have focused on deriving a spoken language model (SLM) that can not only listen but also generate speech. To achieve this, a promising direction is to conduct speech-text joint modeli… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Preprint. Work in progress

  38. arXiv:2504.06827  [pdf, other

    cs.CV

    IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments

    Authors: Can Zhang, Gim Hee Lee

    Abstract: This work presents IAAO, a novel framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. Unlike prior methods that rely on task-specific networks and assumptions about movable parts, our IAAO leverages large foundation models to estimate interactive affordances and part articulations in three stages. W… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  39. arXiv:2504.06398  [pdf, other

    cs.LG

    Sharpness-Aware Parameter Selection for Machine Unlearning

    Authors: Saber Malekmohammadi, Hong kyu Lee, Li Xiong

    Abstract: It often happens that some sensitive personal information, such as credit card numbers or passwords, are mistakenly incorporated in the training of machine learning models and need to be removed afterwards. The removal of such information from a trained model is a complex task that needs to partially reverse the training process. There have been various machine unlearning techniques proposed in th… ▽ More

    Submitted 24 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  40. arXiv:2504.06003  [pdf, other

    cs.CV

    econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians

    Authors: Can Zhang, Gim Hee Lee

    Abstract: The primary focus of most recent works on open-vocabulary neural fields is extracting precise semantic features from the VLMs and then consolidating them efficiently into a multi-view consistent 3D neural fields representation. However, most existing works over-trusted SAM to regularize image-level CLIP without any further refinement. Moreover, several existing works improved efficiency by dimensi… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  41. arXiv:2504.03120  [pdf, other

    eess.SY cs.RO

    Distributed Resilience-Aware Control in Multi-Robot Networks

    Authors: Haejoon Lee, Dimitra Panagou

    Abstract: Ensuring resilient consensus in multi-robot systems with misbehaving agents remains a challenge, as many existing network resilience properties are inherently combinatorial and globally defined. While previous works have proposed control laws to enhance or preserve resilience in multi-robot networks, they often assume a fixed topology with known resilience properties, or require global state knowl… ▽ More

    Submitted 10 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: Submitted to 2025 IEEE Conference on Decision and Control (CDC)

  42. arXiv:2504.02214  [pdf, other

    cs.CV eess.IV

    Geospatial Artificial Intelligence for Satellite-based Flood Extent Mapping: Concepts, Advances, and Future Perspectives

    Authors: Hyunho Lee, Wenwen Li

    Abstract: Geospatial Artificial Intelligence (GeoAI) for satellite-based flood extent mapping systematically integrates artificial intelligence techniques with satellite data to identify flood events and assess their impacts, for disaster management and spatial decision-making. The primary output often includes flood extent maps, which delineate the affected areas, along with additional analytical outputs s… ▽ More

    Submitted 8 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

    Comments: 10 pages, 5 figures

  43. arXiv:2504.02008  [pdf, other

    q-bio.QM cs.AI

    Test-time Adaptation for Foundation Medical Segmentation Model without Parametric Updates

    Authors: Kecheng Chen, Xinyu Luo, Tiexin Qin, Jie Liu, Hui Liu, Victor Ho Fun Lee, Hong Yan, Haoliang Li

    Abstract: Foundation medical segmentation models, with MedSAM being the most popular, have achieved promising performance across organs and lesions. However, MedSAM still suffers from compromised performance on specific lesions with intricate structures and appearance, as well as bounding box prompt-induced perturbations. Although current test-time adaptation (TTA) methods for medical image segmentation may… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Under review

  44. arXiv:2504.01690  [pdf, other

    cs.SD cs.AI eess.AS

    Token Pruning in Audio Transformers: Optimizing Performance and Decoding Patch Importance

    Authors: Taehan Lee, Hyukjun Lee

    Abstract: Vision Transformers (ViTs) have achieved state-of-the-art performance across various computer vision tasks, but their high computational cost remains a challenge. Token pruning has been proposed to reduce this cost by selectively removing less important tokens. While effective in vision tasks by discarding non-object regions, applying this technique to audio tasks presents unique challenges, as di… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: This work has been submitted to the IEEE for possible publication. Source code is available at https://github.com/andylee-24/token-pruning-audio-transformer

  45. Robust Transmission Design for Active RIS-Aided Systems

    Authors: Jinho Yang, Hyeongtaek Lee, Junil Choi

    Abstract: Different from conventional passive reconfigurable intelligent surfaces (RISs), incident signals and thermal noise can be amplified at active RISs. By exploiting the amplifying capability of active RISs, noticeable performance improvement can be expected when precise channel state information (CSI) is available. Since obtaining perfect CSI related to an RIS is difficult in practice, a robust trans… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: 6 pages, 4 figures, accepted to IEEE Transactions on Vehicular Technology

  46. arXiv:2503.24306  [pdf, other

    cs.CV

    Point Tracking in Surgery--The 2024 Surgical Tattoos in Infrared (STIR) Challenge

    Authors: Adam Schmidt, Mert Asim Karaoglu, Soham Sinha, Mingang Jang, Ho-Gun Ha, Kyungmin Jung, Kyeongmo Gu, Ihsan Ullah, Hyunki Lee, Jonáš Šerých, Michal Neoral, Jiří Matas, Rulin Zhou, Wenlong He, An Wang, Hongliang Ren, Bruno Silva, Sandro Queirós, Estêvão Lima, João L. Vilaça, Shunsuke Kikuchi, Atsushi Kouno, Hiroki Matsuzaki, Tongtong Li, Yulu Chen , et al. (15 additional authors not shown)

    Abstract: Understanding tissue motion in surgery is crucial to enable applications in downstream tasks such as segmentation, 3D reconstruction, virtual tissue landmarking, autonomous probe-based scanning, and subtask autonomy. Labeled data are essential to enabling algorithms in these downstream tasks since they allow us to quantify and train algorithms. This paper introduces a point tracking challenge to a… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  47. arXiv:2503.24210  [pdf, other

    cs.CV cs.AI cs.MM

    DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting

    Authors: Seungjun Lee, Gim Hee Lee

    Abstract: Reconstructing sharp 3D representations from blurry multi-view images are long-standing problem in computer vision. Recent works attempt to enhance high-quality novel view synthesis from the motion blur by leveraging event-based cameras, benefiting from high dynamic range and microsecond temporal resolution. However, they often reach sub-optimal visual quality in either restoring inaccurate color… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: CVPR 2025. Project Page: https://diet-gs.github.io

  48. arXiv:2503.23764  [pdf, other

    cs.CV cs.AI

    WaveFormer: A 3D Transformer with Wavelet-Driven Feature Representation for Efficient Medical Image Segmentation

    Authors: Md Mahfuz Al Hasan, Mahdi Zaman, Abdul Jawad, Alberto Santamaria-Pang, Ho Hin Lee, Ivan Tarapov, Kyle See, Md Shah Imran, Antika Roy, Yaser Pourmohammadi Fallah, Navid Asadizanjani, Reza Forghani

    Abstract: Transformer-based architectures have advanced medical image analysis by effectively modeling long-range dependencies, yet they often struggle in 3D settings due to substantial memory overhead and insufficient capture of fine-grained local features. We address these limitations with WaveFormer, a novel 3D-transformer that: i) leverages the fundamental frequency-domain properties of features for con… ▽ More

    Submitted 31 March, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  49. arXiv:2503.23430  [pdf, other

    stat.ML cs.LG math.OC stat.AP

    DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization

    Authors: Youngjun Song, Youngsik Hwang, Jonghun Lee, Heechang Lee, Dong-Young Lim

    Abstract: Domain generalization (DG) aims to learn models that can generalize well to unseen domains by training only on a set of source domains. Sharpness-Aware Minimization (SAM) has been a popular approach for this, aiming to find flat minima in the total loss landscape. However, we show that minimizing the total loss sharpness does not guarantee sharpness across individual domains. In particular, SAM ca… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

  50. arXiv:2503.23228  [pdf, other

    eess.SY cs.RO

    Energy-Aware Lane Planning for Connected Electric Vehicles in Urban Traffic: Design and Vehicle-in-the-Loop Validation

    Authors: Hansung Kim, Eric Yongkeun Choi, Eunhyek Joa, Hotae Lee, Linda Lim, Scott Moura, Francesco Borrelli

    Abstract: Urban driving with connected and automated vehicles (CAVs) offers potential for energy savings, yet most eco-driving strategies focus solely on longitudinal speed control within a single lane. This neglects the significant impact of lateral decisions, such as lane changes, on overall energy efficiency, especially in environments with traffic signals and heterogeneous traffic flow. To address this… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

    Comments: Submitted to an Invited Session at 2025 IEEE Conference on Decision and Control