Skip to main content

Showing 1–50 of 653 results for author: Yon, J

.
  1. arXiv:2506.13472  [pdf, ps, other

    cs.CL cs.AI

    ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models

    Authors: Junho Yoon, Geom Lee, Donghyeon Jeon, Inho Kang, Seung-Hoon Na

    Abstract: Quantization has been widely studied as an effective technique for reducing the memory requirement of large language models (LLMs), potentially improving the latency time as well. Utilizing the characteristic of rotational invariance of transformer, we propose the rotation-based saliency-aware weight quantization (ROSAQ), which identifies salient channels in the projection feature space, not in th… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 10 pages, 2 figures

  2. arXiv:2506.12802  [pdf, ps, other

    cs.CR

    Bidirectional Biometric Authentication Using Transciphering and (T)FHE

    Authors: Joon Soo Yoo, Tae Min Ahn, Ji Won Yoon

    Abstract: Biometric authentication systems pose privacy risks, as leaked templates such as iris or fingerprints can lead to security breaches. Fully Homomorphic Encryption (FHE) enables secure encrypted evaluation, but its deployment is hindered by large ciphertexts, high key overhead, and limited trust models. We propose the Bidirectional Transciphering Framework (BTF), combining FHE, transciphering, and a… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  3. arXiv:2506.12761  [pdf, ps, other

    cs.CR cs.IR

    Versatile and Fast Location-Based Private Information Retrieval with Fully Homomorphic Encryption over the Torus

    Authors: Joon Soo Yoo, Taeho Kim, Ji Won Yoon

    Abstract: Location-based services often require users to share sensitive locational data, raising privacy concerns due to potential misuse or exploitation by untrusted servers. In response, we present VeLoPIR, a versatile location-based private information retrieval (PIR) system designed to preserve user privacy while enabling efficient and scalable query processing. VeLoPIR introduces three operational mod… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  4. arXiv:2506.11098  [pdf, ps, other

    cs.LG cs.AI

    Debiasing Online Preference Learning via Preference Feature Preservation

    Authors: Dongyoung Kim, Jinsung Yoon, Jinwoo Shin, Jaehyung Kim

    Abstract: Recent preference learning frameworks for large language models (LLMs) simplify human preferences with binary pairwise comparisons and scalar rewards. This simplification could make LLMs' responses biased to mostly preferred features, and would be exacerbated during the iterations of online preference learning steps. To address these challenges, we propose a novel framework coined PFP (Preference… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 20 page, 20 figures

  5. arXiv:2506.09498  [pdf, ps, other

    cs.AI

    Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning

    Authors: Jaesik Yoon, Hyeonseo Cho, Yoshua Bengio, Sungjin Ahn

    Abstract: Diffusion models have recently emerged as a powerful approach for trajectory planning. However, their inherently non-sequential nature limits their effectiveness in long-horizon reasoning tasks at test time. The recently proposed Monte Carlo Tree Diffusion (MCTD) offers a promising solution by combining diffusion with tree-based search, achieving state-of-the-art performance on complex planning pr… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  6. arXiv:2506.08445  [pdf, ps, other

    cs.CR

    GPS Spoofing Attacks on AI-based Navigation Systems with Obstacle Avoidance in UAV

    Authors: Ji Hyuk Jung, Mi Yeon Hong, Ji Won Yoon

    Abstract: Recently, approaches using Deep Reinforcement Learning (DRL) have been proposed to solve UAV navigation systems in complex and unknown environments. However, despite extensive research and attention, systematic studies on various security aspects have not yet been conducted. Therefore, in this paper, we conduct research on security vulnerabilities in DRL-based navigation systems, particularly focu… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  7. arXiv:2506.07177  [pdf, ps, other

    cs.CV cs.AI

    Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

    Authors: Sangwon Jang, Taekyung Ki, Jaehyeong Jo, Jaehong Yoon, Soo Ye Kim, Zhe Lin, Sung Ju Hwang

    Abstract: Advancements in diffusion models have significantly improved video quality, directing attention to fine-grained controllability. However, many existing methods depend on fine-tuning large-scale video models for specific tasks, which becomes increasingly impractical as model sizes continue to grow. In this work, we present Frame Guidance, a training-free guidance for controllable video generation b… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: Project page: https://frame-guidance-video.github.io/

  8. arXiv:2506.06630  [pdf, ps, other

    cs.RO cs.AI

    Active Test-time Vision-Language Navigation

    Authors: Heeju Ko, Sungjune Kim, Gyeongrok Oh, Jeongyoon Yoon, Honglak Lee, Sujin Jang, Seungryong Kim, Sangpil Kim

    Abstract: Vision-Language Navigation (VLN) policies trained on offline datasets often exhibit degraded task performance when deployed in unfamiliar navigation environments at test time, where agents are typically evaluated without access to external interaction or feedback. Entropy minimization has emerged as a practical solution for reducing prediction uncertainty at test time; however, it can suffer from… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  9. arXiv:2506.06275  [pdf, ps, other

    cs.CV cs.CL cs.LG

    Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding

    Authors: Emmanouil Zaranis, António Farinhas, Saul Santos, Beatriz Canaverde, Miguel Moura Ramos, Aditya K Surikuchi, André Viveiros, Baohao Liao, Elena Bueno-Benito, Nithin Sivakumaran, Pavlo Vasylenko, Shoubin Yu, Sonal Sannigrahi, Wafaa Mohammed, Ben Peters, Danae Sánchez Villegas, Elias Stengel-Eskin, Giuseppe Attanasio, Jaehong Yoon, Stella Frank, Alessandro Suglia, Chrysoula Zerva, Desmond Elliott, Mariella Dimiccoli, Mohit Bansal , et al. (6 additional authors not shown)

    Abstract: Despite recent progress in vision-language models (VLMs), holistic understanding of long-form video content remains a significant challenge, partly due to limitations in current benchmarks. Many focus on peripheral, ``needle-in-a-haystack'' details, encouraging context-insensitive retrieval over deep comprehension. Others rely on large-scale, semi-automatically generated questions (often produced… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Under Review

  10. arXiv:2506.06131  [pdf, ps, other

    math.DS math.CA

    Adaptive Cucker-Smale Networks: Limiting Laplacian Time-Varying Dynamics

    Authors: Christian Kuehn, Jaeyoung Yoon

    Abstract: Differences in opinion can be seen as distances between individuals, and such differences do not always vanish over time. In this paper, we propose a modeling framework that captures the formation of opinion clusters, based on extensions of the Cucker Smale and Hegselmann Krause models to a combined adaptive (or co-evolutionary) network. Reducing our model to a singular limit of fast adaptation, w… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  11. arXiv:2506.03525  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning

    Authors: Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal

    Abstract: Recent advances in Chain-of-Thought (CoT) reasoning have improved complex video understanding, but existing methods often struggle to adapt to domain-specific skills (e.g., event detection, spatial relation understanding, emotion understanding) over various video content. To address this, we propose Video-Skill-CoT (a.k.a. Video-SKoT), a framework that automatically constructs and leverages skill-… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Project website: https://video-skill-cot.github.io/

  12. arXiv:2506.00481  [pdf, other

    cs.CL cs.AI

    PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings

    Authors: Junseo Kim, Jongwook Han, Dongmin Choi, Jongwook Yoon, Eun-Ju Lee, Yohan Jo

    Abstract: Visual persuasion, which uses visual elements to influence cognition and behaviors, is crucial in fields such as advertising and political communication. With recent advancements in artificial intelligence, there is growing potential to develop persuasive systems that automatically generate persuasive images tailored to individuals. However, a significant bottleneck in this area is the lack of com… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: ACL 2025 Main. Code and dataset are released at: https://github.com/holi-lab/PVP_Personalized_Visual_Persuasion

  13. arXiv:2505.24211  [pdf, other

    cs.CL

    Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

    Authors: Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu

    Abstract: Any-to-any generative models aim to enable seamless interpretation and generation across multiple modalities within a unified framework, yet their ability to preserve relationships across modalities remains uncertain. Do unified models truly achieve cross-modal coherence, or is this coherence merely perceived? To explore this, we introduce ACON, a dataset of 1,000 images (500 newly contributed) pa… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Journal ref: ACL 2025

  14. arXiv:2505.21876  [pdf, ps, other

    cs.CV cs.AI

    EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance

    Authors: Zun Wang, Jaemin Cho, Jialu Li, Han Lin, Jaehong Yoon, Yue Zhang, Mohit Bansal

    Abstract: Recent approaches on 3D camera control in video diffusion models (VDMs) often create anchor videos to guide diffusion models as a structured prior by rendering from estimated point clouds following annotated camera trajectories. However, errors inherent in point cloud estimation often lead to inaccurate anchor videos. Moreover, the requirement for extensive camera trajectory annotations further in… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Project website: https://zunwang1.github.io/Epic

  15. arXiv:2505.20598  [pdf

    cond-mat.supr-con

    Proximity engineering and interferometric quantification of a non-volatile anomalous phase-shift in zero-field polarity-reversible Josephson diodes

    Authors: Kun-Rok Jeon, Jae-Keun Kim, Jiho Yoon, Jae-Chun Jeon, Hyeon Han, Audrey Cottet, Takis Kontos, Stuart S. P. Parkin

    Abstract: The recent realization of zero-field polarity-reversible supercurrent rectification in proximity-magnetized Rashba(-type) Pt Josephson junctions (JJs)5 promises its practical applications for superconducting logic circuits and cryogenic memories. Here, by substituting the Pt Josephson barrier for either 5d or 4d element proximity layer with different (para-)magnetic susceptibility, spin-orbit coup… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 24 pages, 4 figures, 6 extended data figures

  16. arXiv:2505.18454  [pdf, other

    cs.CL

    Hybrid Latent Reasoning via Reinforcement Learning

    Authors: Zhenrui Yue, Bowen Jin, Huimin Zeng, Honglei Zhuang, Zhen Qin, Jinsung Yoon, Lanyu Shang, Jiawei Han, Dong Wang

    Abstract: Recent advances in large language models (LLMs) have introduced latent reasoning as a promising alternative to autoregressive reasoning. By performing internal computation with hidden states from previous steps, latent reasoning benefit from more informative features rather than sampling a discrete chain-of-thought (CoT) path. Yet latent reasoning approaches are often incompatible with LLMs, as th… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  17. arXiv:2505.15117  [pdf, ps, other

    cs.CL cs.AI cs.IR

    An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents

    Authors: Bowen Jin, Jinsung Yoon, Priyanka Kargupta, Sercan O. Arik, Jiawei Han

    Abstract: Reinforcement learning (RL) has demonstrated strong potential in training large language models (LLMs) capable of complex reasoning for real-world problem solving. More recently, RL has been leveraged to create sophisticated LLM-based search agents that adeptly combine reasoning with search engine use. While the use of RL for training search agents is promising, the optimal design of such agents r… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 22 pages

  18. arXiv:2505.14036  [pdf, ps, other

    cs.LG cs.AI

    Adaptive Cyclic Diffusion for Inference Scaling

    Authors: Gyubin Lee, Truong Nhat Nguyen Bao, Jaesik Yoon, Dongwoo Lee, Minsu Kim, Yoshua Bengio, Sungjin Ahn

    Abstract: Diffusion models have demonstrated strong generative capabilities across domains ranging from image synthesis to complex reasoning tasks. However, most inference-time scaling methods rely on fixed denoising schedules, limiting their ability to allocate computation based on instance difficulty or task-specific demands adaptively. We introduce the challenge of adaptive inference-time scaling-dynamic… ▽ More

    Submitted 25 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  19. arXiv:2505.13577  [pdf, other

    cs.SD cs.AI eess.AS

    VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation

    Authors: Yubin Kim, Taehan Kim, Wonjune Kang, Eugene Park, Joonsik Yoon, Dongjae Lee, Xin Liu, Daniel McDuff, Hyeonhoon Lee, Cynthia Breazeal, Hae Won Park

    Abstract: Vocal health plays a crucial role in peoples' lives, significantly impacting their communicative abilities and interactions. However, despite the global prevalence of voice disorders, many lack access to convenient diagnosis and treatment. This paper introduces VocalAgent, an audio large language model (LLM) to address these challenges through vocal health diagnosis. We leverage Qwen-Audio-Chat fi… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  20. arXiv:2505.11709  [pdf, ps, other

    cs.CV cs.LG cs.RO

    EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

    Authors: Ryan Hoque, Peide Huang, David J. Yoon, Mouli Sivapurapu, Jian Zhang

    Abstract: Imitation learning for manipulation has a well-known data scarcity problem. Unlike natural language and 2D computer vision, there is no Internet-scale corpus of data for dexterous manipulation. One appealing option is egocentric human video, a passively scalable data source. However, existing large-scale datasets such as Ego4D do not have native hand pose annotations and do not focus on object man… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  21. 3D-Fixup: Advancing Photo Editing with 3D Priors

    Authors: Yen-Chi Cheng, Krishna Kumar Singh, Jae Shin Yoon, Alex Schwing, Liangyan Gui, Matheus Gadelha, Paul Guerrero, Nanxuan Zhao

    Abstract: Despite significant advances in modeling image priors via diffusion models, 3D-aware image editing remains challenging, in part because the object is only specified via a single image. To tackle this challenge, we propose 3D-Fixup, a new framework for editing 2D images guided by learned 3D priors. The framework supports difficult editing situations such as object translation and 3D rotation. To ac… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: SIGGRAPH 2025. Project page: https://3dfixup.github.io/

  22. arXiv:2505.08889  [pdf, other

    cs.GR cs.CV

    IntrinsicEdit: Precise generative image manipulation in intrinsic space

    Authors: Linjie Lyu, Valentin Deschaintre, Yannick Hold-Geoffroy, Miloš Hašan, Jae Shin Yoon, Thomas Leimkühler, Christian Theobalt, Iliyan Georgiev

    Abstract: Generative diffusion models have advanced image editing with high-quality results and intuitive interfaces such as prompts and semantic drawing. However, these interfaces lack precise control, and the associated methods typically specialize on a single editing task. We introduce a versatile, generative workflow that operates in an intrinsic-image latent space, enabling semantic, local manipulation… ▽ More

    Submitted 15 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: SIGGRAPH 2025 Journal track

  23. arXiv:2505.05316  [pdf

    physics.app-ph cond-mat.mes-hall

    Size dependence of the properties of synthetic-antiferromagnet-based stochastic magnetic tunnel junctions for probabilistic computing

    Authors: Takuma Kinoshita, Ju-Young Yoon, Nuno Caçoilo, Ryota Mochizuki, Haruna Kaneko, Shun Kanai, Hideo Ohno, Shunsuke Fukami

    Abstract: Stochastic magnetic tunnel junctions (s-MTJs) are core components for spintronics-based probabilistic computing (p-computing), a promising candidate for energy-efficient unconventional computing. To achieve reliable performance under practical conditions, the use of a synthetic antiferromagnetic (SAF) free-layer configuration was proposed due to its enhanced tolerance to magnetic field perturbatio… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 18 pages, 4 figures

  24. arXiv:2505.05026  [pdf, other

    cs.CL cs.LG

    G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness

    Authors: Jaehyun Jeon, Jang Han Yoon, Min Soo Kim, Sumin Shim, Yejin Choi, Hanbin Kim, Youngjae Yu

    Abstract: Evaluating user interface (UI) design effectiveness extends beyond aesthetics to influencing user behavior, a principle central to Design Persuasiveness. A/B testing is the predominant method for determining which UI variations drive higher user engagement, but it is costly and time-consuming. While recent Vision-Language Models (VLMs) can process automated UI analysis, current approaches focus on… ▽ More

    Submitted 9 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 31 pages, 17 figures

  25. arXiv:2505.04163  [pdf, other

    cs.LG cs.IR

    Retrieval Augmented Time Series Forecasting

    Authors: Sungwon Han, Seungeon Lee, Meeyoung Cha, Sercan O Arik, Jinsung Yoon

    Abstract: Time series forecasting uses historical data to predict future trends, leveraging the relationships between past observations and available features. In this paper, we propose RAFT, a retrieval-augmented time series forecasting method to provide sufficient inductive biases and complement the model's learning capacity. When forecasting the subsequent time frames, we directly retrieve historical dat… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  26. FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning

    Authors: Ju Yeon Kang, Ji Won Yoon, Semin Kim, Min Hyun Han, Nam Soo Kim

    Abstract: Recently, fake audio detection has gained significant attention, as advancements in speech synthesis and voice conversion have increased the vulnerability of automatic speaker verification (ASV) systems to spoofing attacks. A key challenge in this task is generalizing models to detect unseen, out-of-distribution (OOD) attacks. Although existing approaches have shown promising results, they inheren… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Accepted at ICASSP 2025

  27. arXiv:2504.12134  [pdf, other

    quant-ph physics.app-ph

    Quantum sensing with arbitrary frequency resolution via correlation measurements

    Authors: Jungbae Yoon, Keyuan Zhong, Guoqing Wang, Boning Li, Donghun Lee, Paola Cappellaro

    Abstract: Achieving high-frequency spectral resolution with quantum sensors, while crucial in fields ranging from physical to biological sciences, is challenging due to their finite coherence time. Here, we introduce a novel protocol that achieves this goal by measuring phase correlations of AC magnetic fields using ensembles of NV centers. Our method extends the sensing dynamic range to frequencies higher… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 20 pages, 3 main figures, 8 additional figures

  28. arXiv:2504.08641  [pdf, other

    cs.CV cs.AI cs.CL

    Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

    Authors: Jialu Li, Shoubin Yu, Han Lin, Jaemin Cho, Jaehong Yoon, Mohit Bansal

    Abstract: Recent advancements in text-to-video (T2V) diffusion models have significantly enhanced the visual quality of the generated videos. However, even recent T2V models find it challenging to follow text descriptions accurately, especially when the prompt requires accurate control of spatial layouts or object trajectories. A recent line of research uses layout guidance for T2V models that require fine-… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: Website: https://video-msg.github.io; The first three authors contributed equally

  29. arXiv:2504.07377  [pdf, other

    physics.flu-dyn physics.comp-ph

    Euler-Lagrange study of Microbubble-Laden Turbulent Flow over Superhydrophobic surfaces

    Authors: Byeong-Cheon Kim, Kyoungsik Chang, Sang-Wook Lee, Jaiyoung Ryu, Minjae Kim, Jaemoon Yoon

    Abstract: For slow-speed ships, underwater vehicles, and pipe transportation systems, viscous resistance accounts for a large proportion of the total energy losses. As such, various technologies have been developed to reduce viscous resistance and enhance energy efficiency in these applications. Air injection and surface treatment are two representative drag reduction techniques. Additionally, efforts to co… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 28 pages, 9 figures

    MSC Class: 76F65 (primary); 76T10 (Secondary)

  30. arXiv:2504.03467  [pdf, ps, other

    math.NT

    Sums of squares of integers except for a fixed one

    Authors: Wonjun Chae, Yun-seong Ji, Kisuk Kim, Kyoungmin Kim, Byeong-Kweon Oh, Jongheun Yoon

    Abstract: In this article, we study a sum of squares of integers except for a fixed one. For any nonnegative integer $n$, we find the minimum number of squares of integers except for $n$ whose sums represent all positive integers that are represented by a sum of squares except for it. This problem could be considered as a generalization of Dubouis's result for the case when $n=0$.

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 12 pages

    MSC Class: 11E20; 11E25

  31. arXiv:2504.03011  [pdf, other

    cs.CV

    Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization

    Authors: Junying Wang, Jingyuan Liu, Xin Sun, Krishna Kumar Singh, Zhixin Shu, He Zhang, Jimei Yang, Nanxuan Zhao, Tuanfeng Y. Wang, Simon S. Chen, Ulrich Neumann, Jae Shin Yoon

    Abstract: This paper introduces Comprehensive Relighting, the first all-in-one approach that can both control and harmonize the lighting from an image or video of humans with arbitrary body parts from any scene. Building such a generalizable model is extremely challenging due to the lack of dataset, restricting existing image-based relighting models to a specific scenario (e.g., face or static human). To ad… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Project page:https://junyingw.github.io/paper/relighting. Accepted by CVPR 2025

  32. arXiv:2504.02775  [pdf, other

    cs.CV cs.LG

    TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection

    Authors: Yoon Gyo Jung, Jaewoo Park, Jaeho Yoon, Kuan-Chuan Peng, Wonchul Kim, Andrew Beng Jin Teoh, Octavia Camps

    Abstract: We aim to solve unsupervised anomaly detection in a practical challenging environment where the normal dataset is both contaminated with defective regions and its product class distribution is tailed but unknown. We observe that existing models suffer from tail-versus-noise trade-off where if a model is robust against pixel noise, then its performance deteriorates on tail class samples, and vice v… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR2025

  33. arXiv:2503.16518  [pdf, other

    cs.HC cs.AI cs.LG

    Advancing Human-Machine Teaming: Concepts, Challenges, and Applications

    Authors: Dian Chen, Han Jun Yoon, Zelin Wan, Nithin Alluru, Sang Won Lee, Richard He, Terrence J. Moore, Frederica F. Nelson, Sunghyun Yoon, Hyuk Lim, Dan Dongseong Kim, Jin-Hee Cho

    Abstract: Human-Machine Teaming (HMT) is revolutionizing collaboration across domains such as defense, healthcare, and autonomous systems by integrating AI-driven decision-making, trust calibration, and adaptive teaming. This survey presents a comprehensive taxonomy of HMT, analyzing theoretical models, including reinforcement learning, instance-based learning, and interdependence theory, alongside interdis… ▽ More

    Submitted 6 May, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  34. arXiv:2503.15475  [pdf, other

    cs.CV

    Cube: A Roblox View of 3D Intelligence

    Authors: Foundation AI Team, Kiran Bhat, Nishchaie Khanna, Karun Channa, Tinghui Zhou, Yiheng Zhu, Xiaoxia Sun, Charles Shang, Anirudh Sudarshan, Maurice Chu, Daiqing Li, Kangle Deng, Jean-Philippe Fauconnier, Tijmen Verhulsdonck, Maneesh Agrawala, Kayvon Fatahalian, Alexander Weiss, Christian Reiser, Ravi Kiran Chirravuri, Ravali Kandur, Alejandro Pelaez, Akash Garg, Michael Palleschi, Jessica Wang, Skylar Litz , et al. (22 additional authors not shown)

    Abstract: Foundation models trained on vast amounts of data have demonstrated remarkable reasoning and generation capabilities in the domains of text, images, audio and video. Our goal at Roblox is to build such a foundation model for 3D intelligence, a model that can support developers in producing all aspects of a Roblox experience, from generating 3D objects and scenes to rigging characters for animation… ▽ More

    Submitted 14 April, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: Our code and model weights can be found at: https://github.com/Roblox/cube

  35. arXiv:2503.13441  [pdf, other

    cs.RO cs.AI cs.CV

    Humanoid Policy ~ Human Policy

    Authors: Ri-Zhao Qiu, Shiqi Yang, Xuxin Cheng, Chaitanya Chawla, Jialong Li, Tairan He, Ge Yan, David J. Yoon, Ryan Hoque, Lars Paulsen, Ge Yang, Jian Zhang, Sha Yi, Guanya Shi, Xiaolong Wang

    Abstract: Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embo… ▽ More

    Submitted 24 March, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

    Comments: Code and data: https://human-as-robot.github.io/

  36. arXiv:2503.11487  [pdf, ps, other

    astro-ph.SR astro-ph.GA

    Elemental abundances of 44 very metal-poor stars determined from Subaru/IRD near-infrared spectra

    Authors: Wako Aoki, Timothy C. Beers, Satoshi Honda, Tadafumi Matsuno, Vinicius M. Placco, Jinmi Yoon, Masayuki Kuzuhara, Hiroki Harakawa, Teruyuki Hirano, Takayuki Kotani, Takashi Kurokawa, Jun Nishikawa, Masashi Omiya, Motohide Tamura, Sebastien Vievard

    Abstract: Abundances of five elements, Na, Mg, Al, Si, and Sr, are investigated for 44 very metal-poor stars (-4.0 < [Fe/H] < -1.5) in the Galactic halo system based on an Local Thermodinamic Equilibrium (LTE) analysis of high-resolution near-infrared spectra obtained with the Infrared Doppler instrument (IRD) on the Subaru Telescope. Mg and Si abundances are determined for all 44 stars. The Si abundances a… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 20 pages, 7 figures, 10 tables, to appear in PASJ

  37. arXiv:2503.09956  [pdf, other

    cs.LG cs.AI cs.CV cs.ET

    DeepSeek-Inspired Exploration of RL-based LLMs and Synergy with Wireless Networks: A Survey

    Authors: Yu Qiao, Phuong-Nam Tran, Ji Su Yoon, Loc X. Nguyen, Eui-Nam Huh, Dusit Niyato, Choong Seon Hong

    Abstract: Reinforcement learning (RL)-based large language models (LLMs), such as ChatGPT, DeepSeek, and Grok-3, have gained significant attention for their exceptional capabilities in natural language processing and multimodal data understanding. Meanwhile, the rapid expansion of information services has driven the growing need for intelligence, efficient, and adaptable wireless networks. Wireless networks… ▽ More

    Submitted 16 April, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: 45 pages, 12 figures

  38. arXiv:2503.09516  [pdf, other

    cs.CL cs.AI cs.IR

    Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

    Authors: Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, Jiawei Han

    Abstract: Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs). Prompting advanced LLMs with reasoning capabilities to use search engines during inference is often suboptimal, as the LLM might not fully possess the capability on how to interact optimally with the search engine. This paper introduces Searc… ▽ More

    Submitted 8 April, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: 31 pages

  39. arXiv:2503.05777  [pdf, other

    cs.CL cs.AI cs.CY

    Medical Hallucinations in Foundation Models and Their Impact on Healthcare

    Authors: Yubin Kim, Hyewon Jeong, Shan Chen, Shuyue Stella Li, Mingyu Lu, Kumail Alhamoud, Jimin Mun, Cristina Grau, Minseok Jung, Rodrigo Gameiro, Lizhou Fan, Eugene Park, Tristan Lin, Joonsik Yoon, Wonjin Yoon, Maarten Sap, Yulia Tsvetkov, Paul Liang, Xuhai Xu, Xin Liu, Daniel McDuff, Hyeonhoon Lee, Hae Won Park, Samir Tulebaev, Cynthia Breazeal

    Abstract: Foundation Models that are capable of processing and generating multi-modal data have transformed AI's role in medicine. However, a key limitation of their reliability is hallucination, where inaccurate or fabricated information can impact clinical decisions and patient safety. We define medical hallucination as any instance in which a model generates misleading medical content. This paper examine… ▽ More

    Submitted 25 February, 2025; originally announced March 2025.

  40. arXiv:2503.02107  [pdf, other

    cs.RO

    Balancing Act: Trading Off Doppler Odometry and Map Registration for Efficient Lidar Localization

    Authors: Katya M. Papais, Daniil Lisus, David J. Yoon, Andrew Lambert, Keith Y. K. Leung, Timothy D. Barfoot

    Abstract: Most autonomous vehicles rely on accurate and efficient localization, which is achieved by comparing live sensor data to a preexisting map, to navigate their environment. Balancing the accuracy of localization with computational efficiency remains a significant challenge, as high-accuracy methods often come with higher computational costs. In this paper, we present two ways of improving lidar loca… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 8 pages, 3 figures, 2 tables, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

  41. arXiv:2503.01820  [pdf, other

    cs.LG cs.AI cs.CL

    RSQ: Learning from Important Tokens Leads to Better Quantized LLMs

    Authors: Yi-Lin Sung, Prateek Yadav, Jialu Li, Jaehong Yoon, Mohit Bansal

    Abstract: Layer-wise quantization is a key technique for efficiently compressing large models without expensive retraining. Previous methods typically quantize the weights of each layer by "uniformly" optimizing the layer reconstruction loss across all output tokens. However, in this paper, we demonstrate that better-quantized models can be obtained by prioritizing learning from important tokens (e.g. which… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Our code is available at https://github.com/ylsung/rsq

  42. arXiv:2503.01216  [pdf, other

    cs.RO

    A Single Scale Doesn't Fit All: Adaptive Motion Scaling for Efficient and Precise Teleoperation

    Authors: Jeonghyeon Yoon, Sanghyeok Park, Hyojae Park, Cholin Kim, Sihyeoung Park, Minho Hwang

    Abstract: Teleoperation is increasingly employed in environments where direct human access is difficult, such as hazardous exploration or surgical field. However, if the motion scale factor(MSF) intended to compensate for workspace-size differences is set inappropriately, repeated clutching operations and reduced precision can significantly raise cognitive load. This paper presents a shared controller that… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  43. arXiv:2503.00481  [pdf, ps, other

    cs.SE cs.AI

    Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy

    Authors: Felix Dobslaw, Robert Feldt, Juyeon Yoon, Shin Yoo

    Abstract: Large Language Models (LLMs) and Multi-Agent LLMs (MALLMs) introduce non-determinism unlike traditional or machine learning software, requiring new approaches to verifying correctness beyond simple output comparisons or statistical accuracy over test datasets. This paper presents a taxonomy for LLM test case design, informed by both the research literature, our experience, and open-source tools… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  44. arXiv:2502.17799  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall

    Rapid low-temperature synthesis of graphene-coated SiC substrates for remote and van der Waals epitaxy

    Authors: Se H. Kim, Hanjoo Lee, Dong Gwan Kim, Donghan Kim, Seugki Kim, Hyunho Yang, Yunsu Jang, Jangho Yoon, Hyunsoo Kim, Seoyong Ha, ByoungTak Lee, Jung-Hee Lee, Roy Byung Kyu Chung, Hongsik Park, Sungkyu Kim, Tae Hoon Lee, Hyun S. Kum

    Abstract: Non-conventional epitaxial techniques, such as van der Waals epitaxy (vdWE) and remote epitaxy, have attracted substantial attention in the semiconductor research community for their capability to repeatedly produce high-quality free-standing films from a single mother wafer. Successful implementation of these epitaxial techniques depends on creating a robust, uniform two-dimensional (2D) material… ▽ More

    Submitted 20 May, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

  45. arXiv:2502.14296  [pdf, other

    cs.CY

    On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

    Authors: Yue Huang, Chujie Gao, Siyuan Wu, Haoran Wang, Xiangqi Wang, Yujun Zhou, Yanbo Wang, Jiayi Ye, Jiawen Shi, Qihui Zhang, Yuan Li, Han Bao, Zhaoyi Liu, Tianrui Guan, Dongping Chen, Ruoxi Chen, Kehan Guo, Andy Zou, Bryan Hooi Kuen-Yew, Caiming Xiong, Elias Stengel-Eskin, Hongyang Zhang, Hongzhi Yin, Huan Zhang, Huaxiu Yao , et al. (41 additional authors not shown)

    Abstract: Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, a… ▽ More

    Submitted 11 May, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  46. arXiv:2502.07202  [pdf, ps, other

    cs.AI cs.LG

    Monte Carlo Tree Diffusion for System 2 Planning

    Authors: Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn

    Abstract: Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance naturally improves with inference-time computation scaling-standard diffusion-based planners offer only limited avenues for the scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength o… ▽ More

    Submitted 10 June, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: 23 pages, 7 figures, ICML 2025 Main Track Spotlight

  47. "It's Great Because It's Ran By Us": Empowering Teen Volunteer Discord Moderators to Design Healthy and Engaging Youth-Led Online Communities

    Authors: Jina Yoon, Amy X. Zhang, Joseph Seering

    Abstract: Online communities can offer many benefits for youth including peer learning, cultural expression, and skill development. However, most HCI research on youth-focused online communities has centered communities developed by adults for youth rather than by the youth themselves. In this work, we interviewed 11 teenagers (ages 13-17) who moderate online Discord communities created by youth, for youth.… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  48. arXiv:2502.03699  [pdf, ps, other

    cs.CL cs.AI cs.IR

    LLM Alignment as Retriever Optimization: An Information Retrieval Perspective

    Authors: Bowen Jin, Jinsung Yoon, Zhen Qin, Ziqi Wang, Wei Xiong, Yu Meng, Jiawei Han, Sercan O. Arik

    Abstract: Large Language Models (LLMs) have revolutionized artificial intelligence with capabilities in reasoning, coding, and communication, driving innovation across industries. Their true potential depends on effective alignment to ensure correct, trustworthy and ethical behavior, addressing challenges like misinformation, hallucinations, bias and misuse. While existing Reinforcement Learning (RL)-based… ▽ More

    Submitted 9 June, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: 26 pages

  49. arXiv:2501.19306  [pdf, other

    cs.AI cs.CL

    SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling

    Authors: Jiefeng Chen, Jie Ren, Xinyun Chen, Chengrun Yang, Ruoxi Sun, Jinsung Yoon, Sercan Ö Arık

    Abstract: Recent advancements in Large Language Models (LLMs) have created new opportunities to enhance performance on complex reasoning tasks by leveraging test-time computation. However, existing parallel scaling methods, such as repeated sampling or reward model scoring, often suffer from premature convergence and high costs due to task-specific reward model training, while sequential methods like SELF-R… ▽ More

    Submitted 23 May, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

  50. arXiv:2501.16463  [pdf, ps, other

    hep-th math-ph

    Higher-order chiral scalar from boundary reduction of 3d higher-spin gravity

    Authors: Calvin Yi-Ren Chen, Euihun Joung, Karapet Mkrtchyan, Junggi Yoon

    Abstract: We use a recently proposed covariant procedure to reduce the Chern-Simons action of three-dimensional higher-spin gravity to the boundary, resulting in a Lorentz covariant action for higher-order chiral scalars. After gauge-fixing, we obtain a higher-derivative action generalizing the $s=1$ Floreanini-Jackiw and $s=2$ Alekseev-Shatashvili actions to arbitrary spin $s$. For simplicity, we treat the… ▽ More

    Submitted 14 April, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: 25 pages (incl. appendix and bibliography); v2: added references, made clarifications

    Report number: Imperial-TP-KM-2025-01