Skip to main content

Showing 1–50 of 583 results for author: Lee, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.18454  [pdf, other

    cs.LG

    Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training

    Authors: Hiroki Naganuma, Xinzhi Zhang, Man-Chung Yue, Ioannis Mitliagkas, Philipp A. Witte, Russell J. Hewett, Yin Tat Lee

    Abstract: Following AI scaling trends, frontier models continue to grow in size and continue to be trained on larger datasets. Training these models requires huge investments in exascale computational resources, which has in turn driven development of distributed deep learning methods. Data parallelism is an essential approach to speed up training, but it requires frequent global communication between worke… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  2. Bridging Bond Beyond Life: Designing VR Memorial Space with Stakeholder Collaboration via Research through Design

    Authors: Heejae Bae, Nayeong Kim, Sehee Lee, Tak Yeon Lee

    Abstract: The integration of digital technologies into memorialization practices offers opportunities to transcend physical and temporal limitations. However, designing personalized memorial spaces that address the diverse needs of the dying and the bereaved remains underexplored. Using a Research through Design (RtD) approach, we conducted a three-phase study: participatory design, VR memorial space develo… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 6 pages excluding reference and appendix. Accepted at ACM CHI EA'25

  3. arXiv:2504.06866  [pdf, other

    cs.RO cs.AI cs.CV

    GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes

    Authors: Seunghyeok Back, Joosoon Lee, Kangmin Kim, Heeseon Rho, Geonhyup Lee, Raeyoung Kang, Sangbeom Lee, Sangjun Noh, Youngjin Lee, Taeyeop Lee, Kyoobin Lee

    Abstract: Robust grasping in cluttered environments remains an open challenge in robotics. While benchmark datasets have significantly advanced deep learning methods, they mainly focus on simplistic scenes with light occlusion and insufficient diversity, limiting their applicability to practical scenarios. We present GraspClutter6D, a large-scale real-world grasping dataset featuring: (1) 1,000 highly clutt… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  4. arXiv:2504.02812  [pdf, other

    cs.CV

    BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

    Authors: Van Nguyen Nguyen, Stephen Tyree, Andrew Guo, Mederic Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Stan Birchfield, Jiri Matas, Yann Labbe, Martin Sundermeyer, Tomas Hodan

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods… ▽ More

    Submitted 23 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2403.09799

  5. arXiv:2504.02199  [pdf, other

    cs.CV cs.AI

    ESC: Erasing Space Concept for Knowledge Deletion

    Authors: Tae-Young Lee, Sundong Park, Minwoo Jeon, Hyoseok Hwang, Gyeong-Moon Park

    Abstract: As concerns regarding privacy in deep learning continue to grow, individuals are increasingly apprehensive about the potential exploitation of their personal knowledge in trained models. Despite several research efforts to address this, they often fail to consider the real-world demand from users for complete knowledge erasure. Furthermore, our investigation reveals that existing methods have a ri… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: 22 pages, 14 figures, 18 tables, CVPR 2025

  6. arXiv:2504.01690  [pdf, other

    cs.SD cs.AI eess.AS

    Token Pruning in Audio Transformers: Optimizing Performance and Decoding Patch Importance

    Authors: Taehan Lee, Hyukjun Lee

    Abstract: Vision Transformers (ViTs) have achieved state-of-the-art performance across various computer vision tasks, but their high computational cost remains a challenge. Token pruning has been proposed to reduce this cost by selectively removing less important tokens. While effective in vision tasks by discarding non-object regions, applying this technique to audio tasks presents unique challenges, as di… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: This work has been submitted to the IEEE for possible publication. Source code is available at https://github.com/andylee-24/token-pruning-audio-transformer

  7. arXiv:2503.20020  [pdf, other

    cs.RO

    Gemini Robotics: Bringing AI into the Physical World

    Authors: Gemini Robotics Team, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas, Travis Armstrong, Ashwin Balakrishna, Robert Baruch, Maria Bauza, Michiel Blokzijl, Steven Bohez, Konstantinos Bousmalis, Anthony Brohan, Thomas Buschmann, Arunkumar Byravan, Serkan Cabi, Ken Caluwaerts, Federico Casarini, Oscar Chang, Jose Enrique Chen, Xi Chen, Hao-Tien Lewis Chiang, Krzysztof Choromanski, David D'Ambrosio, Sudeep Dasari , et al. (93 additional authors not shown)

    Abstract: Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. This report introduces a new family of AI models purposefully designed for robotics and built upon the foundation of Gemini 2.0. We present Gemini Robotics, an advanced Vision-Lang… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  8. arXiv:2503.18673  [pdf, other

    cs.CV cs.AI cs.RO

    Any6D: Model-free 6D Pose Estimation of Novel Objects

    Authors: Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, Kuk-Jin Yoon

    Abstract: We introduce Any6D, a model-free framework for 6D object pose estimation that requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes. Unlike existing methods that rely on textured 3D models or multiple viewpoints, Any6D leverages a joint object alignment process to enhance 2D-3D alignment and metric scale estimation for improved pose accu… ▽ More

    Submitted 25 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: CVPR 2025, Project Page: https://taeyeop.com/any6d

  9. arXiv:2503.17089  [pdf, other

    eess.IV cs.AI cs.CV

    Does a Rising Tide Lift All Boats? Bias Mitigation for AI-based CMR Segmentation

    Authors: Tiarna Lee, Esther Puyol-Antón, Bram Ruijsink, Miaojing Shi, Andrew P. King

    Abstract: Artificial intelligence (AI) is increasingly being used for medical imaging tasks. However, there can be biases in the resulting models, particularly when they were trained using imbalanced training datasets. One such example has been the strong race bias effect in cardiac magnetic resonance (CMR) image segmentation models. Although this phenomenon has been reported in a number of publications, li… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  10. arXiv:2503.16700  [pdf, other

    cs.LG eess.SY

    Deep Q-Learning with Gradient Target Tracking

    Authors: Donghwan Lee, Bum Geun Park, Taeho Lee

    Abstract: This paper introduces Q-learning with gradient target tracking, a novel reinforcement learning framework that provides a learned continuous target update mechanism as an alternative to the conventional hard update paradigm. In the standard deep Q-network (DQN), the target network is a copy of the online network's weights, held fixed for a number of iterations before being periodically replaced via… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  11. arXiv:2503.11078  [pdf, other

    cs.CV cs.LG

    Understanding Flatness in Generative Models: Its Role and Benefits

    Authors: Taehwan Lee, Kyeongkook Seo, Jaejun Yoo, Sung Whan Yoon

    Abstract: Flat minima, known to enhance generalization and robustness in supervised learning, remain largely unexplored in generative models. In this work, we systematically investigate the role of loss surface flatness in generative models, both theoretically and empirically, with a particular focus on diffusion models. We establish a theoretical claim that flatter minima improve robustness against perturb… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  12. arXiv:2503.06743  [pdf, other

    eess.IV cs.CV

    X-GAN: A Generative AI-Powered Unsupervised Model for High-Precision Segmentation of Retinal Main Vessels toward Early Detection of Glaucoma

    Authors: Cheng Huang, Weizheng Xie, Tsengdar J. Lee, Jui-Kai Wang, Karanjit Kooner, Jia Zhang

    Abstract: Structural changes in main retinal blood vessels serve as critical biomarkers for the onset and progression of glaucoma. Identifying these vessels is vital for vascular modeling yet highly challenging. This paper proposes X-GAN, a generative AI-powered unsupervised segmentation model designed for extracting main blood vessels from Optical Coherence Tomography Angiography (OCTA) images. The process… ▽ More

    Submitted 12 March, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

    Comments: 11 pages, 8 figures

  13. arXiv:2503.03753  [pdf, other

    cs.IT cs.AI eess.SP

    Generative Diffusion Model-based Compression of MIMO CSI

    Authors: Heasung Kim, Taekyun Lee, Hyeji Kim, Gustavo De Veciana, Mohamed Amine Arfaoui, Asil Koc, Phil Pietraski, Guodong Zhang, John Kaewell

    Abstract: While neural lossy compression techniques have markedly advanced the efficiency of Channel State Information (CSI) compression and reconstruction for feedback in MIMO communications, efficient algorithms for more challenging and practical tasks-such as CSI compression for future channel prediction and reconstruction with relevant side information-remain underexplored, often resulting in suboptimal… ▽ More

    Submitted 6 February, 2025; originally announced March 2025.

    Comments: 6 pages

    MSC Class: 68P30 ACM Class: I.2.0

  14. arXiv:2503.00455  [pdf, other

    cs.SD cs.AI cs.MA cs.MM eess.AS

    PodAgent: A Comprehensive Framework for Podcast Generation

    Authors: Yujia Xiao, Lei He, Haohan Guo, Fenglong Xie, Tan Lee

    Abstract: Existing Existing automatic audio generation methods struggle to generate podcast-like audio programs effectively. The key challenges lie in in-depth content generation, appropriate and expressive voice production. This paper proposed PodAgent, a comprehensive framework for creating audio programs. PodAgent 1) generates informative topic-discussion content by designing a Host-Guest-Writer multi-ag… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  15. arXiv:2502.21057  [pdf, other

    cs.RO cs.AI

    Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control

    Authors: Taeho Lee, Donghwan Lee

    Abstract: Practical control systems pose significant challenges in identifying optimal control policies due to uncertainties in the system model and external disturbances. While $H_\infty$ control techniques are commonly used to design robust controllers that mitigate the effects of disturbances, these methods often require complex and computationally intensive calculations. To address this issue, this pape… ▽ More

    Submitted 12 March, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

    Comments: 8 pages

  16. arXiv:2502.20685  [pdf, other

    cs.CV

    EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching

    Authors: Dongki Jung, Jaehoon Choi, Yonghan Lee, Somi Jeong, Taejae Lee, Dinesh Manocha, Suyong Yeon

    Abstract: We introduce the first learning-based dense matching algorithm, termed Equirectangular Projection-Oriented Dense Kernelized Feature Matching (EDM), specifically designed for omnidirectional images. Equirectangular projection (ERP) images, with their large fields of view, are particularly suited for dense matching techniques that aim to establish comprehensive correspondences across images. However… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  17. arXiv:2502.20500  [pdf, other

    cs.RO math.OC

    Equivariant Reinforcement Learning Frameworks for Quadrotor Low-Level Control

    Authors: Beomyeol Yu, Taeyoung Lee

    Abstract: Improving sampling efficiency and generalization capability is critical for the successful data-driven control of quadrotor unmanned aerial vehicles (UAVs) that are inherently unstable. While various reinforcement learning (RL) approaches have been applied to autonomous quadrotor flight, they often require extensive training data, posing multiple challenges and safety risks in practice. To address… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 14 pages, 8 figures

  18. arXiv:2502.17494  [pdf, other

    cs.IR cs.AI cs.LG

    External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

    Authors: Mingfu Liang, Xi Liu, Rong Jin, Boyang Liu, Qiuling Suo, Qinghai Zhou, Song Zhou, Laming Chen, Hua Zheng, Zhiyuan Li, Shali Jiang, Jiyan Yang, Xiaozhen Xia, Fan Yang, Yasmine Badr, Ellie Wen, Shuyu Xu, Hansey Chen, Zhengyu Zhang, Jade Nie, Chunzhi Yang, Zhichen Zeng, Weilin Zhang, Xingliang Huang, Qianru Li , et al. (80 additional authors not shown)

    Abstract: Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in indus… ▽ More

    Submitted 23 April, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted by the ACM Web Conference (WWW) 2025 Industrial Track as Oral Presentation

  19. arXiv:2502.13595  [pdf, other

    cs.CL cs.AI cs.IR

    MMTEB: Massive Multilingual Text Embedding Benchmark

    Authors: Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, Márton Kardos, Ashwin Mathur, David Stap, Jay Gala, Wissam Siblini, Dominik Krzemiński, Genta Indra Winata, Saba Sturua, Saiteja Utpala, Mathieu Ciancone, Marion Schaeffer, Gabriel Sequeira, Diganta Misra, Shreeya Dhakal, Jonathan Rystrøm, Roman Solomatin, Ömer Çağatan, Akash Kundu, Martin Bernstorff, Shitao Xiao, Akshita Sukhlecha, Bhavish Pahwa , et al. (61 additional authors not shown)

    Abstract: Text embeddings are typically evaluated on a limited set of tasks, which are constrained by language, domain, and task diversity. To address these limitations and provide a more comprehensive evaluation, we introduce the Massive Multilingual Text Embedding Benchmark (MMTEB) - a large-scale, community-driven expansion of MTEB, covering over 500 quality-controlled evaluation tasks across 250+ langua… ▽ More

    Submitted 8 April, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted for ICLR: https://openreview.net/forum?id=zl3pfz4VCV

  20. arXiv:2502.12944  [pdf, other

    cs.LG

    Performance of Zero-Shot Time Series Foundation Models on Cloud Data

    Authors: William Toner, Thomas L. Lee, Artjom Joosen, Rajkarn Singh, Martin Asenov

    Abstract: Time series foundation models (FMs) have emerged as a popular paradigm for zero-shot multi-domain forecasting. FMs are trained on numerous diverse datasets and claim to be effective forecasters across multiple different time series domains, including cloud data. In this work we investigate this claim, exploring the effectiveness of FMs on cloud data. We demonstrate that many well-known FMs fail to… ▽ More

    Submitted 4 March, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Comments: 5 pages, Preprint

  21. arXiv:2502.12920  [pdf, other

    cs.LG stat.ML

    Lightweight Online Adaption for Time Series Foundation Model Forecasts

    Authors: Thomas L. Lee, William Toner, Rajkarn Singh, Artjom Joosen, Martin Asenov

    Abstract: Foundation models (FMs) have emerged as a promising approach for time series forecasting. While effective, FMs typically remain fixed during deployment due to the high computational costs of learning them online. Consequently, deployed FMs fail to adapt their forecasts to current data characteristics, despite the availability of online feedback from newly arriving data. This raises the question of… ▽ More

    Submitted 26 March, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Comments: 8 pages, Preprint

  22. arXiv:2502.12605  [pdf, other

    cs.MA cs.LG

    Hypernetwork-based approach for optimal composition design in partially controlled multi-agent systems

    Authors: Kyeonghyeon Park, David Molina Concha, Hyun-Rok Lee, Chi-Guhn Lee, Taesik Lee

    Abstract: Partially Controlled Multi-Agent Systems (PCMAS) are comprised of controllable agents, managed by a system designer, and uncontrollable agents, operating autonomously. This study addresses an optimal composition design problem in PCMAS, which involves the system designer's problem, determining the optimal number and policies of controllable agents, and the uncontrollable agents' problem, identifyi… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  23. arXiv:2502.08949  [pdf, other

    cs.LG

    Self-Supervised Graph Contrastive Pretraining for Device-level Integrated Circuits

    Authors: Sungyoung Lee, Ziyi Wang, Seunggeun Kim, Taekyun Lee, David Z. Pan

    Abstract: Self-supervised graph representation learning has driven significant advancements in domains such as social network analysis, molecular design, and electronics design automation (EDA). However, prior works in EDA have mainly focused on the representation of gate-level digital circuits, failing to capture analog and mixed-signal circuits. To address this gap, we introduce DICE: Device-level Integra… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  24. arXiv:2502.07663  [pdf, other

    cs.AI cs.CL cs.CY cs.HC

    Human Decision-making is Susceptible to AI-driven Manipulation

    Authors: Sahand Sabour, June M. Liu, Siyang Liu, Chris Z. Yao, Shiyao Cui, Xuanming Zhang, Wen Zhang, Yaru Cao, Advait Bhat, Jian Guan, Wei Wu, Rada Mihalcea, Hongning Wang, Tim Althoff, Tatia M. C. Lee, Minlie Huang

    Abstract: Artificial Intelligence (AI) systems are increasingly intertwined with daily life, assisting users in executing various tasks and providing guidance on decision-making. This integration introduces risks of AI-driven manipulation, where such systems may exploit users' cognitive biases and emotional vulnerabilities to steer them toward harmful outcomes. Through a randomized controlled trial with 233… ▽ More

    Submitted 24 February, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: Work in progress

  25. arXiv:2502.07192  [pdf, other

    cs.CV

    OscNet: Machine Learning on CMOS Oscillator Networks

    Authors: Wenxiao Cai, Thomas H. Lee

    Abstract: Machine learning and AI have achieved remarkable advancements but at the cost of significant computational resources and energy consumption. This has created an urgent need for a novel, energy-efficient computational fabric to replace the current computing pipeline. Recently, a promising approach has emerged by mimicking spiking neurons in the brain and leveraging oscillators on CMOS for direct co… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  26. arXiv:2502.06086  [pdf, other

    cs.CL

    Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property Type

    Authors: Seokwon Song, Taehyun Lee, Jaewoo Ahn, Jae Hyuk Sung, Gunhee Kim

    Abstract: Conceptual combination is a cognitive process that merges basic concepts, enabling the creation of complex expressions. During this process, the properties of combination (e.g., the whiteness of a peeled apple) can be inherited from basic concepts, newly emerge, or be canceled. However, previous studies have evaluated a limited set of properties and have not examined the generative process. To add… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: NAACL 2025; the dataset and experimental code are available at https://github.com/seokwon99/CCPT.git

  27. arXiv:2502.05409  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.SY

    Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment

    Authors: Maneesha Wickramasuriya, Beomyeol Yu, Taeyoung Lee, Murray Snyder

    Abstract: This paper proposes a vision-in-the-loop simulation environment for deep monocular pose estimation of a UAV operating in an ocean environment. Recently, a deep neural network with a transformer architecture has been successfully trained to estimate the pose of a UAV relative to the flight deck of a research vessel, overcoming several limitations of GPS-based approaches. However, validating the dee… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 8 pages, 15 figures, conference

  28. arXiv:2502.04646  [pdf, other

    cs.LG cs.AI

    Importance Sampling via Score-based Generative Models

    Authors: Heasung Kim, Taekyun Lee, Hyeji Kim, Gustavo de Veciana

    Abstract: Importance sampling, which involves sampling from a probability density function (PDF) proportional to the product of an importance weight function and a base PDF, is a powerful technique with applications in variance reduction, biased or customized sampling, data augmentation, and beyond. Inspired by the growing availability of score-based generative models (SGMs), we propose an entirely training… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: 18 pages

    MSC Class: 68T01 ACM Class: I.2.0

  29. arXiv:2502.01377  [pdf, other

    cs.CE cs.AI

    Data-Efficient Model for Psychological Resilience Prediction based on Neurological Data

    Authors: Zhi Zhang, Yan Liu, Mengxia Gao, Yu Yang, Jiannong Cao, Wai Kai Hou, Shirley Li, Sonata Yau, Yun Kwok Wing, Tatia M. C. Lee

    Abstract: Psychological resilience, defined as the ability to rebound from adversity, is crucial for mental health. Compared with traditional resilience assessments through self-reported questionnaires, resilience assessments based on neurological data offer more objective results with biological markers, hence significantly enhancing credibility. This paper proposes a novel data-efficient model to address… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  30. arXiv:2501.18837  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

    Authors: Mrinank Sharma, Meg Tong, Jesse Mu, Jerry Wei, Jorrit Kruthoff, Scott Goodfriend, Euan Ong, Alwin Peng, Raj Agarwal, Cem Anil, Amanda Askell, Nathan Bailey, Joe Benton, Emma Bluemke, Samuel R. Bowman, Eric Christiansen, Hoagy Cunningham, Andy Dau, Anjali Gopal, Rob Gilson, Logan Graham, Logan Howard, Nimit Kalra, Taesung Lee, Kevin Lin , et al. (18 additional authors not shown)

    Abstract: Large language models (LLMs) are vulnerable to universal jailbreaks-prompting strategies that systematically bypass model safeguards and enable users to carry out harmful processes that require many model interactions, like manufacturing illegal substances at scale. To defend against these attacks, we introduce Constitutional Classifiers: safeguards trained on synthetic data, generated by promptin… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  31. arXiv:2501.15641  [pdf, other

    cs.CV

    Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting

    Authors: Yuxin Zhang, Minyan Luo, Weiming Dong, Xiao Yang, Haibin Huang, Chongyang Ma, Oliver Deussen, Tong-Yee Lee, Changsheng Xu

    Abstract: The stories and characters that captivate us as we grow up shape unique fantasy worlds, with images serving as the primary medium for visually experiencing these realms. Personalizing generative models through fine-tuning with theme-specific data has become a prevalent approach in text-to-image generation. However, unlike object customization, which focuses on learning specific objects, theme-spec… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

  32. arXiv:2501.15332  [pdf, other

    cs.HC q-bio.NC

    Perception of an AI Teammate in an Embodied Control Task Affects Team Performance, Reflected in Human Teammates' Behaviors and Physiological Responses

    Authors: Yinuo Qin, Richard T. Lee, Paul Sajda

    Abstract: The integration of artificial intelligence (AI) into human teams is widely expected to enhance performance and collaboration. However, our study reveals a striking and counterintuitive result: human-AI teams performed worse than human-only teams, especially when task difficulty increased. Using a virtual reality-based sensorimotor task, we observed that the inclusion of an active human-like AI tea… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  33. arXiv:2501.15328  [pdf, other

    q-bio.NC cs.LG

    Physiologically-Informed Predictability of a Teammate's Future Actions Forecasts Team Performance

    Authors: Yinuo Qin, Richard T. Lee, Weijia Zhang, Xiaoxiao Sun, Paul Sajda

    Abstract: In collaborative environments, a deep understanding of multi-human teaming dynamics is essential for optimizing performance. However, the relationship between individuals' behavioral and physiological markers and their combined influence on overall team performance remains poorly understood. To explore this, we designed a triadic human collaborative sensorimotor task in virtual reality (VR) and in… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  34. arXiv:2501.14918  [pdf, other

    cs.CV

    3D/2D Registration of Angiograms using Silhouette-based Differentiable Rendering

    Authors: Taewoong Lee, Sarah Frisken, Nazim Haouchine

    Abstract: We present a method for 3D/2D registration of Digital Subtraction Angiography (DSA) images to provide valuable insight into brain hemodynamics and angioarchitecture. Our approach formulates the registration as a pose estimation problem, leveraging both anteroposterior and lateral DSA views and employing differentiable rendering. Preliminary experiments on real and synthetic datasets demonstrate th… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  35. arXiv:2501.13341  [pdf, other

    cs.CV

    Multi-aspect Knowledge Distillation with Large Language Model

    Authors: Taegyeong Lee, Jinsik Bang, Soyeong Kwon, Taehwan Kim

    Abstract: Recent advancements in deep learning have significantly improved performance on computer vision tasks. Previous image classification methods primarily modify model architectures or add features, and they optimize models using cross-entropy loss on class logits. Since they focus on classifying images with considering class labels, these methods may struggle to learn various \emph{aspects} of classe… ▽ More

    Submitted 12 April, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

    Comments: Accept to CVPRW2025 (FGVC12)

  36. arXiv:2501.08769  [pdf

    cs.CL

    Enhanced Large Language Models for Effective Screening of Depression and Anxiety

    Authors: June M. Liu, Mengxia Gao, Sahand Sabour, Zhuang Chen, Minlie Huang, Tatia M. C. Lee

    Abstract: Depressive and anxiety disorders are widespread, necessitating timely identification and management. Recent advances in Large Language Models (LLMs) offer potential solutions, yet high costs and ethical concerns about training data remain challenges. This paper introduces a pipeline for synthesizing clinical interviews, resulting in 1,157 interactive dialogues (PsyInterview), and presents EmoScan,… ▽ More

    Submitted 25 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

  37. Dynamic Portfolio Optimization via Augmented DDPG with Quantum Price Levels-Based Trading Strategy

    Authors: Runsheng Lin, Zihan Xing, Mingze Ma, Raymond S. T. Lee

    Abstract: With the development of deep learning, Dynamic Portfolio Optimization (DPO) problem has received a lot of attention in recent years, not only in the field of finance but also in the field of deep learning. Some advanced research in recent years has proposed the application of Deep Reinforcement Learning (DRL) to the DPO problem, which demonstrated to be more advantageous than supervised learning i… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: 8 pages

    Journal ref: Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2023

  38. arXiv:2501.05310  [pdf, other

    eess.AS cs.SD

    Probing Speaker-specific Features in Speaker Representations

    Authors: Aemon Yat Fei Chiu, Paco Kei Ching Fung, Roger Tsz Yeung Li, Jingyu Li, Tan Lee

    Abstract: This study explores speaker-specific features encoded in speaker embeddings and intermediate layers of speech self-supervised learning (SSL) models. By utilising a probing method, we analyse features such as pitch, tempo, and energy across prominent speaker embedding models and speech SSL models, including HuBERT, WavLM, and Wav2vec 2.0. The results reveal that speaker embeddings like CAM++ excel… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

  39. arXiv:2501.02064  [pdf, other

    cs.CV cs.AI

    ArtCrafter: Text-Image Aligning Style Transfer via Embedding Reframing

    Authors: Nisha Huang, Kaer Huang, Yifan Pu, Jiangshan Wang, Jie Guo, Yiqiang Yan, Xiu Li, Tong-Yee Lee

    Abstract: Recent years have witnessed significant advancements in text-guided style transfer, primarily attributed to innovations in diffusion models. These models excel in conditional guidance, utilizing text or images to direct the sampling process. However, despite their capabilities, direct conditional guidance approaches often face challenges in balancing the expressiveness of textual semantics with th… ▽ More

    Submitted 17 April, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

    Comments: 13 pages, 17 figures, submitted to a journal

  40. arXiv:2412.10436  [pdf, other

    cs.CV cs.LG

    Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation

    Authors: SeungBum Ha, Taehwan Lee, Jiyoun Lim, Sung Whan Yoon

    Abstract: Federated learning (FL) has recently garnered attention as a data-decentralized training framework that enables the learning of deep models from locally distributed samples while keeping data privacy. Built upon the framework, immense efforts have been made to establish FL benchmarks, which provide rigorous evaluation settings that control data heterogeneity across clients. Prior efforts have main… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  41. arXiv:2412.08905  [pdf, other

    cs.CL cs.AI

    Phi-4 Technical Report

    Authors: Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, Yuanzhi Li, Weishung Liu, Caio C. T. Mendes, Anh Nguyen, Eric Price, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Xin Wang, Rachel Ward, Yue Wu, Dingli Yu , et al. (2 additional authors not shown)

    Abstract: We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabil… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  42. arXiv:2412.04637  [pdf

    cs.IR cs.AI cs.LG

    Semantic Retrieval at Walmart

    Authors: Alessandro Magnani, Feng Liu, Suthee Chaidaroon, Sachin Yadav, Praveen Reddy Suram, Ajit Puthenputhussery, Sijie Chen, Min Xie, Anirudh Kashi, Tony Lee, Ciya Liao

    Abstract: In product search, the retrieval of candidate products before re-ranking is more critical and challenging than other search like web search, especially for tail queries, which have a complex and specific search intent. In this paper, we present a hybrid system for e-commerce search deployed at Walmart that combines traditional inverted index and embedding-based neural retrieval to better answer us… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: 9 page, 2 figures, 10 tables, KDD 2022

  43. arXiv:2412.04137  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Text Change Detection in Multilingual Documents Using Image Comparison

    Authors: Doyoung Park, Naresh Reddy Yarram, Sunjin Kim, Minkyu Kim, Seongho Cho, Taehee Lee

    Abstract: Document comparison typically relies on optical character recognition (OCR) as its core technology. However, OCR requires the selection of appropriate language models for each document and the performance of multilingual or hybrid models remains limited. To overcome these challenges, we propose text change detection (TCD) using an image comparison model tailored for multilingual documents. Unlike… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: 15pages, 11figures 6tables, wacv2025 accepted

  44. arXiv:2411.19517  [pdf, other

    cs.LG cs.AI

    RL-MILP Solver: A Reinforcement Learning Approach for Solving Mixed-Integer Linear Programs with Graph Neural Networks

    Authors: Tae-Hoon Lee, Min-Soo Kim

    Abstract: Mixed-integer linear programming (MILP) is a widely used optimization technique across various fields. Existing $\textit{end-to-end learning}$ methods for MILP generate values for a subset of decision variables and delegate the remaining problem to traditional MILP solvers. However, this approach often fails to guarantee solution feasibility (i.e., satisfying all constraints) due to inaccurate pre… ▽ More

    Submitted 11 March, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

    Comments: Extended version (17 pages, 8 figures). Accepted at the 2025 AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)

  45. arXiv:2411.18049  [pdf, other

    cs.HC

    Understanding the Impact of Spatial Immersion in Web Data Stories

    Authors: Seon Gyeom Kim, Juhyeong Park, Yutaek Song, Donggun Lee, Yubin Lee, Ryan Rossi, Jane Hoffswell, Eunyee Koh, Tak Yeon Lee

    Abstract: An increasing number of web articles engage the reader with the feeling of being immersed in the data space. However, the exact characteristics of spatial immersion in the context of visual storytelling remain vague. For example, what are the common design patterns of data stories with spatial immersion? How do they affect the reader's experience? To gain a deeper understanding of the subject, we… ▽ More

    Submitted 29 March, 2025; v1 submitted 26 November, 2024; originally announced November 2024.

  46. arXiv:2411.17971  [pdf, other

    eess.IV cs.AI cs.CE cs.LG

    Graph Neural Network for Cerebral Blood Flow Prediction With Clinical Datasets

    Authors: Seungyeon Kim, Wheesung Lee, Sung-Ho Ahn, Do-Eun Lee, Tae-Rin Lee

    Abstract: Accurate prediction of cerebral blood flow is essential for the diagnosis and treatment of cerebrovascular diseases. Traditional computational methods, however, often incur significant computational costs, limiting their practicality in real-time clinical applications. This paper proposes a graph neural network (GNN) to predict blood flow and pressure in previously unseen cerebral vascular network… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: 4 pages, 3 figures

  47. arXiv:2411.17785  [pdf, other

    eess.SP cs.LG

    New Test-Time Scenario for Biosignal: Concept and Its Approach

    Authors: Yong-Yeon Jo, Byeong Tak Lee, Beom Joon Kim, Jeong-Ho Hong, Hak Seung Lee, Joon-myoung Kwon

    Abstract: Online Test-Time Adaptation (OTTA) enhances model robustness by updating pre-trained models with unlabeled data during testing. In healthcare, OTTA is vital for real-time tasks like predicting blood pressure from biosignals, which demand continuous adaptation. We introduce a new test-time scenario with streams of unlabeled samples and occasional labeled samples. Our framework combines supervised a… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: Findings paper presented at Machine Learning for Health (ML4H) symposium 2024, December 15-16, 2024, Vancouver, Canada, 6 pages

  48. arXiv:2411.15509  [pdf, other

    cs.CV cs.AI

    Interactive Visual Assessment for Text-to-Image Generation Models

    Authors: Xiaoyue Mi, Fan Tang, Juan Cao, Qiang Sheng, Ziyao Huang, Peng Li, Yang Liu, Tong-Yee Lee

    Abstract: Visual generation models have achieved remarkable progress in computer graphics applications but still face significant challenges in real-world deployment. Current assessment approaches for visual generation tasks typically follow an isolated three-phase framework: test input collection, model output generation, and user assessment. These fashions suffer from fixed coverage, evolving difficulty,… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

    Comments: Under Review

  49. arXiv:2411.15034  [pdf, other

    cs.CV cs.LG

    HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads

    Authors: Yu Xu, Fan Tang, Juan Cao, Yuxin Zhang, Xiaoyu Kong, Jintao Li, Oliver Deussen, Tong-Yee Lee

    Abstract: Diffusion Transformers (DiTs) have exhibited robust capabilities in image generation tasks. However, accurate text-guided image editing for multimodal DiTs (MM-DiTs) still poses a significant challenge. Unlike UNet-based structures that could utilize self/cross-attention maps for semantic editing, MM-DiTs inherently lack support for explicit and consistent incorporated text guidance, resulting in… ▽ More

    Submitted 22 November, 2024; originally announced November 2024.

  50. An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems

    Authors: Jingyu Li, Aemon Yat Fei Chiu, Tan Lee

    Abstract: Language mismatch is among the most common and challenging domain mismatches in deploying speaker verification (SV) systems. Adversarial reprogramming has shown promising results in cross-language adaptation for SV. The reprogramming is implemented by padding learnable parameters on the two sides of input speech signals. In this paper, we investigate the relationship between the number of padded p… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

    Comments: Accepted by ISCSLP 2024

    Journal ref: 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP), Beijing, China, 2024, pp. 388-392