Skip to main content

Showing 1–50 of 193 results for author: Moon, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.04223  [pdf, other

    cs.LG cs.AI cs.DC

    FRAIN to Train: A Fast-and-Reliable Solution for Decentralized Federated Learning

    Authors: Sanghyeon Park, Soo-Mook Moon

    Abstract: Federated learning (FL) enables collaborative model training across distributed clients while preserving data locality. Although FedAvg pioneered synchronous rounds for global model averaging, slower devices can delay collective progress. Asynchronous FL (e.g., FedAsync) addresses stragglers by continuously integrating client updates, yet naive implementations risk client drift due to non-IID data… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  2. arXiv:2505.02722  [pdf, other

    cs.AI cs.LG

    Enhancing LLMs' Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry

    Authors: Junu Kim, Chaeeun Shim, Sungjin Park, Su Yeon Lee, Gee Young Suh, Chae-Man Lim, Seong Jin Choi, Song Mi Moon, Kyoung-Ho Song, Eu Suk Kim, Hong Bin Kim, Sejoong Kim, Chami Im, Dong-Wan Kang, Yong Soo Kim, Hee-Joon Bae, Sung Yoon Lim, Han-Gil Jeong, Edward Choi

    Abstract: Although large language models (LLMs) have demonstrated impressive reasoning capabilities across general domains, their effectiveness in real-world clinical practice remains limited. This is likely due to their insufficient exposure to real-world clinical data during training, as such data is typically not included due to privacy concerns. To address this, we propose enhancing the clinical reasoni… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  3. arXiv:2505.01530  [pdf

    cs.CV cs.AI

    Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer

    Authors: Muhammad Tayyab Khan, Zane Yong, Lequn Chen, Jun Ming Tan, Wenhe Feng, Seung Ki Moon

    Abstract: Accurate extraction of key information from 2D engineering drawings is crucial for high-precision manufacturing. Manual extraction is time-consuming and error-prone, while traditional Optical Character Recognition (OCR) techniques often struggle with complex layouts and overlapping symbols, resulting in unstructured outputs. To address these challenges, this paper proposes a novel hybrid deep lear… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: This paper has been submitted to the IEEE International Conference on Industrial Engineering and Engineering Management (IEEM 2025)

  4. arXiv:2504.14875  [pdf, other

    cs.CV cs.AI cs.LG

    ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams

    Authors: Chris Dongjoo Kim, Jihwan Moon, Sangwoo Moon, Heeseung Yun, Sihaeng Lee, Aniruddha Kembhavi, Soonyoung Lee, Gunhee Kim, Sangho Lee, Christopher Clark

    Abstract: The rapid growth of video-text data presents challenges in storage and computation during training. Online learning, which processes streaming data in real-time, offers a promising solution to these issues while also allowing swift adaptations in scenarios demanding real-time responsiveness. One strategy to enhance the efficiency and effectiveness of learning involves identifying and prioritizing… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: CVPR 2025 (main conference)

  5. arXiv:2504.14802  [pdf, other

    cs.DC

    ReCraft: Self-Contained Split, Merge, and Membership Change of Raft Protocol

    Authors: Kezhi Xiong, Soonwon Moon, Joshua Kang, Bryant Curto, Jieung Kim, Ji-Yong Shin

    Abstract: Designing reconfiguration schemes for consensus protocols is challenging because subtle corner cases during reconfiguration could invalidate the correctness of the protocol. Thus, most systems that embed consensus protocols conservatively implement the reconfiguration and refrain from developing an efficient scheme. Existing implementations often stop the entire system during reconfiguration and r… ▽ More

    Submitted 27 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Journal ref: The 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (2025)

  6. arXiv:2504.13180  [pdf, other

    cs.CV cs.AI cs.LG

    PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

    Authors: Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi, Triantafyllos Afouras, Tushar Nagarajan, Muhammad Maaz, Yale Song, Tengyu Ma, Shuming Hu, Suyog Jain, Miguel Martin, Huiyu Wang, Hanoona Rasheed, Peize Sun, Po-Yao Huang, Daniel Bolya, Nikhila Ravi, Shashank Jain, Tammy Stark, Shane Moon, Babak Damavandi, Vivian Lee, Andrew Westbury, Salman Khan, Philipp Krähenbühl , et al. (4 additional authors not shown)

    Abstract: Vision-language models are integral to computer vision research, yet many high-performing models remain closed-source, obscuring their data, design and training recipe. The research community has responded by using distillation from black-box models to label training data, achieving strong benchmark results, at the cost of measurable scientific progress. However, without knowing the details of the… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Technical report

  7. arXiv:2504.11673  [pdf, other

    cs.CL

    Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

    Authors: Minwoo Kang, Suhong Moon, Seung Hyeong Lee, Ayush Raj, Joseph Suh, David M. Chan

    Abstract: Large language models (LLMs) are increasingly capable of simulating human behavior, offering cost-effective ways to estimate user responses during the early phases of survey design. While previous studies have examined whether models can reflect individual opinions or attitudes, we argue that a \emph{higher-order} binding of virtual personas requires successfully approximating not only the opinion… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  8. arXiv:2504.03746  [pdf, other

    cs.LG cs.AI

    Enhancing Biologically Inspired Hierarchical Temporal Memory with Hardware-Accelerated Reflex Memory

    Authors: Pavia Bera, Sabrina Hassan Moon, Jennifer Adorno, Dayane Alfenas Reis, Sanjukta Bhanja

    Abstract: The rapid expansion of the Internet of Things (IoT) generates zettabytes of data that demand efficient unsupervised learning systems. Hierarchical Temporal Memory (HTM), a third-generation unsupervised AI algorithm, models the neocortex of the human brain by simulating columns of neurons to process and predict sequences. These neuron columns can memorize and infer sequences across multiple orders.… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  9. arXiv:2504.02812  [pdf, other

    cs.CV

    BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

    Authors: Van Nguyen Nguyen, Stephen Tyree, Andrew Guo, Mederic Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Stan Birchfield, Jiri Matas, Yann Labbe, Martin Sundermeyer, Tomas Hodan

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods… ▽ More

    Submitted 23 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2403.09799

  10. arXiv:2503.23566  [pdf, other

    cs.CL

    When LLM Therapists Become Salespeople: Evaluating Large Language Models for Ethical Motivational Interviewing

    Authors: Haein Kong, Seonghyeon Moon

    Abstract: Large language models (LLMs) have been actively applied in the mental health field. Recent research shows the promise of LLMs in applying psychotherapy, especially motivational interviewing (MI). However, there is a lack of studies investigating how language models understand MI ethics. Given the risks that malicious actors can use language models to apply MI for unethical purposes, it is importan… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

  11. arXiv:2503.22087  [pdf, other

    cs.CV

    Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction

    Authors: Seokha Moon, Janghyun Baek, Giseop Kim, Jinkyu Kim, Sunwook Choi

    Abstract: 3D occupancy prediction has emerged as a key perception task for autonomous driving, as it reconstructs 3D environments to provide a comprehensive scene understanding. Recent studies focus on integrating spatiotemporal information obtained from past observations to improve prediction accuracy, using a multi-frame fusion approach that processes multiple past frames together. However, these methods… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  12. arXiv:2503.17731  [pdf, other

    cs.CV

    Co-op: Correspondence-based Novel Object Pose Estimation

    Authors: Sungphill Moon, Hyeontae Son, Dongcheol Hur, Sangwook Kim

    Abstract: We propose Co-op, a novel method for accurately and robustly estimating the 6DoF pose of objects unseen during training from a single RGB image. Our method requires only the CAD model of the target object and can precisely estimate its pose without any additional fine-tuning. While existing model-based methods suffer from inefficiency due to using a large number of templates, our method enables fa… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025

  13. arXiv:2503.10349  [pdf, other

    cs.RO eess.SP

    Autonomous Robotic Radio Source Localization via a Novel Gaussian Mixture Filtering Approach

    Authors: Sukkeun Kim, Sangwoo Moon, Ivan Petrunin, Hyo-Sang Shin, Shehryar Khattak

    Abstract: This study proposes a new Gaussian Mixture Filter (GMF) to improve the estimation performance for the autonomous robotic radio signal source search and localization problem in unknown environments. The proposed filter is first tested with a benchmark numerical problem to validate the performance with other state-of-practice approaches such as Particle Gaussian Mixture (PGM) filters and Particle Fi… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  14. arXiv:2503.09572  [pdf, other

    cs.CL

    Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

    Authors: Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami

    Abstract: Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution det… ▽ More

    Submitted 22 April, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

  15. arXiv:2503.03139  [pdf, other

    cs.LG cs.AI cs.DC

    Convergence Analysis of Federated Learning Methods Using Backward Error Analysis

    Authors: Jinwoo Lim, Suhyun Kim, Soo-Mook Moon

    Abstract: Backward error analysis allows finding a modified loss function, which the parameter updates really follow under the influence of an optimization method. The additional loss terms included in this modified function is called implicit regularizer. In this paper, we attempt to find the implicit regularizer for various federated learning algorithms on non-IID data distribution, and explain why each m… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Journal ref: AAAI 2025

  16. arXiv:2503.00322  [pdf

    cs.AR cs.AI

    T-REX: A 68-567 μs/token, 0.41-3.95 μJ/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET

    Authors: Seunghyun Moon, Mao Li, Gregory Chen, Phil Knag, Ram Krishnamurthy, Mingoo Seok

    Abstract: This work introduces novel training and post-training compression schemes to reduce external memory access during transformer model inference. Additionally, a new control flow mechanism, called dynamic batching, and a novel buffer architecture, termed a two-direction accessible register file, further reduce external memory access while improving hardware utilization.

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: Accepted to IEEE ISSCC 2025

  17. arXiv:2502.18689  [pdf, ps, other

    cs.HC

    Emerging Practices in Participatory AI Design in Public Sector Innovation

    Authors: Devansh Saxena, Zoe Kahn, Erina Seh-Young Moon, Lauren M. Chambers, Corey Jackson, Min Kyung Lee, Motahhare Eslami, Shion Guha, Sheena Erete, Lilly Irani, Deirdre Mulligan, John Zimmerman

    Abstract: Local and federal agencies are rapidly adopting AI systems to augment or automate critical decisions, efficiently use resources, and improve public service delivery. AI systems are being used to support tasks associated with urban planning, security, surveillance, energy and critical infrastructure, and support decisions that directly affect citizens and their ability to access essential services.… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25), April 26-May 1, 2025, Yokohama, Japan

  18. arXiv:2502.17771  [pdf, other

    cs.LG cs.AI cs.CV

    Sample Selection via Contrastive Fragmentation for Noisy Label Regression

    Authors: Chris Dongjoo Kim, Sangwoo Moon, Jihwan Moon, Dongyeon Woo, Gunhee Kim

    Abstract: As with many other problems, real-world regression is plagued by the presence of noisy labels, an inevitable issue that demands our attention. Fortunately, much real-world data often exhibits an intrinsic property of continuously ordered correlations between labels and features, where data points with similar labels are also represented with closely related features. In response, we propose a nove… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: NeurIPS 2024

  19. arXiv:2502.16761  [pdf, other

    cs.CL

    Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions

    Authors: Joseph Suh, Erfan Jahanparast, Suhong Moon, Minwoo Kang, Serina Chang

    Abstract: Large language models (LLMs) present novel opportunities in public opinion research by predicting survey responses in advance during the early stages of survey design. Prior methods steer LLMs via descriptions of subpopulations as LLMs' input prompt, yet such prompt engineering approaches have struggled to faithfully predict the distribution of survey responses from human subjects. In this work, w… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  20. arXiv:2502.13575  [pdf, other

    cs.LG

    ETS: Efficient Tree Search for Inference-Time Scaling

    Authors: Coleman Hooper, Sehoon Kim, Suhong Moon, Kerem Dilmen, Monishwaran Maheswaran, Nicholas Lee, Michael W. Mahoney, Sophia Shao, Kurt Keutzer, Amir Gholami

    Abstract: Test-time compute scaling has emerged as a new axis along which to improve model accuracy, where additional computation is used at inference time to allow the model to think longer for more challenging problems. One promising approach for test-time compute scaling is search against a process reward model, where a model generates multiple potential candidates at each step of the search, and these p… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: 11 pages

  21. The Datafication of Care in Public Homelessness Services

    Authors: Erina Seh-Young Moon, Devansh Saxena, Dipto Das, Shion Guha

    Abstract: Homelessness systems in North America adopt coordinated data-driven approaches to efficiently match support services to clients based on their assessed needs and available resources. AI tools are increasingly being implemented to allocate resources, reduce costs and predict risks in this space. In this study, we conducted an ethnographic case study on the City of Toronto's homelessness system's da… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: CHI Conference on Human Factors in Computing Systems (CHI '25), April 26-May 1, 2025, Yokohama, Japan. ACM, New York, NY, USA, 16 pages

  22. DEMOTIC: A Differentiable Sampler for Multi-Level Digital Circuits

    Authors: Arash Ardakani, Minwoo Kang, Kevin He, Qijing Huang, Vighnesh Iyer, Suhong Moon, John Wawrzynek

    Abstract: Efficient sampling of satisfying formulas for circuit satisfiability (CircuitSAT), a well-known NP-complete problem, is essential in modern front-end applications for thorough testing and verification of digital circuits. Generating such samples is a hard computational problem due to the inherent complexity of digital circuits, size of the search space, and resource constraints involved in the pro… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: 7 pages

  23. arXiv:2502.00315  [pdf, other

    cs.CV

    MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model

    Authors: Jihyeok Kim, Seongwoo Moon, Sungwon Nah, David Hyunchul Shim

    Abstract: This paper proposes novel methods to enhance the performance of monocular 3D object detection models by leveraging the generalized feature extraction capabilities of a vision foundation model. Unlike traditional CNN-based approaches, which often suffer from inaccurate depth estimation and rely on multi-stage object detection pipelines, this study employs a Vision Transformer (ViT)-based foundation… ▽ More

    Submitted 31 January, 2025; originally announced February 2025.

    Comments: 8 pages, 8 figures

  24. arXiv:2501.12332  [pdf, other

    cs.CL cs.AI cs.LG

    Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration

    Authors: Thomas Walshe, Sae Young Moon, Chunyang Xiao, Yawwani Gunawardana, Fran Silavong

    Abstract: Acquiring labelled training data remains a costly task in real world machine learning projects to meet quantity and quality requirements. Recently Large Language Models (LLMs), notably GPT-4, have shown great promises in labelling data with high accuracy. However, privacy and cost concerns prevent the ubiquitous use of GPT-4. In this work, we explore effectively leveraging open-source models for a… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: 11 pages, 1 figure

  25. arXiv:2501.10869  [pdf, other

    cs.LG cs.RO

    Diffusion-Based Imitation Learning for Social Pose Generation

    Authors: Antonio Lech Martin-Ozimek, Isuru Jayarathne, Su Larb Mon, Jouh Yeong Chew

    Abstract: Intelligent agents, such as robots and virtual agents, must understand the dynamics of complex social interactions to interact with humans. Effectively representing social dynamics is challenging because we require multi-modal, synchronized observations to understand a scene. We explore how using a single modality, the pose behavior, of multiple individuals in a social interaction can be used to g… ▽ More

    Submitted 18 January, 2025; originally announced January 2025.

    Comments: This paper was submitted as an LBR to HRI2025

  26. arXiv:2501.10857  [pdf, other

    cs.RO cs.LG

    Learning Nonverbal Cues in Multiparty Social Interactions for Robotic Facilitators

    Authors: Antonio Lech Martin-Ozimek, Isuru Jayarathne, Su Larb Mon, Jouhyeong Chew

    Abstract: Conventional behavior cloning (BC) models often struggle to replicate the subtleties of human actions. Previous studies have attempted to address this issue through the development of a new BC technique: Implicit Behavior Cloning (IBC). This new technique consistently outperformed the conventional Mean Squared Error (MSE) BC models in a variety of tasks. Our goal is to replicate the performance of… ▽ More

    Submitted 18 January, 2025; originally announced January 2025.

    Comments: Submitted to as a short contribution to HRI2025

  27. arXiv:2501.04211  [pdf, other

    cs.LG cs.AI

    CURing Large Models: Compression via CUR Decomposition

    Authors: Sanghyeon Park, Soo-Mook Moon

    Abstract: Large deep learning models have achieved remarkable success but are resource-intensive, posing challenges such as memory usage. We introduce CURing, a novel model compression method based on CUR matrix decomposition, which approximates weight matrices as the product of selected columns (C) and rows (R), and a small linking matrix (U). We apply this decomposition to weights chosen based on the comb… ▽ More

    Submitted 10 January, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

  28. arXiv:2411.16034  [pdf, other

    cs.CV

    VisualLens: Personalization through Visual History

    Authors: Wang Bill Zhu, Deqing Fu, Kai Sun, Yi Lu, Zhaojiang Lin, Seungwhan Moon, Kanika Narang, Mustafa Canim, Yue Liu, Anuj Kumar, Xin Luna Dong

    Abstract: We hypothesize that a user's visual history with images reflecting their daily life, offers valuable insights into their interests and preferences, and can be leveraged for personalization. Among the many challenges to achieve this goal, the foremost is the diversity and noises in the visual history, containing images not necessarily related to a recommendation task, not necessarily reflecting the… ▽ More

    Submitted 24 November, 2024; originally announced November 2024.

  29. arXiv:2411.11917  [pdf, other

    cs.CV

    FCC: Fully Connected Correlation for Few-Shot Segmentation

    Authors: Seonghyeon Moon, Haein Kong, Muhammad Haris Khan, Yuewei Lin

    Abstract: Few-shot segmentation (FSS) aims to segment the target object in a query image using only a small set of support images and masks. Therefore, having strong prior information for the target object using the support set is essential for guiding the initial training of FSS, which leads to the success of few-shot segmentation in challenging cases, such as when the target object shows considerable vari… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

  30. arXiv:2411.03707  [pdf

    cs.CV cs.AI

    Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction

    Authors: Muhammad Tayyab Khan, Lequn Chen, Ye Han Ng, Wenhe Feng, Nicholas Yew Jin Tan, Seung Ki Moon

    Abstract: Geometric Dimensioning and Tolerancing (GD&T) plays a critical role in manufacturing by defining acceptable variations in part features to ensure component quality and functionality. However, extracting GD&T information from 2D engineering drawings is a time-consuming and labor-intensive task, often relying on manual efforts or semi-automated tools. To address these challenges, this study proposes… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: Paper has been submitted to the 9th International Conference on Innovation in Artificial Intelligence (ICIAI 2025)

  31. arXiv:2411.02824  [pdf, other

    cs.LG eess.SY

    Layer-Adaptive State Pruning for Deep State Space Models

    Authors: Minseon Gwak, Seongrok Moon, Joohwan Ko, PooGyeon Park

    Abstract: Due to the lack of state dimension optimization methods, deep state space models (SSMs) have sacrificed model capacity, training search space, or stability to alleviate computational costs caused by high state dimensions. In this work, we provide a structured pruning method for SSMs, Layer-Adaptive STate pruning (LAST), which reduces the state dimension of each layer in minimizing model-level outp… ▽ More

    Submitted 31 January, 2025; v1 submitted 5 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024, Added missing arXiv information for one reference

  32. arXiv:2411.02810  [pdf

    cs.CE cs.IR

    Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs

    Authors: Muhammad Tayyab Khan, Lequn Chen, Ye Han Ng, Wenhe Feng, Nicholas Yew Jin Tan, Seung Ki Moon

    Abstract: Automatic feature recognition (AFR) is essential for transforming design knowledge into actionable manufacturing information. Traditional AFR methods, which rely on predefined geometric rules and large datasets, are often time-consuming and lack generalizability across various manufacturing features. To address these challenges, this study investigates vision-language models (VLMs) for automating… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: Paper has been submitted to The ASME Journal of Computing and Information Science in Engineering (JCISE)

  33. arXiv:2410.22801  [pdf, other

    physics.chem-ph cs.LG

    Machine Learning Nonadiabatic Dynamics: Eliminating Phase Freedom of Nonadiabatic Couplings with the State-Intraction State-Averaged Spin-Restricted Ensemble-Referenced Kohn-Sham Approach

    Authors: Sung Wook Moon, Soohaeng Yoo Willow, Tae Hyeon Park, Seung Kyu Min, Chang Woo Myung

    Abstract: Excited-state molecular dynamics (ESMD) simulations near conical intersections (CIs) pose significant challenges when using machine learning potentials (MLPs). Although MLPs have gained recognition for their integration into mixed quantum-classical (MQC) methods, such as trajectory surface hopping (TSH), and their capacity to model correlated electron-nuclear dynamics efficiently, difficulties per… ▽ More

    Submitted 16 January, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

  34. arXiv:2410.10577  [pdf, other

    cs.RO

    Words to Wheels: Vision-Based Autonomous Driving Understanding Human Language Instructions Using Foundation Models

    Authors: Chanhoe Ryu, Hyunki Seong, Daegyu Lee, Seongwoo Moon, Sungjae Min, D. Hyunchul Shim

    Abstract: This paper introduces an innovative application of foundation models, enabling Unmanned Ground Vehicles (UGVs) equipped with an RGB-D camera to navigate to designated destinations based on human language instructions. Unlike learning-based methods, this approach does not require prior training but instead leverages existing foundation models, thus facilitating generalization to novel environments.… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 7 pages, 7 figures

  35. arXiv:2410.06472  [pdf, other

    cs.RO cs.AI cs.HC

    Enabling Novel Mission Operations and Interactions with ROSA: The Robot Operating System Agent

    Authors: Rob Royce, Marcel Kaufmann, Jonathan Becktor, Sangwoo Moon, Kalind Carpenter, Kai Pak, Amanda Towler, Rohan Thakker, Shehryar Khattak

    Abstract: The advancement of robotic systems has revolutionized numerous industries, yet their operation often demands specialized technical knowledge, limiting accessibility for non-expert users. This paper introduces ROSA (Robot Operating System Agent), an AI-powered agent that bridges the gap between the Robot Operating System (ROS) and natural language interfaces. By leveraging state-of-the-art language… ▽ More

    Submitted 12 February, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: Preprint. Accepted at IEEE Aerospace Conference 2025, 16 pages, 12 figures

  36. arXiv:2410.05664  [pdf, other

    cs.CV cs.LG

    Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

    Authors: Saemi Moon, Minjong Lee, Sangdon Park, Dongwoo Kim

    Abstract: As text-to-image diffusion models gain widespread commercial applications, there are increasing concerns about unethical or harmful use, including the unauthorized generation of copyrighted or sensitive content. Concept unlearning has emerged as a promising solution to these challenges by removing undesired and harmful information from the pre-trained model. However, the previous evaluations prima… ▽ More

    Submitted 9 March, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

  37. arXiv:2410.02992  [pdf, other

    cs.AI cs.CL

    Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance

    Authors: Seungyong Moon, Bumsoo Park, Hyun Oh Song

    Abstract: While language models have demonstrated impressive capabilities across a range of tasks, they still struggle with tasks that require complex planning and reasoning. Recent studies have proposed training language models on search processes rather than optimal solutions, resulting in better generalization performance even though search processes are noisy and even suboptimal. However, these studies… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  38. arXiv:2410.01500  [pdf, other

    cs.LG cs.AI

    Discrete Diffusion Schrödinger Bridge Matching for Graph Transformation

    Authors: Jun Hyeong Kim, Seonghwan Kim, Seokhyun Moon, Hyeongwoo Kim, Jeheon Woo, Woo Youn Kim

    Abstract: Transporting between arbitrary distributions is a fundamental goal in generative modeling. Recently proposed diffusion bridge models provide a potential solution, but they rely on a joint distribution that is difficult to obtain in practice. Furthermore, formulations based on continuous domains limit their applicability to discrete domains such as graphs. To overcome these limitations, we propose… ▽ More

    Submitted 28 February, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: Accepted to ICLR 2025

  39. arXiv:2410.01371  [pdf

    cs.CE physics.app-ph

    A method to estimate well flowing gas-oil ratio and composition using pressure and temperature measurements across a production choke, a seed composition of oil and gas, and a thermodynamic simulator

    Authors: Seok Ki Moon, Milan Stanko

    Abstract: In this work we propose and demonstrate a method to estimate the flowing gas-oil ratio and composition of a hydrocarbon well stream using measurements of pressure and temperature across a production choke. The method consists of using a numerical solver on a thermodynamic simulator to recombine a seed oil and gas until the simulated temperature drop across the choke is equal to the measured value.… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: 21 pages, 11 figures

  40. arXiv:2409.19715  [pdf, other

    cs.CL

    Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code

    Authors: Hyungjoo Chae, Taeyoon Kwon, Seungjun Moon, Yongho Song, Dongjin Kang, Kai Tzu-iunn Ong, Beong-woo Kwak, Seonghyeon Bae, Seung-won Hwang, Jinyoung Yeo

    Abstract: This paper presents Coffee-Gym, a comprehensive RL environment for training models that provide feedback on code editing. Coffee-Gym includes two major components: (1) Coffee, a dataset containing humans' code edit traces for coding questions and machine-written feedback for editing erroneous code; (2) CoffeeEval, a reward function that faithfully reflects the helpfulness of feedback by assessing… ▽ More

    Submitted 4 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: EMNLP2024

  41. arXiv:2409.14985  [pdf, other

    cs.CV cs.AI

    Sparse-to-Dense LiDAR Point Generation by LiDAR-Camera Fusion for 3D Object Detection

    Authors: Minseung Lee, Seokha Moon, Seung Joon Lee, Jinkyu Kim

    Abstract: Accurately detecting objects at long distances remains a critical challenge in 3D object detection when relying solely on LiDAR sensors due to the inherent limitations of data sparsity. To address this issue, we propose the LiDAR-Camera Augmentation Network (LCANet), a novel framework that reconstructs LiDAR point cloud data by fusing 2D image features, which contain rich semantic information, gen… ▽ More

    Submitted 24 September, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: 7 pages

  42. arXiv:2409.10587  [pdf, other

    cs.CV

    SoccerNet 2024 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Victor Joos, Floriane Magera, Jan Held, Seyed Abolfazl Ghasemzadeh, Xin Zhou, Karolina Seweryn, Mateusz Kowalczyk, Zuzanna Mróz, Szymon Łukasik, Michał Hałoń, Hassan Mkhallati, Adrien Deliège, Carlos Hinojosa, Karen Sanchez, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Adam Gorski , et al. (59 additional authors not shown)

    Abstract: The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely loca… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 7 pages, 1 figure

  43. arXiv:2409.09905  [pdf, other

    cs.CL

    Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors

    Authors: Joseph Suh, Suhong Moon, Minwoo Kang, David M. Chan

    Abstract: Assessing personality traits using large language models (LLMs) has emerged as an interesting and challenging area of research. While previous methods employ explicit questionnaires, often derived from the Big Five model of personality, we hypothesize that LLMs implicitly encode notions of personality when modeling next-token responses. To demonstrate this, we introduce a novel approach that uncov… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

  44. arXiv:2409.06126  [pdf, other

    eess.AS cs.SD

    VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion

    Authors: Kyungguen Byun, Jason Filos, Erik Visser, Sunkuk Moon

    Abstract: Noise suppression (NS) algorithms are effective in improving speech quality in many cases. However, aggressive noise suppression can damage the target speech, reducing both speech intelligibility and quality despite removing the noise. This study proposes an explicit speech restoration method using a voice conversion (VC) technique for restoration after noise suppression. We observed that high-qua… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 5 pages, 3 figures, submitted to ICASSP 2025

  45. arXiv:2409.06107  [pdf, other

    cs.CL cs.AI

    Doppelgänger's Watch: A Split Objective Approach to Large Language Models

    Authors: Shervin Ghasemlou, Ashish Katiyar, Aparajita Saraf, Seungwhan Moon, Mangesh Pujari, Pinar Donmez, Babak Damavandi, Anuj Kumar

    Abstract: In this paper, we investigate the problem of "generation supervision" in large language models, and present a novel bicameral architecture to separate supervision signals from their core capability, helpfulness. Doppelgänger, a new module parallel to the underlying language model, supervises the generation of each token, and learns to concurrently predict the supervision score(s) of the sequences… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  46. arXiv:2409.02141  [pdf, other

    cs.LG cs.AI cs.CL

    Efficient and Scalable Estimation of Tool Representations in Vector Space

    Authors: Suhong Moon, Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Woosang Lim, Kurt Keutzer, Amir Gholami

    Abstract: Recent advancements in function calling and tool use have significantly enhanced the capabilities of large language models (LLMs) by enabling them to interact with external information sources and execute complex tasks. However, the limited context window of LLMs presents challenges when a large number of tools are available, necessitating efficient methods to manage prompt length and maintain acc… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  47. arXiv:2409.00608  [pdf, other

    cs.CL cs.LG

    TinyAgent: Function Calling at the Edge

    Authors: Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, Sehoon Kim, Ryan Tabrizi, Suhong Moon, Coleman Hooper, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami

    Abstract: Recent large language models (LLMs) have enabled the development of advanced agentic systems that can integrate various tools and APIs to fulfill user queries through function calling. However, the deployment of these LLMs on the edge has not been explored since they typically require cloud-based infrastructure due to their substantial model size and computational demands. To this end, we present… ▽ More

    Submitted 24 October, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024 Demo

  48. arXiv:2408.09358  [pdf, other

    cs.CV cs.AI

    Panorama Tomosynthesis from Head CBCT with Simulated Projection Geometry

    Authors: Anusree P. S., Bikram Keshari Parida, Seong Yong Moon, Wonsang You

    Abstract: Cone Beam Computed Tomography (CBCT) and Panoramic X-rays are the most commonly used imaging modalities in dental health care. CBCT can produce three-dimensional views of a patient's head, providing clinicians with better diagnostic capability, whereas Panoramic X-ray can capture the entire maxillofacial region in a single image. If the CBCT is already available, it can be beneficial to synthesize… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: 12 pages, 6 figures, 1 table, Journal submission planned

  49. arXiv:2408.07576  [pdf, other

    cs.CV cs.AI

    MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation

    Authors: Beoungwoo Kang, Seunghun Moon, Yubin Cho, Hyunwoo Yu, Suk-Ju Kang

    Abstract: Beyond the Transformer, it is important to explore how to exploit the capacity of the MetaFormer, an architecture that is fundamental to the performance improvements of the Transformer. Previous studies have exploited it only for the backbone network. Unlike previous studies, we explore the capacity of the Metaformer architecture more extensively in the semantic segmentation task. We propose a pow… ▽ More

    Submitted 14 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted by WACV 2024

  50. arXiv:2408.07326  [pdf, other

    cs.AR

    LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference

    Authors: Seungjae Moon, Jung-Hoon Kim, Junsoo Kim, Seongmin Hong, Junseo Cha, Minsu Kim, Sukbin Lim, Gyubin Choi, Dongjin Seo, Jongho Kim, Hunjong Lee, Hyunjun Park, Ryeowook Ko, Soongyu Choi, Jongse Park, Jinwon Lee, Joo-Young Kim

    Abstract: The explosive arrival of OpenAI's ChatGPT has fueled the globalization of large language model (LLM), which consists of billions of pretrained parameters that embodies the aspects of syntax and semantics. HyperAccel introduces latency processing unit (LPU), a latency-optimized and highly scalable processor architecture for the acceleration of LLM inference. LPU perfectly balances the memory bandwi… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.