Skip to main content

Showing 1–50 of 269 results for author: Kang, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06560  [pdf, ps, other

    cs.CV cs.LG

    Divergence-Based Similarity Function for Multi-View Contrastive Learning

    Authors: Jae Hyoung Jeon, Cheolsu Lim, Myungjoo Kang

    Abstract: Recent success in contrastive learning has sparked growing interest in more effectively leveraging multiple augmented views of an instance. While prior methods incorporate multiple views at the loss or feature level, they primarily capture pairwise relationships and fail to model the joint structure across all views. In this work, we propose a divergence-based similarity function (DSF) that explic… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: 9 pages, 5 figures

    MSC Class: 68T07; 62H12 ACM Class: I.2.6; I.4.8; I.5.1

  2. arXiv:2507.04748  [pdf, ps, other

    cs.AI

    LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction

    Authors: Sungmin Lee, Minju Kang, Joonhee Lee, Seungyong Lee, Dongju Kim, Jingi Hong, Jun Shin, Pei Zhang, JeongGil Ko

    Abstract: Question-answering (QA) interfaces powered by large language models (LLMs) present a promising direction for improving interactivity with HVAC system insights, particularly for non-expert users. However, enabling accurate, real-time, and context-aware interactions with HVAC systems introduces unique challenges, including the integration of frequently updated sensor data, domain-specific knowledge… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  3. arXiv:2507.01282  [pdf

    cs.AI cs.HC

    Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care

    Authors: Matthew JY Kang, Wenli Yang, Monica R Roberts, Byeong Ho Kang, Charles B Malpas

    Abstract: The recent boom of large language models (LLMs) has re-ignited the hope that artificial intelligence (AI) systems could aid medical diagnosis. Yet despite dazzling benchmark scores, LLM assistants have yet to deliver measurable improvements at the bedside. This scoping review aims to highlight the areas where AI is limited to make practical contributions in the clinical setting, specifically in de… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  4. arXiv:2506.19054  [pdf, ps, other

    cs.CR

    GuardSet-X: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset

    Authors: Mintong Kang, Zhaorun Chen, Chejian Xu, Jiawei Zhang, Chengquan Guo, Minzhou Pan, Ivan Revilla, Yu Sun, Bo Li

    Abstract: As LLMs become widespread across diverse applications, concerns about the security and safety of LLM interactions have intensified. Numerous guardrail models and benchmarks have been developed to ensure LLM content safety. However, existing guardrail benchmarks are often built upon ad hoc risk taxonomies that lack a principled grounding in standardized safety policies, limiting their alignment wit… ▽ More

    Submitted 25 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

  5. arXiv:2506.17756  [pdf, ps, other

    cond-mat.mtrl-sci cs.AI

    Residual Connection-Enhanced ConvLSTM for Lithium Dendrite Growth Prediction

    Authors: Hosung Lee, Byeongoh Hwang, Dasan Kim, Myungjoo Kang

    Abstract: The growth of lithium dendrites significantly impacts the performance and safety of rechargeable batteries, leading to short circuits and capacity degradation. This study proposes a Residual Connection-Enhanced ConvLSTM model to predict dendrite growth patterns with improved accuracy and computational efficiency. By integrating residual connections into ConvLSTM, the model mitigates the vanishing… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 14pages, 6figures, accepted to Journal of The Electrochemical Society

  6. arXiv:2506.12109  [pdf, ps, other

    cs.CL cs.AI

    Personalized LLM Decoding via Contrasting Personal Preference

    Authors: Hyungjune Bu, Chanjoo Jung, Minjae Kang, Jaehyung Kim

    Abstract: As large language models (LLMs) are progressively deployed in various real-world applications, personalization of LLMs has become increasingly important. While various approaches to LLM personalization such as prompt-based and training-based methods have been actively explored, the development of effective decoding-time algorithms remains largely overlooked, despite their demonstrated potential. I… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  7. arXiv:2506.09406  [pdf, ps, other

    cs.RO cs.LG

    Scoop-and-Toss: Dynamic Object Collection for Quadrupedal Systems

    Authors: Minji Kang, Chanwoo Baek, Yoonsang Lee

    Abstract: Quadruped robots have made significant advances in locomotion, extending their capabilities from controlled environments to real-world applications. Beyond movement, recent work has explored loco-manipulation using the legs to perform tasks such as pressing buttons or opening doors. While these efforts demonstrate the feasibility of leg-based manipulation, most have focused on relatively static ta… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  8. arXiv:2506.06311  [pdf, ps, other

    eess.SP cs.LG

    A Novel Shape-Aware Topological Representation for GPR Data with DNN Integration

    Authors: Meiyan Kang, Shizuo Kaji, Sang-Yun Lee, Taegon Kim, Hee-Hwan Ryu, Suyoung Choi

    Abstract: Ground Penetrating Radar (GPR) is a widely used Non-Destructive Testing (NDT) technique for subsurface exploration, particularly in infrastructure inspection and maintenance. However, conventional interpretation methods are often limited by noise sensitivity and a lack of structural awareness. This study presents a novel framework that enhances the detection of underground utilities, especially pi… ▽ More

    Submitted 26 May, 2025; originally announced June 2025.

    Comments: 15 pages, 6 figures

  9. arXiv:2506.01370  [pdf, ps, other

    cs.CV

    PointT2I: LLM-based text-to-image generation via keypoints

    Authors: Taekyung Lee, Donggyu Lee, Myungjoo Kang

    Abstract: Text-to-image (T2I) generation model has made significant advancements, resulting in high-quality images aligned with an input prompt. However, despite T2I generation's ability to generate fine-grained images, it still faces challenges in accurately generating images when the input prompt contains complex concepts, especially human pose. In this paper, we propose PointT2I, a framework that effecti… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  10. Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution

    Authors: Chang Eun Song, Priyansh Bhatnagar, Zihan Xia, Nam Sung Kim, Tajana Rosing, Mingu Kang

    Abstract: Transformers, while revolutionary, face challenges due to their demanding computational cost and large data movement. To address this, we propose HyFlexPIM, a novel mixed-signal processing-in-memory (PIM) accelerator for inference that flexibly utilizes both single-level cell (SLC) and multi-level cell (MLC) RRAM technologies to trade-off accuracy and efficiency. HyFlexPIM achieves efficient dual-… ▽ More

    Submitted 20 May, 2025; originally announced June 2025.

    Comments: Accepted by ISCA'25

  11. arXiv:2505.24336  [pdf, ps, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds

    Authors: Minsu Kang, Seolhee Lee, Choonghyeon Lee, Namhyun Cho

    Abstract: Human to non-human voice conversion (H2NH-VC) transforms human speech into animal or designed vocalizations. Unlike prior studies focused on dog-sounds and 16 or 22.05kHz audio transformation, this work addresses a broader range of non-speech sounds, including natural sounds (lion-roars, birdsongs) and designed voice (synthetic growls). To accomodate generation of diverse non-speech sounds and 44.… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: INTERSPEECH 2025 accepted

  12. arXiv:2505.23759  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.LG

    Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint

    Authors: Heekyung Lee, Jiaxin Ge, Tsung-Han Wu, Minwoo Kang, Trevor Darrell, David M. Chan

    Abstract: Rebus puzzles, visual riddles that encode language through imagery, spatial arrangement, and symbolic substitution, pose a unique challenge to current vision-language models (VLMs). Unlike traditional image captioning or question answering tasks, rebus solving requires multi-modal abstraction, symbolic reasoning, and a grasp of cultural, phonetic and linguistic puns. In this paper, we investigate… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  13. arXiv:2505.17612  [pdf, other

    cs.CL cs.AI

    Distilling LLM Agent into Small Models with Retrieval and Code Tools

    Authors: Minki Kang, Jongwon Jeong, Seanie Lee, Jaewoong Cho, Sung Ju Hwang

    Abstract: Large language models (LLMs) excel at complex reasoning tasks but remain computationally expensive, limiting their practical deployment. To address this, recent works have focused on distilling reasoning capabilities into smaller language models (sLMs) using chain-of-thought (CoT) traces from teacher LLMs. However, this approach struggles in scenarios requiring rare factual knowledge or precise co… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: preprint, v1

  14. arXiv:2505.15277  [pdf, other

    cs.CL

    Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

    Authors: Hyungjoo Chae, Sunghwan Kim, Junhee Cho, Seungone Kim, Seungjun Moon, Gyeom Hwangbo, Dongha Lim, Minjin Kim, Yeonjun Hwang, Minju Gwak, Dongwook Choi, Minseok Kang, Gwanhoon Im, ByeongUng Cho, Hyojun Kim, Jun Hee Han, Taeyoon Kwon, Minju Kim, Beong-woo Kwak, Dongjin Kang, Jinyoung Yeo

    Abstract: Web navigation is a unique domain that can automate many repetitive real-life tasks and is challenging as it requires long-horizon sequential decision making beyond typical multimodal large language model (MLLM) tasks. Yet, specialized reward models for web navigation that can be utilized during both training and test-time have been absent until now. Despite the importance of speed and cost-effect… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Work in progress

  15. arXiv:2505.03826  [pdf

    cs.CV cs.AI

    In-situ and Non-contact Etch Depth Prediction in Plasma Etching via Machine Learning (ANN & BNN) and Digital Image Colorimetry

    Authors: Minji Kang, Seongho Kim, Eunseo Go, Donghyeon Paek, Geon Lim, Muyoung Kim, Soyeun Kim, Sung Kyu Jang, Min Sup Choi, Woo Seok Kang, Jaehyun Kim, Jaekwang Kim, Hyeong-U Kim

    Abstract: Precise monitoring of etch depth and the thickness of insulating materials, such as Silicon dioxide and silicon nitride, is critical to ensuring device performance and yield in semiconductor manufacturing. While conventional ex-situ analysis methods are accurate, they are constrained by time delays and contamination risks. To address these limitations, this study proposes a non-contact, in-situ et… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 20 pages

  16. arXiv:2505.02011  [pdf, ps, other

    cs.LG cs.AI

    CASA: CNN Autoencoder-based Score Attention for Efficient Multivariate Long-term Time-series Forecasting

    Authors: Minhyuk Lee, HyeKyung Yoon, MyungJoo Kang

    Abstract: Multivariate long-term time series forecasting is critical for applications such as weather prediction, and traffic analysis. In addition, the implementation of Transformer variants has improved prediction accuracy. Following these variants, different input data process approaches also enhanced the field, such as tokenization techniques including point-wise, channel-wise, and patch-wise tokenizati… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  17. arXiv:2504.18283  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator

    Authors: Minjae Kang, Martim Brandão

    Abstract: Recent audio-visual generative models have made substantial progress in generating images from audio. However, existing approaches focus on generating images from single-class audio and fail to generate images from mixed audio. To address this, we propose an Audio-Visual Generation and Separation model (AV-GAS) for generating images from soundscapes (mixed audio containing multiple classes). Our c… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Originally submitted to CVPR 2025 on 2024-11-15 with paper ID 15808

  18. arXiv:2504.14919  [pdf, other

    cs.CV

    GenCLIP: Generalizing CLIP Prompts for Zero-shot Anomaly Detection

    Authors: Donghyeong Kim, Chaewon Park, Suhwan Cho, Hyeonjeong Lim, Minseok Kang, Jungho Lee, Sangyoun Lee

    Abstract: Zero-shot anomaly detection (ZSAD) aims to identify anomalies in unseen categories by leveraging CLIP's zero-shot capabilities to match text prompts with visual features. A key challenge in ZSAD is learning general prompts stably and utilizing them effectively, while maintaining both generalizability and category specificity. Although general prompts have been explored in prior works, achieving th… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  19. arXiv:2504.11673  [pdf, ps, other

    cs.CL

    Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

    Authors: Minwoo Kang, Suhong Moon, Seung Hyeong Lee, Ayush Raj, Joseph Suh, David M. Chan

    Abstract: Large language models (LLMs) are increasingly capable of simulating human behavior, offering cost-effective ways to estimate user responses to various surveys and polls. However, the questions in these surveys usually reflect socially understood attitudes: the patterns of attitudes of old/young, liberal/conservative, as understood by both members and non-members of those groups. It is not clear wh… ▽ More

    Submitted 20 June, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  20. arXiv:2504.04718  [pdf, other

    cs.CL cs.AI

    T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

    Authors: Minki Kang, Jongwon Jeong, Jaewoong Cho

    Abstract: Recent studies have demonstrated that test-time compute scaling effectively improves the performance of small language models (sLMs). However, prior research has mainly examined test-time compute scaling with an additional larger model as a verifier, leaving self-verification by sLMs underexplored. In this work, we investigate whether sLMs can reliably self-verify their outputs under test-time sca… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Preprint

  21. arXiv:2503.22738  [pdf, other

    cs.LG cs.CR

    ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

    Authors: Zhaorun Chen, Mintong Kang, Bo Li

    Abstract: Autonomous agents powered by foundation models have seen widespread adoption across various real-world applications. However, they remain highly vulnerable to malicious instructions and attacks, which can result in severe consequences such as privacy breaches and financial losses. More critically, existing guardrails for LLMs are not applicable due to the complex and dynamic nature of agents. To t… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  22. arXiv:2503.18673  [pdf, other

    cs.CV cs.AI cs.RO

    Any6D: Model-free 6D Pose Estimation of Novel Objects

    Authors: Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, Kuk-Jin Yoon

    Abstract: We introduce Any6D, a model-free framework for 6D object pose estimation that requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes. Unlike existing methods that rely on textured 3D models or multiple viewpoints, Any6D leverages a joint object alignment process to enhance 2D-3D alignment and metric scale estimation for improved pose accu… ▽ More

    Submitted 25 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: CVPR 2025, Project Page: https://taeyeop.com/any6d

  23. arXiv:2503.14827  [pdf, other

    cs.CL cs.AI cs.CR

    MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

    Authors: Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Yujin Potter, Zhun Wang, Zhuowen Yuan, Alexander Xiong, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Ziwei Tan, Xuandong Zhao, Francesco Pinto, Zhen Xiang, Yu Gai, Zinan Lin, Dan Hendrycks, Bo Li, Dawn Song

    Abstract: Multimodal foundation models (MMFMs) play a crucial role in various applications, including autonomous driving, healthcare, and virtual assistants. However, several studies have revealed vulnerabilities in these models, such as generating unsafe content by text-to-image models. Existing benchmarks on multimodal models either predominantly assess the helpfulness of these models, or only focus on li… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: ICLR 2025

  24. arXiv:2503.10040  [pdf

    cond-mat.supr-con cs.AI cs.LG

    Rapid analysis of point-contact Andreev reflection spectra via machine learning with adaptive data augmentation

    Authors: Dongik Lee, Valentin Stanev, Xiaohang Zhang, Mijeong Kang, Ichiro Takeuchi, Seunghun Lee

    Abstract: Delineating the superconducting order parameters is a pivotal task in investigating superconductivity for probing pairing mechanisms, as well as their symmetry and topology. Point-contact Andreev reflection (PCAR) measurement is a simple yet powerful tool for identifying the order parameters. The PCAR spectra exhibit significant variations depending on the type of the order parameter in a supercon… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 18 pages, 3 figures

  25. arXiv:2503.01872  [pdf, other

    cs.LG cs.AI cs.CV

    FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent Guidance

    Authors: Mintong Kang, Vinayshekhar Bannihatti Kumar, Shamik Roy, Abhishek Kumar, Sopan Khosla, Balakrishnan Murali Narayanaswamy, Rashmi Gangadharaiah

    Abstract: Text-to-image diffusion models often exhibit biases toward specific demographic groups, such as generating more males than females when prompted to generate images of engineers, raising ethical concerns and limiting their adoption. In this paper, we tackle the challenge of mitigating generation bias towards any target attribute value (e.g., "male" for "gender") in diffusion models while preserving… ▽ More

    Submitted 25 February, 2025; originally announced March 2025.

    Comments: Under submission

  26. arXiv:2502.17235  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Tidiness Score-Guided Monte Carlo Tree Search for Visual Tabletop Rearrangement

    Authors: Hogun Kee, Wooseok Oh, Minjae Kang, Hyemin Ahn, Songhwai Oh

    Abstract: In this paper, we present the tidiness score-guided Monte Carlo tree search (TSMCTS), a novel framework designed to address the tabletop tidying up problem using only an RGB-D camera. We address two major problems for tabletop tidying up problem: (1) the lack of public datasets and benchmarks, and (2) the difficulty of specifying the goal configuration of unseen objects. We address the former by p… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 9 pages, 8 figures

  27. arXiv:2502.16761  [pdf, other

    cs.CL

    Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions

    Authors: Joseph Suh, Erfan Jahanparast, Suhong Moon, Minwoo Kang, Serina Chang

    Abstract: Large language models (LLMs) present novel opportunities in public opinion research by predicting survey responses in advance during the early stages of survey design. Prior methods steer LLMs via descriptions of subpopulations as LLMs' input prompt, yet such prompt engineering approaches have struggled to faithfully predict the distribution of survey responses from human subjects. In this work, w… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  28. arXiv:2502.12464  [pdf, other

    cs.CL

    SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models

    Authors: Seanie Lee, Dong Bok Lee, Dominik Wagner, Minki Kang, Haebin Seong, Tobias Bocklet, Juho Lee, Sung Ju Hwang

    Abstract: Deploying large language models (LLMs) in real-world applications requires robust safety guard models to detect and block harmful user prompts. While large safety guard models achieve strong performance, their computational cost is substantial. To mitigate this, smaller distilled models are used, but they often underperform on "hard" examples where the larger model provides accurate predictions. W… ▽ More

    Submitted 21 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: ACL 2025 findings

  29. arXiv:2502.08673  [pdf, ps, other

    cs.AI cs.LG

    High-Throughput SAT Sampling

    Authors: Arash Ardakani, Minwoo Kang, Kevin He, Qijing Huang, John Wawrzynek

    Abstract: In this work, we present a novel technique for GPU-accelerated Boolean satisfiability (SAT) sampling. Unlike conventional sampling algorithms that directly operate on conjunctive normal form (CNF), our method transforms the logical constraints of SAT problems by factoring their CNF representations into simplified multi-level, multi-output Boolean functions. It then leverages gradient-based optimiz… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: 7 pages

  30. DEMOTIC: A Differentiable Sampler for Multi-Level Digital Circuits

    Authors: Arash Ardakani, Minwoo Kang, Kevin He, Qijing Huang, Vighnesh Iyer, Suhong Moon, John Wawrzynek

    Abstract: Efficient sampling of satisfying formulas for circuit satisfiability (CircuitSAT), a well-known NP-complete problem, is essential in modern front-end applications for thorough testing and verification of digital circuits. Generating such samples is a hard computational problem due to the inherent complexity of digital circuits, size of the search space, and resource constraints involved in the pro… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: 7 pages

  31. arXiv:2502.06047  [pdf, other

    cs.LG

    Neural Shortest Path for Surface Reconstruction from Point Clouds

    Authors: Yesom Park, Imseong Park, Jooyoung Hahn, Myungjoo Kang

    Abstract: In this paper, we propose the neural shortest path (NSP), a vector-valued implicit neural representation (INR) that approximates a distance function and its gradient. The key feature of NSP is to learn the exact shortest path (ESP), which directs an arbitrary point to its nearest point on the target surface. The NSP is decomposed into its magnitude and direction, and a variable splitting method is… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  32. arXiv:2502.04029  [pdf, other

    cs.HC

    Echo-Teddy: Preliminary Design and Development of Large Language Model-based Social Robot for Autistic Students

    Authors: Unggi Lee, Hansung Kim, Juhong Eom, Hyeonseo Jeong, Seungyeon Lee, Gyuri Byun, Yunseo Lee, Minji Kang, Gospel Kim, Jihoi Na, Jewoong Moon, Hyeoncheol Kim

    Abstract: Autistic students often face challenges in social interaction, which can hinder their educational and personal development. This study introduces Echo-Teddy, a Large Language Model (LLM)-based social robot designed to support autistic students in developing social and communication skills. Unlike previous chatbot-based solutions, Echo-Teddy leverages advanced LLM capabilities to provide more natur… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  33. arXiv:2501.17657  [pdf, other

    math.CO cs.DM

    Belief Propagation Guided Decimation on Random k-XORSAT

    Authors: Arnab Chatterjee, Amin Coja-Oghlan, Mihyun Kang, Lena Krieg, Maurice Rolvien, Gregory B. Sorkin

    Abstract: We analyse the performance of Belief Propagation Guided Decimation, a physics-inspired message passing algorithm, on the random $k$-XORSAT problem. Specifically, we derive an explicit threshold up to which the algorithm succeeds with a strictly positive probability $Ī©(1)$ that we compute explicitly, but beyond which the algorithm with high probability fails to find a satisfying assignment. In addi… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

    MSC Class: 60B20; 68W20

  34. arXiv:2501.02409  [pdf, other

    cs.LG cs.AI cs.CE q-bio.MN stat.ME

    Interpretable Neural ODEs for Gene Regulatory Network Discovery under Perturbations

    Authors: Zaikang Lin, Sei Chang, Aaron Zweig, Minseo Kang, Elham Azizi, David A. Knowles

    Abstract: Modern high-throughput biological datasets with thousands of perturbations provide the opportunity for large-scale discovery of causal graphs that represent the regulatory interactions between genes. Differentiable causal graphical models have been proposed to infer a gene regulatory network (GRN) from large scale interventional datasets, capturing the causal gene regulatory relationships from gen… ▽ More

    Submitted 1 February, 2025; v1 submitted 4 January, 2025; originally announced January 2025.

  35. arXiv:2501.01094  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions

    Authors: Suhwan Choi, Kyu Won Kim, Myungjoo Kang

    Abstract: We introduce Multimodal Matching based on Valence and Arousal (MMVA), a tri-modal encoder framework designed to capture emotional content across images, music, and musical captions. To support this framework, we expand the Image-Music-Emotion-Matching-Net (IMEMNet) dataset, creating IMEMNet-C which includes 24,756 images and 25,944 music clips with corresponding musical captions. We employ multimo… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: Paper accepted in Artificial Intelligence for Music workshop at AAAI 2025

  36. arXiv:2412.20155  [pdf, other

    cs.SD cs.AI eess.AS

    Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting

    Authors: Wooseok Han, Minki Kang, Changhun Kim, Eunho Yang

    Abstract: Speaker-adaptive Text-to-Speech (TTS) synthesis has attracted considerable attention due to its broad range of applications, such as personalized voice assistant services. While several approaches have been proposed, they often exhibit high sensitivity to either the quantity or the quality of target speech samples. To address these limitations, we introduce Stable-TTS, a novel speaker-adaptive TTS… ▽ More

    Submitted 28 December, 2024; originally announced December 2024.

    Comments: Accepted by ICASSP 2025

  37. arXiv:2412.15798  [pdf, other

    cs.CV

    Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance

    Authors: Hyunsoo Lee, Minsoo Kang, Bohyung Han

    Abstract: We present a simple but effective training-free approach for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our goal is to generate an image that aligns with the target task while preserving the structure and background of a source image. To this end, we derive the representation guidance with a combination of two objectives: maximizing the similarity t… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: WACV 2025

  38. arXiv:2412.08608  [pdf, other

    cs.SD cs.AI cs.CR eess.AS

    AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models

    Authors: Mintong Kang, Chejian Xu, Bo Li

    Abstract: Recent advancements in large audio-language models (LALMs) have enabled speech-based user interactions, significantly enhancing user experience and accelerating the deployment of LALMs in real-world applications. However, ensuring the safety of LALMs is crucial to prevent risky outputs that may raise societal concerns or violate AI regulations. Despite the importance of this issue, research on jai… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  39. arXiv:2411.19527  [pdf, other

    cs.CV cs.AI cs.LG

    DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

    Authors: Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu

    Abstract: Human motion is inherently continuous and dynamic, posing significant challenges for generative models. While discrete generation methods are widely used, they suffer from limited expressiveness and frame-wise noise artifacts. In contrast, continuous approaches produce smoother, more natural motion but often struggle to adhere to conditioning signals due to high-dimensional complexity and limited… ▽ More

    Submitted 18 April, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

    Comments: 11 pages

  40. arXiv:2411.10543  [pdf, other

    cs.LG cs.CL

    SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism

    Authors: Priyansh Bhatnagar, Linfeng Wen, Mingu Kang

    Abstract: Extensive efforts have been made to boost the performance in the domain of language models by introducing various attention-based transformers. However, the inclusion of linear layers with large dimensions contributes to significant computational and memory overheads. The escalating computational demands of these models necessitate the development of various compression techniques to ensure their… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

  41. arXiv:2411.09760  [pdf, other

    cs.AR cs.ET eess.SP

    SpecPCM: A Low-power PCM-based In-Memory Computing Accelerator for Full-stack Mass Spectrometry Analysis

    Authors: Keming Fan, Ashkan Moradifirouzabadi, Xiangjin Wu, Zheyu Li, Flavio Ponzina, Anton Persson, Eric Pop, Tajana Rosing, Mingu Kang

    Abstract: Mass spectrometry (MS) is essential for proteomics and metabolomics but faces impending challenges in efficiently processing the vast volumes of data. This paper introduces SpecPCM, an in-memory computing (IMC) accelerator designed to achieve substantial improvements in energy and delay efficiency for both MS spectral clustering and database (DB) search. SpecPCM employs analog processing with low-… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

  42. arXiv:2411.00686  [pdf, other

    cs.CL cs.AI

    Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models

    Authors: Minki Kang, Sung Ju Hwang, Gibbeum Lee, Jaewoong Cho

    Abstract: As Large Language Models (LLMs) are increasingly deployed in specialized domains with continuously evolving knowledge, the need for timely and precise knowledge injection has become essential. Fine-tuning with paraphrased data is a common approach to enhance knowledge injection, yet it faces two significant challenges: high computational costs due to repetitive external model usage and limited sam… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  43. arXiv:2411.00404  [pdf, other

    cs.LG

    Fast Adaptation with Kernel and Gradient based Meta Leaning

    Authors: JuneYoung Park, MinJae Kang

    Abstract: Model Agnostic Meta Learning or MAML has become the standard for few-shot learning as a meta-learning problem. MAML is simple and can be applied to any model, as its name suggests. However, it often suffers from instability and computational inefficiency during both training and inference times. In this paper, we propose two algorithms to improve both the inner and outer loops of MAML, then pose a… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: 12 pages(with reference), 2 figures, 4 tables

  44. arXiv:2410.23299  [pdf, other

    cs.AR cs.AI

    FVEval: Understanding Language Model Capabilities in Formal Verification of Digital Hardware

    Authors: Minwoo Kang, Mingjie Liu, Ghaith Bany Hamad, Syed Suhaib, Haoxing Ren

    Abstract: The remarkable reasoning and code generation capabilities of large language models (LLMs) have spurred significant interest in applying LLMs to enable task automation in digital chip design. In particular, recent work has investigated early ideas of applying these models to formal verification (FV), an approach to verifying hardware implementations that can provide strong guarantees of confidence… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  45. arXiv:2410.21822  [pdf

    cs.CV eess.IV eess.SP stat.AP

    PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices

    Authors: Ming Kang, Fung Fung Ting, Raphaƫl C. -W. Phan, Chee-Ming Ting

    Abstract: Brain tumor detection in multiplane Magnetic Resonance Imaging (MRI) slices is a challenging task due to the various appearances and relationships in the structure of the multiplane images. In this paper, we propose a new You Only Look Once (YOLO)-based detection model that incorporates Pretrained Knowledge (PK), called PK-YOLO, to improve the performance for brain tumor detection in multiplane MR… ▽ More

    Submitted 21 April, 2025; v1 submitted 29 October, 2024; originally announced October 2024.

    Comments: References updated; for example, papers in NeurIPS 2024 proceedings appeared on 6 Feb 2025 and AAAI 2025 one on 11 Apr 2025

    MSC Class: 68U10 (Primary) 68T10; 68T07; 62P10 (Secondary) ACM Class: I.4.6; I.5.1; J.3

    Journal ref: In WACV (2025) 3732--3741

  46. arXiv:2410.17401  [pdf, ps, other

    cs.CR cs.CL

    AdvAgent: Controllable Blackbox Red-teaming on Web Agents

    Authors: Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li

    Abstract: Foundation model-based agents are increasingly used to automate complex tasks, enhancing efficiency and productivity. However, their access to sensitive resources and autonomous decision-making also introduce significant security risks, where successful attacks could lead to severe consequences. To systematically uncover these vulnerabilities, we propose AdvAgent, a black-box red-teaming framework… ▽ More

    Submitted 31 May, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: ICML 2025

  47. arXiv:2410.11259  [pdf, other

    cs.CV

    Rethinking the Role of Infrastructure in Collaborative Perception

    Authors: Hyunchul Bae, Minhee Kang, Minwoo Song, Heejin Ahn

    Abstract: Collaborative Perception (CP) is a process in which an ego agent receives and fuses sensor information from surrounding vehicles and infrastructure to enhance its perception capability. To evaluate the need for infrastructure equipped with sensors, extensive and quantitative analysis of the role of infrastructure data in CP is crucial, yet remains underexplored. To address this gap, we first quant… ▽ More

    Submitted 29 April, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: Accepted by ECCV 2024 Workshop MAAS, 14 pages

  48. arXiv:2410.09032  [pdf, other

    cs.CV cs.LG

    Alberta Wells Dataset: Pinpointing Oil and Gas Wells from Satellite Imagery

    Authors: Pratinav Seth, Michelle Lin, Brefo Dwamena Yaw, Jade Boutot, Mary Kang, David Rolnick

    Abstract: Millions of abandoned oil and gas wells are scattered across the world, leaching methane into the atmosphere and toxic compounds into the groundwater. Many of these locations are unknown, preventing the wells from being plugged and their polluting effects averted. Remote sensing is a relatively unexplored tool for pinpointing abandoned wells at scale. We introduce the first large-scale benchmark d… ▽ More

    Submitted 25 May, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

    Comments: Accepted to ICML 2025

  49. arXiv:2410.06442  [pdf, other

    cs.LG cs.AI

    MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data

    Authors: Mingu Kang, Dongseok Lee, Woojin Cho, Jaehyeon Park, Kookjin Lee, Anthony Gruber, Youngjoon Hong, Noseong Park

    Abstract: Large language models (LLMs), like ChatGPT, have shown that even trained with noisy prior data, they can generalize effectively to new tasks through in-context learning (ICL) and pre-training techniques. Motivated by this, we explore whether a similar approach can be applied to scientific foundation models (SFMs). Our methodology is structured as follows: (i) we collect low-cost physics-informed n… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  50. arXiv:2410.05829  [pdf, other

    cs.RO

    A GPT-based Decision Transformer for Multi-Vehicle Coordination at Unsignalized Intersections

    Authors: Eunjae Lee, Minhee Kang, Yoojin Choi, Heejin Ahn

    Abstract: In this paper, we explore the application of the Decision Transformer, a decision-making algorithm based on the Generative Pre-trained Transformer (GPT) architecture, to multi-vehicle coordination at unsignalized intersections. We formulate the coordination problem so as to find the optimal trajectories for multiple vehicles at intersections, modeling it as a sequence prediction task to fully leve… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 7 pages