Search | arXiv e-print repository

Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness

Authors: Yongjin Yang, Euiin Yi, Jongwoo Ko, Kimin Lee, Zhijing Jin, Se-Young Yun

Abstract: The remarkable growth in large language model (LLM) capabilities has spurred exploration into multi-agent systems, with debate frameworks emerging as a promising avenue for enhanced problem-solving. These multi-agent debate (MAD) approaches, where agents collaboratively present, critique, and refine arguments, potentially offer improved reasoning, robustness, and diverse perspectives over monolith… ▽ More The remarkable growth in large language model (LLM) capabilities has spurred exploration into multi-agent systems, with debate frameworks emerging as a promising avenue for enhanced problem-solving. These multi-agent debate (MAD) approaches, where agents collaboratively present, critique, and refine arguments, potentially offer improved reasoning, robustness, and diverse perspectives over monolithic models. Despite prior studies leveraging MAD, a systematic understanding of its effectiveness compared to self-agent methods, particularly under varying conditions, remains elusive. This paper seeks to fill this gap by conceptualizing MAD as a test-time computational scaling technique, distinguished by collaborative refinement and diverse exploration capabilities. We conduct a comprehensive empirical investigation comparing MAD with strong self-agent test-time scaling baselines on mathematical reasoning and safety-related tasks. Our study systematically examines the influence of task difficulty, model scale, and agent diversity on MAD's performance. Key findings reveal that, for mathematical reasoning, MAD offers limited advantages over self-agent scaling but becomes more effective with increased problem difficulty and decreased model capability, while agent diversity shows little benefit. Conversely, for safety tasks, MAD's collaborative refinement can increase vulnerability, but incorporating diverse agent configurations facilitates a gradual reduction in attack success through the collaborative refinement process. We believe our findings provide critical guidance for the future development of more effective and strategically deployed MAD systems. △ Less

Submitted 28 May, 2025; originally announced May 2025.

Comments: Preprint, under review

arXiv:2505.20770 [pdf, ps, other]

Can Large Language Models Predict Audio Effects Parameters from Natural Language?

Authors: Seungheon Doh, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Juhan Nam, Yuki Mitsufuji

Abstract: In music production, manipulating audio effects (Fx) parameters through natural language has the potential to reduce technical barriers for non-experts. We present LLM2Fx, a framework leveraging Large Language Models (LLMs) to predict Fx parameters directly from textual descriptions without requiring task-specific training or fine-tuning. Our approach address the text-to-effect parameter predictio… ▽ More In music production, manipulating audio effects (Fx) parameters through natural language has the potential to reduce technical barriers for non-experts. We present LLM2Fx, a framework leveraging Large Language Models (LLMs) to predict Fx parameters directly from textual descriptions without requiring task-specific training or fine-tuning. Our approach address the text-to-effect parameter prediction (Text2Fx) task by mapping natural language descriptions to the corresponding Fx parameters for equalization and reverberation. We demonstrate that LLMs can generate Fx parameters in a zero-shot manner that elucidates the relationship between timbre semantics and audio effects in music production. To enhance performance, we introduce three types of in-context examples: audio Digital Signal Processing (DSP) features, DSP function code, and few-shot examples. Our results demonstrate that LLM-based Fx parameter generation outperforms previous optimization approaches, offering competitive performance in translating natural language descriptions to appropriate Fx settings. Furthermore, LLMs can serve as text-driven interfaces for audio production, paving the way for more intuitive and accessible music production tools. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Comments: Submitted to WASPAA 2025

arXiv:2505.19427 [pdf, ps, other]

WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference

Authors: Sihan Chen, Dan Zhao, Jongwoo Ko, Colby Banbury, Huiping Zhuang, Luming Liang, Tianyi Chen

Abstract: The growing computational demands of large language models (LLMs) make efficient inference and activation strategies increasingly critical. While recent approaches, such as Mixture-of-Experts (MoE), leverage selective activation but require specialized training, training-free sparse activation methods offer broader applicability and superior resource efficiency through their plug-and-play design.… ▽ More The growing computational demands of large language models (LLMs) make efficient inference and activation strategies increasingly critical. While recent approaches, such as Mixture-of-Experts (MoE), leverage selective activation but require specialized training, training-free sparse activation methods offer broader applicability and superior resource efficiency through their plug-and-play design. However, many existing methods rely solely on hidden state magnitudes to determine activation, resulting in high approximation errors and suboptimal inference accuracy. To address these limitations, we propose WINA (Weight Informed Neuron Activation), a novel, simple, and training-free sparse activation framework that jointly considers hidden state magnitudes and the column-wise $\ell_2$-norms of weight matrices. We show that this leads to a sparsification strategy that obtains optimal approximation error bounds with theoretical guarantees tighter than existing techniques. Empirically, WINA also outperforms state-of-the-art methods (e.g., TEAL) by up to $2.94\%$ in average performance at the same sparsity levels, across a diverse set of LLM architectures and datasets. These results position WINA as a new performance frontier for training-free sparse activation in LLM inference, advancing training-free sparse activation methods and setting a robust baseline for efficient inference. The source code is available at https://github.com/microsoft/wina. △ Less

Submitted 25 May, 2025; originally announced May 2025.

arXiv:2505.19401 [pdf, ps, other]

Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement

Authors: Jangyeon Kim, Ui-Hyeop Shin, Jaehyun Ko, Hyung-Min Park

Abstract: This paper presents an efficient speech enhancement (SE) approach that reuses a processing block repeatedly instead of conventional stacking. Rather than increasing the number of blocks for learning deep latent representations, repeating a single block leads to progressive refinement while reducing parameter redundancy. We also minimize domain transformation by keeping an encoder and decoder shall… ▽ More This paper presents an efficient speech enhancement (SE) approach that reuses a processing block repeatedly instead of conventional stacking. Rather than increasing the number of blocks for learning deep latent representations, repeating a single block leads to progressive refinement while reducing parameter redundancy. We also minimize domain transformation by keeping an encoder and decoder shallow and reusing a single sequence modeling block. Experimental results show that the number of processing stages is more critical to performance than the number of blocks with different weights. Also, we observed that the proposed method gradually refines a noisy input within a single block. Furthermore, with the block reuse method, we demonstrate that deepening the encoder and decoder can be redundant for learning deep complex representation. Therefore, the experimental results confirm that the proposed block reusing enables progressive learning and provides an efficient alternative for SE. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: Accepted to Interspeech 2025

arXiv:2505.18601 [pdf, ps, other]

Flex-Judge: Think Once, Judge Anywhere

Authors: Jongwoo Ko, Sungnyun Kim, Sungwoo Cho, Se-Young Yun

Abstract: Human-generated reward signals are critical for aligning generative models with human preferences, guiding both training and inference-time evaluations. While large language models (LLMs) employed as proxy evaluators, i.e., LLM-as-a-Judge, significantly reduce the costs associated with manual annotations, they typically require extensive modality-specific training data and fail to generalize well… ▽ More Human-generated reward signals are critical for aligning generative models with human preferences, guiding both training and inference-time evaluations. While large language models (LLMs) employed as proxy evaluators, i.e., LLM-as-a-Judge, significantly reduce the costs associated with manual annotations, they typically require extensive modality-specific training data and fail to generalize well across diverse multimodal tasks. In this paper, we propose Flex-Judge, a reasoning-guided multimodal judge model that leverages minimal textual reasoning data to robustly generalize across multiple modalities and evaluation formats. Our core intuition is that structured textual reasoning explanations inherently encode generalizable decision-making patterns, enabling an effective transfer to multimodal judgments, e.g., with images or videos. Empirical results demonstrate that Flex-Judge, despite being trained on significantly fewer text data, achieves competitive or superior performance compared to state-of-the-art commercial APIs and extensively trained multimodal evaluators. Notably, Flex-Judge presents broad impact in modalities like molecule, where comprehensive evaluation benchmarks are scarce, underscoring its practical value in resource-constrained domains. Our framework highlights reasoning-based text supervision as a powerful, cost-effective alternative to traditional annotation-intensive approaches, substantially advancing scalable multimodal model-as-a-judge. △ Less

Submitted 24 May, 2025; originally announced May 2025.

Comments: The code is available at https://github.com/jongwooko/flex-judge

arXiv:2505.15598 [pdf, ps, other]

Limits of $(\infty, 1)$-categories with structure and their lax morphisms

Authors: Joanna Ko

Abstract: Riehl and Verity have established that for a quasi-category $A$ that admits limits, and a homotopy coherent monad on $A$ which does not preserve limits, the Eilenberg-Moore object still admits limits; this can be interpreted as a completeness result involving lax morphisms. We generalise their result to different models for $(\infty, 1)$-categories, with an abundant variety of structures. For inst… ▽ More Riehl and Verity have established that for a quasi-category $A$ that admits limits, and a homotopy coherent monad on $A$ which does not preserve limits, the Eilenberg-Moore object still admits limits; this can be interpreted as a completeness result involving lax morphisms. We generalise their result to different models for $(\infty, 1)$-categories, with an abundant variety of structures. For instance, $(\infty, 1)$-categories with limits, Cartesian fibrations between $(\infty, 1)$-categories, and adjunctions between $(\infty, 1)$-categories. In addition, we show that these $(\infty, 1)$-categories with structure in fact possess an important class of limits of lax morphisms, including $\infty$-categorical versions of inserters and equifiers, when only one morphism in the diagram is required to be structure-preserving. Our approach provides a minimal requirement and a transparent explanation for several kinds of limits of $(\infty, 1)$-categories and their lax morphisms to exist. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: 56 pages

MSC Class: 18N60

arXiv:2505.11315 [pdf, other]

Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior

Authors: Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Yuki Mitsufuji, György Fazekas

Abstract: Style Transfer with Inference-Time Optimisation (ST-ITO) is a recent approach for transferring the applied effects of a reference audio to a raw audio track. It optimises the effect parameters to minimise the distance between the style embeddings of the processed audio and the reference. However, this method treats all possible configurations equally and relies solely on the embedding space, which… ▽ More Style Transfer with Inference-Time Optimisation (ST-ITO) is a recent approach for transferring the applied effects of a reference audio to a raw audio track. It optimises the effect parameters to minimise the distance between the style embeddings of the processed audio and the reference. However, this method treats all possible configurations equally and relies solely on the embedding space, which can lead to unrealistic or biased results. We address this pitfall by introducing a Gaussian prior derived from a vocal preset dataset, DiffVox, over the parameter space. The resulting optimisation is equivalent to maximum-a-posteriori estimation. Evaluations on vocal effects transfer on the MedleyDB dataset show significant improvements across metrics compared to baselines, including a blind audio effects estimator, nearest-neighbour approaches, and uncalibrated ST-ITO. The proposed calibration reduces parameter mean squared error by up to 33% and matches the reference style better. Subjective evaluations with 16 participants confirm our method's superiority, especially in limited data regimes. This work demonstrates how incorporating prior knowledge in inference time enhances audio effects transfer, paving the way for more effective and realistic audio processing systems. △ Less

Submitted 16 May, 2025; originally announced May 2025.

Comments: Submitted to WASPAA 2025

arXiv:2505.10871 [pdf, other]

Optimal Allocation of Privacy Budget on Hierarchical Data Release

Authors: Joonhyuk Ko, Juba Ziani, Ferdinando Fioretto

Abstract: Releasing useful information from datasets with hierarchical structures while preserving individual privacy presents a significant challenge. Standard privacy-preserving mechanisms, and in particular Differential Privacy, often require careful allocation of a finite privacy budget across different levels and components of the hierarchy. Sub-optimal allocation can lead to either excessive noise, re… ▽ More Releasing useful information from datasets with hierarchical structures while preserving individual privacy presents a significant challenge. Standard privacy-preserving mechanisms, and in particular Differential Privacy, often require careful allocation of a finite privacy budget across different levels and components of the hierarchy. Sub-optimal allocation can lead to either excessive noise, rendering the data useless, or to insufficient protections for sensitive information. This paper addresses the critical problem of optimal privacy budget allocation for hierarchical data release. It formulates this challenge as a constrained optimization problem, aiming to maximize data utility subject to a total privacy budget while considering the inherent trade-offs between data granularity and privacy loss. The proposed approach is supported by theoretical analysis and validated through comprehensive experiments on real hierarchical datasets. These experiments demonstrate that optimal privacy budget allocation significantly enhances the utility of the released data and improves the performance of downstream tasks. △ Less

Submitted 16 May, 2025; originally announced May 2025.

arXiv:2505.09163 [pdf, ps, other]

Inverse limits of CM points on certain Shimura varieties

Authors: Ho Yun Jung, Ja Kyung Koo, Dong Hwa Shin

Abstract: Let $N$ be a positive integer, and let $D\equiv0$ or $1\Mod{4}$ be a negative integer. We define the sets $\mathcal{CM}(D,\,Y_1(N)^\pm)$ and $\mathcal{CM}(D,\,Y(N)^\pm)$ as subsets of the Shimura varieties $Y_1(N)^\pm$ and $Y(N)^\pm$, respectively, consisting of CM points of discriminant $D$ that are primitive modulo $N$. By using the theory of definite form class groups, we show that the inverse… ▽ More Let $N$ be a positive integer, and let $D\equiv0$ or $1\Mod{4}$ be a negative integer. We define the sets $\mathcal{CM}(D,\,Y_1(N)^\pm)$ and $\mathcal{CM}(D,\,Y(N)^\pm)$ as subsets of the Shimura varieties $Y_1(N)^\pm$ and $Y(N)^\pm$, respectively, consisting of CM points of discriminant $D$ that are primitive modulo $N$. By using the theory of definite form class groups, we show that the inverse limits \begin{equation*} \varprojlim_N\,\mathcal{CM}(D,\,Y_1(N)^\pm)\quad\textrm{and}\quad \varprojlim_N\,\mathcal{CM}(D,\,Y(N)^\pm) \end{equation*} naturally inherit group structures isomorphic to $\mathrm{Gal}(K^\mathrm{ab}/\mathbb{Q})$ and $\mathrm{Gal}(K^\mathrm{ab}(t^{1/\infty})/\mathbb{Q}(t))$, respectively, where $K=\mathbb{Q}(\sqrt{D})$ and $t$ is a transcendental number. These results provide an explicit and geometric interpretation of class field theory in terms of inverse limits of CM points on the associated Shimura varieties. △ Less

Submitted 14 May, 2025; originally announced May 2025.

MSC Class: 11R37; 11E57; 11G18

arXiv:2505.06544 [pdf, ps, other]

Event-based Neural Spike Detection Using Spiking Neural Networks for Neuromorphic iBMI Systems

Authors: Chanwook Hwang, Biyan Zhou, Ye Ke, Vivek Mohan, Jong Hwan Ko, Arindam Basu

Abstract: Implantable brain-machine interfaces (iBMIs) are evolving to record from thousands of neurons wirelessly but face challenges in data bandwidth, power consumption, and implant size. We propose a novel Spiking Neural Network Spike Detector (SNN-SPD) that processes event-based neural data generated via delta modulation and pulse count modulation, converting signals into sparse events. By leveraging t… ▽ More Implantable brain-machine interfaces (iBMIs) are evolving to record from thousands of neurons wirelessly but face challenges in data bandwidth, power consumption, and implant size. We propose a novel Spiking Neural Network Spike Detector (SNN-SPD) that processes event-based neural data generated via delta modulation and pulse count modulation, converting signals into sparse events. By leveraging the temporal dynamics and inherent sparsity of spiking neural networks, our method improves spike detection performance while maintaining low computational overhead suitable for implantable devices. Our experimental results demonstrate that the proposed SNN-SPD achieves an accuracy of 95.72% at high noise levels (standard deviation 0.2), which is about 2% higher than the existing Artificial Neural Network Spike Detector (ANN-SPD). Moreover, SNN-SPD requires only 0.41% of the computation and about 26.62% of the weight parameters compared to ANN-SPD, with zero multiplications. This approach balances efficiency and performance, enabling effective data compression and power savings for next-generation iBMIs. △ Less

Submitted 10 May, 2025; originally announced May 2025.

Comments: 4 pages, 2 figures, to be published in 2025 IEEE International Symposium on Circuits and Systems (ISCAS) proceedings

arXiv:2505.03867 [pdf, other]

doi 10.1145/3713043.3727051

Scratch Copilot: Supporting Youth Creative Coding with AI

Authors: Stefania Druga, Amy J. Ko

Abstract: Creative coding platforms like Scratch have democratized programming for children, yet translating imaginative ideas into functional code remains a significant hurdle for many young learners. While AI copilots assist adult programmers, few tools target children in block-based environments. Building on prior research \cite{druga_how_2021,druga2023ai, druga2023scratch}, we present Cognimates Scratch… ▽ More Creative coding platforms like Scratch have democratized programming for children, yet translating imaginative ideas into functional code remains a significant hurdle for many young learners. While AI copilots assist adult programmers, few tools target children in block-based environments. Building on prior research \cite{druga_how_2021,druga2023ai, druga2023scratch}, we present Cognimates Scratch Copilot: an AI-powered assistant integrated into a Scratch-like environment, providing real-time support for ideation, code generation, debugging, and asset creation. This paper details the system architecture and findings from an exploratory qualitative evaluation with 18 international children (ages 7--12). Our analysis reveals how the AI Copilot supported key creative coding processes, particularly aiding ideation and debugging. Crucially, it also highlights how children actively negotiated the use of AI, demonstrating strong agency by adapting or rejecting suggestions to maintain creative control. Interactions surfaced design tensions between providing helpful scaffolding and fostering independent problem-solving, as well as learning opportunities arising from navigating AI limitations and errors. Findings indicate Cognimates Scratch Copilot's potential to enhance creative self-efficacy and engagement. Based on these insights, we propose initial design guidelines for AI coding assistants that prioritize youth agency and critical interaction alongside supportive scaffolding. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: 5 figures, 14 pages

arXiv:2504.15558 [pdf, other]

Dynamical mean-field analysis of adaptive Langevin diffusions: Replica-symmetric fixed point and empirical Bayes

Authors: Zhou Fan, Justin Ko, Bruno Loureiro, Yue M. Lu, Yandi Shen

Abstract: In many applications of statistical estimation via sampling, one may wish to sample from a high-dimensional target distribution that is adaptively evolving to the samples already seen. We study an example of such dynamics, given by a Langevin diffusion for posterior sampling in a Bayesian linear regression model with i.i.d. regression design, whose prior continuously adapts to the Langevin traject… ▽ More In many applications of statistical estimation via sampling, one may wish to sample from a high-dimensional target distribution that is adaptively evolving to the samples already seen. We study an example of such dynamics, given by a Langevin diffusion for posterior sampling in a Bayesian linear regression model with i.i.d. regression design, whose prior continuously adapts to the Langevin trajectory via a maximum marginal-likelihood scheme. Results of dynamical mean-field theory (DMFT) developed in our companion paper establish a precise high-dimensional asymptotic limit for the joint evolution of the prior parameter and law of the Langevin sample. In this work, we carry out an analysis of the equations that describe this DMFT limit, under conditions of approximate time-translation-invariance which include, in particular, settings where the posterior law satisfies a log-Sobolev inequality. In such settings, we show that this adaptive Langevin trajectory converges on a dimension-independent time horizon to an equilibrium state that is characterized by a system of scalar fixed-point equations, and the associated prior parameter converges to a critical point of a replica-symmetric limit for the model free energy. As a by-product of our analyses, we obtain a new dynamical proof that this replica-symmetric limit for the free energy is exact, in models having a possibly misspecified prior and where a log-Sobolev inequality holds for the posterior law. △ Less

Submitted 21 April, 2025; originally announced April 2025.

arXiv:2504.15556 [pdf, ps, other]

Dynamical mean-field analysis of adaptive Langevin diffusions: Propagation-of-chaos and convergence of the linear response

Authors: Zhou Fan, Justin Ko, Bruno Loureiro, Yue M. Lu, Yandi Shen

Abstract: Motivated by an application to empirical Bayes learning in high-dimensional regression, we study a class of Langevin diffusions in a system with random disorder, where the drift coefficient is driven by a parameter that continuously adapts to the empirical distribution of the realized process up to the current time. The resulting dynamics take the form of a stochastic interacting particle system h… ▽ More Motivated by an application to empirical Bayes learning in high-dimensional regression, we study a class of Langevin diffusions in a system with random disorder, where the drift coefficient is driven by a parameter that continuously adapts to the empirical distribution of the realized process up to the current time. The resulting dynamics take the form of a stochastic interacting particle system having both a McKean-Vlasov type interaction and a pairwise interaction defined by the random disorder. We prove a propagation-of-chaos result, showing that in the large system limit over dimension-independent time horizons, the empirical distribution of sample paths of the Langevin process converges to a deterministic limit law that is described by dynamical mean-field theory. This law is characterized by a system of dynamical fixed-point equations for the limit of the drift parameter and for the correlation and response kernels of the limiting dynamics. Using a dynamical cavity argument, we verify that these correlation and response kernels arise as the asymptotic limits of the averaged correlation and linear response functions of single coordinates of the system. These results enable an asymptotic analysis of an empirical Bayes Langevin dynamics procedure for learning an unknown prior parameter in a linear regression model, which we develop in a companion paper. △ Less

Submitted 21 April, 2025; originally announced April 2025.

arXiv:2504.14914 [pdf, other]

K-DRIFT Preparation: Experimental Verification of an Observation Strategy for Accurate Dark-Sky Flats

Authors: Woowon Byun, Kwang-Il Seon, Jongwan Ko

Abstract: Despite its scientific importance, the low-surface-brightness universe has yet to be fully explored due to various systematic uncertainties that affect the achievable surface-brightness limit. Reducing these uncertainties requires very accurate data processing. The dark-sky flat is a widely used calibration frame for accurate flat-field correction, generated by combining the sky background from sc… ▽ More Despite its scientific importance, the low-surface-brightness universe has yet to be fully explored due to various systematic uncertainties that affect the achievable surface-brightness limit. Reducing these uncertainties requires very accurate data processing. The dark-sky flat is a widely used calibration frame for accurate flat-field correction, generated by combining the sky background from science images. However, the night sky will likely contain complex local fluctuations, thus may still lead to photometric errors in data calibrated with dark-sky flats. To address this concern, we conduct mock observations with semi-realistic sky simulation data and evaluate observation strategies to mitigate the impact of the fluctuating sky background. Our experiments consider two representative sky conditions (clear and dirty) and perform intensive comparative analysis on two observation methods (offset and rolling). Our findings suggest that the rolling dithering method, which incorporates the operation of camera rotation into conventional dithering, can provide more accurate dark-sky flats. Finally, we discuss the broader implications of this method through additional experiments examining several factors that may affect the imaging quality of observational data. △ Less

Submitted 21 April, 2025; originally announced April 2025.

Comments: 22 pages, 15 figures, Accepted for publication in PASP

arXiv:2504.14735 [pdf, other]

DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions

Authors: Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas, Yuki Mitsufuji

Abstract: This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for ``Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for parameter estimation. Vocal presets are retrieved from two datasets, compris… ▽ More This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for ``Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for parameter estimation. Vocal presets are retrieved from two datasets, comprising 70 tracks from MedleyDB and 365 tracks from a private collection. Analysis of parameter correlations highlights strong relationships between effects and parameters, such as the high-pass and low-shelf filters often behaving together to shape the low end, and the delay time correlates with the intensity of the delayed signals. Principal component analysis reveals connections to McAdams' timbre dimensions, where the most crucial component modulates the perceived spaciousness while the secondary components influence spectral brightness. Statistical testing confirms the non-Gaussian nature of the parameter distribution, highlighting the complexity of the vocal effects space. These initial findings on the parameter distributions set the foundation for future research in vocal effects modelling and automatic mixing. Our source code and datasets are accessible at https://github.com/SonyResearch/diffvox. △ Less

Submitted 20 April, 2025; originally announced April 2025.

Comments: Submitted to DAFx 2025

arXiv:2504.14123 [pdf, other]

Bayesian Principles Improve Prompt Learning In Vision-Language Models

Authors: Mingyu Kim, Jongwoo Ko, Mijung Park

Abstract: Prompt learning is a popular fine-tuning method for vision-language models due to its efficiency. It requires a small number of additional learnable parameters while significantly enhancing performance on target tasks. However, most existing methods suffer from overfitting to fine-tuning data, yielding poor generalizability. To address this, we propose a new training objective function based on a… ▽ More Prompt learning is a popular fine-tuning method for vision-language models due to its efficiency. It requires a small number of additional learnable parameters while significantly enhancing performance on target tasks. However, most existing methods suffer from overfitting to fine-tuning data, yielding poor generalizability. To address this, we propose a new training objective function based on a Bayesian learning principle to balance adaptability and generalizability. We derive a prior over the logits, where the mean function is parameterized by the pre-trained model, while the posterior corresponds to the fine-tuned model. This objective establishes a balance by allowing the fine-tuned model to adapt to downstream tasks while remaining close to the pre-trained model. △ Less

Submitted 18 April, 2025; originally announced April 2025.

Comments: AISTATS2025

arXiv:2504.02480 [pdf, other]

Graph Attention-Driven Bayesian Deep Unrolling for Dual-Peak Single-Photon Lidar Imaging

Authors: Kyungmin Choi, JaKeoung Koo, Stephen McLaughlin, Abderrahim Halimi

Abstract: Single-photon Lidar imaging offers a significant advantage in 3D imaging due to its high resolution and long-range capabilities, however it is challenging to apply in noisy environments with multiple targets per pixel. To tackle these challenges, several methods have been proposed. Statistical methods demonstrate interpretability on the inferred parameters, but they are often limited in their abil… ▽ More Single-photon Lidar imaging offers a significant advantage in 3D imaging due to its high resolution and long-range capabilities, however it is challenging to apply in noisy environments with multiple targets per pixel. To tackle these challenges, several methods have been proposed. Statistical methods demonstrate interpretability on the inferred parameters, but they are often limited in their ability to handle complex scenes. Deep learning-based methods have shown superior performance in terms of accuracy and robustness, but they lack interpretability or they are limited to a single-peak per pixel. In this paper, we propose a deep unrolling algorithm for dual-peak single-photon Lidar imaging. We introduce a hierarchical Bayesian model for multiple targets and propose a neural network that unrolls the underlying statistical method. To support multiple targets, we adopt a dual depth maps representation and exploit geometric deep learning to extract features from the point cloud. The proposed method takes advantages of statistical methods and learning-based methods in terms of accuracy and quantifying uncertainty. The experimental results on synthetic and real data demonstrate the competitive performance when compared to existing methods, while also providing uncertainty information. △ Less

Submitted 3 April, 2025; originally announced April 2025.

arXiv:2503.23371 [pdf, other]

FeRG-LLM : Feature Engineering by Reason Generation Large Language Models

Authors: Jeonghyun Ko, Gyeongyun Park, Donghoon Lee, Kyunam Lee

Abstract: One of the key tasks in machine learning for tabular data is feature engineering. Although it is vital for improving the performance of models, it demands considerable human expertise and deep domain knowledge, making it labor-intensive endeavor. To address this issue, we propose a novel framework, \textbf{FeRG-LLM} (\textbf{Fe}ature engineering by \textbf{R}eason \textbf{G}eneration \textbf{L}arg… ▽ More One of the key tasks in machine learning for tabular data is feature engineering. Although it is vital for improving the performance of models, it demands considerable human expertise and deep domain knowledge, making it labor-intensive endeavor. To address this issue, we propose a novel framework, \textbf{FeRG-LLM} (\textbf{Fe}ature engineering by \textbf{R}eason \textbf{G}eneration \textbf{L}arge \textbf{L}anguage \textbf{M}odels), a large language model designed to automatically perform feature engineering at an 8-billion-parameter scale. We have constructed two-stage conversational dialogues that enable language models to analyze machine learning tasks and discovering new features, exhibiting their Chain-of-Thought (CoT) capabilities. We use these dialogues to fine-tune Llama 3.1 8B model and integrate Direct Preference Optimization (DPO) to receive feedback improving quality of new features and the model's performance. Our experiments show that FeRG-LLM performs comparably to or better than Llama 3.1 70B on most datasets, while using fewer resources and achieving reduced inference time. It outperforms other studies in classification tasks and performs well in regression tasks. Moreover, since it does not rely on cloud-hosted LLMs like GPT-4 with extra API costs when generating features, it can be deployed locally, addressing security concerns. △ Less

Submitted 30 March, 2025; originally announced March 2025.

Comments: Accepted to NAACL 2025 Findings

arXiv:2503.21721 [pdf, other]

Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance

Authors: Jaywon Koo, Jefferson Hernandez, Moayed Haji-Ali, Ziyan Yang, Vicente Ordonez

Abstract: Evaluating text-to-image synthesis is challenging due to misalignment between established metrics and human preferences. We propose cFreD, a metric based on the notion of Conditional Fréchet Distance that explicitly accounts for both visual fidelity and text-prompt alignment. Existing metrics such as Inception Score (IS), Fréchet Inception Distance (FID) and CLIPScore assess either image quality o… ▽ More Evaluating text-to-image synthesis is challenging due to misalignment between established metrics and human preferences. We propose cFreD, a metric based on the notion of Conditional Fréchet Distance that explicitly accounts for both visual fidelity and text-prompt alignment. Existing metrics such as Inception Score (IS), Fréchet Inception Distance (FID) and CLIPScore assess either image quality or image-text alignment but not both which limits their correlation with human preferences. Scoring models explicitly trained to replicate human preferences require constant updates and may not generalize to novel generation techniques or out-of-domain inputs. Through extensive experiments across multiple recently proposed text-to-image models and diverse prompt datasets, we demonstrate that cFreD exhibits a higher correlation with human judgments compared to statistical metrics, including metrics trained with human preferences. Our findings validate cFreD as a robust, future-proof metric for the systematic evaluation of text-to-image models, standardizing benchmarking in this rapidly evolving field. We release our evaluation toolkit and benchmark in the appendix. △ Less

Submitted 27 March, 2025; originally announced March 2025.

arXiv:2503.19559 [pdf, other]

Combined Annual Modulation Dark Matter Search with COSINE-100 and ANAIS-112

Authors: N. Carlin, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. França, C. Ha, I. S. Hahn, S. J. Hollick, S. B. Hong, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim, Y. J. Ko, D. H. Lee , et al. (49 additional authors not shown)

Abstract: The annual modulation signal, claimed to be consistent with dark matter as observed by DAMA/LIBRA in a sodium-iodide based detector, has persisted for over two decades. COSINE-100 and ANAIS-112 were designed to test the claim directly using the same target material. COSINE-100, located at Yangyang Underground Laboratory in South Korea, and ANAIS-112, located at Canfranc Underground Laboratory in S… ▽ More The annual modulation signal, claimed to be consistent with dark matter as observed by DAMA/LIBRA in a sodium-iodide based detector, has persisted for over two decades. COSINE-100 and ANAIS-112 were designed to test the claim directly using the same target material. COSINE-100, located at Yangyang Underground Laboratory in South Korea, and ANAIS-112, located at Canfranc Underground Laboratory in Spain, have been taking data since 2016 and 2017, respectively. Each experiment published its respective results independently. In this paper, we present the results of an annual modulation search as a test of the signal observed by DAMA/LIBRA with the first three respective years of data from COSINE-100 and ANAIS-112. Using a Markov Chain Monte Carlo method, we find best fit values for modulation amplitude of $-0.0002 {\pm} 0.0026$ cpd/kg/keV in the 1-6 keV and $0.0021 {\pm} 0.0028$ cpd/kg/keV in the 2-6 keV energy regions. These results are not compatible with DAMA/LIBRA's assertion for their observation of annual modulation at $3.7σ$ and $2.6σ$, respectively. Performing a simple combination of the newly released 6-years datasets from both experiments find values consistent with no modulation at $0.0005 {\pm} 0.0019$ cpd/kg/keV in the 1-6 keV and $0.0027 {\pm} 0.0021$ cpd/kg/keV in the 2-6 keV energy regions with $4.68σ$ and $3.53σ$ respective exclusions of the DAMA/LIBRA signal. △ Less

Submitted 25 March, 2025; originally announced March 2025.

Comments: 6 pages, 4 figures, 3 tables

arXiv:2503.16924 [pdf, other]

Optimized Minimal 3D Gaussian Splatting

Authors: Joo Chan Lee, Jong Hwan Ko, Eunbyung Park

Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful representation for real-time, high-performance rendering, enabling a wide range of applications. However, representing 3D scenes with numerous explicit Gaussian primitives imposes significant storage and memory overhead. Recent studies have shown that high-quality rendering can be achieved with a substantially reduced number of Gaussians when… ▽ More 3D Gaussian Splatting (3DGS) has emerged as a powerful representation for real-time, high-performance rendering, enabling a wide range of applications. However, representing 3D scenes with numerous explicit Gaussian primitives imposes significant storage and memory overhead. Recent studies have shown that high-quality rendering can be achieved with a substantially reduced number of Gaussians when represented with high-precision attributes. Nevertheless, existing 3DGS compression methods still rely on a relatively large number of Gaussians, focusing primarily on attribute compression. This is because a smaller set of Gaussians becomes increasingly sensitive to lossy attribute compression, leading to severe quality degradation. Since the number of Gaussians is directly tied to computational costs, it is essential to reduce the number of Gaussians effectively rather than only optimizing storage. In this paper, we propose Optimized Minimal Gaussians representation (OMG), which significantly reduces storage while using a minimal number of primitives. First, we determine the distinct Gaussian from the near ones, minimizing redundancy without sacrificing quality. Second, we propose a compact and precise attribute representation that efficiently captures both continuity and irregularity among primitives. Additionally, we propose a sub-vector quantization technique for improved irregularity representation, maintaining fast training with a negligible codebook size. Extensive experiments demonstrate that OMG reduces storage requirements by nearly 50% compared to the previous state-of-the-art and enables 600+ FPS rendering while maintaining high rendering quality. Our source code is available at https://maincold2.github.io/omg/. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Comments: Project page: https://maincold2.github.io/omg/

arXiv:2503.16814 [pdf, ps, other]

Understanding Bias Reinforcement in LLM Agents Debate

Authors: Jihwan Oh, Minchan Jeong, Jongwoo Ko, Se-Young Yun

Abstract: Large Language Models $($LLMs$)$ solve complex problems using training-free methods like prompt engineering and in-context learning, yet ensuring reasoning correctness remains challenging. While self-correction methods such as self-consistency and self-refinement aim to improve reliability, they often reinforce biases due to the lack of effective feedback mechanisms. Multi-Agent Debate $($MAD$)$ h… ▽ More Large Language Models $($LLMs$)$ solve complex problems using training-free methods like prompt engineering and in-context learning, yet ensuring reasoning correctness remains challenging. While self-correction methods such as self-consistency and self-refinement aim to improve reliability, they often reinforce biases due to the lack of effective feedback mechanisms. Multi-Agent Debate $($MAD$)$ has emerged as an alternative, but we identify two key limitations: bias reinforcement, where debate amplifies model biases instead of correcting them, and lack of perspective diversity, as all agents share the same model and reasoning patterns, limiting true debate effectiveness. To systematically evaluate these issues, we introduce $\textit{MetaNIM Arena}$, a benchmark designed to assess LLMs in adversarial strategic decision-making, where dynamic interactions influence optimal decisions. To overcome MAD's limitations, we propose $\textbf{DReaMAD}$ $($$\textbf{D}$iverse $\textbf{Rea}$soning via $\textbf{M}$ulti-$\textbf{A}$gent $\textbf{D}$ebate with Refined Prompt$)$, a novel framework that $(1)$ refines LLM's strategic prior knowledge to improve reasoning quality and $(2)$ promotes diverse viewpoints within a single model by systematically modifying prompts, reducing bias. Empirical results show that $\textbf{DReaMAD}$ significantly improves decision accuracy, reasoning diversity, and bias mitigation across multiple strategic tasks, establishing it as a more effective approach for LLM-based decision-making. △ Less

Submitted 28 May, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

Comments: ICML 2025

arXiv:2503.07067 [pdf, other]

DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

Authors: Jongwoo Ko, Tianyi Chen, Sungnyun Kim, Tianyu Ding, Luming Liang, Ilya Zharkov, Se-Young Yun

Abstract: Despite the success of distillation in large language models (LLMs), most prior work applies identical loss functions to both teacher- and student-generated data. These strategies overlook the synergy between loss formulations and data types, leading to a suboptimal performance boost in student models. To address this, we propose DistiLLM-2, a contrastive approach that simultaneously increases the… ▽ More Despite the success of distillation in large language models (LLMs), most prior work applies identical loss functions to both teacher- and student-generated data. These strategies overlook the synergy between loss formulations and data types, leading to a suboptimal performance boost in student models. To address this, we propose DistiLLM-2, a contrastive approach that simultaneously increases the likelihood of teacher responses and decreases that of student responses by harnessing this synergy. Our extensive experiments show that DistiLLM-2 not only builds high-performing student models across a wide range of tasks, including instruction-following and code generation, but also supports diverse applications, such as preference alignment and vision-language extensions. These findings highlight the potential of a contrastive approach to enhance the efficacy of LLM distillation by effectively aligning teacher and student models across varied data types. △ Less

Submitted 10 March, 2025; originally announced March 2025.

Comments: The code will be available soon at https://github.com/jongwooko/distillm-2

arXiv:2503.01708 [pdf, other]

Pseudo-Maximum Likelihood Theory for High-Dimensional Rank One Inference

Authors: Curtis Grant, Aukosh Jagannath, Justin Ko

Abstract: We develop a pseudo-likelihood theory for rank one matrix estimation problems in the high dimensional limit. We prove a variational principle for the limiting pseudo-maximum likelihood which also characterizes the performance of the corresponding pseudo-maximum likelihood estimator. We show that this variational principle is universal and depends only on four parameters determined by the correspon… ▽ More We develop a pseudo-likelihood theory for rank one matrix estimation problems in the high dimensional limit. We prove a variational principle for the limiting pseudo-maximum likelihood which also characterizes the performance of the corresponding pseudo-maximum likelihood estimator. We show that this variational principle is universal and depends only on four parameters determined by the corresponding null model. Through this universality, we introduce a notion of equivalence for estimation problems of this type and, in particular, show that a broad class of estimation tasks, including community detection, sparse submatrix detection, and non-linear spiked matrix models, are equivalent to spiked matrix models. As an application, we obtain a complete description of the performance of the least-squares (or ``best rank one'') estimator for any rank one matrix estimation problem. △ Less

Submitted 3 March, 2025; originally announced March 2025.

Comments: 52 pages, 2 figures

arXiv:2503.01107 [pdf, other]

VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors

Authors: Juil Koo, Paul Guerrero, Chun-Hao Paul Huang, Duygu Ceylan, Minhyuk Sung

Abstract: Generative methods for image and video editing use generative models as priors to perform edits despite incomplete information, such as changing the composition of 3D objects shown in a single image. Recent methods have shown promising composition editing results in the image setting, but in the video setting, editing methods have focused on editing object's appearance and motion, or camera motion… ▽ More Generative methods for image and video editing use generative models as priors to perform edits despite incomplete information, such as changing the composition of 3D objects shown in a single image. Recent methods have shown promising composition editing results in the image setting, but in the video setting, editing methods have focused on editing object's appearance and motion, or camera motion, and as a result, methods to edit object composition in videos are still missing. We propose \name as a method for editing 3D object compositions in videos of static scenes with camera motion. Our approach allows editing the 3D position of a 3D object across all frames of a video in a temporally consistent manner. This is achieved by lifting intermediate features of a generative model to a 3D reconstruction that is shared between all frames, editing the reconstruction, and projecting the features on the edited reconstruction back to each frame. To the best of our knowledge, this is the first generative approach to edit object compositions in videos. Our approach is simple and training-free, while outperforming state-of-the-art image editing baselines. △ Less

Submitted 26 March, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

Comments: Project page: https://videohandles.github.io

arXiv:2502.17105 [pdf, other]

SFLD: Reducing the content bias for AI-generated Image Detection

Authors: Seoyeon Gye, Junwon Ko, Hyounguk Shon, Minchan Kwon, Junmo Kim

Abstract: Identifying AI-generated content is critical for the safe and ethical use of generative AI. Recent research has focused on developing detectors that generalize to unknown generators, with popular methods relying either on high-level features or low-level fingerprints. However, these methods have clear limitations: biased towards unseen content, or vulnerable to common image degradations, such as J… ▽ More Identifying AI-generated content is critical for the safe and ethical use of generative AI. Recent research has focused on developing detectors that generalize to unknown generators, with popular methods relying either on high-level features or low-level fingerprints. However, these methods have clear limitations: biased towards unseen content, or vulnerable to common image degradations, such as JPEG compression. To address these issues, we propose a novel approach, SFLD, which incorporates PatchShuffle to integrate high-level semantic and low-level textural information. SFLD applies PatchShuffle at multiple levels, improving robustness and generalization across various generative models. Additionally, current benchmarks face challenges such as low image quality, insufficient content preservation, and limited class diversity. In response, we introduce TwinSynths, a new benchmark generation methodology that constructs visually near-identical pairs of real and synthetic images to ensure high quality and content preservation. Our extensive experiments and analysis show that SFLD outperforms existing methods on detecting a wide variety of fake images sourced from GANs, diffusion models, and TwinSynths, demonstrating the state-of-the-art performance and generalization capabilities to novel generative models. △ Less

Submitted 24 February, 2025; originally announced February 2025.

Comments: IEEE/CVF WACV 2025, Oral

arXiv:2502.08939 [pdf, other]

TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument

Authors: Kyungsu Kim, Junghyun Koo, Sungho Lee, Haesun Joung, Kyogu Lee

Abstract: Recent advancements in neural audio codecs have enabled the use of tokenized audio representations in various audio generation tasks, such as text-to-speech, text-to-audio, and text-to-music generation. Leveraging this approach, we propose TokenSynth, a novel neural synthesizer that utilizes a decoder-only transformer to generate desired audio tokens from MIDI tokens and CLAP (Contrastive Language… ▽ More Recent advancements in neural audio codecs have enabled the use of tokenized audio representations in various audio generation tasks, such as text-to-speech, text-to-audio, and text-to-music generation. Leveraging this approach, we propose TokenSynth, a novel neural synthesizer that utilizes a decoder-only transformer to generate desired audio tokens from MIDI tokens and CLAP (Contrastive Language-Audio Pretraining) embedding, which has timbre-related information. Our model is capable of performing instrument cloning, text-to-instrument synthesis, and text-guided timbre manipulation without any fine-tuning. This flexibility enables diverse sound design and intuitive timbre control. We evaluated the quality of the synthesized audio, the timbral similarity between synthesized and target audio/text, and synthesis accuracy (i.e., how accurately it follows the input MIDI) using objective measures. TokenSynth demonstrates the potential of leveraging advanced neural audio codecs and transformers to create powerful and versatile neural synthesizers. The source code, model weights, and audio demos are available at: https://github.com/KyungsuKim42/tokensynth △ Less

Submitted 12 February, 2025; originally announced February 2025.

Comments: 5 pages, 1 figure, to be published in ICASSP 2025

arXiv:2502.07842 [pdf, other]

Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators

Authors: Jiyoon Kim, Kang Eun Jeon, Yulhwa Kim, Jong Hwan Ko

Abstract: Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision increases. Low-precision ADCs can reduce this overhead but introduce partial-sum quantization errors degrading accuracy. Additionally, low-bit weight constraints, imposed by cell limitations and the n… ▽ More Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision increases. Low-precision ADCs can reduce this overhead but introduce partial-sum quantization errors degrading accuracy. Additionally, low-bit weight constraints, imposed by cell limitations and the need for multiple cells for higher-bit weights, present further challenges. While fine-grained partial-sum quantization has been studied to lower ADC resolution effectively, weight granularity, which limits overall partial-sum quantized accuracy, remains underexplored. This work addresses these challenges by aligning weight and partial-sum quantization granularities at the column-wise level. Our method improves accuracy while maintaining dequantization overhead, simplifies training by removing two-stage processes, and ensures robustness to memory cell variations via independent column-wise scale factors. We also propose an open-source CIM-oriented convolution framework to handle fine-grained weights and partial-sums efficiently, incorporating a novel tiling method and group convolution. Experimental results on ResNet-20 (CIFAR-10, CIFAR-100) and ResNet-18 (ImageNet) show accuracy improvements of 0.99%, 2.69%, and 1.01%, respectively, compared to the best-performing related works. Additionally, variation analysis reveals the robustness of our method against memory cell variations. These findings highlight the effectiveness of our quantization scheme in enhancing accuracy and robustness while maintaining hardware efficiency in CIM-based DNN implementations. Our code is available at https://github.com/jiyoonkm/ColumnQuant. △ Less

Submitted 13 March, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

arXiv:2502.07834 [pdf, other]

MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures

Authors: Do Yeong Kang, Yeong Hwan Oh, Chanwook Hwang, Jinhee Kim, Kang Eun Jeon, Jong Hwan Ko

Abstract: The implementation of Hyperdimensional Computing (HDC) on In-Memory Computing (IMC) architectures faces significant challenges due to the mismatch between highdimensional vectors and IMC array sizes, leading to inefficient memory utilization and increased computation cycles. This paper presents MEMHD, a Memory-Efficient Multi-centroid HDC framework designed to address these challenges. MEMHD intro… ▽ More The implementation of Hyperdimensional Computing (HDC) on In-Memory Computing (IMC) architectures faces significant challenges due to the mismatch between highdimensional vectors and IMC array sizes, leading to inefficient memory utilization and increased computation cycles. This paper presents MEMHD, a Memory-Efficient Multi-centroid HDC framework designed to address these challenges. MEMHD introduces a clustering-based initialization method and quantization aware iterative learning for multi-centroid associative memory. Through these approaches and its overall architecture, MEMHD achieves a significant reduction in memory requirements while maintaining or improving classification accuracy. Our approach achieves full utilization of IMC arrays and enables one-shot (or few-shot) associative search. Experimental results demonstrate that MEMHD outperforms state-of-the-art binary HDC models, achieving up to 13.69% higher accuracy with the same memory usage, or 13.25x more memory efficiency at the same accuracy level. Moreover, MEMHD reduces computation cycles by up to 80x and array usage by up to 71x compared to baseline IMC mapping methods when mapped to 128x128 IMC arrays, while significantly improving energy and computation cycle efficiency. △ Less

Submitted 10 February, 2025; originally announced February 2025.

Comments: Accepted to appear at DATE 2025

arXiv:2502.07820 [pdf, other]

Low-Rank Compression for IMC Arrays

Authors: Kang Eun Jeon, Johnny Rhe, Jong Hwan Ko

Abstract: In this study, we address the challenge of low-rank model compression in the context of in-memory computing (IMC) architectures. Traditional pruning approaches, while effective in model size reduction, necessitate additional peripheral circuitry to manage complex dataflows and mitigate dislocation issues, leading to increased area and energy overheads. To circumvent these drawbacks, we propose lev… ▽ More In this study, we address the challenge of low-rank model compression in the context of in-memory computing (IMC) architectures. Traditional pruning approaches, while effective in model size reduction, necessitate additional peripheral circuitry to manage complex dataflows and mitigate dislocation issues, leading to increased area and energy overheads. To circumvent these drawbacks, we propose leveraging low-rank compression techniques, which, unlike pruning, streamline the dataflow and seamlessly integrate with IMC architectures. However, low-rank compression presents its own set of challenges, namely i) suboptimal IMC array utilization and ii) compromised accuracy. To address these issues, we introduce a novel approach i) employing shift and duplicate kernel (SDK) mapping technique, which exploits idle IMC columns for parallel processing, and ii) group low-rank convolution, which mitigates the information imbalance in the decomposed matrices. Our experimental results demonstrate that our proposed method achieves up to 2.5x speedup or +20.9% accuracy boost over existing pruning techniques. △ Less

Submitted 10 February, 2025; originally announced February 2025.

Comments: Accepted to appear at DATE'25 (Lyon, France)

arXiv:2502.04362 [pdf, other]

LLMs can be easily Confused by Instructional Distractions

Authors: Yerin Hwang, Yongil Kim, Jahyun Koo, Taegwan Kang, Hyunkyung Bae, Kyomin Jung

Abstract: Despite the fact that large language models (LLMs) show exceptional skill in instruction following tasks, this strength can turn into a vulnerability when the models are required to disregard certain instructions. Instruction-following tasks typically involve a clear task description and input text containing the target data to be processed. However, when the input itself resembles an instruction,… ▽ More Despite the fact that large language models (LLMs) show exceptional skill in instruction following tasks, this strength can turn into a vulnerability when the models are required to disregard certain instructions. Instruction-following tasks typically involve a clear task description and input text containing the target data to be processed. However, when the input itself resembles an instruction, confusion may arise, even if there is explicit prompting to distinguish between the task instruction and the input. We refer to this phenomenon as instructional distraction. In this paper, we introduce a novel benchmark, named DIM-Bench, specifically designed to assess LLMs' performance under instructional distraction. The benchmark categorizes real-world instances of instructional distraction and evaluates LLMs across four instruction tasks: rewriting, proofreading, translation, and style transfer -- alongside five input tasks: reasoning, code generation, mathematical reasoning, bias detection, and question answering. Our experimental results reveal that even the most advanced LLMs are susceptible to instructional distraction, often failing to accurately follow user intent in such cases. △ Less

Submitted 4 February, 2025; originally announced February 2025.

Comments: 8 pages

arXiv:2502.01031 [pdf, other]

DiffIM: Differentiable Influence Minimization with Surrogate Modeling and Continuous Relaxation

Authors: Junghun Lee, Hyunju Kim, Fanchen Bu, Jihoon Ko, Kijung Shin

Abstract: In social networks, people influence each other through social links, which can be represented as propagation among nodes in graphs. Influence minimization (IMIN) is the problem of manipulating the structures of an input graph (e.g., removing edges) to reduce the propagation among nodes. IMIN can represent time-critical real-world applications, such as rumor blocking, but IMIN is theoretically dif… ▽ More In social networks, people influence each other through social links, which can be represented as propagation among nodes in graphs. Influence minimization (IMIN) is the problem of manipulating the structures of an input graph (e.g., removing edges) to reduce the propagation among nodes. IMIN can represent time-critical real-world applications, such as rumor blocking, but IMIN is theoretically difficult and computationally expensive. Moreover, the discrete nature of IMIN hinders the usage of powerful machine learning techniques, which requires differentiable computation. In this work, we propose DiffIM, a novel method for IMIN with two differentiable schemes for acceleration: (1) surrogate modeling for efficient influence estimation, which avoids time-consuming simulations (e.g., Monte Carlo), and (2) the continuous relaxation of decisions, which avoids the evaluation of individual discrete decisions (e.g., removing an edge). We further propose a third accelerating scheme, gradient-driven selection, that chooses edges instantly based on gradients without optimization (spec., gradient descent iterations) on each test instance. Through extensive experiments on real-world graphs, we show that each proposed scheme significantly improves speed with little (or even no) IMIN performance degradation. Our method is Pareto-optimal (i.e., no baseline is faster and more effective than it) and typically several orders of magnitude (spec., up to 15,160X) faster than the most effective baseline while being more effective. △ Less

Submitted 2 February, 2025; originally announced February 2025.

Comments: Accepted to AAAI'25

arXiv:2501.13665 [pdf, other]

Limits on WIMP dark matter with NaI(Tl) crystals in three years of COSINE-100 data

Authors: G. H. Yu, N. Carlin, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Franca, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim, Y. J. Ko, D. H. Lee , et al. (34 additional authors not shown)

Abstract: We report limits on WIMP dark matter derived from three years of data collected by the COSINE-100 experiment with NaI(Tl) crystals, achieving an improved energy threshold of 0.7 keV. This lowered threshold enhances sensitivity in the sub-GeV mass range, extending the reach for direct detection of low-mass dark matter. Although no excess of WIMP-like events was observed, the increased sensitivity e… ▽ More We report limits on WIMP dark matter derived from three years of data collected by the COSINE-100 experiment with NaI(Tl) crystals, achieving an improved energy threshold of 0.7 keV. This lowered threshold enhances sensitivity in the sub-GeV mass range, extending the reach for direct detection of low-mass dark matter. Although no excess of WIMP-like events was observed, the increased sensitivity enabled a model-independent comparison between the expected WIMP signal rate-based on mass limits from our data-and DAMA's reported modulation amplitude. Our findings strongly disfavor the DAMA signal as originating from WIMP interactions, fully excluding DAMA/LIBRA 3$σ$ allowed regions and providing enhanced WIMP mass limits by an order of magnitude in the spin-independent model compared to previous results. In the spin-dependent model, cross-section upper limits were obtained in the mass range [0.1-5.0] GeV/c$^2$, with additional sensitivity to sub-GeV WIMPs through the inclusion of the Migdal effect. These results represent substantial progress in low-mass dark matter exploration and reinforce constraints on the longstanding DAMA claim. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.07824 [pdf, other]

Real-time Verification and Refinement of Language Model Text Generation

Authors: Joonho Ko, Jinheon Baek, Sung Ju Hwang

Abstract: Large language models (LLMs) have shown remarkable performance across a wide range of natural language tasks. However, a critical challenge remains in that they sometimes generate factually incorrect answers. To address this, while many previous work has focused on identifying errors in their generation and further refining them, they are slow in deployment since they are designed to verify the re… ▽ More Large language models (LLMs) have shown remarkable performance across a wide range of natural language tasks. However, a critical challenge remains in that they sometimes generate factually incorrect answers. To address this, while many previous work has focused on identifying errors in their generation and further refining them, they are slow in deployment since they are designed to verify the response from LLMs only after their entire generation (from the first to last tokens) is done. Further, we observe that once LLMs generate incorrect tokens early on, there is a higher likelihood that subsequent tokens will also be factually incorrect. To this end, in this work, we propose Streaming-VR (Streaming Verification and Refinement), a novel approach designed to enhance the efficiency of verification and refinement of LLM outputs. Specifically, the proposed Streaming-VR enables on-the-fly verification and correction of tokens as they are being generated, similar to a streaming process, ensuring that each subset of tokens is checked and refined in real-time by another LLM as the LLM constructs its response. Through comprehensive evaluations on multiple datasets, we demonstrate that our approach not only enhances the factual accuracy of LLMs, but also offers a more efficient solution compared to prior refinement methods. △ Less

Submitted 13 April, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

arXiv:2501.00645 [pdf, other]

SoundBrush: Sound as a Brush for Visual Scene Editing

Authors: Kim Sung-Bin, Kim Jun-Seong, Junseok Ko, Yewon Kim, Tae-Hyun Oh

Abstract: We propose SoundBrush, a model that uses sound as a brush to edit and manipulate visual scenes. We extend the generative capabilities of the Latent Diffusion Model (LDM) to incorporate audio information for editing visual scenes. Inspired by existing image-editing works, we frame this task as a supervised learning problem and leverage various off-the-shelf models to construct a sound-paired visual… ▽ More We propose SoundBrush, a model that uses sound as a brush to edit and manipulate visual scenes. We extend the generative capabilities of the Latent Diffusion Model (LDM) to incorporate audio information for editing visual scenes. Inspired by existing image-editing works, we frame this task as a supervised learning problem and leverage various off-the-shelf models to construct a sound-paired visual scene dataset for training. This richly generated dataset enables SoundBrush to learn to map audio features into the textual space of the LDM, allowing for visual scene editing guided by diverse in-the-wild sound. Unlike existing methods, SoundBrush can accurately manipulate the overall scenery or even insert sounding objects to best match the audio inputs while preserving the original content. Furthermore, by integrating with novel view synthesis techniques, our framework can be extended to edit 3D scenes, facilitating sound-driven 3D scene manipulation. Demos are available at https://soundbrush.github.io/. △ Less

Submitted 31 December, 2024; originally announced January 2025.

Comments: AAAI 2025

arXiv:2412.07475 [pdf, other]

Enhanced 2-categorical structures, two-dimensional limit sketches and the symmetry of internalisation

Authors: Nathanael Arkor, John Bourke, Joanna Ko

Abstract: Many structures of interest in two-dimensional category theory have aspects that are inherently strict. This strictness is not a limitation, but rather plays a fundamental role in the theory of such structures. For instance, a monoidal fibration is - crucially - a strict monoidal functor, rather than a pseudo or lax monoidal functor. Other examples include monoidal double categories, double fibrat… ▽ More Many structures of interest in two-dimensional category theory have aspects that are inherently strict. This strictness is not a limitation, but rather plays a fundamental role in the theory of such structures. For instance, a monoidal fibration is - crucially - a strict monoidal functor, rather than a pseudo or lax monoidal functor. Other examples include monoidal double categories, double fibrations, and intercategories. We provide an explanation for this phenomenon from the perspective of enhanced 2-categories, which are 2-categories having a distinguished subclass of 1-cells representing the strict morphisms. As part of our development, we introduce enhanced 2-categorical limit sketches and explain how this setting addresses shortcomings in the theory of 2-categorical limit sketches. In particular, we establish the symmetry of internalisation for such structures, entailing, for instance, that a monoidal double category is equivalently a pseudomonoid in an enhanced 2-category of double categories, or a pseudocategory in an enhanced 2-category of monoidal categories. △ Less

Submitted 10 December, 2024; originally announced December 2024.

Comments: 49 pages

MSC Class: 18C10; 18C30; 18C40; 18D20; 18M65; 18N10

arXiv:2412.07454 [pdf, other]

Tazza: Shuffling Neural Network Parameters for Secure and Private Federated Learning

Authors: Kichang Lee, Jaeho Jin, JaeYeon Park, Songkuk Kim, JeongGil Ko

Abstract: Federated learning enables decentralized model training without sharing raw data, preserving data privacy. However, its vulnerability towards critical security threats, such as gradient inversion and model poisoning by malicious clients, remain unresolved. Existing solutions often address these issues separately, sacrificing either system robustness or model accuracy. This work introduces Tazza, a… ▽ More Federated learning enables decentralized model training without sharing raw data, preserving data privacy. However, its vulnerability towards critical security threats, such as gradient inversion and model poisoning by malicious clients, remain unresolved. Existing solutions often address these issues separately, sacrificing either system robustness or model accuracy. This work introduces Tazza, a secure and efficient federated learning framework that simultaneously addresses both challenges. By leveraging the permutation equivariance and invariance properties of neural networks via weight shuffling and shuffled model validation, Tazza enhances resilience against diverse poisoning attacks, while ensuring data confidentiality and high model accuracy. Comprehensive evaluations on various datasets and embedded platforms show that Tazza achieves robust defense with up to 6.7x improved computational efficiency compared to alternative schemes, without compromising performance. △ Less

Submitted 3 February, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

Comments: 27 pages, 18 figures

MSC Class: 68T07 ACM Class: I.2.11

arXiv:2412.02280 [pdf, other]

AH-OCDA: Amplitude-based Curriculum Learning and Hopfield Segmentation Model for Open Compound Domain Adaptation

Authors: Jaehyun Choi, Junwon Ko, Dong-Jae Lee, Junmo Kim

Abstract: Open compound domain adaptation (OCDA) is a practical domain adaptation problem that consists of a source domain, target compound domain, and unseen open domain. In this problem, the absence of domain labels and pixel-level segmentation labels for both compound and open domains poses challenges to the direct application of existing domain adaptation and generalization methods. To address this issu… ▽ More Open compound domain adaptation (OCDA) is a practical domain adaptation problem that consists of a source domain, target compound domain, and unseen open domain. In this problem, the absence of domain labels and pixel-level segmentation labels for both compound and open domains poses challenges to the direct application of existing domain adaptation and generalization methods. To address this issue, we propose Amplitude-based curriculum learning and a Hopfield segmentation model for Open Compound Domain Adaptation (AH-OCDA). Our method comprises two complementary components: 1) amplitude-based curriculum learning and 2) Hopfield segmentation model. Without prior knowledge of target domains within the compound domains, amplitude-based curriculum learning gradually induces the semantic segmentation model to adapt from the near-source compound domain to the far-source compound domain by ranking unlabeled compound domain images through Fast Fourier Transform (FFT). Additionally, the Hopfield segmentation model maps segmentation feature distributions from arbitrary domains to the feature distributions of the source domain. AH-OCDA achieves state-of-the-art performance on two OCDA benchmarks and extended open domains, demonstrating its adaptability to continuously changing compound domains and unseen open domains. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: WACV 2025

arXiv:2412.02237 [pdf, other]

Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models

Authors: Jungwon Park, Jungmin Ko, Dongnam Byun, Jangwon Suh, Wonjong Rhee

Abstract: Recent text-to-image diffusion models leverage cross-attention layers, which have been effectively utilized to enhance a range of visual generative tasks. However, our understanding of cross-attention layers remains somewhat limited. In this study, we introduce a mechanistic interpretability approach for diffusion models by constructing Head Relevance Vectors (HRVs) that align with human-specified… ▽ More Recent text-to-image diffusion models leverage cross-attention layers, which have been effectively utilized to enhance a range of visual generative tasks. However, our understanding of cross-attention layers remains somewhat limited. In this study, we introduce a mechanistic interpretability approach for diffusion models by constructing Head Relevance Vectors (HRVs) that align with human-specified visual concepts. An HRV for a given visual concept has a length equal to the total number of cross-attention heads, with each element representing the importance of the corresponding head for the given visual concept. To validate HRVs as interpretable features, we develop an ordered weakening analysis that demonstrates their effectiveness. Furthermore, we propose concept strengthening and concept adjusting methods and apply them to enhance three visual generative tasks. Our results show that HRVs can reduce misinterpretations of polysemous words in image generation, successfully modify five challenging attributes in image editing, and mitigate catastrophic neglect in multi-concept generation. Overall, our work provides an advancement in understanding cross-attention layers and introduces new approaches for fine-controlling these layers at the head level. △ Less

Submitted 24 February, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

Comments: Accepted by ICLR 2025

arXiv:2411.16312 [pdf, other]

EPS: Efficient Patch Sampling for Video Overfitting in Deep Super-Resolution Model Training

Authors: Yiying Wei, Hadi Amirpour, Jong Hwan Ko, Christian Timmerer

Abstract: Leveraging the overfitting property of deep neural networks (DNNs) is trending in video delivery systems to enhance quality within bandwidth limits. Existing approaches transmit overfitted super-resolution (SR) model streams for low-resolution (LR) bitstreams, which are used to reconstruct high-resolution (HR) videos at the decoder. Although these approaches show promising results, the huge comput… ▽ More Leveraging the overfitting property of deep neural networks (DNNs) is trending in video delivery systems to enhance quality within bandwidth limits. Existing approaches transmit overfitted super-resolution (SR) model streams for low-resolution (LR) bitstreams, which are used to reconstruct high-resolution (HR) videos at the decoder. Although these approaches show promising results, the huge computational costs of training a large number of video frames limit their practical applications. To overcome this challenge, we propose an efficient patch sampling method named EPS for video SR network overfitting, which identifies the most valuable training patches from video frames. To this end, we first present two low-complexity Discrete Cosine Transform (DCT)-based spatial-temporal features to measure the complexity score of each patch directly. By analyzing the histogram distribution of these features, we then categorize all possible patches into different clusters and select training patches from the cluster with the highest spatial-temporal information. The number of sampled patches is adaptive based on the video content, addressing the trade-off between training complexity and efficiency. Our method reduces the number of patches for the training to 4% to 25%, depending on the resolution and number of clusters, while maintaining high video quality and significantly enhancing training efficiency. Compared to the state-of-the-art patch sampling method, EMT, our approach achieves an 83% decrease in overall run time. △ Less

Submitted 25 November, 2024; originally announced November 2024.

arXiv:2411.12220 [pdf, other]

DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning

Authors: Kichang Lee, Yujin Shin, Jonghyuk Yun, Songkuk Kim, Jun Han, JeongGil Ko

Abstract: Federated Learning (FL) enables collaborative model training across distributed devices while preserving local data privacy, making it ideal for mobile and embedded systems. However, the decentralized nature of FL also opens vulnerabilities to model poisoning attacks, particularly backdoor attacks, where adversaries implant trigger patterns to manipulate model predictions. In this paper, we propos… ▽ More Federated Learning (FL) enables collaborative model training across distributed devices while preserving local data privacy, making it ideal for mobile and embedded systems. However, the decentralized nature of FL also opens vulnerabilities to model poisoning attacks, particularly backdoor attacks, where adversaries implant trigger patterns to manipulate model predictions. In this paper, we propose DeTrigger, a scalable and efficient backdoor-robust federated learning framework that leverages insights from adversarial attack methodologies. By employing gradient analysis with temperature scaling, DeTrigger detects and isolates backdoor triggers, allowing for precise model weight pruning of backdoor activations without sacrificing benign model knowledge. Extensive evaluations across four widely used datasets demonstrate that DeTrigger achieves up to 251x faster detection than traditional methods and mitigates backdoor attacks by up to 98.9%, with minimal impact on global model accuracy. Our findings establish DeTrigger as a robust and scalable solution to protect federated learning environments against sophisticated backdoor threats. △ Less

Submitted 3 February, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

Comments: 21 pages

MSC Class: 68T07 ACM Class: I.2.11

arXiv:2411.08117 [pdf, other]

Tracing the Formation History of Intrahalo Light with Horizon Run 5

Authors: Hyungjin Joo, M. James Jee, Juhan Kim, Jaehyun Lee, Jongwan Ko, Changbom Park, Jihye Shin, Owain Snaith, Christophe Pichon, Brad Gibson, Yonghwi Kim

Abstract: We investigate the formation history of intrahalo light (IHL) using the high-resolution (~1 kpc), large-scale (~Gpc) cosmological hydrodynamical simulation, Horizon Run 5 (HR5). IHL particles are identified by carefully considering both their binding energies and positions with respect to the tidal radii of individual galaxies. By analyzing more than 1,200 galaxy groups and clusters with… ▽ More We investigate the formation history of intrahalo light (IHL) using the high-resolution (~1 kpc), large-scale (~Gpc) cosmological hydrodynamical simulation, Horizon Run 5 (HR5). IHL particles are identified by carefully considering both their binding energies and positions with respect to the tidal radii of individual galaxies. By analyzing more than 1,200 galaxy groups and clusters with $\geq 10^{13} M_{\odot}$ and tracing their individual IHL particles back in time, we classify the origin of each IHL particle at each epoch based on the status of the originating galaxy into three categories: brightest halo galaxy (BHG) formation/merger, satellite galaxy stripping, and pre-processing. Our study reveals that the IHL production through BHG formation/merger is the predominant production channel, contributing over 60\% of the total IHL mass across all redshifts. The second most significant IHL production channel is pre-processing, providing more than 20\% in the final HR5 snapshot. Stripping is negligible at $z>4$ but becomes gradually more important as halos mature at $z<4$. Finally, we verify that IHL production through the disruption of dwarf galaxies and in-situ formation is negligible, contributing less than ~3\% and ~0.5\% to the total IHL production, respectively. △ Less

Submitted 12 November, 2024; originally announced November 2024.

Comments: Submitted to ApJ, 14 pages, 11 figures

arXiv:2411.05256 [pdf, other]

Radiopurity measurements of liquid scintillator for the COSINE-100 Upgrade

Authors: J. Kim, C. Ha, S. H. Kim, W. K. Kim, Y. D. Kim, Y. J. Ko, E. K. Lee, H. Lee, H. S. Lee, I. S. Lee, J. Lee, S. H. Lee, S. M. Lee, Y. J. Lee, G. H. Yu

Abstract: A new 2,400 L liquid scintillator has been produced for the COSINE-100 Upgrade, which is under construction at Yemilab for the next COSINE dark matter experiment phase. The linear-alkyl-benzene-based scintillator is designed to serve as a veto for NaI(Tl) crystal targets and a separate platform for rare event searches. We measured using a sample consisting of a custom-made 445 mL cylindrical Teflo… ▽ More A new 2,400 L liquid scintillator has been produced for the COSINE-100 Upgrade, which is under construction at Yemilab for the next COSINE dark matter experiment phase. The linear-alkyl-benzene-based scintillator is designed to serve as a veto for NaI(Tl) crystal targets and a separate platform for rare event searches. We measured using a sample consisting of a custom-made 445 mL cylindrical Teflon container equipped with two 3-inch photomultiplier tubes. Analyses show activity levels of $0.091 \pm 0.042$ mBq/kg for $^{238}$U and $0.012 \pm 0.007$ mBq/kg for $^{232}$Th. △ Less

Submitted 7 November, 2024; originally announced November 2024.

arXiv:2411.02824 [pdf, other]

Layer-Adaptive State Pruning for Deep State Space Models

Authors: Minseon Gwak, Seongrok Moon, Joohwan Ko, PooGyeon Park

Abstract: Due to the lack of state dimension optimization methods, deep state space models (SSMs) have sacrificed model capacity, training search space, or stability to alleviate computational costs caused by high state dimensions. In this work, we provide a structured pruning method for SSMs, Layer-Adaptive STate pruning (LAST), which reduces the state dimension of each layer in minimizing model-level outp… ▽ More Due to the lack of state dimension optimization methods, deep state space models (SSMs) have sacrificed model capacity, training search space, or stability to alleviate computational costs caused by high state dimensions. In this work, we provide a structured pruning method for SSMs, Layer-Adaptive STate pruning (LAST), which reduces the state dimension of each layer in minimizing model-level output energy loss by extending modal truncation for a single system. LAST scores are evaluated using the $\mathcal{H}_{\infty}$ norms of subsystems and layer-wise energy normalization. The scores serve as global pruning criteria, enabling cross-layer comparison of states and layer-adaptive pruning. Across various sequence benchmarks, LAST optimizes previous SSMs, revealing the redundancy and compressibility of their state spaces. Notably, we demonstrate that, on average, pruning 33% of states still maintains performance with 0.52% accuracy loss in multi-input multi-output SSMs without retraining. Code is available at https://github.com/msgwak/LAST. △ Less

Submitted 31 January, 2025; v1 submitted 5 November, 2024; originally announced November 2024.

Comments: NeurIPS 2024, Added missing arXiv information for one reference

arXiv:2411.01974 [pdf, other]

On the phase diagram of extensive-rank symmetric matrix denoising beyond rotational invariance

Authors: Jean Barbier, Francesco Camilli, Justin Ko, Koki Okajima

Abstract: Matrix denoising is central to signal processing and machine learning. Its statistical analysis when the matrix to infer has a factorised structure with a rank growing proportionally to its dimension remains a challenge, except when it is rotationally invariant. In this case the information theoretic limits and an efficient Bayes-optimal denoising algorithm, called rotational invariant estimator [… ▽ More Matrix denoising is central to signal processing and machine learning. Its statistical analysis when the matrix to infer has a factorised structure with a rank growing proportionally to its dimension remains a challenge, except when it is rotationally invariant. In this case the information theoretic limits and an efficient Bayes-optimal denoising algorithm, called rotational invariant estimator [1,2], are known. Beyond this setting few results can be found. The reason is that the model is not a usual spin system because of the growing rank dimension, nor a matrix model (as appearing in high-energy physics) due to the lack of rotation symmetry, but rather a hybrid between the two. Here we make progress towards the understanding of Bayesian matrix denoising when the signal is a factored matrix $XX^\intercal$ that is not rotationally invariant. Monte Carlo simulations suggest the existence of a \emph{denoising-factorisation transition} separating a phase where denoising using the rotational invariant estimator remains Bayes-optimal due to universality properties of the same nature as in random matrix theory, from one where universality breaks down and better denoising is possible, though algorithmically hard. We argue that it is only beyond the transition that factorisation, i.e., estimating $X$ itself, becomes possible up to irresolvable ambiguities. On the theory side, we combine mean-field techniques in an interpretable multiscale fashion in order to access the minimum mean-square error and mutual information. Interestingly, our alternative method yields equations reproducible by the replica approach of [3]. Using numerical insights, we delimit the portion of phase diagram where we conjecture the mean-field theory to be exact, and correct it using universality when it is not. Our complete ansatz matches well the numerics in the whole phase diagram when considering finite size effects. △ Less

Submitted 14 March, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

arXiv:2411.00521 [pdf, other]

Initial Mass Functions of Young Stellar Clusters from the Gemini Spectroscopic Survey of Nearby Galaxies I. Young Massive Clusters in the Antennae galaxies

Authors: Jae-Rim Koo, Hyun-Jeong Kim, Beomdu Lim

Abstract: The stellar initial mass function (IMF) is a key parameter to understand the star formation process and the integrated properties of stellar populations in remote galaxies. We present a spectroscopic study of young massive clusters (YMCs) in the starburst galaxies NGC 4038/39. The integrated spectra of seven YMCs obtained with GMOS-S attached to the 8.2-m Gemini South telescope reveal the spectral… ▽ More The stellar initial mass function (IMF) is a key parameter to understand the star formation process and the integrated properties of stellar populations in remote galaxies. We present a spectroscopic study of young massive clusters (YMCs) in the starburst galaxies NGC 4038/39. The integrated spectra of seven YMCs obtained with GMOS-S attached to the 8.2-m Gemini South telescope reveal the spectral features associated with stellar ages and the underlying IMFs. We constrain the ages of the YMCs using the absorption lines and strong emission bands from Wolf-Rayet stars. The internal reddening is also estimated from the strength of the Na I D absorption lines. Based on these constraints, the observed spectra are matched with the synthetic spectra generated from a simple stellar population model. Several parameters of the clusters including age, reddening, cluster mass, and the underlying IMF are derived from the spectral matching. The ages of the YMCs range from 2.5 to 6.5 Myr, and these clusters contain stellar masses ranging from 1.6 X 10^5 M_sun to 7.9 X 10^7 M_sun. The underlying IMFs appear to differ from the universal form of the Salpeter/Kroupa IMF. Interestingly, massive clusters tend to have the bottom-heavy IMFs, although the masses of some clusters are overestimated due to the crowding effect. Based on this, our results suggest that the universal form of the IMF is not always valid when analyzing integrated light from unresolved stellar systems. However, further study with a larger sample size is required to reach a definite conclusion. △ Less

Submitted 1 November, 2024; originally announced November 2024.

Comments: 18 pages, 9 figures, accepted for publication in AJ

arXiv:2410.22815 [pdf, other]

Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous Clients

Authors: Jabin Koo, Minwoo Jang, Jungseul Ok

Abstract: Federated fine-tuning for Large Language Models (LLMs) has recently gained attention due to the heavy communication overhead of transmitting large model updates. Low Rank Adaptation (LoRA) has been proposed as a solution, yet its application in federated learning is complicated by discordance in aggregation. Existing methods addressing this discordance often suffer from performance degradation at… ▽ More Federated fine-tuning for Large Language Models (LLMs) has recently gained attention due to the heavy communication overhead of transmitting large model updates. Low Rank Adaptation (LoRA) has been proposed as a solution, yet its application in federated learning is complicated by discordance in aggregation. Existing methods addressing this discordance often suffer from performance degradation at low ranks in heterogeneous data settings. In response, we introduce LoRA-A2 (Low Rank Adaptation with Alternating freeze and Adaptive rank selection), which demonstrates robustness in challenging settings with low ranks and high data heterogeneity. Our experimental findings reveal that LoRA-A2 maintains performance even under extreme heterogeneity and low rank conditions, achieving up to a 99.8% reduction in uploaded parameters compared to full fine-tuning without compromising performance. This adaptive mechanism boosts robustness and communication efficiency in federated fine-tuning, enabling the practical deployment of LLMs in resource-constrained environments. △ Less

Submitted 30 October, 2024; originally announced October 2024.

arXiv:2410.19503 [pdf, other]

SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models

Authors: Jahyun Koo, Yerin Hwang, Yongil Kim, Taegwan Kang, Hyunkyung Bae, Kyomin Jung

Abstract: Despite the success of Large Language Models (LLMs), they still face challenges related to high inference costs and memory requirements. To address these issues, Knowledge Distillation (KD) has emerged as a popular method for model compression, with student-generated outputs (SGOs) as training data being particularly notable for reducing the mismatch between training and inference. However, SGOs o… ▽ More Despite the success of Large Language Models (LLMs), they still face challenges related to high inference costs and memory requirements. To address these issues, Knowledge Distillation (KD) has emerged as a popular method for model compression, with student-generated outputs (SGOs) as training data being particularly notable for reducing the mismatch between training and inference. However, SGOs often produce noisy and biased sequences, which can lead to misguidance from the teacher model, especially in long sequences. To mitigate these challenges, we propose SWITCH (Studying WIth TeaCHer for Knowledge Distillation), a novel approach that strategically incorporates the teacher model during the student's sequence generation. SWITCH identifies discrepancies between the token probabilities of the teacher and student models, allowing the teacher to intervene selectively, particularly in long sequences that are more prone to teacher misguidance. Extensive experimental results across three model families and five instruction-following datasets show that SWITCH surpasses traditional KD methods, particularly excelling in the generation of long sequential data. △ Less

Submitted 22 April, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

Comments: NAACL 2025 Findings

arXiv:2410.17862 [pdf, other]

doi 10.3847/1538-4357/ad7816

The Most Massive Early-type Galaxies Exhibit Tidal Features More Frequently in Lower-density Environments

Authors: Yongmin Yoon, Jae-Woo Kim, Jongwan Ko

Abstract: The most massive early-type galaxies (ETGs) are known to form through numerous galaxy mergers. Thus, it is intriguing to study whether their formation in low-density environments, where nearby companions are almost absent, is associated with mergers, which are directly traced by tidal features. Using the 436 most massive ETGs with $M_\mathrm{star}>10^{11.2}\,M_{\odot}$ at $z<0.04$, we determine th… ▽ More The most massive early-type galaxies (ETGs) are known to form through numerous galaxy mergers. Thus, it is intriguing to study whether their formation in low-density environments, where nearby companions are almost absent, is associated with mergers, which are directly traced by tidal features. Using the 436 most massive ETGs with $M_\mathrm{star}>10^{11.2}\,M_{\odot}$ at $z<0.04$, we determine the variation in the fraction of massive ETGs with tidal features ($f_T$) across different environments and verify whether the most massive ETGs commonly have tidal features in very low density environments. Our main discovery is that the most massive ETGs exhibit tidal features more frequently in lower-density environments. In the highest-density environments, like galaxy clusters, $f_T$ is $0.21\pm0.06$, while in the lowest-density environments it triples to $0.62\pm0.06$. This trend is stronger for more extremely massive ETGs, with $f_T$ reaching $0.92\pm0.08$ in the lowest-density environments. One explanation for our finding is that the most massive ETGs in lower-density environments have genuinely experienced recent mergers more frequently than their counterparts in higher-density environments, suggesting that they possess extended formation histories that continue into the present. Another possibility is that tidal features last shorter in denser environments owing to external factors inherent in these environments. Our additional findings that massive ETGs with bluer $u-r$ colors are a more dominant driver of our main discovery and that dust lanes are more commonly observed in massive ETGs in low-density environments imply that gas-abundant mergers primarily contribute to the increased rate of recent mergers in low-density environments. △ Less

Submitted 23 October, 2024; originally announced October 2024.

Comments: 16 pages, 10 figures, published on October 18 in ApJ

Journal ref: The Astrophysical Journal, Volume 974, Issue 2, id. 299, 13 pp. (2024)

arXiv:2410.09362 [pdf, other]

SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Margins

Authors: Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh, Sailik Sengupta, Sravan Bodapati, Aram Galstyan

Abstract: Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the preferences used in DAAs are usually collected before the alignment training begins and remain unchanged (off-policy). This can lead to two problems where the policy… ▽ More Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the preferences used in DAAs are usually collected before the alignment training begins and remain unchanged (off-policy). This can lead to two problems where the policy model (1) picks up on spurious correlations in the dataset (as opposed to learning the intended alignment expressed in the human preference labels), and (2) overfits to feedback on off-policy trajectories that have less likelihood of being generated by an updated policy model. To address these issues, we introduce Self-Reviewing and Alignment (SeRA), a cost-efficient and effective method that can be readily combined with existing DAAs. SeRA comprises of two components: (1) sample selection using implicit reward margins, which helps alleviate over-fitting to some undesired features, and (2) preference bootstrapping using implicit rewards to augment preference data with updated policy models in a cost-efficient manner. Extensive experimentation, including some on instruction-following tasks, demonstrate the effectiveness and generality of SeRA in training LLMs on offline preference datasets with DAAs. △ Less

Submitted 12 October, 2024; originally announced October 2024.

Showing 1–50 of 515 results for author: Koo, J