Skip to main content

Showing 1–50 of 987 results for author: Hwang, S

.
  1. arXiv:2506.09229  [pdf, ps, other

    cs.CV

    Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models

    Authors: Sungwon Hwang, Hyojin Jang, Kinam Kim, Minho Park, Jaegul choo

    Abstract: Fine-tuning Video Diffusion Models (VDMs) at the user level to generate videos that reflect specific attributes of training data presents notable challenges, yet remains underexplored despite its practical importance. Meanwhile, recent work such as Representation Alignment (REPA) has shown promise in improving the convergence and quality of DiT-based image diffusion models by aligning, or assimila… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 24 pages, 25 figures

  2. arXiv:2506.07177  [pdf, ps, other

    cs.CV cs.AI

    Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

    Authors: Sangwon Jang, Taekyung Ki, Jaehyeong Jo, Jaehong Yoon, Soo Ye Kim, Zhe Lin, Sung Ju Hwang

    Abstract: Advancements in diffusion models have significantly improved video quality, directing attention to fine-grained controllability. However, many existing methods depend on fine-tuning large-scale video models for specific tasks, which becomes increasingly impractical as model sizes continue to grow. In this work, we present Frame Guidance, a training-free guidance for controllable video generation b… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: Project page: https://frame-guidance-video.github.io/

  3. arXiv:2506.05522  [pdf, other

    cs.SI cs.HC

    Understanding Community-Level Blocklists in Decentralized Social Media

    Authors: Owen Xingjian Zhang, Sohyeon Hwang, Yuhan Liu, Manoel Horta Ribeiro, Andrés Monroy-Hernández

    Abstract: Community-level blocklists are key to content moderation practices in decentralized social media. These blocklists enable moderators to prevent other communities, such as those acting in bad faith, from interacting with their own -- and, if shared publicly, warn others about communities worth blocking. Prior work has examined blocklists in centralized social media, noting their potential for colle… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  4. arXiv:2506.05167  [pdf, ps, other

    cs.CL cs.AI cs.IR

    ECoRAG: Evidentiality-guided Compression for Long Context RAG

    Authors: Yeonseok Jeong, Jinsu Kim, Dohyeon Lee, Seung-won Hwang

    Abstract: Large Language Models (LLMs) have shown remarkable performance in Open-Domain Question Answering (ODQA) by leveraging external documents through Retrieval-Augmented Generation (RAG). To reduce RAG overhead, from longer context, context compression is necessary. However, prior compression methods do not focus on filtering out non-evidential information, which limit the performance in LLM-based RAG.… ▽ More

    Submitted 6 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  5. arXiv:2506.04704  [pdf, ps, other

    cs.CV cs.AI

    HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model

    Authors: Youngwan Lee, Kangsan Kim, Kwanyong Park, Ilcahe Jung, Soojin Jang, Seanie Lee, Yong-Ju Lee, Sung Ju Hwang

    Abstract: Despite emerging efforts to enhance the safety of Vision-Language Models (VLMs), current approaches face two main shortcomings. 1) Existing safety-tuning datasets and benchmarks only partially consider how image-text interactions can yield harmful content, often overlooking contextually unsafe outcomes from seemingly benign pairs. This narrow coverage leaves VLMs vulnerable to jailbreak attacks in… ▽ More

    Submitted 11 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: Project page: https://youngwanlee.github.io/holisafe

  6. arXiv:2506.04288  [pdf, ps, other

    cs.LG

    Backbone Augmented Training for Adaptations

    Authors: Jae Wan Park, Junhyeok Kim, Youngjun Jun, Hyunah Ko, Seong Jae Hwang

    Abstract: Adaptations facilitate efficient training of large backbone models, including diffusion models for image generation and transformer-based language models. While various adaptation techniques enhance performance with minimal computational resources, limited adaptation data often leads to challenges in training. To address this, we focus on the enormous amount of backbone data used to pre-train the… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  7. arXiv:2506.03678  [pdf, ps, other

    astro-ph.GA

    Searching for Dark Galaxies with HI detection from the Arecibo Legacy Fast ALFA (ALFALFA) survey

    Authors: Minseong Kwon, Ho Seong Hwang, Brian R. Kent, Ilsang Yoon, Gain Lee, Hyein Yoon

    Abstract: We present a catalog of 142 dark galaxy candidates in a region covered by the Arecibo Legacy Fast ALFA (ALFALFA) survey. We start with 344 ALFALFA HI sources without optical counterparts and remove those that do not seem to have dark galaxy origin. To do that, we first eliminate 83 sources that are known HI clouds probably formed from tidal interactions between galaxies and 13 sources that have op… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 15 pages, 8 figures, 2 tables, Accepted for publication in ApJS

  8. arXiv:2506.00910  [pdf, other

    cs.LG cs.AI

    PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models

    Authors: Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Dongseop Kim, Sung Ju Hwang

    Abstract: Knowledge distillation (KD) is a widely used framework for training compact, task-specific models by leveraging the knowledge of teacher models. However, its application to active learning (AL), which aims to minimize annotation costs through iterative sample selection, remains underexplored. This gap stems from the fact that KD typically assumes access to sufficient labeled data, whereas AL opera… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 35 pages, 30 figures

  9. arXiv:2506.00607  [pdf, ps, other

    cs.CV cs.AI

    Parallel Rescaling: Rebalancing Consistency Guidance for Personalized Diffusion Models

    Authors: JungWoo Chae, Jiyoon Kim, Sangheum Hwang

    Abstract: Personalizing diffusion models to specific users or concepts remains challenging, particularly when only a few reference images are available. Existing methods such as DreamBooth and Textual Inversion often overfit to limited data, causing misalignment between generated images and text prompts when attempting to balance identity fidelity with prompt adherence. While Direct Consistency Optimization… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  10. arXiv:2506.00232  [pdf, ps, other

    cs.CL

    ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering

    Authors: Ruofan Wu, Youngwon Lee, Fan Shu, Danmei Xu, Seung-won Hwang, Zhewei Yao, Yuxiong He, Feng Yan

    Abstract: Retrieval-Augmented Generation (RAG) systems are increasingly diverse, yet many suffer from monolithic designs that tightly couple core functions like query reformulation, retrieval, reasoning, and verification. This limits their interpretability, systematic evaluation, and targeted improvement, especially for complex multi-hop question answering. We introduce ComposeRAG, a novel modular abstracti… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  11. arXiv:2505.24553  [pdf, ps, other

    cs.CL cs.AI

    CREFT: Sequential Multi-Agent LLM for Character Relation Extraction

    Authors: Ye Eun Chun, Taeyoon Hwang, Seung-won Hwang, Byung-Hak Kim

    Abstract: Understanding complex character relations is crucial for narrative analysis and efficient script evaluation, yet existing extraction methods often fail to handle long-form narratives with nuanced interactions. To address this challenge, we present CREFT, a novel sequential framework leveraging specialized Large Language Model (LLM) agents. First, CREFT builds a base character graph through knowled… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  12. arXiv:2505.23059  [pdf, ps, other

    cs.IR cs.AI

    From Token to Action: State Machine Reasoning to Mitigate Overthinking in Information Retrieval

    Authors: Dohyeon Lee, Yeonseok Jeong, Seung-won Hwang

    Abstract: Chain-of-Thought (CoT) prompting enables complex reasoning in large language models (LLMs), including applications in information retrieval (IR). However, it often leads to overthinking, where models produce excessively long and semantically redundant traces with little or no benefit. We identify two key challenges in IR: redundant trajectories that revisit similar states and misguided reasoning t… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  13. arXiv:2505.23032  [pdf, ps, other

    cs.LG cs.AI

    Bayesian Neural Scaling Law Extrapolation with Prior-Fitted Networks

    Authors: Dongwoo Lee, Dong Bok Lee, Steven Adriaensen, Juho Lee, Sung Ju Hwang, Frank Hutter, Seon Joo Kim, Hae Beom Lee

    Abstract: Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications invol… ▽ More

    Submitted 11 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML 2025

  14. arXiv:2505.22962  [pdf, ps, other

    cs.SI cs.CY cs.HC

    Seeing the Politics of Decentralized Social Media Protocols

    Authors: Tolulope Oshinowo, Sohyeon Hwang, Amy X. Zhang, Andrés Monroy-Hernández

    Abstract: Calls to decentralize feed-based social media have been driven by concerns about the concentrated power of centralized platforms and their societal impact. In response, numerous decentralized social media protocols have emerged, each interpreting "decentralization" in different ways. We analyze four such protocols -- ActivityPub, AT Protocol, Nostr, and Farcaster -- to develop a novel conceptual f… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 22 pages, 6 figures, 3 tables

  15. arXiv:2505.19764  [pdf, ps, other

    cs.LG cs.AI

    Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding

    Authors: Patara Trirat, Wonyong Jeong, Sung Ju Hwang

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but optimizing LLM-based agentic systems remains challenging due to the vast search space of agent configurations, prompting strategies, and communication patterns. Existing approaches often rely on heuristic-based tuning or exhaustive evaluation, which can be computationally expensive and suboptimal. This… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Code will be available at https://github.com/DeepAuto-AI/agentic-predictor

  16. arXiv:2505.18512  [pdf, ps, other

    cs.IR cs.AI cs.CL cs.LG

    AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking

    Authors: Soyoung Yoon, Gyuwan Kim, Gyu-Hwung Cho, Seung-won Hwang

    Abstract: Listwise reranking with large language models (LLMs) enhances top-ranked results in retrieval-based applications. Due to the limit in context size and high inference cost of long context, reranking is typically performed over a fixed size of small subsets, with the final ranking aggregated from these partial results. This fixed computation disregards query difficulty and document distribution, lea… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 22 pages, 3 figures. The first two authors contributed equally. Author order is randomly determined via coin toss

  17. arXiv:2505.18318  [pdf, ps, other

    cs.HC

    The Relational Origins of Rules in Online Communities

    Authors: Charles Kiene, Sohyeon Hwang, Nathan TeBlunthuis, Carl Colglazier, Aaron Shaw, Benjamin Mako Hill

    Abstract: Where do rules come from in online communities? While prior studies of online community governance in social computing have sought to characterize rules by their functions within communities and documented practices of rule enforcement, they have largely overlooked rule adoption and change. This study investigates how and why online communities adopt and change their rules. We conducted a grounded… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  18. arXiv:2505.17612  [pdf, other

    cs.CL cs.AI

    Distilling LLM Agent into Small Models with Retrieval and Code Tools

    Authors: Minki Kang, Jongwon Jeong, Seanie Lee, Jaewoong Cho, Sung Ju Hwang

    Abstract: Large language models (LLMs) excel at complex reasoning tasks but remain computationally expensive, limiting their practical deployment. To address this, recent works have focused on distilling reasoning capabilities into smaller language models (sLMs) using chain-of-thought (CoT) traces from teacher LLMs. However, this approach struggles in scenarios requiring rare factual knowledge or precise co… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: preprint, v1

  19. arXiv:2505.16631  [pdf, other

    cs.IR cs.CL

    MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries

    Authors: Jonghwi Kim, Deokhyung Kang, Seonjeong Hwang, Yunsu Kim, Jungseul Ok, Gary Lee

    Abstract: Despite bilingual speakers frequently using mixed-language queries in web searches, Information Retrieval (IR) research on them remains scarce. To address this, we introduce MiLQ,Mixed-Language Query test set, the first public benchmark of mixed-language queries, confirmed as realistic and highly preferred. Experiments show that multilingual IR models perform moderately on MiLQ and inconsistently… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 16 pages, 9 figures

  20. arXiv:2505.16223  [pdf, ps, other

    cs.AI cs.LG

    MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network

    Authors: Sangyong Lee, Subo Hwang, Dohoon Kim

    Abstract: In this paper, we propose MADCluster, a novel model-agnostic anomaly detection framework utilizing self-supervised clustering. MADCluster is applicable to various deep learning architectures and addresses the 'hypersphere collapse' problem inherent in existing deep learning-based anomaly detection methods. The core idea is to cluster normal pattern data into a 'single cluster' while simultaneously… ▽ More

    Submitted 11 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 24 pages, 9 figures

  21. arXiv:2505.15137  [pdf, ps, other

    cs.CV

    Multispectral Detection Transformer with Infrared-Centric Sensor Fusion

    Authors: Seongmin Hwang, Daeyoung Han, Moongu Jeon

    Abstract: Multispectral object detection aims to leverage complementary information from visible (RGB) and infrared (IR) modalities to enable robust performance under diverse environmental conditions. In this letter, we propose IC-Fusion, a multispectral object detector that effectively fuses visible and infrared features through a lightweight and modalityaware design. Motivated by wavelet analysis and empi… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Under Review

  22. arXiv:2505.12805  [pdf, other

    cs.LG cs.AI

    FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA

    Authors: Seanie Lee, Sangwoo Park, Dong Bok Lee, Dominik Wagner, Haebin Seong, Tobias Bocklet, Juho Lee, Sung Ju Hwang

    Abstract: Low-Rank Adaptation (LoRA), which introduces a product of two trainable low-rank matrices into frozen pre-trained weights, is widely used for efficient fine-tuning of language models in federated learning (FL). However, when combined with differentially private stochastic gradient descent (DP-SGD), LoRA faces substantial noise amplification: DP-SGD perturbs per-sample gradients, and the matrix mul… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: preprint

  23. arXiv:2505.12780  [pdf, other

    cs.HC

    Beyond Individual UX: Defining Group Experience(GX) as a New Paradigm for Group-centered AI

    Authors: Soohwan Lee, Seoyeong Hwang, Kyungho Lee

    Abstract: Recent advancements in HCI and AI have predominantly centered on individual user experiences, often neglecting the emergent dynamics of group interactions. This provocation introduces Group Experience(GX) to capture the collective perceptual, emotional, and cognitive dimensions that arise when individuals interact in cohesive groups. We challenge the conventional Human-centered AI paradigm and pro… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted at DIS'25 Companion (Provocations)

  24. arXiv:2505.12233  [pdf, ps, other

    eess.IV cs.CV

    PRETI: Patient-Aware Retinal Foundation Model via Metadata-Guided Representation Learning

    Authors: Yeonkyung Lee, Woojung Han, Youngjun Jun, Hyeonmin Kim, Jungkyung Cho, Seong Jae Hwang

    Abstract: Retinal foundation models have significantly advanced retinal image analysis by leveraging self-supervised learning to reduce dependence on labeled data while achieving strong generalization. Many recent approaches enhance retinal image understanding using report supervision, but obtaining clinical reports is often costly and challenging. In contrast, metadata (e.g., age, gender) is widely availab… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: MICCAI2025 early accept

  25. arXiv:2505.11254  [pdf, ps, other

    cs.LG

    Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction

    Authors: Jeffrey Willette, Heejun Lee, Sung Ju Hwang

    Abstract: The attention mechanism of a transformer has a quadratic complexity, leading to high inference costs and latency for long sequences. However, attention matrices are mostly sparse, which implies that many entries may be omitted from computation for efficient inference. Sparse attention inference methods aim to reduce this computational burden; however, they also come with a troublesome performance… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  26. arXiv:2505.10840  [pdf, other

    astro-ph.CO

    Redshift Evolution of the Intrinsic Alignments of Galaxies and Subhalos in the Horizon Run 5 Simulation

    Authors: Sanghyeon Han, Motonari Tonegawa, Ho Seong Hwang, Yohan Dubois, Juhan Kim, Yonghwi Kim, Oh-Kyoung Kwon, Jaehyun Lee, Owain N. Snaith, Brad K. Gibson, Changbom Park

    Abstract: We investigate the redshift evolution of intrinsic alignments of the shapes of galaxies and subhalos with the large-scale structures of the universe using the cosmological hydrodynamic simulation, $\textit{Horizon Run 5}$. To this end, early-type galaxies are selected from the simulated galaxy catalogs based on stellar mass and kinematic morphology. The shapes of galaxies and subhalos are computed… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 21 pages, 26 figures, submitted to ApJ

  27. arXiv:2505.09666  [pdf, ps, other

    cs.CL cs.AI cs.LG

    System Prompt Optimization with Meta-Learning

    Authors: Yumin Choi, Jinheon Baek, Sung Ju Hwang

    Abstract: Large Language Models (LLMs) have shown remarkable capabilities, with optimizing their input prompts playing a pivotal role in maximizing their performance. However, while LLM prompts consist of both the task-agnostic system prompts and task-specific user prompts, existing work on prompt optimization has focused on user prompts specific to individual queries or tasks, and largely overlooked the sy… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  28. arXiv:2505.08528  [pdf, ps, other

    cs.LG cs.AI cs.CV

    GradMix: Gradient-based Selective Mixup for Robust Data Augmentation in Class-Incremental Learning

    Authors: Minsu Kim, Seong-Hyeon Hwang, Steven Euijong Whang

    Abstract: In the context of continual learning, acquiring new knowledge while maintaining previous knowledge presents a significant challenge. Existing methods often use experience replay techniques that store a small portion of previous task data for training. In experience replay approaches, data augmentation has emerged as a promising strategy to further improve the model performance by mixing limited pr… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  29. arXiv:2505.07675  [pdf, other

    cs.LG cs.AI cs.CV

    Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization

    Authors: Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Sung Ju Hwang

    Abstract: Vision-language models (VLMs) have achieved remarkable success across diverse tasks by leveraging rich textual information with minimal labeled data. However, deploying such large models remains challenging, particularly in resource-constrained environments. Knowledge distillation (KD) offers a well-established solution to this problem; however, recent KD approaches from VLMs often involve multi-s… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 41 pages, 19 figures, preprint

  30. arXiv:2505.06951  [pdf, ps, other

    cs.CV cs.RO

    Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation

    Authors: Seokjun Kwon, Jeongmin Shin, Namil Kim, Soonmin Hwang, Yukyung Choi

    Abstract: In autonomous driving, thermal image semantic segmentation has emerged as a critical research area, owing to its ability to provide robust scene understanding under adverse visual conditions. In particular, unsupervised domain adaptation (UDA) for thermal image segmentation can be an efficient solution to address the lack of labeled thermal datasets. Nevertheless, since these methods do not effect… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 7 pages, 4 figures, International Conference on Robotics and Automation(ICRA) 2025

  31. arXiv:2505.03905  [pdf, other

    cond-mat.mtrl-sci physics.app-ph

    Directional Thermal Emission Across Both Polarizations in Planar Photonic Architectures

    Authors: David E. Abraham, Daniel Cui, Baolai Liang, Jae S. Hwang, Parthiban Santhanam, Linus Kim, Rayen Lin, Aaswath P. Raman

    Abstract: Directional and spectral control of thermal emission is essential for applications in energy conversion, imaging, and sensing. Existing planar, lithography-free epsilon-near-zero (ENZ) films only support transverse-magnetic (TM) control of thermal emission via the Berreman mode and cannot address transverse-electric (TE) waves due to the absence of natural optical magnetism over optical and infrar… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  32. arXiv:2505.01710  [pdf, other

    astro-ph.CO astro-ph.IM

    RVSNUpy: A Python Package for Spectroscopic Redshift Measurement Based on Cross-Correlation

    Authors: Taewan Kim, Jubee Sohn, Ho Seong Hwang

    Abstract: We introduce RVSNUpy, a new Python package designed to measure spectroscopic redshifts. Based on inverse-variance weighted cross-correlation, RVSNUpy determines the redshifts by comparing observed spectra with various rest-frame template spectra. We test the performance of RVSNUpy based on ~ 6000 objects in the HectoMAP redshift survey observed with both SDSS and MMT/Hectospec. We demonstrate that… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 12 figures

  33. arXiv:2504.21850  [pdf, other

    cs.CV

    COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning

    Authors: Xindi Wu, Hee Seung Hwang, Polina Kirichenko, Olga Russakovsky

    Abstract: Multimodal Large Language Models (MLLMs) excel at simple vision-language tasks but struggle when faced with complex tasks that require multiple capabilities, such as simultaneously recognizing objects, counting them, and understanding their spatial relationships. This might be partially the result of the fact that Visual Instruction Tuning (VIT), a critical training step for MLLMs, has traditional… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 17 pages, 13 figures

  34. arXiv:2504.20734  [pdf, other

    cs.CL cs.AI cs.CV cs.IR cs.LG

    UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities

    Authors: Woongyeong Yeo, Kangsan Kim, Soyeong Jeong, Jinheon Baek, Sung Ju Hwang

    Abstract: Retrieval-Augmented Generation (RAG) has shown substantial promise in improving factual accuracy by grounding model responses with external knowledge relevant to queries. However, most existing RAG approaches are limited to a text-only corpus, and while recent efforts have extended RAG to other modalities such as images and videos, they typically operate over a single modality-specific corpus. In… ▽ More

    Submitted 19 May, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: Project page : https://universalrag.github.io

  35. arXiv:2504.19574  [pdf, other

    cs.CV

    DG-DETR: Toward Domain Generalized Detection Transformer

    Authors: Seongmin Hwang, Daeyoung Han, Moongu Jeon

    Abstract: End-to-end Transformer-based detectors (DETRs) have demonstrated strong detection performance. However, domain generalization (DG) research has primarily focused on convolutional neural network (CNN)-based detectors, while paying little attention to enhancing the robustness of DETRs. In this letter, we introduce a Domain Generalized DEtection TRansformer (DG-DETR), a simple, effective, and plug-an… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: Under Review

  36. arXiv:2504.18157  [pdf, other

    eess.AS cs.SD

    DOSE : Drum One-Shot Extraction from Music Mixture

    Authors: Suntae Hwang, Seonghyeon Kang, Kyungsu Kim, Semin Ahn, Kyogu Lee

    Abstract: Drum one-shot samples are crucial for music production, particularly in sound design and electronic music. This paper introduces Drum One-Shot Extraction, a task in which the goal is to extract drum one-shots that are present in the music mixture. To facilitate this, we propose the Random Mixture One-shot Dataset (RMOD), comprising large-scale, randomly arranged music mixtures paired with correspo… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Published in IEEE ICASSP 2025

  37. arXiv:2504.17219  [pdf, other

    cs.LG cs.AI cs.CR

    Enhancing Variational Autoencoders with Smooth Robust Latent Encoding

    Authors: Hyomin Lee, Minseon Kim, Sangwon Jang, Jongheon Jeong, Sung Ju Hwang

    Abstract: Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models, as in Stable Diffusion, yet questions regarding their robustness remain largely underexplored. Although adversarial training has been an established technique for enhancing robustness in predictive models, it has been overlooked for generative models due to concerns about potential fidelity degr… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: Under review

  38. arXiv:2504.17192  [pdf, other

    cs.CL

    Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

    Authors: Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang

    Abstract: Despite the rapid growth of machine learning research, corresponding code implementations are often unavailable, making it slow and labor-intensive for researchers to reproduce results and build upon prior work. In the meantime, recent Large Language Models (LLMs) excel at understanding scientific documents and generating high-quality code. Inspired by this, we introduce PaperCoder, a multi-agent… ▽ More

    Submitted 18 May, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  39. arXiv:2504.15192  [pdf

    cs.CV cs.AI

    Breast density in MRI: an AI-based quantification and relationship to assessment in mammography

    Authors: Yaqian Chen, Lin Li, Hanxue Gu, Haoyu Dong, Derek L. Nguyen, Allan D. Kirk, Maciej A. Mazurowski, E. Shelley Hwang

    Abstract: Mammographic breast density is a well-established risk factor for breast cancer. Recently there has been interest in breast MRI as an adjunct to mammography, as this modality provides an orthogonal and highly quantitative assessment of breast tissue. However, its 3D nature poses analytic challenges related to delineating and aggregating complex structures across slices. Here, we applied an in-hous… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 13 pages, 5 figures

  40. arXiv:2504.14893  [pdf, other

    cs.AR

    Hardware-based Heterogeneous Memory Management for Large Language Model Inference

    Authors: Soojin Hwang, Jungwoo Kim, Sanghyeon Lee, Hongbeen Kim, Jaehyuk Huh

    Abstract: A large language model (LLM) is one of the most important emerging machine learning applications nowadays. However, due to its huge model size and runtime increase of the memory footprint, LLM inferences suffer from the lack of memory capacity in conventional systems consisting of multiple GPUs with a modest amount of high bandwidth memory. Moreover, since LLM contains many bandwidthintensive kern… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  41. arXiv:2504.14396  [pdf, other

    cs.CV

    SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation

    Authors: Minho Park, Taewoong Kang, Jooyeol Yun, Sungwon Hwang, Jaegul Choo

    Abstract: The increasing demand for AR/VR applications has highlighted the need for high-quality 360-degree panoramic content. However, generating high-quality 360-degree panoramic images and videos remains a challenging task due to the severe distortions introduced by equirectangular projection (ERP). Existing approaches either fine-tune pretrained diffusion models on limited ERP datasets or attempt tuning… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  42. arXiv:2504.09097  [pdf, other

    cs.CV

    BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting

    Authors: Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Junuk Cha, Soohyun Hwang, Hyein Hwang, Seungryul Baek

    Abstract: Reconstructing 3Ds of hand-object interaction (HOI) is a fundamental problem that can find numerous applications. Despite recent advances, there is no comprehensive pipeline yet for bimanual class-agnostic interaction reconstruction from a monocular RGB video, where two hands and an unknown object are interacting with each other. Previous works tackled the limited hand-object interaction case, whe… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR 2025

  43. arXiv:2504.06028  [pdf, other

    q-fin.CP q-fin.ST

    A Mean-Reverting Model of Exchange Rate Risk Premium Using Ornstein-Uhlenbeck Dynamics

    Authors: SeungJae Hwang

    Abstract: This paper examines the empirical failure of uncovered interest parity (UIP) and proposes a structural explanation based on a mean-reverting risk premium. We define a realized premium as the deviation between observed exchange rate returns and the interest rate differential, and demonstrate its strong mean-reverting behavior across multiple horizons. Motivated by this pattern, we model the risk pr… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 7 pages, 5 figures. Includes empirical backtesting of a continuous-time stochastic model. Independent undergraduate research

    MSC Class: 91G80; 60J60

  44. arXiv:2504.05616  [pdf, other

    astro-ph.GA astro-ph.CO

    A Redshift Survey of the Coma Cluster (A1656): Understanding the Nature of Subhalos in the Weak-lensing Map

    Authors: Wooseok Kang, Ho Seong Hwang, Nobuhiro Okabe, Changbom Park

    Abstract: We study the physical properties of weak-lensing subhalos in the Coma cluster of galaxies using data from galaxy redshift surveys. The data include 12989 galaxies with measured spectroscopic redshifts (2184 from our MMT/Hectospec observation and 10807 from the literature). The $r$-band magnitude limit at which the differential spectroscopic completeness drops below 50% is 20.2 mag, which is spatia… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: 23 pages excluding appendix, 14 figures, accepted for publication in ApJS. 2 figure sets to be published as online-only materials are included in the appendix

  45. arXiv:2504.02012  [pdf, other

    cs.LG

    Instruction-Guided Autoregressive Neural Network Parameter Generation

    Authors: Soro Bedionita, Bruno Andreis, Song Chong, Sung Ju Hwang

    Abstract: Learning to generate neural network parameters conditioned on task descriptions and architecture specifications is pivotal for advancing model adaptability and transfer learning. Existing methods especially those based on diffusion models suffer from limited scalability to large architectures, rigidity in handling varying network depths, and disjointed parameter generation that undermines inter-la… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  46. arXiv:2503.22168  [pdf, other

    cs.CV

    Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis

    Authors: Woojung Han, Yeonkyung Lee, Chanyoung Kim, Kwanghyun Park, Seong Jae Hwang

    Abstract: Diffusion-based text-to-image (T2I) models have recently excelled in high-quality image generation, particularly in a training-free manner, enabling cost-effective adaptability and generalization across diverse tasks. However, while the existing methods have been continuously focusing on several challenges, such as "missing objects" and "mismatched attributes," another critical issue of "mislocate… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: CVPR2025

  47. arXiv:2503.22163  [pdf, other

    cs.LG

    T-CIL: Temperature Scaling using Adversarial Perturbation for Calibration in Class-Incremental Learning

    Authors: Seong-Hyeon Hwang, Minsu Kim, Steven Euijong Whang

    Abstract: We study model confidence calibration in class-incremental learning, where models learn from sequential tasks with different class sets. While existing works primarily focus on accuracy, maintaining calibrated confidence has been largely overlooked. Unfortunately, most post-hoc calibration techniques are not designed to work with the limited memories of old-task data typical in class-incremental l… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  48. Design of Seamless Multi-modal Interaction Framework for Intelligent Virtual Agents in Wearable Mixed Reality Environment

    Authors: Ghazanfar Ali, Hong-Quan Le, Junho Kim, Seoung-won Hwang, Jae-In Hwang

    Abstract: In this paper, we present the design of a multimodal interaction framework for intelligent virtual agents in wearable mixed reality environments, especially for interactive applications at museums, botanical gardens, and similar places. These places need engaging and no-repetitive digital content delivery to maximize user involvement. An intelligent virtual agent is a promising mode for both purpo… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: 6 pages, 14 Figures, Computer Animation and Social Agents (CASA 2019)

    Journal ref: CASA 2019: Proceedings of the 32nd International Conference on Computer Animation and Social Agents - Year 2019 - Pages 47 - 52

  49. arXiv:2503.18817  [pdf, other

    cs.CV cs.AI

    Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

    Authors: Jeonghyeon Kim, Sangheum Hwang

    Abstract: Prior research on out-of-distribution detection (OoDD) has primarily focused on single-modality models. Recently, with the advent of large-scale pretrained vision-language models such as CLIP, OoDD methods utilizing such multi-modal representations through zero-shot and prompt learning strategies have emerged. However, these methods typically involve either freezing the pretrained weights or only… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  50. arXiv:2503.18642  [pdf, other

    eess.IV cs.CV

    Rethinking Glaucoma Calibration: Voting-Based Binocular and Metadata Integration

    Authors: Taejin Jeong, Joohyeok Kim, Jaehoon Joo, Yeonwoo Jung, Hyeonmin Kim, Seong Jae Hwang

    Abstract: Glaucoma is an incurable ophthalmic disease that damages the optic nerve, leads to vision loss, and ranks among the leading causes of blindness worldwide. Diagnosing glaucoma typically involves fundus photography, optical coherence tomography (OCT), and visual field testing. However, the high cost of OCT often leads to reliance on fundus photography and visual field testing, both of which exhibit… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.