Skip to main content

Showing 1–50 of 114 results for author: Im, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04482  [pdf, ps, other

    cs.CV

    A Training-Free Style-Personalization via Scale-wise Autoregressive Model

    Authors: Kyoungmin Lee, Jihun Park, Jongmin Gim, Wonhyeok Choi, Kyumin Hwang, Jaeyeul Kim, Sunghoon Im

    Abstract: We present a training-free framework for style-personalized image generation that controls content and style information during inference using a scale-wise autoregressive model. Our method employs a three-path design--content, style, and generation--each guided by a corresponding text prompt, enabling flexible and efficient control over image semantics without any additional training. A central c… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: 13 pages, 10 figures

  2. arXiv:2507.02145  [pdf, ps, other

    cs.CL cs.AI

    Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization

    Authors: Keyan Jin, Yapeng Wang, Leonel Santos, Tao Fang, Xu Yang, Sio Kei Im, Hugo Gonçalo Oliveira

    Abstract: Dialogue summarization is a challenging task with significant practical value in customer service, meeting analysis, and conversational AI. Although large language models (LLMs) have achieved substantial progress in summarization tasks, the performance of step-by-step reasoning architectures-specifically Long Chain-of-Thought (CoT) implementations such as OpenAI-o1 and DeepSeek-R1-remains unexplor… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2507.00474  [pdf, ps, other

    cs.CV

    ADAptation: Reconstruction-based Unsupervised Active Learning for Breast Ultrasound Diagnosis

    Authors: Yaofei Duan, Yuhao Huang, Xin Yang, Luyi Han, Xinyu Xie, Zhiyuan Zhu, Ping He, Ka-Hou Chan, Ligang Cui, Sio-Kei Im, Dong Ni, Tao Tan

    Abstract: Deep learning-based diagnostic models often suffer performance drops due to distribution shifts between training (source) and test (target) domains. Collecting and labeling sufficient target domain data for model retraining represents an optimal solution, yet is limited by time and scarce resources. Active learning (AL) offers an efficient approach to reduce annotation costs while maintaining perf… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 11 pages, 4 figures, 4 tables. Accepted by conference MICCAI2025

  4. arXiv:2506.07510  [pdf, ps, other

    cs.CL

    DeRAGEC: Denoising Named Entity Candidates with Synthetic Rationale for ASR Error Correction

    Authors: Solee Im, Wonjun Lee, Jinmyeong An, Yunsu Kim, Jungseul Ok, Gary Geunbae Lee

    Abstract: We present DeRAGEC, a method for improving Named Entity (NE) correction in Automatic Speech Recognition (ASR) systems. By extending the Retrieval-Augmented Generative Error Correction (RAGEC) framework, DeRAGEC employs synthetic denoising rationales to filter out noisy NE candidates before correction. By leveraging phonetic similarity and augmented definitions, it refines noisy retrieved NEs using… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: ACL2025 Findings

  5. arXiv:2505.23400  [pdf, ps, other

    cs.CV

    Bridging Geometric and Semantic Foundation Models for Generalized Monocular Depth Estimation

    Authors: Sanggyun Ma, Wonjoon Choi, Jihun Park, Jaeyeul Kim, Seunghun Lee, Jiwan Seo, Sunghoon Im

    Abstract: We present Bridging Geometric and Semantic (BriGeS), an effective method that fuses geometric and semantic information within foundation models to enhance Monocular Depth Estimation (MDE). Central to BriGeS is the Bridging Gate, which integrates the complementary strengths of depth and segmentation foundation models. This integration is further refined by our Attention Temperature Scaling techniqu… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  6. arXiv:2505.21364  [pdf, ps, other

    cs.LG cs.AI

    Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders

    Authors: James Oldfield, Shawn Im, Yixuan Li, Mihalis A. Nicolaou, Ioannis Patras, Grigorios G Chrysos

    Abstract: Multilayer perceptrons (MLPs) are an integral part of large language models, yet their dense representations render them difficult to understand, edit, and steer. Recent methods learn interpretable approximations via neuron-level sparsity, yet fail to faithfully reconstruct the original mapping--significantly increasing model's next-token cross-entropy loss. In this paper, we advocate for moving t… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  7. arXiv:2505.13946  [pdf, ps, other

    cs.AI

    Visual Instruction Bottleneck Tuning

    Authors: Changdae Oh, Jiatong Li, Shawn Im, Yixuan Li

    Abstract: Despite widespread adoption, multimodal large language models (MLLMs) suffer performance degradation when encountering unfamiliar queries under distribution shifts. Existing methods to improve MLLM generalization typically require either more instruction data or larger advanced model architectures, both of which incur non-trivial human labor or computational costs. In this work, we take an alterna… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  8. arXiv:2504.18503  [pdf, other

    cs.DS

    Online Distributed Queue Length Estimation

    Authors: Aditya Bhaskara, Sreenivas Gollapudi, Sungjin Im, Kostas Kollias, Kamesh Munagala

    Abstract: Queue length monitoring is a commonly arising problem in numerous applications such as queue management systems, scheduling, and traffic monitoring. Motivated by such applications, we formulate a queue monitoring problem, where there is a FIFO queue with arbitrary arrivals and departures, and a server needs to monitor the length of a queue by using decentralized pings from packets in the queue. Pa… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: To appear in SIAM Conference on Applied and Computational Discrete Algorithms (ACDA) 2025

  9. arXiv:2504.06144  [pdf, other

    cs.CV

    A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model

    Authors: Jihun Park, Jongmin Gim, Kyoungmin Lee, Minseok Oh, Minwoo Choi, Jaeyeul Kim, Woo Chool Park, Sunghoon Im

    Abstract: We present a training-free style-aligned image generation method that leverages a scale-wise autoregressive model. While large-scale text-to-image (T2I) models, particularly diffusion-based methods, have demonstrated impressive generation quality, they often suffer from style misalignment across generated image sets and slow inference speeds, limiting their practical usability. To address these is… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 17 pages, 15 figures

  10. arXiv:2504.02770  [pdf, ps, other

    cs.DB cs.DS

    Efficient Algorithms for Cardinality Estimation and Conjunctive Query Evaluation With Simple Degree Constraints

    Authors: Sungjin Im, Benjamin Moseley, Hung Q. Ngo, Kirk Pruhs

    Abstract: Cardinality estimation and conjunctive query evaluation are two of the most fundamental problems in database query processing. Recent work proposed, studied, and implemented a robust and practical information-theoretic cardinality estimation framework. In this framework, the estimator is the cardinality upper bound of a conjunctive query subject to ``degree-constraints'', which model a rich set of… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2211.08381

  11. arXiv:2503.22209  [pdf, other

    cs.CV cs.LG

    Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces

    Authors: Wonhyeok Choi, Kyumin Hwang, Minwoo Choi, Kiljoon Han, Wonjoon Choi, Mingyu Shin, Sunghoon Im

    Abstract: Self-supervised monocular depth estimation (SSMDE) has gained attention in the field of deep learning as it estimates depth without requiring ground truth depth maps. This approach typically uses a photometric consistency loss between a synthesized image, generated from the estimated depth, and the original image, thereby reducing the need for extensive dataset acquisition. However, the convention… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Accepted at AAAI 2025

  12. arXiv:2502.21001  [pdf, other

    cs.CV

    Towards Lossless Implicit Neural Representation via Bit Plane Decomposition

    Authors: Woo Kyoung Han, Byeonghun Lee, Hyunmin Cho, Sunghoon Im, Kyong Hwan Jin

    Abstract: We quantify the upper bound on the size of the implicit neural representation (INR) model from a digital perspective. The upper bound of the model size increases exponentially as the required bit-precision increases. To this end, we present a bit-plane decomposition method that makes INR predict bit-planes, producing the same effect as reducing the upper bound of the model size. We validate our hy… ▽ More

    Submitted 20 March, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

  13. arXiv:2502.18006  [pdf, other

    quant-ph cs.ET cs.MM

    Adaptive Quantum Scaling Model for Histogram Distribution-based Quantum Watermarking

    Authors: Zheng Xing, Chan-Tong Lam, Xiaochen Yuan, Sio-Kei Im, Penousal Machado

    Abstract: The development of quantum image representation and quantum measurement techniques has made quantum image processing research a hot topic. In this paper, a novel Adaptive Quantum Scaling Model (AQSM) is first proposed for scrambling watermark images. Then, on the basis of the proposed AQSM, a novel quantum watermarking scheme is presented. Unlike existing quantum watermarking schemes with fixed em… ▽ More

    Submitted 31 March, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  14. arXiv:2502.14573  [pdf, other

    cs.CV cs.LG

    Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining

    Authors: Wonhyeok Choi, Kyumin Hwang, Wei Peng, Minwoo Choi, Sunghoon Im

    Abstract: Self-supervised monocular depth estimation (SSMDE) aims to predict the dense depth map of a monocular image, by learning depth from RGB image sequences, eliminating the need for ground-truth depth labels. Although this approach simplifies data acquisition compared to supervised methods, it struggles with reflective surfaces, as they violate the assumptions of Lambertian reflectance, leading to ina… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted at ICLR 2025

  15. arXiv:2502.14259  [pdf, ps, other

    cs.LG

    LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records

    Authors: Sujeong Im, Jungwoo Oh, Edward Choi

    Abstract: Lab tests are fundamental for diagnosing diseases and monitoring patient conditions. However, frequent testing can be burdensome for patients, and test results may not always be immediately available. To address these challenges, we propose LabTOP, a unified model that predicts lab test outcomes by leveraging a language modeling approach on EHR data. Unlike conventional methods that estimate only… ▽ More

    Submitted 5 July, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: 11 pages for main text, 13 pages for appendix

  16. arXiv:2502.02716  [pdf, other

    cs.LG cs.CL

    A Unified Understanding and Evaluation of Steering Methods

    Authors: Shawn Im, Yixuan Li

    Abstract: Steering methods provide a practical approach to controlling large language models by applying steering vectors to intermediate activations, guiding outputs toward desired behaviors while avoiding retraining. Despite their growing importance, the field lacks a unified understanding and consistent evaluation across tasks and datasets, hindering progress. This paper introduces a unified framework fo… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  17. arXiv:2502.01983  [pdf, ps, other

    math-ph cs.IT

    Diagrammatics of information

    Authors: Mee Seong Im, Clement Kam, Caden Pici

    Abstract: We introduce a diagrammatic perspective for Shannon entropy created by the first author and Mikhail Khovanov and connect it to information theory and mutual information. We also give two complete proofs that the $5$-term dilogarithm deforms to the $4$-term infinitesimal dilogarithm.

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: 33 pages, many figures

    MSC Class: Primary: 57K16; 18M30; 28D20; Secondary: 37A35; 68P30; 94A15; 94A40

  18. arXiv:2502.00577  [pdf, other

    cs.AI cs.CL cs.LG

    Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach

    Authors: Changdae Oh, Zhen Fang, Shawn Im, Xuefeng Du, Yixuan Li

    Abstract: Multimodal large language models (MLLMs) have shown promising capabilities but struggle under distribution shifts, where evaluation data differ from instruction tuning distributions. Although previous works have provided empirical evaluations, we argue that establishing a formal framework that can characterize and quantify the risk of MLLMs is necessary to ensure the safe and reliable application… ▽ More

    Submitted 24 May, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: ICML 2025 camera-ready

  19. arXiv:2501.19010  [pdf, other

    cs.CL cs.SD eess.AS

    DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition

    Authors: Wonjun Lee, Solee Im, Heejin Do, Yunsu Kim, Jungseul Ok, Gary Geunbae Lee

    Abstract: Dysarthric speech recognition often suffers from performance degradation due to the intrinsic diversity of dysarthric severity and extrinsic disparity from normal speech. To bridge these gaps, we propose a Dynamic Phoneme-level Contrastive Learning (DyPCL) method, which leads to obtaining invariant representations across diverse speakers. We decompose the speech utterance into phoneme segments for… ▽ More

    Submitted 3 February, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: NAACL 2025 main conference, 9pages, 1 page appendix

  20. arXiv:2501.02739  [pdf, other

    cs.CL cs.AI cs.LG

    TARDiS : Text Augmentation for Refining Diversity and Separability

    Authors: Kyungmin Kim, SangHun Im, GiBaeg Kim, Heung-Seon Oh

    Abstract: Text augmentation (TA) is a critical technique for text classification, especially in few-shot settings. This paper introduces a novel LLM-based TA method, TARDiS, to address challenges inherent in the generation and alignment stages of two-stage TA methods. For the generation stage, we propose two generation processes, SEG and CEG, incorporating multiple class-specific prompts to enhance diversit… ▽ More

    Submitted 5 January, 2025; originally announced January 2025.

    Comments: 10 pages

  21. arXiv:2411.16030  [pdf, other

    cs.LG cs.DS

    Binary Search with Distributional Predictions

    Authors: Michael Dinitz, Sungjin Im, Thomas Lavastida, Benjamin Moseley, Aidin Niaparast, Sergei Vassilvitskii

    Abstract: Algorithms with (machine-learned) predictions is a powerful framework for combining traditional worst-case algorithms with modern machine learning. However, the vast majority of work in this space assumes that the prediction itself is non-probabilistic, even if it is generated by some stochastic process (such as a machine learning system). This is a poor fit for modern ML, particularly modern neur… ▽ More

    Submitted 24 November, 2024; originally announced November 2024.

  22. arXiv:2410.07497  [pdf, ps, other

    cs.GT cs.DS

    Strategic Facility Location via Predictions

    Authors: Qingyun Chen, Nick Gravin, Sungjin Im

    Abstract: The facility location with strategic agents is a canonical problem in the literature on mechanism design without money. Recently, Agrawal et. al. considered this problem in the context of machine learning augmented algorithms, where the mechanism designer is also given a prediction of the optimal facility location. An ideal mechanism in this framework produces an outcome that is close to the socia… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: To appear in WINE 2024

  23. arXiv:2410.01957  [pdf, ps, other

    cs.CL

    Challenges and Future Directions of Data-Centric AI Alignment

    Authors: Min-Hsuan Yeh, Jeffrey Wang, Xuefeng Du, Seongheon Park, Leitian Tao, Shawn Im, Yixuan Li

    Abstract: As AI systems become increasingly capable and influential, ensuring their alignment with human values, preferences, and goals has become a critical research focus. Current alignment methods primarily focus on designing algorithms and loss functions but often underestimate the crucial role of data. This paper advocates for a shift towards data-centric AI alignment, emphasizing the need to enhance t… ▽ More

    Submitted 1 May, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: ICML 2025

  24. arXiv:2409.15384  [pdf, other

    eess.IV cs.CV cs.LG

    BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow

    Authors: EungGu Kang, Byeonghun Lee, Sunghoon Im, Kyong Hwan Jin

    Abstract: Multi frame super-resolution(MFSR) achieves higher performance than single image super-resolution (SISR), because MFSR leverages abundant information from multiple frames. Recent MFSR approaches adapt the deformable convolution network (DCN) to align the frames. However, the existing MFSR suffers from misalignments between the reference and source frames due to the limitations of DCN, such as smal… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

    Comments: 12 pages

  25. arXiv:2409.08462  [pdf, ps, other

    math.KT cs.IT math-ph math.CT

    Entropy, cocycles, and their diagrammatics

    Authors: Mee Seong Im, Mikhail Khovanov

    Abstract: The first part of the paper explains how to encode a one-cocycle and a two-cocycle on a group $G$ with values in its representation by networks of planar trivalent graphs with edges labelled by elements of $G$, elements of the representation floating in the regions, and suitable rules for manipulation of these diagrams. When the group is a semidirect product, there is a similar presentation via ov… ▽ More

    Submitted 8 October, 2024; v1 submitted 12 September, 2024; originally announced September 2024.

    Comments: In v2, expanded Remarks 4.6, 4.8 and added more references. 82 pages, many figures

    MSC Class: Primary: 94A17; 20J06; 18M10; 18M30; Secondary: 18B40; 37A20; 18G45

  26. arXiv:2409.06263  [pdf, other

    cs.CL cs.AI

    Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking

    Authors: Jihyun Lee, Solee Im, Wonjun Lee, Gary Geunbae Lee

    Abstract: Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations. However, its accuracy drops significantly in spoken dialogue environments due to named entity errors from Automatic Speech Recognition (ASR) systems. We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of D… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  27. arXiv:2409.03020  [pdf, ps, other

    cs.DS

    Online Scheduling via Gradient Descent for Weighted Flow Time Minimization

    Authors: Qingyun Chen, Sungjin Im, Aditya Petety

    Abstract: In this paper, we explore how a natural generalization of Shortest Remaining Processing Time (SRPT) can be a powerful \emph{meta-algorithm} for online scheduling. The meta-algorithm processes jobs to maximally reduce the objective of the corresponding offline scheduling problem of the remaining jobs: minimizing the total weighted completion time of them (the residual optimum). We show that it achi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  28. arXiv:2408.08461  [pdf, other

    cs.CV

    Style-Editor: Text-driven object-centric style editing

    Authors: Jihun Park, Jongmin Gim, Kyoungmin Lee, Seunghun Lee, Sunghoon Im

    Abstract: We present Text-driven object-centric style editing model named Style-Editor, a novel method that guides style editing at an object-centric level using textual inputs. The core of Style-Editor is our Patch-wise Co-Directional (PCD) loss, meticulously designed for precise object-centric editing that are closely aligned with the input text. This loss combines a patch directional loss for text-guided… ▽ More

    Submitted 8 April, 2025; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: 22 pages, 19 figures, CVPR 2025

  29. arXiv:2408.03459  [pdf, other

    cs.LG

    On the Generalization of Preference Learning with DPO

    Authors: Shawn Im, Yixuan Li

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities but often struggle to align with human preferences, leading to harmful or undesirable outputs. Preference learning, which trains models to distinguish between preferred and non-preferred responses based on human feedback, has become a crucial component for ensuring that LLMs align with human values. Despite the widespread adopt… ▽ More

    Submitted 6 December, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  30. arXiv:2407.07995  [pdf, other

    cs.CV

    Flow4D: Leveraging 4D Voxel Network for LiDAR Scene Flow Estimation

    Authors: Jaeyeul Kim, Jungwan Woo, Ukcheol Shin, Jean Oh, Sunghoon Im

    Abstract: Understanding the motion states of the surrounding environment is critical for safe autonomous driving. These motion states can be accurately derived from scene flow, which captures the three-dimensional motion field of points. Existing LiDAR scene flow methods extract spatial features from each point cloud and then fuse them channel-wise, resulting in the implicit extraction of spatio-temporal fe… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  31. arXiv:2407.03010  [pdf, other

    cs.CV

    Context-Aware Video Instance Segmentation

    Authors: Seunghun Lee, Jiwan Seo, Kiljoon Han, Minwoo Choi, Sunghoon Im

    Abstract: In this paper, we introduce the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. To efficiently extract and leverage this information, we propose the Context-Aware Instance Tracker (CAIT), which merges contextual data surrounding the instances with the core instance features… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Project page: https://seung-hun-lee.github.io/projects/CAVIS/

  32. Online Load and Graph Balancing for Random Order Inputs

    Authors: Sungjin Im, Ravi Kumar, Shi Li, Aditya Petety, Manish Purohit

    Abstract: Online load balancing for heterogeneous machines aims to minimize the makespan (maximum machine workload) by scheduling arriving jobs with varying sizes on different machines. In the adversarial setting, where an adversary chooses not only the collection of job sizes but also their arrival order, the problem is well-understood and the optimal competitive ratio is known to be $Θ(\log m)$ where $m$… ▽ More

    Submitted 20 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  33. arXiv:2404.05558  [pdf, other

    eess.IV cs.CV

    JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients

    Authors: Woo Kyoung Han, Sunghoon Im, Jaedeok Kim, Kyong Hwan Jin

    Abstract: We propose a practical approach to JPEG image decoding, utilizing a local implicit neural representation with continuous cosine formulation. The JPEG algorithm significantly quantizes discrete cosine transform (DCT) spectra to achieve a high compression rate, inevitably resulting in quality degradation while encoding an image. We have designed a continuous cosine spectrum estimator to address the… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  34. arXiv:2404.01984  [pdf, other

    cs.CV

    Fashion Style Editing with Generative Human Prior

    Authors: Chaerin Kong, Seungyong Lee, Soohyeok Im, Wonsuk Yang

    Abstract: Image editing has been a long-standing challenge in the research community with its far-reaching impact on numerous applications. Recently, text-driven methods started to deliver promising results in domains like human faces, but their applications to more complex domains have been relatively limited. In this work, we explore the task of fashion style editing, where we aim to manipulate the fashio… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 5 pages

  35. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  36. arXiv:2403.18742  [pdf, other

    cs.LG cs.AI

    Understanding the Learning Dynamics of Alignment with Human Feedback

    Authors: Shawn Im, Yixuan Li

    Abstract: Aligning large language models (LLMs) with human intentions has become a critical task for safely deploying models in real-world systems. While existing alignment approaches have seen empirical success, theoretically understanding how these methods affect model behavior remains an open question. Our work provides an initial attempt to theoretically analyze the learning dynamics of human preference… ▽ More

    Submitted 6 August, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  37. arXiv:2403.03468  [pdf, other

    cs.CV

    Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator

    Authors: Wonhyeok Choi, Mingyu Shin, Hyukzae Lee, Jaehoon Cho, Jaehyeon Park, Sunghoon Im

    Abstract: Real-time processing is crucial in autonomous driving systems due to the imperative of instantaneous decision-making and rapid response. In real-world scenarios, autonomous vehicles are continuously tasked with interpreting their surroundings, analyzing intricate sensor data, and making decisions within split seconds to ensure safety through numerous computer vision tasks. In this paper, we presen… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted at ICRA 2024

  38. Analysis of the Two-Step Heterogeneous Transfer Learning for Laryngeal Blood Vessel Classification: Issue and Improvement

    Authors: Xinyi Fang, Xu Yang, Chak Fong Chong, Kei Long Wong, Yapeng Wang, Tiankui Zhang, Sio-Kei Im

    Abstract: Accurate classification of laryngeal vascular as benign or malignant is crucial for early detection of laryngeal cancer. However, organizations with limited access to laryngeal vascular images face challenges due to the lack of large and homogeneous public datasets for effective learning. Distinguished from the most familiar works, which directly transfer the ImageNet pre-trained models to the tar… ▽ More

    Submitted 14 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  39. Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels

    Authors: Chak Fong Chong, Xinyi Fang, Jielong Guo, Yapeng Wang, Wei Ke, Chan-Tong Lam, Sio-Kei Im

    Abstract: Large-scale image datasets are often partially labeled, where only a few categories' labels are known for each image. Assigning pseudo-labels to unknown labels to gain additional training signals has become prevalent for training deep classification models. However, some pseudo-labels are inevitably incorrect, leading to a notable decline in the model classification performance. In this paper, we… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Journal ref: Neurocomputing 2025

  40. arXiv:2401.13053  [pdf, other

    cs.GT cs.DS

    Data Exchange Markets via Utility Balancing

    Authors: Aditya Bhaskara, Sreenivas Gollapudi, Sungjin Im, Kostas Kollias, Kamesh Munagala, Govind S. Sankar

    Abstract: This paper explores the design of a balanced data-sharing marketplace for entities with heterogeneous datasets and machine learning models that they seek to refine using data from other agents. The goal of the marketplace is to encourage participation for data sharing in the presence of such heterogeneity. Our market design approach for data sharing focuses on interim utility balance, where partic… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: To appear in WWW 2024

  41. arXiv:2401.01075  [pdf, other

    cs.CV

    Depth-discriminative Metric Learning for Monocular 3D Object Detection

    Authors: Wonhyeok Choi, Mingyu Shin, Sunghoon Im

    Abstract: Monocular 3D object detection poses a significant challenge due to the lack of depth information in RGB images. Many existing methods strive to enhance the object depth estimation performance by allocating additional parameters for object depth estimation, utilizing extra modules or data. In contrast, we introduce a novel metric learning scheme that encourages the model to extract depth-discrimina… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Accepted at NeurIPS 2023

  42. arXiv:2312.17033  [pdf, ps, other

    math.CT cs.FL math-ph math.DS math.QA

    Boolean TQFTs with accumulating defects, sofic systems, and automata for infinite words

    Authors: Paul Gustafson, Mee Seong Im, Mikhail Khovanov

    Abstract: Any finite state automaton gives rise to a Boolean one-dimensional TQFT with defects and inner endpoints of cobordisms. This paper extends the correspondence to Boolean TQFTs where defects accumulate toward inner endpoints, relating such TQFTs and topological theories to sofic systems and $ω$-automata.

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 31 pages, many figures

    MSC Class: Primary: 57K16; 68Q45; 18M05; 37B10; Secondary: 06A12; 68Q70; 18B20

  43. arXiv:2312.14063  [pdf, other

    cs.DB cs.DS

    Polynomial Time Convergence of the Iterative Evaluation of Datalogo Programs

    Authors: Sungjin Im, Benjamin Moseley, Hung Q. Ngo, Kirk Pruhs

    Abstract: Datalogo is an extension of Datalog that allows for aggregation and recursion over an arbitrary commutative semiring. Like Datalog, Datalogo programs can be evaluated via the natural iterative algorithm until a fixed point is reached. However unlike Datalog, the natural iterative evaluation of some Datalogo programs over some semirings may not converge. It is known that the commutative semirings f… ▽ More

    Submitted 21 February, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

  44. arXiv:2312.12098  [pdf, other

    cs.CV

    Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains

    Authors: Jaeyeul Kim, Jungwan Woo, Jeonghoon Kim, Sunghoon Im

    Abstract: In the realm of LiDAR-based perception, significant strides have been made, yet domain generalization remains a substantial challenge. The performance often deteriorates when models are applied to unfamiliar datasets with different LiDAR sensors or deployed in new environments, primarily due to variations in point cloud density distributions. To tackle this challenge, we propose a Density Discrimi… ▽ More

    Submitted 16 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by ECCV 2024

  45. arXiv:2312.06032  [pdf, other

    cs.AI cs.HC

    Evaluating the Utility of Model Explanations for Model Development

    Authors: Shawn Im, Jacob Andreas, Yilun Zhou

    Abstract: One of the motivations for explainable AI is to allow humans to make better and more informed decisions regarding the use and deployment of AI models. But careful evaluations are needed to assess whether this expectation has been fulfilled. Current evaluations mainly focus on algorithmic properties of explanations, and those that involve human subjects often employ subjective questions to test hum… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  46. arXiv:2311.17664  [pdf, ps, other

    cs.DB

    On the Convergence Rate of Linear Datalogo over Stable Semirings

    Authors: Sungjin Im, Benjamin Moseley, Hung Ngo, Kirk Pruhs

    Abstract: Datalogo is an extension of Datalog, where instead of a program being a collection of union of conjunctive queries over the standard Boolean semiring, a program may now be a collection of sum-product queries over an arbitrary commutative partially ordered pre-semiring. Datalogo is more powerful than Datalog in that its additional algebraic structure alows for supporting recursion with aggregation.… ▽ More

    Submitted 5 July, 2025; v1 submitted 29 November, 2023; originally announced November 2023.

  47. arXiv:2310.05366  [pdf, other

    cs.CV

    Rotation Matters: Generalized Monocular 3D Object Detection for Various Camera Systems

    Authors: SungHo Moon, JinWoo Bae, SungHoon Im

    Abstract: Research on monocular 3D object detection is being actively studied, and as a result, performance has been steadily improving. However, 3D object detection performance is significantly reduced when applied to a camera system different from the system used to capture the training datasets. For example, a 3D detector trained on datasets from a passenger car mostly fails to regress accurate 3D boundi… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to CVPRw 2023

  48. arXiv:2309.01409  [pdf, other

    cs.CV

    Implicit Neural Image Stitching

    Authors: Minsu Kim, Jaewon Lee, Byeonghun Lee, Sunghoon Im, Kyong Hwan Jin

    Abstract: Existing frameworks for image stitching often provide visually reasonable stitchings. However, they suffer from blurry artifacts and disparities in illumination, depth level, etc. Although the recent learning-based stitchings relax such disparities, the required methods impose sacrifice of image qualities failing to capture high-frequency details for stitched images. To address the problem, we pro… ▽ More

    Submitted 21 January, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

  49. arXiv:2309.00708  [pdf, ps, other

    math.QA cs.FL math-ph math.CT math.RT

    From finite state automata to tangle cobordisms: a TQFT journey from one to four dimensions

    Authors: Mee Seong Im, Mikhail Khovanov

    Abstract: This is a brief introduction to link homology theories that categorify Reshetikhin--Turaev $\mathsf{SL}(N)$-quantum link invariants. A recently discovered surprising connection between finite state automata and Boolean TQFTs in dimension one is explained as a warm-up.

    Submitted 17 September, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: 40 pages, many figures, corrected minor misprints. To appear in Contemporary Mathematics, conference proceedings "From Representation Theory to Mathematical Physics and Back"

    MSC Class: Primary: 57K16; 57K18; 57K45; 68Q45; 18M30; Secondary: 68Q70; 18B20

  50. arXiv:2309.00237  [pdf, other

    cs.CL cs.AI

    Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

    Authors: Sunjun Kweon, Junu Kim, Jiyoun Kim, Sujeong Im, Eunbyeol Cho, Seongsu Bae, Jungwoo Oh, Gyubok Lee, Jong Hak Moon, Seng Chan You, Seungjin Baek, Chang Hoon Han, Yoon Bin Jung, Yohan Jo, Edward Choi

    Abstract: The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train… ▽ More

    Submitted 29 July, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: ACL 2024 (Findings)