Skip to main content

Showing 1–50 of 257 results for author: Jung, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.07768  [pdf, ps, other

    cs.LG cs.CV

    TRIX- Trading Adversarial Fairness via Mixed Adversarial Training

    Authors: Tejaswini Medi, Steffen Jung, Margret Keuper

    Abstract: Adversarial Training (AT) is a widely adopted defense against adversarial examples. However, existing approaches typically apply a uniform training objective across all classes, overlooking disparities in class-wise vulnerability. This results in adversarial unfairness: classes with well distinguishable features (strong classes) tend to become more robust, while classes with overlapping or shared… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

  2. arXiv:2507.04006  [pdf, ps, other

    cs.CV

    Group-wise Scaling and Orthogonal Decomposition for Domain-Invariant Feature Extraction in Face Anti-Spoofing

    Authors: Seungjin Jung, Kanghee Lee, Yonghyun Jeong, Haeun Noh, Jungmin Lee, Jongwon Choi

    Abstract: Domain Generalizable Face Anti-Spoofing (DGFAS) methods effectively capture domain-invariant features by aligning the directions (weights) of local decision boundaries across domains. However, the bias terms associated with these boundaries remain misaligned, leading to inconsistent classification thresholds and degraded performance on unseen target domains. To address this issue, we propose a nov… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: Published at ICCV 2025. code is will be available at https://github.com/SeungjinJung/GD-FAS

  3. arXiv:2506.19325  [pdf, ps, other

    cs.AI

    FEAT: A Preference Feedback Dataset through a Cost-Effective Auto-Generation and Labeling Framework for English AI Tutoring

    Authors: Hyein Seo, Taewook Hwang, Yohan Lee, sangkeun Jung

    Abstract: In English education tutoring, teacher feedback is essential for guiding students. Recently, AI-based tutoring systems have emerged to assist teachers; however, these systems require high-quality and large-scale teacher feedback data, which is both time-consuming and costly to generate manually. In this study, we propose FEAT, a cost-effective framework for generating teacher feedback, and have co… ▽ More

    Submitted 26 June, 2025; v1 submitted 24 June, 2025; originally announced June 2025.

    Comments: ACL 2025 (Short)

  4. arXiv:2506.15138  [pdf, ps, other

    cs.CL cs.AI

    Thunder-Tok: Minimizing Tokens per Word in Tokenizing Korean Texts for Generative Language Models

    Authors: Gyeongje Cho, Yeonkyoun So, Chanwoo Park, Sangmin Lee, Sungmok Jung, Jaejin Lee

    Abstract: This paper introduces Thunder-Tok, a new Korean tokenizer designed to reduce token fertility without compromising model performance. Our approach uses a rule-based pre-tokenization method that aligns with the linguistic structure of the Korean language. We also create a seed vocabulary containing tokens that resemble linguistic units and employ a branching entropy-based selection algorithm. These… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  5. arXiv:2506.14397  [pdf, ps, other

    cs.CL

    Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding

    Authors: Yeonkyoung So, Gyuseong Lee, Sungmok Jung, Joonhak Lee, JiA Kang, Sangho Kim, Jaejin Lee

    Abstract: Negation is a fundamental linguistic phenomenon that poses persistent challenges for Large Language Models (LLMs), particularly in tasks requiring deep semantic understanding. Existing benchmarks often treat negation as a side case within broader tasks like natural language inference, resulting in a lack of benchmarks that exclusively target negation understanding. In this work, we introduce Thund… ▽ More

    Submitted 17 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

  6. arXiv:2506.08475  [pdf, ps, other

    cs.LG cs.CE math.NA

    Thermodynamically Consistent Latent Dynamics Identification for Parametric Systems

    Authors: Xiaolong He, Yeonjong Shin, Anthony Gruber, Sohyeon Jung, Kookjin Lee, Youngsoo Choi

    Abstract: We propose an efficient thermodynamics-informed latent space dynamics identification (tLaSDI) framework for the reduced-order modeling of parametric nonlinear dynamical systems. This framework integrates autoencoders for dimensionality reduction with newly developed parametric GENERIC formalism-informed neural networks (pGFINNs), which enable efficient learning of parametric latent dynamics while… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  7. arXiv:2505.24023  [pdf, ps, other

    cs.CV cs.AI

    Multi-Group Proportional Representation for Text-to-Image Models

    Authors: Sangwon Jung, Alex Oesterling, Claudio Mayrink Verdun, Sajani Vithana, Taesup Moon, Flavio P. Calmon

    Abstract: Text-to-image (T2I) generative models can create vivid, realistic images from textual descriptions. As these models proliferate, they expose new concerns about their ability to represent diverse demographic groups, propagate stereotypes, and efface minority populations. Despite growing attention to the "safe" and "responsible" design of artificial intelligence (AI), there is no established methodo… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  8. arXiv:2505.16297  [pdf, other

    cs.CL

    ToDi: Token-wise Distillation via Fine-Grained Divergence Control

    Authors: Seongryong Jung, Suwan Yoon, DongGeon Kim, Hwanhee Lee

    Abstract: Large language models (LLMs) offer impressive performance but are impractical for resource-constrained deployment due to high latency and energy consumption. Knowledge distillation (KD) addresses this by transferring knowledge from a large teacher to a smaller student model. However, conventional KD, notably approaches like Forward KL (FKL) and Reverse KL (RKL), apply uniform divergence loss acros… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 13 pages, 7 figures

  9. arXiv:2505.12486  [pdf, ps, other

    cs.CV

    Guiding Diffusion with Deep Geometric Moments: Balancing Fidelity and Variation

    Authors: Sangmin Jung, Utkarsh Nath, Yezhou Yang, Giulia Pedrielli, Joydeep Biswas, Amy Zhang, Hassan Ghasemzadeh, Pavan Turaga

    Abstract: Text-to-image generation models have achieved remarkable capabilities in synthesizing images, but often struggle to provide fine-grained control over the output. Existing guidance approaches, such as segmentation maps and depth maps, introduce spatial rigidity that restricts the inherent diversity of diffusion models. In this work, we introduce Deep Geometric Moments (DGM) as a novel form of guida… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: Accepted in CVPR Workshop GMCV 2025

  10. arXiv:2505.11152  [pdf, ps, other

    cs.CV

    Learning Dense Hand Contact Estimation from Imbalanced Data

    Authors: Daniel Sungho Jung, Kyoung Mu Lee

    Abstract: Hands are essential to human interaction, and understanding contact between hands and the world can promote comprehensive understanding of their function. Recently, there have been growing number of hand interaction datasets that cover interaction with object, other hand, scene, and body. Despite the significance of the task and increasing high-quality data, how to effectively learn dense hand con… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Project page: http://haco-release.github.io

  11. arXiv:2505.03569  [pdf, other

    cs.CV

    Corner Cases: How Size and Position of Objects Challenge ImageNet-Trained Models

    Authors: Mishal Fatima, Steffen Jung, Margret Keuper

    Abstract: Backgrounds in images play a major role in contributing to spurious correlations among different data points. Owing to aesthetic preferences of humans capturing the images, datasets can exhibit positional (location of the object within a given frame) and size (region-of-interest to image ratio) biases for different classes. In this paper, we show that these biases can impact how much a model relie… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  12. arXiv:2505.03470  [pdf, other

    cs.CV cs.AI cs.CG cs.LG

    Blending 3D Geometry and Machine Learning for Multi-View Stereopsis

    Authors: Vibhas Vats, Md. Alimoor Reza, David Crandall, Soon-heung Jung

    Abstract: Traditional multi-view stereo (MVS) methods primarily depend on photometric and geometric consistency constraints. In contrast, modern learning-based algorithms often rely on the plane sweep algorithm to infer 3D geometry, applying explicit geometric consistency (GC) checks only as a post-processing step, with no impact on the learning process itself. In this work, we introduce GC MVSNet plus plus… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: A pre-print -- paper under-review. arXiv admin note: substantial text overlap with arXiv:2310.19583

  13. arXiv:2504.15743  [pdf, other

    cs.HC cs.AI cs.LG

    iMedic: Towards Smartphone-based Self-Auscultation Tool for AI-Powered Pediatric Respiratory Assessment

    Authors: Seung Gyu Jeong, Sung Woo Nam, Seong Kwan Jung, Seong-Eun Kim

    Abstract: Respiratory auscultation is crucial for early detection of pediatric pneumonia, a condition that can quickly worsen without timely intervention. In areas with limited physician access, effective auscultation is challenging. We present a smartphone-based system that leverages built-in microphones and advanced deep learning algorithms to detect abnormal respiratory sounds indicative of pneumonia ris… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  14. arXiv:2504.10831  [pdf, other

    cs.AI cs.RO

    Hallucination-Aware Generative Pretrained Transformer for Cooperative Aerial Mobility Control

    Authors: Hyojun Ahn, Seungcheol Oh, Gyu Seon Kim, Soyi Jung, Soohyun Park, Joongheon Kim

    Abstract: This paper proposes SafeGPT, a two-tiered framework that integrates generative pretrained transformers (GPTs) with reinforcement learning (RL) for efficient and reliable unmanned aerial vehicle (UAV) last-mile deliveries. In the proposed design, a Global GPT module assigns high-level tasks such as sector allocation, while an On-Device GPT manages real-time local route planning. An RL-based safety… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    MSC Class: 68T05

  15. arXiv:2504.09859  [pdf, other

    cs.HC

    Can VLMs Assess Similarity Between Graph Visualizations?

    Authors: Seokweon Jung, Hyeon Jeon, Jeongmin Rhee, Jinwook Seo

    Abstract: Graph visualizations have been studied for tasks such as clustering and temporal analysis, but how these visual similarities relate to established graph similarity measures remains unclear. In this paper, we explore the potential of Vision Language Models (VLMs) to approximate human-like perception of graph similarity. We generate graph datasets of various sizes and densities and compare VLM-deriv… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  16. arXiv:2504.06634  [pdf, other

    cs.CV

    Crafting Query-Aware Selective Attention for Single Image Super-Resolution

    Authors: Junyoung Kim, Youngrok Kim, Siyeol Jung, Donghyun Min

    Abstract: Single Image Super-Resolution (SISR) reconstructs high-resolution images from low-resolution inputs, enhancing image details. While Vision Transformer (ViT)-based models improve SISR by capturing long-range dependencies, they suffer from quadratic computational costs or employ selective attention mechanisms that do not explicitly focus on query-relevant regions. Despite these advancements, prior w… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 10 pages, 5 figures, 4 tables

  17. arXiv:2504.02882  [pdf, other

    cs.CL cs.LG

    DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

    Authors: Sunghee Jung, Donghun Lee, Shinbok Lee, Gaeun Seo, Daniel Lee, Byeongil Ko, Junrae Cho, Kihyun Kim, Eunggyun Kim, Myeongcheol Shin

    Abstract: Tool-Augmented Larage Language Models (TA-LLMs) have shown promise in real-world applications, but face challenges in handling incomplete queries and out-of-scope requests. While existing approaches rely mainly on Supervised Fine-Tuning with expert trajectories, we propose DiaTool-DPO, a novel method that enhances TA-LLM's dialogue capabilities through Direct Preference Optimization. We model TA-L… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  18. arXiv:2503.18603  [pdf, other

    cs.CL

    LANGALIGN: Enhancing Non-English Language Models via Cross-Lingual Embedding Alignment

    Authors: Jong Myoung Kim, Young-Jun Lee, Ho-Jin Choi, Sangkeun Jung

    Abstract: While Large Language Models have gained attention, many service developers still rely on embedding-based models due to practical constraints. In such cases, the quality of fine-tuning data directly impacts performance, and English datasets are often used as seed data for training non-English models. In this study, we propose LANGALIGN, which enhances target language processing by aligning English… ▽ More

    Submitted 25 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: now preparing

  19. arXiv:2503.18250  [pdf, other

    cs.CL

    PAD: Towards Efficient Data Generation for Transfer Learning Using Phrase Alignment

    Authors: Jong Myoung Kim, Young-Jun_Lee, Ho-Jin Choi, Sangkeun Jung

    Abstract: Transfer learning leverages the abundance of English data to address the scarcity of resources in modeling non-English languages, such as Korean. In this study, we explore the potential of Phrase Aligned Data (PAD) from standardized Statistical Machine Translation (SMT) to enhance the efficiency of transfer learning. Through extensive experiments, we demonstrate that PAD synergizes effectively wit… ▽ More

    Submitted 25 March, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

    Comments: Preparing for conference

  20. arXiv:2503.15769  [pdf, other

    cs.DC cs.LG eess.SY

    Prediction of Permissioned Blockchain Performance for Resource Scaling Configurations

    Authors: Seungwoo Jung, Yeonho Yoo, Gyeongsik Yang, Chuck Yoo

    Abstract: Blockchain is increasingly offered as blockchain-as-a-service (BaaS) by cloud service providers. However, configuring BaaS appropriately for optimal performance and reliability resorts to try-and-error. A key challenge is that BaaS is often perceived as a ``black-box,'' leading to uncertainties in performance and resource provisioning. Previous studies attempted to address this challenge; however,… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Journal ref: ICT Express, Volume 10, Issue 6, December 2024, Pages 1253-1258

  21. arXiv:2503.09993  [pdf, other

    cs.CV

    Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes

    Authors: JunYong Choi, Min-Cheol Sagong, SeokYeong Lee, Seung-Won Jung, Ig-Jae Kim, Junghyun Cho

    Abstract: We propose a diffusion-based inverse rendering framework that decomposes a single RGB image into geometry, material, and lighting. Inverse rendering is inherently ill-posed, making it difficult to predict a single accurate solution. To address this challenge, recent generative model-based methods aim to present a range of possible solutions. However, finding a single accurate solution and generati… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025

  22. arXiv:2503.09906  [pdf, other

    eess.AS cs.SD

    ValSub: Subsampling Validation Data to Mitigate Forgetting during ASR Personalization

    Authors: Haaris Mehmood, Karthikeyan Saravanan, Pablo Peso Parada, David Tuckey, Mete Ozay, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: Automatic Speech Recognition (ASR) is widely used within consumer devices such as mobile phones. Recently, personalization or on-device model fine-tuning has shown that adaptation of ASR models towards target user speech improves their performance over rare words or accented speech. Despite these gains, fine-tuning on user data (target domain) risks the personalized model to forget knowledge about… ▽ More

    Submitted 7 April, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted at ICASSP 2025

  23. arXiv:2503.09361  [pdf, other

    cs.CV cs.SI

    Deep Learning for Climate Action: Computer Vision Analysis of Visual Narratives on X

    Authors: Katharina Prasse, Marcel Kleinmann, Inken Adam, Kerstin Beckersjuergen, Andreas Edte, Jona Frroku, Timotheus Gumpp, Steffen Jung, Isaac Bravo, Stefanie Walter, Margret Keuper

    Abstract: Climate change is one of the most pressing challenges of the 21st century, sparking widespread discourse across social media platforms. Activists, policymakers, and researchers seek to understand public sentiment and narratives while access to social media data has become increasingly restricted in the post-API era. In this study, we analyze a dataset of climate change-related tweets from X (forme… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  24. arXiv:2503.05488  [pdf, ps, other

    cs.CL

    KIEval: Evaluation Metric for Document Key Information Extraction

    Authors: Minsoo Khang, Sang Chul Jung, Sungrae Park, Teakgyu Hong

    Abstract: Document Key Information Extraction (KIE) is a technology that transforms valuable information in document images into structured data, and it has become an essential function in industrial settings. However, current evaluation metrics of this technology do not accurately reflect the critical attributes of its industrial applications. In this paper, we present KIEval, a novel application-centric e… ▽ More

    Submitted 26 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

  25. arXiv:2502.20427  [pdf, other

    cs.CR cs.AI cs.SD eess.AS

    DeePen: Penetration Testing for Audio Deepfake Detection

    Authors: Nicolas Müller, Piotr Kawa, Adriana Stan, Thien-Phuc Doan, Souhwan Jung, Wei Herng Choong, Philip Sperl, Konstantin Böttinger

    Abstract: Deepfakes - manipulated or forged audio and video media - pose significant security risks to individuals, organizations, and society at large. To address these challenges, machine learning-based classifiers are commonly employed to detect deepfake content. In this paper, we assess the robustness of such classifiers through a systematic penetration testing methodology, which we introduce as DeePen.… ▽ More

    Submitted 5 March, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

  26. arXiv:2502.18934  [pdf, other

    cs.CL cs.LG

    Kanana: Compute-efficient Bilingual Language Models

    Authors: Kanana LLM Team, Yunju Bak, Hojin Lee, Minho Ryu, Jiyeon Ham, Seungjae Jung, Daniel Wontae Nam, Taegyeong Eo, Donghun Lee, Doohae Jung, Boseop Kim, Nayeon Kim, Jaesun Park, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Kyoung-Woon On, Seulye Baeg, Junrae Cho, Sunghee Jung, Jieun Kang, EungGyun Kim, Eunhwa Kim, Byeongil Ko, Daniel Lee , et al. (4 additional authors not shown)

    Abstract: We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality dat… ▽ More

    Submitted 28 February, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: 40 pages, 15 figures

  27. arXiv:2502.18744  [pdf, ps, other

    cs.AI cs.CL

    ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction

    Authors: Jeesu Jung, Chanjun Park, Sangkeun Jung

    Abstract: Recent efforts in LLM alignment have focused on constructing large-scale preference datasets via human or Artificial Intelligence (AI) annotators. However, such approaches rely on instance-wise supervision, incurring substantial annotation cost and limited interpretability. In this paper, we propose ZEBRA - a model behavior-wise zero-annotation framework that constructs preference data by leveragi… ▽ More

    Submitted 2 June, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

    Comments: 16 pages,7 figures,5 tables,4 graphs

  28. arXiv:2502.16391  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    Subspace Recovery in Winsorized PCA: Insights into Accuracy and Robustness

    Authors: Sangil Han, Kyoowon Kim, Sungkyu Jung

    Abstract: In this paper, we explore the theoretical properties of subspace recovery using Winsorized Principal Component Analysis (WPCA), utilizing a common data transformation technique that caps extreme values to mitigate the impact of outliers. Despite the widespread use of winsorization in various tasks of multivariate analysis, its theoretical properties, particularly for subspace recovery, have receiv… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  29. arXiv:2502.11969  [pdf, other

    cs.AI cs.CV cs.LG

    Learning Generalizable Prompt for CLIP with Class Similarity Knowledge

    Authors: Sehun Jung, Hyang-won Lee

    Abstract: In vision-language models (VLMs), prompt tuning has shown its effectiveness in adapting models to downstream tasks. However, learned prompts struggle to generalize to unseen classes, as they tend to overfit to the classes that are targeted during prompt tuning. Examining failure cases, we observed that learned prompts disrupt the semantics of unseen classes, generating text embeddings with incorre… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  30. arXiv:2502.07703  [pdf, other

    cs.RO

    GaRLIO: Gravity enhanced Radar-LiDAR-Inertial Odometry

    Authors: Chiyun Noh, Wooseong Yang, Minwoo Jung, Sangwoo Jung, Ayoung Kim

    Abstract: Recently, gravity has been highlighted as a crucial constraint for state estimation to alleviate potential vertical drift. Existing online gravity estimation methods rely on pose estimation combined with IMU measurements, which is considered best practice when direct velocity measurements are unavailable. However, with radar sensors providing direct velocity data-a measurement not yet utilized for… ▽ More

    Submitted 21 February, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  31. arXiv:2502.07380  [pdf, other

    cs.RO

    Demonstrating Wheeled Lab: Modern Sim2Real for Low-cost, Open-source Wheeled Robotics

    Authors: Tyler Han, Preet Shah, Sidharth Rajagopal, Yanda Bao, Sanghun Jung, Sidharth Talia, Gabriel Guo, Bryan Xu, Bhaumik Mehta, Emma Romig, Rosario Scalise, Byron Boots

    Abstract: Simulation has been pivotal in recent robotics milestones and is poised to play a prominent role in the field's future. However, recent robotic advances often rely on expensive and high-maintenance platforms, limiting access to broader robotics audiences. This work introduces Wheeled Lab, a framework for the low-cost, open-source wheeled platforms that are already widely established in education a… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: Under Review

  32. arXiv:2502.07131  [pdf, other

    cs.CL q-fin.CP

    TWICE: What Advantages Can Low-Resource Domain-Specific Embedding Model Bring? -- A Case Study on Korea Financial Texts

    Authors: Yewon Hwang, Sungbum Jung, Hanwool Lee, Sara Yu

    Abstract: Domain specificity of embedding models is critical for effective performance. However, existing benchmarks, such as FinMTEB, are primarily designed for high-resource languages, leaving low-resource settings, such as Korean, under-explored. Directly translating established English benchmarks often fails to capture the linguistic and cultural nuances present in low-resource domains. In this paper, t… ▽ More

    Submitted 1 April, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted at FinancialAI@ICLR 2025

  33. arXiv:2502.06822  [pdf, other

    cs.LG cs.CL cs.GR

    DiffListener: Discrete Diffusion Model for Listener Generation

    Authors: Siyeol Jung, Taehwan Kim

    Abstract: The listener head generation (LHG) task aims to generate natural nonverbal listener responses based on the speaker's multimodal cues. While prior work either rely on limited modalities (e.g. audio and facial information) or employ autoregressive approaches which have limitations such as accumulating prediction errors. To address these limitations, we propose DiffListener, a discrete diffusion base… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    Comments: Accepted at ICASSP 2025

  34. arXiv:2502.03132  [pdf, other

    cs.RO eess.SY

    SPARK: A Modular Benchmark for Humanoid Robot Safety

    Authors: Yifan Sun, Rui Chen, Kai S. Yun, Yikuan Fang, Sebin Jung, Feihan Li, Bowei Li, Weiye Zhao, Changliu Liu

    Abstract: This paper introduces the Safe Protective and Assistive Robot Kit (SPARK), a comprehensive benchmark designed to ensure safety in humanoid autonomy and teleoperation. Humanoid robots pose significant safety risks due to their physical capabilities of interacting with complex environments. The physical structures of humanoid robots further add complexity to the design of general safety solutions. T… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  35. arXiv:2502.01946  [pdf, other

    cs.RO cs.CV

    HeRCULES: Heterogeneous Radar Dataset in Complex Urban Environment for Multi-session Radar SLAM

    Authors: Hanjun Kim, Minwoo Jung, Chiyun Noh, Sangwoo Jung, Hyunho Song, Wooseong Yang, Hyesu Jang, Ayoung Kim

    Abstract: Recently, radars have been widely featured in robotics for their robustness in challenging weather conditions. Two commonly used radar types are spinning radars and phased-array radars, each offering distinct sensor characteristics. Existing datasets typically feature only a single type of radar, leading to the development of algorithms limited to that specific kind. In this work, we highlight tha… ▽ More

    Submitted 21 February, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: 2025 IEEE International Conference on Robotics and Automation (ICRA 2025)

  36. arXiv:2501.18943  [pdf, other

    cs.RO

    HeLiOS: Heterogeneous LiDAR Place Recognition via Overlap-based Learning and Local Spherical Transformer

    Authors: Minwoo Jung, Sangwoo Jung, Hyeonjae Gil, Ayoung Kim

    Abstract: LiDAR place recognition is a crucial module in localization that matches the current location with previously observed environments. Most existing approaches in LiDAR place recognition dominantly focus on the spinning type LiDAR to exploit its large FOV for matching. However, with the recent emergence of various LiDAR types, the importance of matching data across different LiDAR types has grown si… ▽ More

    Submitted 6 February, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: 8 pages, 7 figures, 5 table

  37. arXiv:2501.17171  [pdf

    cs.CV cs.AI cs.LG eess.IV

    Separated Inter/Intra-Modal Fusion Prompts for Compositional Zero-Shot Learning

    Authors: Sua Jung

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize subtle differences in meaning or the combination of states and objects through the use of known and unknown concepts during training. Existing methods either focused on prompt configuration or on using prompts to tune the pre-trained Vision-Language model. However, these methods faced challenges in accurately identifying subtle differences… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: AIAP 2025

    Journal ref: Published at AIAP 2025

  38. arXiv:2501.13944  [pdf, other

    cs.CL cs.AI

    Fanar: An Arabic-Centric Multimodal Generative AI Platform

    Authors: Fanar Team, Ummar Abbas, Mohammad Shahmeer Ahmad, Firoj Alam, Enes Altinisik, Ehsannedin Asgari, Yazan Boshmaf, Sabri Boughorbel, Sanjay Chawla, Shammur Chowdhury, Fahim Dalvi, Kareem Darwish, Nadir Durrani, Mohamed Elfeky, Ahmed Elmagarmid, Mohamed Eltabakh, Masoomali Fatehkia, Anastasios Fragkopoulos, Maram Hasanain, Majd Hawasly, Mus'ab Husaini, Soon-Gyo Jung, Ji Kim Lucas, Walid Magdy, Safa Messaoud , et al. (17 additional authors not shown)

    Abstract: We present Fanar, a platform for Arabic-centric multimodal generative AI systems, that supports language, speech and image generation tasks. At the heart of Fanar are Fanar Star and Fanar Prime, two highly capable Arabic Large Language Models (LLMs) that are best in the class on well established benchmarks for similar sized models. Fanar Star is a 7B (billion) parameter model that was trained from… ▽ More

    Submitted 18 January, 2025; originally announced January 2025.

    ACM Class: I.2.0; D.2.0

  39. arXiv:2501.09113  [pdf, other

    eess.AS cs.SD

    persoDA: Personalized Data Augmentation for Personalized ASR

    Authors: Pablo Peso Parada, Spyros Fontalis, Md Asif Jalal, Karthikeyan Saravanan, Anastasios Drosou, Mete Ozay, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: Data augmentation (DA) is ubiquitously used in training of Automatic Speech Recognition (ASR) models. DA offers increased data variability, robustness and generalization against different acoustic distortions. Recently, personalization of ASR models on mobile devices has been shown to improve Word Error Rate (WER). This paper evaluates data augmentation in this context and proposes persoDA; a DA m… ▽ More

    Submitted 17 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

    Comments: ICASSP'25-Copyright 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  40. arXiv:2412.06936  [pdf, other

    cs.CY cs.AI cs.LG

    Creating a Cooperative AI Policymaking Platform through Open Source Collaboration

    Authors: Aiden Lewington, Alekhya Vittalam, Anshumaan Singh, Anuja Uppuluri, Arjun Ashok, Ashrith Mandayam Athmaram, Austin Milt, Benjamin Smith, Charlie Weinberger, Chatanya Sarin, Christoph Bergmeir, Cliff Chang, Daivik Patel, Daniel Li, David Bell, Defu Cao, Donghwa Shin, Edward Kang, Edwin Zhang, Enhui Li, Felix Chen, Gabe Smithline, Haipeng Chen, Henry Gasztowtt, Hoon Shin , et al. (26 additional authors not shown)

    Abstract: Advances in artificial intelligence (AI) present significant risks and opportunities, requiring improved governance to mitigate societal harms and promote equitable benefits. Current incentive structures and regulatory delays may hinder responsible AI development and deployment, particularly in light of the transformative potential of large language models (LLMs). To address these challenges, we p… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  41. Deformation-Aware Segmentation Network Robust to Motion Artifacts for Brain Tissue Segmentation using Disentanglement Learning

    Authors: Sunyoung Jung, Yoonseok Choi, Mohammed A. Al-masni, Minyoung Jung, Dong-Hyun Kim

    Abstract: Motion artifacts caused by prolonged acquisition time are a significant challenge in Magnetic Resonance Imaging (MRI), hindering accurate tissue segmentation. These artifacts appear as blurred images that mimic tissue-like appearances, making segmentation difficult. This study proposes a novel deep learning framework that demonstrates superior performance in both motion correction and robust brain… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: Medical Image Computing and Computer Assisted Intervention, MICCAI 2024

    Journal ref: Medical Image Computing and Computer Assisted Intervention MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15009. Springer, Cham

  42. arXiv:2412.03887  [pdf, other

    cs.RO cs.CV

    MOANA: Multi-Radar Dataset for Maritime Odometry and Autonomous Navigation Application

    Authors: Hyesu Jang, Wooseong Yang, Hanguen Kim, Dongje Lee, Yongjin Kim, Jinbum Park, Minsoo Jeon, Jaeseong Koh, Yejin Kang, Minwoo Jung, Sangwoo Jung, Chng Zhen Hao, Wong Yu Hin, Chew Yihang, Ayoung Kim

    Abstract: Maritime environmental sensing requires overcoming challenges from complex conditions such as harsh weather, platform perturbations, large dynamic objects, and the requirement for long detection ranges. While cameras and LiDAR are commonly used in ground vehicle navigation, their applicability in maritime settings is limited by range constraints and hardware maintenance issues. Radar sensors, howe… ▽ More

    Submitted 12 May, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: 10 pages, 9 figures, 3 tables

  43. arXiv:2412.01340  [pdf, other

    cs.CL

    A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls

    Authors: Sheikh Shafayat, Dongkeun Yoon, Woori Jang, Jiwoo Choi, Alice Oh, Seohyon Jung

    Abstract: In this work, we propose and evaluate the feasibility of a two-stage pipeline to evaluate literary machine translation, in a fine-grained manner, from English to Korean. The results show that our framework provides fine-grained, interpretable metrics suited for literary translation and obtains a higher correlation with human judgment than traditional machine translation metrics. Nonetheless, it st… ▽ More

    Submitted 1 January, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  44. arXiv:2411.14770  [pdf, other

    cs.RO

    Aim My Robot: Precision Local Navigation to Any Object

    Authors: Xiangyun Meng, Xuning Yang, Sanghun Jung, Fabio Ramos, Srid Sadhan Jujjavarapu, Sanjoy Paul, Dieter Fox

    Abstract: Existing navigation systems mostly consider "success" when the robot reaches within 1m radius to a goal. This precision is insufficient for emerging applications where the robot needs to be positioned precisely relative to an object for downstream tasks, such as docking, inspection, and manipulation. To this end, we design and implement Aim-My-Robot (AMR), a local navigation system that enables a… ▽ More

    Submitted 27 December, 2024; v1 submitted 22 November, 2024; originally announced November 2024.

  45. arXiv:2411.14054  [pdf, other

    cs.CL cs.AI

    FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use Dialogs

    Authors: Shinbok Lee, Gaeun Seo, Daniel Lee, Byeongil Ko, Sunghee Jung, Myeongcheol Shin

    Abstract: This study investigates language models' generative capabilities in tool-use dialogs. We categorize the models' outputs in tool-use dialogs into four distinct types: Tool Call, Answer Completion, Slot Question, and Relevance Detection, which serve as aspects for evaluation. We introduce FunctionChat-Bench, comprising 700 evaluation items and automated assessment programs. Using this benchmark, we… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: 8 pages

  46. arXiv:2411.05114  [pdf

    cs.HC

    STEM: Soft Tactile Electromagnetic Actuator for Virtual Environment Interactions

    Authors: Heeju Mun, Seunggyeom Jung, Seung Mo Jeong, David Santiago Diaz Cortes, Ki-Uk Kyung

    Abstract: The research aims to expand tactile feedback beyond vibrations to various modes of stimuli, such as indentation, vibration, among others. By incorporating soft material into the design of a novel tactile actuator, we can achieve multi-modality and enhance the device's wearability, which encompasses compliance, safety, and portability. The proposed tactile device can elevate the presence and immers… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: Part of proceedings of 6th International Conference AsiaHaptics 2024

  47. arXiv:2410.23142  [pdf, ps, other

    cs.LG cs.CV

    FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training

    Authors: Tejaswini Medi, Steffen Jung, Margret Keuper

    Abstract: Deep neural networks are susceptible to adversarial attacks and common corruptions, which undermine their robustness. In order to enhance model resilience against such challenges, Adversarial Training (AT) has emerged as a prominent solution. Nevertheless, adversarial robustness is often attained at the expense of model fairness during AT, i.e., disparity in class-wise robustness of the model. Whi… ▽ More

    Submitted 14 June, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

  48. arXiv:2410.07147  [pdf, other

    cs.CL cs.AI cs.CY

    Taking a turn for the better: Conversation redirection throughout the course of mental-health therapy

    Authors: Vivian Nguyen, Sang Min Jung, Lillian Lee, Thomas D. Hull, Cristian Danescu-Niculescu-Mizil

    Abstract: Mental-health therapy involves a complex conversation flow in which patients and therapists continuously negotiate what should be talked about next. For example, therapists might try to shift the conversation's direction to keep the therapeutic process on track and avoid stagnation, or patients might push the discussion towards issues they want to focus on. How do such patient and therapist redi… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: To appear in the Proceedings of EMNLP (Findings) 2024. Code available at https://convokit.cornell.edu

  49. arXiv:2410.03105  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Mamba in Vision: A Comprehensive Survey of Techniques and Applications

    Authors: Md Maklachur Rahman, Abdullah Aman Tutul, Ankur Nath, Lamyanba Laishram, Soon Ki Jung, Tracy Hammond

    Abstract: Mamba is emerging as a novel approach to overcome the challenges faced by Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in computer vision. While CNNs excel at extracting local features, they often struggle to capture long-range dependencies without complex architectural modifications. In contrast, ViTs effectively model global relationships but suffer from high computational… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: Under Review

  50. arXiv:2409.05902  [pdf, ps, other

    cs.LG cs.AR cs.CL

    OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models

    Authors: Jahyun Koo, Dahoon Park, Sangwoo Jung, Jaeha Kung

    Abstract: To overcome the burden on the memory size and bandwidth due to ever-increasing size of large language models (LLMs), aggressive weight quantization has been recently studied, while lacking research on quantizing activations. In this paper, we present a hardware-software co-design method that results in an energy-efficient LLM accelerator, named OPAL, for generation tasks. First of all, a novel act… ▽ More

    Submitted 24 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

    Comments: 7 pages, 8 figures, DAC2024 accepted