Skip to main content

Showing 1–50 of 166 results for author: Ahn, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05556  [pdf, ps, other

    cs.AR cs.CR

    Per-Row Activation Counting on Real Hardware: Demystifying Performance Overheads

    Authors: Jumin Kim, Seungmin Baek, Minbok Wi, Hwayong Nam, Michael Jaemin Kim, Sukhan Lee, Kyomin Sohn, Jung Ho Ahn

    Abstract: Per-Row Activation Counting (PRAC), a DRAM read disturbance mitigation method, modifies key DRAM timing parameters, reportedly causing significant performance overheads in simulator-based studies. However, given known discrepancies between simulators and real hardware, real-machine experiments are vital for accurate PRAC performance estimation. We present the first real-machine performance analysi… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 4 pages, 4 figures, to appear at IEEE Computer Architecture Letters

  2. arXiv:2507.05234  [pdf, ps, other

    cs.PL cs.SE

    React-tRace: A Semantics for Understanding React Hooks

    Authors: Jay Lee, Joongwon Ahn, Kwangkeun Yi

    Abstract: React has become the most widely used web front-end framework, enabling the creation of user interfaces in a declarative and compositional manner. Hooks are a set of APIs that manage side effects in functional components in React. However, their semantics are often seen as opaque to developers, leading to UI bugs. In this paper, we formalize the semantics of the essence of React Hooks we name Reac… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Conditionally accepted to OOPSLA 2025

  3. arXiv:2506.15918  [pdf, ps, other

    cs.CR cs.AR

    Sudoku: Decomposing DRAM Address Mapping into Component Functions

    Authors: Minbok Wi, Seungmin Baek, Seonyong Park, Mattan Erez, Jung Ho Ahn

    Abstract: Decomposing DRAM address mappings into component-level functions is critical for understanding memory behavior and enabling precise RowHammer attacks, yet existing reverse-engineering methods fall short. We introduce novel timing-based techniques leveraging DRAM refresh intervals and consecutive access latencies to infer component-specific functions. Based on this, we present Sudoku, the first sof… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 6 pages, 6 figures, 2 tables, DRAMSec 2025

  4. arXiv:2506.14107  [pdf, ps, other

    cs.DC cs.CV

    Déjà Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation Reuse

    Authors: Jinwoo Hwang, Daeun Kim, Sangyeop Lee, Yoonsung Kim, Guseul Heo, Hojoon Kim, Yunseok Jeong, Tadiwos Meaza, Eunhyeok Park, Jeongseob Ahn, Jongse Park

    Abstract: Recently, Video-Language Models (VideoLMs) have demonstrated remarkable capabilities, offering significant potential for flexible and powerful video query systems. These models typically rely on Vision Transformers (ViTs), which process video frames individually to extract visual embeddings. However, generating embeddings for large-scale videos requires ViT inferencing across numerous frames, posi… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted to 2025 VLDB

  5. arXiv:2506.03610  [pdf, other

    cs.AI

    Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

    Authors: Dongmin Park, Minkyu Kim, Beongjun Choi, Junhyuck Kim, Keon Lee, Jonghyun Lee, Inkyu Park, Byeong-Uk Lee, Jaeyoung Hwang, Jaewoo Ahn, Ameya S. Mahabaleshwarkar, Bilal Kartal, Pritam Biswas, Yoshi Suhara, Kangwook Lee, Jaewoong Cho

    Abstract: Large Language Model (LLM) agents are reshaping the game industry, particularly with more intelligent and human-preferable game characters. However, existing game benchmarks fall short of practical needs: they lack evaluations of diverse LLM capabilities across various game genres, studies of agentic modules crucial for complex gameplay, and fine-tuning datasets for aligning pre-trained LLMs into… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  6. arXiv:2506.02214  [pdf

    cs.SE cs.CV

    Is PMBOK Guide the Right Fit for AI? Re-evaluating Project Management in the Face of Artificial Intelligence Projects

    Authors: Alexey Burdakov, Max Jaihyun Ahn

    Abstract: This paper critically evaluates the applicability of the Project Management Body of Knowledge (PMBOK) Guide framework to Artificial Intelligence (AI) software projects, highlighting key limitations and proposing tailored adaptations. Unlike traditional projects, AI initiatives rely heavily on complex data, iterative experimentation, and specialized expertise while navigating significant ethical co… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 9 pages, 1 figure

    ACM Class: D.2.9; I.4

  7. arXiv:2505.22943  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.LG cs.SD

    Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates

    Authors: Jaewoo Ahn, Heeseung Yun, Dayoon Ko, Gunhee Kim

    Abstract: While pre-trained multimodal representations (e.g., CLIP) have shown impressive capabilities, they exhibit significant compositional vulnerabilities leading to counterintuitive judgments. We introduce Multimodal Adversarial Compositionality (MAC), a benchmark that leverages large language models (LLMs) to generate deceptive text samples to exploit these vulnerabilities across different modalities… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: ACL 2025 Main. Code is released at https://vision.snu.ac.kr/projects/mac

  8. arXiv:2505.17993  [pdf, ps, other

    math.CO cs.CC cs.DM

    Finding d-Cuts in Claw-free Graphs

    Authors: Jungho Ahn, Tala Eagling-Vose, Felicia Lucke, Daniël Paulusma, Siani Smith

    Abstract: The Matching Cut problem is to decide if the vertex set of a connected graph can be partitioned into two non-empty sets $B$ and $R$ such that the edges between $B$ and $R$ form a matching, that is, every vertex in $B$ has at most one neighbour in $R$, and vice versa. If for some integer $d\geq 1$, we allow every neighbour in $B$ to have at most $d$ neighbours in $R$, and vice versa, we obtain the… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  9. Cosmos: A CXL-Based Full In-Memory System for Approximate Nearest Neighbor Search

    Authors: Seoyoung Ko, Hyunjeong Shim, Wanju Doh, Sungmin Yun, Jinin So, Yongsuk Kwon, Sang-Soo Park, Si-Dong Roh, Minyong Yoon, Taeksang Song, Jung Ho Ahn

    Abstract: Retrieval-Augmented Generation (RAG) is crucial for improving the quality of large language models by injecting proper contexts extracted from external sources. RAG requires high-throughput, low-latency Approximate Nearest Neighbor Search (ANNS) over billion-scale vector databases. Conventional DRAM/SSD solutions face capacity/latency limits, whereas specialized hardware or RDMA clusters lack flex… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 4 pages, 5 figures, to appear at IEEE Computer Architecture Letters

  10. arXiv:2504.05615  [pdf, other

    cs.LG cs.AI

    FedEFC: Federated Learning Using Enhanced Forward Correction Against Noisy Labels

    Authors: Seunghun Yu, Jin-Hyun Ahn, Joonhyuk Kang

    Abstract: Federated Learning (FL) is a powerful framework for privacy-preserving distributed learning. It enables multiple clients to collaboratively train a global model without sharing raw data. However, handling noisy labels in FL remains a major challenge due to heterogeneous data distributions and communication constraints, which can severely degrade model performance. To address this issue, we propose… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: 9 pages, 3 figures

  11. arXiv:2504.01282  [pdf, other

    cs.CL

    Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing

    Authors: Jihyun Janice Ahn, Wenpeng Yin

    Abstract: While the inconsistency of LLMs is not a novel topic, prior research has predominantly addressed two types of generative inconsistencies: i) Randomness Inconsistency: running the same LLM multiple trials, yielding varying responses; ii) Paraphrase Inconsistency: paraphrased prompts result in different responses from the same LLM. Randomness Inconsistency arises from the inherent randomness due to… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 9 pages

  12. arXiv:2503.16870  [pdf, other

    cs.LG cs.AI cs.CL

    Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs

    Authors: Anshumann, Mohd Abbas Zaidi, Akhil Kedia, Jinwoo Ahn, Taehwak Kwon, Kangwook Lee, Haejun Lee, Joohyung Lee

    Abstract: Knowledge distillation can be a cost-effective technique to distill knowledge in Large Language Models, if the teacher output logits can be pre-computed and cached. However, successfully applying this to pre-training remains largely unexplored. In this work, we prove that naive approaches for sparse knowledge distillation such as caching Top-K probabilities, while intuitive, provide biased estimat… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: Anshumann, Mohd Abbas Zaidi and Akhil Kedia have Equal Contribution

    MSC Class: 68T50 ACM Class: I.2.7

  13. arXiv:2503.03287  [pdf, other

    cs.CV

    Deep Understanding of Sign Language for Sign to Subtitle Alignment

    Authors: Youngjoon Jang, Jeongsoo Choi, Junseok Ahn, Joon Son Chung

    Abstract: The objective of this work is to align asynchronous subtitles in sign language videos with limited labelled data. To achieve this goal, we propose a novel framework with the following contributions: (1) we leverage fundamental grammatical rules of British Sign Language (BSL) to pre-process the input subtitles, (2) we design a selective alignment loss to optimise the model for predicting the tempor… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  14. arXiv:2502.06086  [pdf, other

    cs.CL

    Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property Type

    Authors: Seokwon Song, Taehyun Lee, Jaewoo Ahn, Jae Hyuk Sung, Gunhee Kim

    Abstract: Conceptual combination is a cognitive process that merges basic concepts, enabling the creation of complex expressions. During this process, the properties of combination (e.g., the whiteness of a peeled apple) can be inherited from basic concepts, newly emerge, or be canceled. However, previous studies have evaluated a limited set of properties and have not examined the generative process. To add… ▽ More

    Submitted 22 May, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

    Comments: NAACL 2025 Oral

  15. arXiv:2502.05352  [pdf, other

    cs.AI cs.DC cs.MA

    ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks

    Authors: Saurabh Jha, Rohan Arora, Yuji Watanabe, Takumi Yanagawa, Yinfang Chen, Jackson Clark, Bhavya Bhavya, Mudit Verma, Harshit Kumar, Hirokuni Kitahara, Noah Zheutlin, Saki Takano, Divya Pathak, Felix George, Xinbo Wu, Bekir O. Turkkan, Gerard Vanloo, Michael Nidd, Ting Dai, Oishik Chatterjee, Pranjal Gupta, Suranjana Samanta, Pooja Aggarwal, Rong Lee, Pavankumar Murali , et al. (18 additional authors not shown)

    Abstract: Realizing the vision of using AI agents to automate critical IT tasks depends on the ability to measure and understand effectiveness of proposed solutions. We introduce ITBench, a framework that offers a systematic methodology for benchmarking AI agents to address real-world IT automation tasks. Our initial release targets three key areas: Site Reliability Engineering (SRE), Compliance and Securit… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  16. arXiv:2501.18877  [pdf, other

    cs.CV cs.CR cs.LG

    Distorting Embedding Space for Safety: A Defense Mechanism for Adversarially Robust Diffusion Models

    Authors: Jaesin Ahn, Heechul Jung

    Abstract: Text-to-image diffusion models show remarkable generation performance following text prompts, but risk generating Not Safe For Work (NSFW) contents from unsafe prompts. Existing approaches, such as prompt filtering or concept unlearning, fail to defend against adversarial attacks while maintaining benign image quality. In this paper, we propose a novel approach called Distorting Embedding Space (D… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  17. arXiv:2501.12422  [pdf, other

    cs.LG cs.AI cs.CV

    CroMe: Multimodal Fake News Detection using Cross-Modal Tri-Transformer and Metric Learning

    Authors: Eunjee Choi, Junhyun Ahn, XinYu Piao, Jong-Kook Kim

    Abstract: Multimodal Fake News Detection has received increasing attention recently. Existing methods rely on independently encoded unimodal data and overlook the advantages of capturing intra-modality relationships and integrating inter-modal similarities using advanced techniques. To address these issues, Cross-Modal Tri-Transformer and Metric Learning for Multimodal Fake News Detection (CroMe) is propose… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  18. arXiv:2501.00991  [pdf, other

    cs.DM cs.DS math.CO

    Twin-width one

    Authors: Jungho Ahn, Hugo Jacob, Noleen Köhler, Christophe Paul, Amadeus Reinald, Sebastian Wiederrecht

    Abstract: We investigate the structure of graphs of twin-width at most $1$, and obtain the following results: - Graphs of twin-width at most $1$ are permutation graphs. In particular they have an intersection model and a linear structure. - There is always a $1$-contraction sequence closely following a given permutation diagram. - Based on a recursive decomposition theorem, we obtain a simple algorith… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: Accepted to STACS 2025

  19. arXiv:2412.19259  [pdf, other

    eess.AS cs.SD

    VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis

    Authors: Jaemin Jung, Junseok Ahn, Chaeyoung Jung, Tan Dat Nguyen, Youngjoon Jang, Joon Son Chung

    Abstract: We present VoiceDiT, a multi-modal generative model for producing environment-aware speech and audio from text and visual prompts. While aligning speech with text is crucial for intelligible speech, achieving this alignment in noisy conditions remains a significant and underexplored challenge in the field. To address this, we present a novel audio generation pipeline named VoiceDiT. This pipeline… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: Accepted to ICASSP 2025

  20. arXiv:2412.04139  [pdf, ps, other

    cs.AI

    Monet: Mixture of Monosemantic Experts for Transformers

    Authors: Jungwoo Park, Young Jin Ahn, Kee-Eung Kim, Jaewoo Kang

    Abstract: Understanding the internal computations of large language models (LLMs) is crucial for aligning them with human values and preventing undesirable behaviors like toxic content generation. However, mechanistic interpretability is hindered by polysemanticity -- where individual neurons respond to multiple, unrelated concepts. While Sparse Autoencoders (SAEs) have attempted to disentangle these featur… ▽ More

    Submitted 11 June, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

  21. arXiv:2411.15620  [pdf, other

    cs.CV

    Fine-Grained Open-Vocabulary Object Recognition via User-Guided Segmentation

    Authors: Jinwoo Ahn, Hyeokjoon Kwon, Hwiyeon Yoo

    Abstract: Recent advent of vision-based foundation models has enabled efficient and high-quality object detection at ease. Despite the success of previous studies, object detection models face limitations on capturing small components from holistic objects and taking user intention into account. To address these challenges, we propose a novel foundation model-based detection method called FOCUS: Fine-graine… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  22. arXiv:2411.14137  [pdf, other

    cs.CV cs.CL

    VAGUE: Visual Contexts Clarify Ambiguous Expressions

    Authors: Heejeong Nam, Jinwoo Ahn, Keummin Ka, Jiwan Chung, Youngjae Yu

    Abstract: Human communication often relies on visual cues to resolve ambiguity. While humans can intuitively integrate these cues, AI systems often find it challenging to engage in sophisticated multimodal reasoning. We introduce VAGUE, a benchmark evaluating multimodal AI systems' ability to integrate visual context for intent disambiguation. VAGUE consists of 1.6K ambiguous textual expressions, each paire… ▽ More

    Submitted 11 March, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

    Comments: 31 pages

  23. arXiv:2410.22394  [pdf, other

    cs.CL

    AAAR-1.0: Assessing AI's Potential to Assist Research

    Authors: Renze Lou, Hanzi Xu, Sijia Wang, Jiangshu Du, Ryo Kamoi, Xiaoxin Lu, Jian Xie, Yuxuan Sun, Yusen Zhang, Jihyun Janice Ahn, Hongchao Fang, Zhuoyang Zou, Wenchao Ma, Xi Li, Kai Zhang, Congying Xia, Lifu Huang, Wenpeng Yin

    Abstract: Numerous studies have assessed the proficiency of AI systems, particularly large language models (LLMs), in facilitating everyday tasks such as email writing, question answering, and creative content generation. However, researchers face unique challenges and opportunities in leveraging LLMs for their own work, such as brainstorming research ideas, designing experiments, and writing or reviewing p… ▽ More

    Submitted 25 May, 2025; v1 submitted 29 October, 2024; originally announced October 2024.

    Comments: ICML 2025. Project Webpage: https://renzelou.github.io/AAAR-1.0/

  24. arXiv:2410.09872  [pdf, other

    cs.CV cs.MM

    Towards Reproducible Learning-based Compression

    Authors: Jiahao Pang, Muhammad Asad Lodhi, Junghyun Ahn, Yuning Huang, Dong Tian

    Abstract: A deep learning system typically suffers from a lack of reproducibility that is partially rooted in hardware or software implementation details. The irreproducibility leads to skepticism in deep learning technologies and it can hinder them from being deployed in many applications. In this work, the irreproducibility issue is analyzed where deep learning is employed in compression systems while the… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: Accepted at MMSP 2024

  25. arXiv:2410.08680  [pdf, other

    cs.CV

    Gait Sequence Upsampling using Diffusion Models for Single LiDAR Sensors

    Authors: Jeongho Ahn, Kazuto Nakashima, Koki Yoshino, Yumi Iwashita, Ryo Kurazume

    Abstract: Recently, 3D LiDAR has emerged as a promising technique in the field of gait-based person identification, serving as an alternative to traditional RGB cameras, due to its robustness under varying lighting conditions and its ability to capture 3D geometric information. However, long capture distances or the use of low-cost LiDAR sensors often result in sparse human point clouds, leading to a declin… ▽ More

    Submitted 14 October, 2024; v1 submitted 11 October, 2024; originally announced October 2024.

  26. arXiv:2409.18231  [pdf, other

    cs.RO

    ReloPush: Multi-object Rearrangement in Confined Spaces with a Nonholonomic Mobile Robot Pusher

    Authors: Jeeho Ahn, Christoforos Mavrogiannis

    Abstract: We focus on push-based multi-object rearrangement planning using a nonholonomically constrained mobile robot. The simultaneous geometric, kinematic, and physics constraints make this problem especially challenging. Prior work on rearrangement planning often relaxes some of these constraints by assuming dexterous hardware, prehensile manipulation, or sparsely occupied workspaces. Our key insight is… ▽ More

    Submitted 12 March, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: Preprint of final version, accepted to ICRA 2025

  27. arXiv:2409.13707  [pdf, other

    cs.IR cs.AI cs.CL

    Retrieval Augmented Generation-Based Incident Resolution Recommendation System for IT Support

    Authors: Paulina Toro Isaza, Michael Nidd, Noah Zheutlin, Jae-wook Ahn, Chidansh Amitkumar Bhatt, Yu Deng, Ruchi Mahindru, Martin Franz, Hans Florian, Salim Roukos

    Abstract: Clients wishing to implement generative AI in the domain of IT Support and AIOps face two critical issues: domain coverage and model size constraints due to model choice limitations. Clients might choose to not use larger proprietary models such as GPT-4 due to cost and privacy concerns and so are limited to smaller models with potentially less domain coverage that do not generalize to the client'… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: 7 pages, 3 figures, 6 tables

  28. arXiv:2409.10903  [pdf, other

    cs.RO

    Efficient Computation of Whole-Body Control Utilizing Simplified Whole-Body Dynamics via Centroidal Dynamics

    Authors: Junewhee Ahn, Jaesug Jung, Yisoo Lee, Hokyun Lee, Sami Haddadin, Jaeheung Park

    Abstract: In this study, we present a novel method for enhancing the computational efficiency of whole-body control for humanoid robots, a challenge accentuated by their high degrees of freedom. The reduced-dimension rigid body dynamics of a floating base robot is constructed by segmenting its kinematic chain into constrained and unconstrained chains, simplifying the dynamics of the unconstrained chain thro… ▽ More

    Submitted 30 December, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: submitted to IJCAS, under review

  29. MaDis-Stereo: Enhanced Stereo Matching via Distilled Masked Image Modeling

    Authors: Jihye Ahn, Hyesong Choi, Soomin Kim, Dongbo Min

    Abstract: In stereo matching, CNNs have traditionally served as the predominant architectures. Although Transformer-based stereo models have been studied recently, their performance still lags behind CNN-based stereo models due to the inherent data scarcity issue in the stereo matching task. In this paper, we propose Masked Image Modeling Distilled Stereo matching model, termed MaDis-Stereo, that enhances l… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  30. arXiv:2409.02545  [pdf, other

    cs.CV

    UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching

    Authors: Soomin Kim, Hyesong Choi, Jihye Ahn, Dongbo Min

    Abstract: Unlike other vision tasks where Transformer-based approaches are becoming increasingly common, stereo depth estimation is still dominated by convolution-based approaches. This is mainly due to the limited availability of real-world ground truth for stereo matching, which is a limiting factor in improving the performance of Transformer-based stereo approaches. In this paper, we propose UniTT-Stereo… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  31. arXiv:2409.01141  [pdf, other

    cs.AR cs.LG

    Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching

    Authors: Sungmin Yun, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim, Byeongho Kim, Sukhan Lee, Kyomin Sohn, Jung Ho Ahn

    Abstract: Large language models (LLMs) have emerged due to their capability to generate high-quality content across diverse contexts. To reduce their explosively increasing demands for computing resources, a mixture of experts (MoE) has emerged. The MoE layer enables exploiting a huge number of parameters with less computation. Applying state-of-the-art continuous batching increases throughput; however, it… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 15 pages, 16 figures, accepted at MICRO 2024

  32. arXiv:2407.13055  [pdf, other

    cs.CR cs.PF

    Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs

    Authors: Jongmin Kim, Wonseok Choi, Jung Ho Ahn

    Abstract: Fully homomorphic encryption (FHE) is a cryptographic technology capable of resolving security and privacy problems in cloud computing by encrypting data in use. However, FHE introduces tremendous computational overhead for processing encrypted data, causing FHE workloads to become 2-6 orders of magnitude slower than their unencrypted counterparts. To mitigate the overhead, we propose Cheddar, an… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 12 pages, 5 figures

  33. arXiv:2407.11368  [pdf

    cs.CL

    Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach

    Authors: Sojung Lucia Kim, Taehong Jang, Joonmo Ahn

    Abstract: This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ACL2024 submitted

  34. arXiv:2407.11017  [pdf, other

    cs.CL cs.AI cs.LG

    Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation

    Authors: Jihyun Janice Ahn, Ryo Kamoi, Lu Cheng, Rui Zhang, Wenpeng Yin

    Abstract: Mainstream LLM research has primarily focused on enhancing their generative capabilities. However, even the most advanced LLMs experience uncertainty in their outputs, often producing varied results on different runs or when faced with minor changes in input, despite no substantial change in content. Given multiple responses from the same LLM to the same input, we advocate leveraging the LLMs' dis… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: 4 pages, 3 tables

  35. arXiv:2407.10558  [pdf, other

    cs.CV cs.LG

    ConTEXTure: Consistent Multiview Images to Texture

    Authors: Jaehoon Ahn, Sumin Cho, Harim Jung, Kibeom Hong, Seonghoon Ban, Moon-Ryul Jung

    Abstract: We introduce ConTEXTure, a generative network designed to create a texture map/atlas for a given 3D mesh using images from multiple viewpoints. The process begins with generating a front-view image from a text prompt, such as 'Napoleon, front view', describing the 3D mesh. Additional images from different viewpoints are derived from this front-view image and camera poses relative to it. ConTEXTure… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  36. arXiv:2406.15709  [pdf, other

    cs.CR

    I Experienced More than 10 DeFi Scams: On DeFi Users' Perception of Security Breaches and Countermeasures

    Authors: Mingyi Liu, Jun Ho Huh, HyungSeok Han, Jaehyuk Lee, Jihae Ahn, Frank Li, Hyoungshick Kim, Taesoo Kim

    Abstract: Decentralized Finance (DeFi) offers a whole new investment experience and has quickly emerged as an enticing alternative to Centralized Finance (CeFi). Rapidly growing market size and active users, however, have also made DeFi a lucrative target for scams and hacks, with 1.95 billion USD lost in 2023. Unfortunately, no prior research thoroughly investigates DeFi users' security risk awareness leve… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: In Proceedings of the 33rd USENIX Security Symposium, Philadelphia, PA, USA, Aug. 2024

  37. arXiv:2406.12233  [pdf, other

    cs.AI cs.CL cs.CV

    SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

    Authors: Young Jin Ahn, Jungwoo Park, Sangha Park, Jonghyun Choi, Kee-Eung Kim

    Abstract: Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  38. arXiv:2406.05963  [pdf, other

    cs.CV cs.AI

    Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

    Authors: Jinwoo Ahn, Junhyeok Park, Min-Jun Kim, Kang-Hyeon Kim, So-Yeong Sohn, Yun-Ji Lee, Du-Seong Chang, Yu-Jung Heo, Eun-Sol Kim

    Abstract: In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two m… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  39. arXiv:2406.05602  [pdf, other

    cs.CV cs.CL

    Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

    Authors: Philip Wootaek Shin, Jihyun Janice Ahn, Wenpeng Yin, Jack Sampson, Vijaykrishnan Narayanan

    Abstract: It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and t… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  40. arXiv:2405.18027  [pdf, other

    cs.CL

    TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

    Authors: Jaewoo Ahn, Taehyun Lee, Junyoung Lim, Jin-Hwa Kim, Sangdoo Yun, Hwaran Lee, Gunhee Kim

    Abstract: While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurat… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings. Code and dataset are released at https://ahnjaewoo.github.io/timechara

  41. arXiv:2405.10272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS eess.IV

    Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

    Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

    Abstract: The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  42. arXiv:2405.02499  [pdf, other

    cs.CR cs.AR

    DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

    Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

    Abstract: The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)

  43. arXiv:2404.14687  [pdf, other

    cs.MM cs.AI cs.CL cs.CV

    Pegasus-v1 Technical Report

    Authors: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon , et al. (19 additional authors not shown)

    Abstract: This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  44. arXiv:2404.03602  [pdf, other

    cs.CL

    Evaluating LLMs at Detecting Errors in LLM Responses

    Authors: Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang

    Abstract: With Large Language Models (LLMs) being widely used across various tasks, detecting errors in their responses is increasingly crucial. However, little research has been conducted on error detection of LLM responses. Collecting error annotations on LLM responses is challenging due to the subjective nature of many NLP tasks, and thus previous research focuses on tasks of little practical value (e.g.… ▽ More

    Submitted 27 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: COLM 2024, 46 pages, Benchmark and code: https://github.com/psunlpgroup/ReaLMistake

  45. arXiv:2404.02155  [pdf, other

    cs.CV

    Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields

    Authors: Joshua Ahn, Haochen Wang, Raymond A. Yeh, Greg Shakhnarovich

    Abstract: Scale-ambiguity in 3D scene dimensions leads to magnitude-ambiguity of volumetric densities in neural radiance fields, i.e., the densities double when scene size is halved, and vice versa. We call this property alpha invariance. For NeRFs to better maintain alpha invariance, we recommend 1) parameterizing both distance and volume densities in log space, and 2) a discretization-agnostic initializat… ▽ More

    Submitted 16 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. project page https://pals.ttic.edu/p/alpha-invariance

  46. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  47. arXiv:2403.20109  [pdf, ps, other

    cs.LG cs.AI q-bio.BM

    Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

    Authors: Jinyeong Park, Jaegyoon Ahn, Jonghwan Choi, Jibum Kim

    Abstract: Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  48. arXiv:2403.14963  [pdf, other

    cs.CR

    Enabling Physical Localization of Uncooperative Cellular Devices

    Authors: Taekkyung Oh, Sangwook Bae, Junho Ahn, Yonghwa Lee, Tuan Dinh Hoang, Min Suk Kang, Nils Ole Tippenhauer, Yongdae Kim

    Abstract: In cellular networks, authorities may need to physically locate user devices to track criminals or illegal equipment. This process involves authorized agents tracing devices by monitoring uplink signals with cellular operator assistance. However, tracking uncooperative uplink signal sources remains challenging, even for operators and authorities. Three key challenges persist for fine-grained local… ▽ More

    Submitted 26 September, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  49. arXiv:2403.05591  [pdf, other

    cs.HC cs.LG

    Data-Driven Ergonomic Risk Assessment of Complex Hand-intensive Manufacturing Processes

    Authors: Anand Krishnan, Xingjian Yang, Utsav Seth, Jonathan M. Jeyachandran, Jonathan Y. Ahn, Richard Gardner, Samuel F. Pedigo, Adriana, Blom-Schieber, Ashis G. Banerjee, Krithika Manohar

    Abstract: Hand-intensive manufacturing processes, such as composite layup and textile draping, require significant human dexterity to accommodate task complexity. These strenuous hand motions often lead to musculoskeletal disorders and rehabilitation surgeries. We develop a data-driven ergonomic risk assessment system with a special focus on hand and finger activity to better identify and address ergonomic… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 26 pages, 7 figures

  50. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.