Skip to main content

Showing 1–50 of 162 results for author: Chung, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.03467  [pdf

    cs.CL

    Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis

    Authors: Shuang Zhou, Jiashuo Wang, Zidu Xu, Song Wang, David Brauer, Lindsay Welton, Jacob Cogan, Yuen-Hei Chung, Lei Tian, Zaifu Zhan, Yu Hou, Mingquan Lin, Genevieve B. Melton, Rui Zhang

    Abstract: Explainable disease diagnosis, which leverages patient information (e.g., signs and symptoms) and computational models to generate probable diagnoses and reasonings, offers clear clinical values. However, when clinical notes encompass insufficient evidence for a definite diagnosis, such as the absence of definitive symptoms, diagnostic uncertainty usually arises, increasing the risk of misdiagnosi… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 22 pages, 8 figures

  2. arXiv:2504.17203  [pdf, other

    cs.DB cs.LG

    High-Fidelity And Complex Test Data Generation For Real-World SQL Code Generation Services

    Authors: Shivasankari Kannan, Yeounoh Chung, Amita Gondi, Tristan Swadell, Fatma Ozcan

    Abstract: The demand for high-fidelity test data is paramount in industrial settings where access to production data is largely restricted. Traditional data generation methods often fall short, struggling with low-fidelity and the ability to model complex data structures and semantic relationships that are critical for testing complex SQL code generation services like Natural Language to SQL (NL2SQL). In th… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  3. arXiv:2503.17126  [pdf, other

    cs.CL cs.LG

    Modifying Large Language Model Post-Training for Diverse Creative Writing

    Authors: John Joon Young Chung, Vishakh Padmakumar, Melissa Roemmele, Yuqian Sun, Max Kreminski

    Abstract: As creative writing tasks do not have singular correct answers, large language models (LLMs) trained to perform these tasks should be able to generate diverse valid outputs. However, LLM post-training often focuses on improving generation quality but neglects to facilitate output diversity. Hence, in creative writing generation, we investigate post-training approaches to promote both output divers… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  4. arXiv:2503.12745  [pdf, other

    cs.CV

    ProtoDepth: Unsupervised Continual Depth Completion with Prototypes

    Authors: Patrick Rim, Hyoungseob Park, S. Gangopadhyay, Ziyao Zeng, Younjoon Chung, Alex Wong

    Abstract: We present ProtoDepth, a novel prototype-based approach for continual learning of unsupervised depth completion, the multimodal 3D reconstruction task of predicting dense depth maps from RGB images and sparse point clouds. The unsupervised learning paradigm is well-suited for continual learning, as ground truth is not needed. However, when training on new non-stationary distributions, depth comple… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  5. arXiv:2503.06335  [pdf, other

    cs.HC cs.CL

    Phraselette: A Poet's Procedural Palette

    Authors: Alex Calderwood, John Joon Young Chung, Yuqian Sun, Melissa Roemmele, Max Kreminski

    Abstract: According to the recently introduced theory of artistic support tools, creativity support tools exert normative influences over artistic production, instantiating a normative ground that shapes both the process and product of artistic expression. We argue that the normative ground of most existing automated writing tools is misaligned with writerly values and identify a potential alternative frame… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  6. arXiv:2502.15602  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation

    Authors: Yoonjin Chung, Pilsun Eu, Junwon Lee, Keunwoo Choi, Juhan Nam, Ben Sangbae Chon

    Abstract: Although being widely adopted for evaluating generated audio signals, the Fréchet Audio Distance (FAD) suffers from significant limitations, including reliance on Gaussian assumptions, sensitivity to sample size, and high computational complexity. As an alternative, we introduce the Kernel Audio Distance (KAD), a novel, distribution-free, unbiased, and computationally efficient metric based on Max… ▽ More

    Submitted 9 March, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

  7. arXiv:2502.15419  [pdf, other

    cs.CL cs.AI cs.CY

    Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking

    Authors: Yi-Ling Chung, Aurora Cobo, Pablo Serna

    Abstract: Robust automatic fact-checking systems have the potential to combat online misinformation at scale. However, most existing research primarily focuses on English. In this paper, we introduce MultiSynFact, the first large-scale multilingual fact-checking dataset containing 2.2M claim-source pairs designed to support Spanish, German, English, and other low-resource languages. Our dataset generation p… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: 15 pages, 1 figure, 18 tables

  8. arXiv:2502.11478  [pdf, other

    cs.SD cs.LG eess.AS

    TAPS: Throat and Acoustic Paired Speech Dataset for Deep Learning-Based Speech Enhancement

    Authors: Yunsik Kim, Yonghun Song, Yoonyoung Chung

    Abstract: In high-noise environments such as factories, subways, and busy streets, capturing clear speech is challenging due to background noise. Throat microphones provide a solution with their noise-suppressing properties, reducing the noise while recording speech. However, a significant limitation remains: high-frequency information is attenuated as sound waves pass through skin and tissue, reducing spee… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  9. arXiv:2502.04599  [pdf, other

    cs.HC

    Fuzzy Linkography: Automatic Graphical Summarization of Creative Activity Traces

    Authors: Amy Smith, Barrett R. Anderson, Jasmine Tan Otto, Isaac Karth, Yuqian Sun, John Joon Young Chung, Melissa Roemmele, Max Kreminski

    Abstract: Linkography -- the analysis of links between the design moves that make up an episode of creative ideation or design -- can be used for both visual and quantitative assessment of creative activity traces. Traditional linkography, however, is time-consuming, requiring a human coder to manually annotate both the design moves within an episode and the connections between them. As a result, linkograph… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  10. arXiv:2501.13284  [pdf, other

    cs.HC cs.AI cs.CL

    Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols

    Authors: John Joon Young Chung, Melissa Roemmele, Max Kreminski

    Abstract: We introduce Toyteller, an AI-powered storytelling system where users generate a mix of story text and visuals by directly manipulating character symbols like they are toy-playing. Anthropomorphized symbol motions can convey rich and nuanced social interactions; Toyteller leverages these motions (1) to let users steer story text generation and (2) as a visual output format that accompanies story t… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Comments: Accepted to CHI2025

  11. arXiv:2501.12372  [pdf, other

    cs.DB cs.AI

    Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL

    Authors: Yeounoh Chung, Gaurav T. Kakkar, Yu Gan, Brenton Milne, Fatma Ozcan

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across a range of natural language processing tasks. In particular, improvements in reasoning abilities and the expansion of context windows have opened new avenues for leveraging these powerful models. NL2SQL is challenging in that the natural language question is inherently ambiguous, while the SQL generation requires a preci… ▽ More

    Submitted 20 March, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: 13 pages, 6 figures, VLDB 2025

  12. arXiv:2501.09099  [pdf, other

    cs.HC

    Drama Llama: An LLM-Powered Storylets Framework for Authorable Responsiveness in Interactive Narrative

    Authors: Yuqian Sun, Phoebe J. Wang, John Joon Young Chung, Melissa Roemmele, Taewook Kim, Max Kreminski

    Abstract: In this paper, we present Drama Llama, an LLM-powered storylets framework that supports the authoring of responsive, open-ended interactive stories. DL combines the structural benefits of storylet-based systems with the generative capabilities of large language models, enabling authors to create responsive interactive narratives while maintaining narrative control. Rather than crafting complex log… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

    Comments: 10 pages, 5 photos

  13. arXiv:2501.06488  [pdf, other

    cs.CV cs.AI cs.HC cs.MM eess.IV

    NVS-SQA: Exploring Self-Supervised Quality Representation Learning for Neurally Synthesized Scenes without References

    Authors: Qiang Qu, Yiran Shen, Xiaoming Chen, Yuk Ying Chung, Weidong Cai, Tongliang Liu

    Abstract: Neural View Synthesis (NVS), such as NeRF and 3D Gaussian Splatting, effectively creates photorealistic scenes from sparse viewpoints, typically evaluated by quality assessment methods like PSNR, SSIM, and LPIPS. However, these full-reference methods, which compare synthesized views to reference views, may not fully capture the perceptual quality of neurally synthesized scenes (NSS), particularly… ▽ More

    Submitted 11 January, 2025; originally announced January 2025.

  14. arXiv:2412.08029  [pdf, other

    cs.CV cs.AI cs.HC cs.MM eess.IV

    NeRF-NQA: No-Reference Quality Assessment for Scenes Generated by NeRF and Neural View Synthesis Methods

    Authors: Qiang Qu, Hanxue Liang, Xiaoming Chen, Yuk Ying Chung, Yiran Shen

    Abstract: Neural View Synthesis (NVS) has demonstrated efficacy in generating high-fidelity dense viewpoint videos using a image set with sparse views. However, existing quality assessment methods like PSNR, SSIM, and LPIPS are not tailored for the scenes with dense viewpoints synthesized by NVS and NeRF variants, thus, they often fall short in capturing the perceptual quality, including spatial and angular… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Journal ref: IEEE Transactions on Visualization and Computer Graphics, vol. 30, no. 5, pp. 2129-2139, May 2024

  15. arXiv:2412.07080  [pdf, other

    cs.CV cs.AI cs.MM

    EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision

    Authors: Qiang Qu, Xiaoming Chen, Yuk Ying Chung, Yiran Shen

    Abstract: Event-stream representation is the first step for many computer vision tasks using event cameras. It converts the asynchronous event-streams into a formatted structure so that conventional machine learning models can be applied easily. However, most of the state-of-the-art event-stream representations are manually designed and the quality of these representations cannot be guaranteed due to the no… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Published on IEEE Transactions on Image Processing

    Journal ref: IEEE Transactions on Image Processing, vol. 33, pp. 6579-6591, 2024

  16. arXiv:2411.16750  [pdf, other

    cs.CV cs.CL cs.LG cs.MM

    PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation

    Authors: Ziyao Zeng, Jingcheng Ni, Daniel Wang, Patrick Rim, Younjoon Chung, Fengyu Yang, Byung-Woo Hong, Alex Wong

    Abstract: Traditional monocular depth estimation suffers from inherent ambiguity and visual nuisance. We argue that language prior can enhance monocular depth estimation by leveraging the inductive bias learned during the text-to-image pre-training of diffusion models. The ability of these models to generate images that align with text indicates that they have learned the spatial relationships, size, and sh… ▽ More

    Submitted 17 April, 2025; v1 submitted 24 November, 2024; originally announced November 2024.

  17. arXiv:2411.12440  [pdf, other

    cs.CV

    Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels

    Authors: Haodong Chen, Runnan Chen, Qiang Qu, Zhaoqing Wang, Tongliang Liu, Xiaoming Chen, Yuk Ying Chung

    Abstract: Recent advancements in 3D Gaussian Splatting (3DGS) have substantially improved novel view synthesis, enabling high-quality reconstruction and real-time rendering. However, blurring artifacts, such as floating primitives and over-reconstruction, remain challenging. Current methods address these issues by refining scene structure, enhancing geometric representations, addressing blur in training ima… ▽ More

    Submitted 2 December, 2024; v1 submitted 19 November, 2024; originally announced November 2024.

  18. arXiv:2411.09072  [pdf, other

    cs.LG

    Continuous GNN-based Anomaly Detection on Edge using Efficient Adaptive Knowledge Graph Learning

    Authors: Sanggeon Yun, Ryozo Masukawa, William Youngwoo Chung, Minhyoung Na, Nathaniel Bastian, Mohsen Imani

    Abstract: The increasing demand for robust security solutions across various industries has made Video Anomaly Detection (VAD) a critical task in applications such as intelligent surveillance, evidence investigation, and violence detection. Traditional approaches to VAD often rely on finetuning large pre-trained models, which can be computationally expensive and impractical for real-time or resource-constra… ▽ More

    Submitted 13 January, 2025; v1 submitted 13 November, 2024; originally announced November 2024.

    Comments: Accepted to DATE 2025

  19. arXiv:2410.24177  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models

    Authors: Heng-Jui Chang, Hongyu Gong, Changhan Wang, James Glass, Yu-An Chung

    Abstract: Spoken language models (SLMs) have gained increasing attention with advancements in text-based, decoder-only language models. SLMs process text and speech, enabling simultaneous speech understanding and generation. This paper presents Double-Codebook Speaker-invariant Clustering (DC-Spin), which aims to improve speech tokenization by bridging audio signals and SLM tokens. DC-Spin extracts speaker-… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Comments: Preprint

  20. arXiv:2410.02074  [pdf, other

    cs.IR cs.LG

    Price-guided user attention in large-scale E-commerce group recommendation

    Authors: Yang Shi, Young-joo Chung

    Abstract: Existing group recommender systems utilize attention mechanisms to identify critical users who influence group decisions the most. We analyzed user attention scores from a widely-used group recommendation model on a real-world E-commerce dataset and found that item price and user interaction history significantly influence the selection of critical users. When item prices are low, users with exten… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  21. arXiv:2410.01943  [pdf, other

    cs.LG cs.AI cs.CL cs.DB

    CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

    Authors: Mohammadreza Pourreza, Hailong Li, Ruoxi Sun, Yeounoh Chung, Shayan Talaei, Gaurav Tarlok Kakkar, Yu Gan, Amin Saberi, Fatma Ozcan, Sercan O. Arik

    Abstract: In tackling the challenges of large language model (LLM) performance for Text-to-SQL tasks, we introduce CHASE-SQL, a new framework that employs innovative strategies, using test-time compute in multi-agent modeling to improve candidate generation and selection. CHASE-SQL leverages LLMs' intrinsic knowledge to generate diverse and high-quality SQL candidates using different LLM generators with: (1… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  22. arXiv:2410.00207  [pdf

    cs.CL

    Evaluating the performance of state-of-the-art esg domain-specific pre-trained large language models in text classification against existing models and traditional machine learning techniques

    Authors: Tin Yuet Chung, Majid Latifi

    Abstract: This research investigates the classification of Environmental, Social, and Governance (ESG) information within textual disclosures. The aim is to develop and evaluate binary classification models capable of accurately identifying and categorizing E, S and G-related content respectively. The motivation for this research stems from the growing importance of ESG considerations in investment decisi… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: 56 pages, 9 figures

  23. arXiv:2409.15087  [pdf

    eess.IV cs.CV cs.LG

    Towards Accountable AI-Assisted Eye Disease Diagnosis: Workflow Design, External Validation, and Continual Learning

    Authors: Qingyu Chen, Tiarnan D L Keenan, Elvira Agron, Alexis Allot, Emily Guan, Bryant Duong, Amr Elsawy, Benjamin Hou, Cancan Xue, Sanjeeb Bhandari, Geoffrey Broadhead, Chantal Cousineau-Krieger, Ellen Davis, William G Gensheimer, David Grasic, Seema Gupta, Luis Haddock, Eleni Konstantinou, Tania Lamba, Michele Maiberger, Dimosthenis Mantopoulos, Mitul C Mehta, Ayman G Nahri, Mutaz AL-Nawaflh, Arnold Oshinsky , et al. (13 additional authors not shown)

    Abstract: Timely disease diagnosis is challenging due to increasing disease burdens and limited clinician availability. AI shows promise in diagnosis accuracy but faces real-world application issues due to insufficient validation in clinical workflows and diverse populations. This study addresses gaps in medical AI downstream accountability through a case study on age-related macular degeneration (AMD) diag… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  24. arXiv:2409.00879  [pdf, other

    cs.LG cs.AI

    Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts

    Authors: Youngseog Chung, Dhruv Malik, Jeff Schneider, Yuanzhi Li, Aarti Singh

    Abstract: The traditional viewpoint on Sparse Mixture of Experts (MoE) models is that instead of training a single large expert, which is computationally expensive, we can train many small experts. The hope is that if the total parameter count of the small experts equals that of the singular large expert, then we retain the representation power of the large expert while gaining computational tractability an… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 21 pages, 5 figures, 13 tables

  25. arXiv:2408.06731  [pdf, other

    cs.CY cs.AI cs.CL

    Large language models can consistently generate high-quality content for election disinformation operations

    Authors: Angus R. Williams, Liam Burke-Moore, Ryan Sze-Yin Chan, Florence E. Enock, Federico Nanni, Tvesha Sippy, Yi-Ling Chung, Evelina Gabasova, Kobi Hackenburg, Jonathan Bright

    Abstract: Advances in large language models have raised concerns about their potential use in generating compelling election disinformation at scale. This study presents a two-part investigation into the capabilities of LLMs to automate stages of an election disinformation operation. First, we introduce DisElect, a novel evaluation dataset designed to measure LLM compliance with instructions to generate con… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  26. arXiv:2408.04112  [pdf, other

    cs.HC cs.AI cs.CL

    Patchview: LLM-Powered Worldbuilding with Generative Dust and Magnet Visualization

    Authors: John Joon Young Chung, Max Kreminski

    Abstract: Large language models (LLMs) can help writers build story worlds by generating world elements, such as factions, characters, and locations. However, making sense of many generated elements can be overwhelming. Moreover, if the user wants to precisely control aspects of generated elements that are difficult to specify verbally, prompting alone may be insufficient. We introduce Patchview, a customiz… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to UIST2024

  27. arXiv:2407.15329  [pdf, other

    eess.IV cs.CV

    Efficient Multi-disparity Transformer for Light Field Image Super-resolution

    Authors: Zeke Zexi Hu, Haodong Chen, Yuk Ying Chung, Xiaoming Chen

    Abstract: This paper presents the Multi-scale Disparity Transformer (MDT), a novel Transformer tailored for light field image super-resolution (LFSR) that addresses the issues of computational redundancy and disparity entanglement caused by the indiscriminate processing of sub-aperture images inherent in conventional methods. MDT features a multi-branch structure, with each branch utilising independent disp… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  28. arXiv:2406.18375  [pdf, other

    cs.CV

    From Majority to Minority: A Diffusion-based Augmentation for Underrepresented Groups in Skin Lesion Analysis

    Authors: Janet Wang, Yunsung Chung, Zhengming Ding, Jihun Hamm

    Abstract: AI-based diagnoses have demonstrated dermatologist-level performance in classifying skin cancer. However, such systems are prone to under-performing when tested on data from minority groups that lack sufficient representation in the training sets. Although data collection and annotation offer the best means for promoting minority groups, these processes are costly and time-consuming. Prior works h… ▽ More

    Submitted 30 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  29. arXiv:2405.13954  [pdf, other

    cs.LG cs.AI cs.CL

    What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

    Authors: Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Minsoo Kang, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, Jeff Schneider, Eduard Hovy, Roger Grosse, Eric Xing

    Abstract: Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast trai… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  30. arXiv:2405.11703  [pdf, other

    cs.LG

    QComp: A QSAR-Based Data Completion Framework for Drug Discovery

    Authors: Bingjia Yang, Yunsie Chung, Archer Y. Yang, Bo Yuan, Xiang Yu

    Abstract: In drug discovery, in vitro and in vivo experiments reveal biochemical activities related to the efficacy and toxicity of compounds. The experimental data accumulate into massive, ever-evolving, and sparse datasets. Quantitative Structure-Activity Relationship (QSAR) models, which predict biochemical activities using only the structural information of compounds, face challenges in integrating the… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  31. arXiv:2405.10345  [pdf, other

    q-bio.QM cs.AI cs.LG

    Machine Learning Driven Biomarker Selection for Medical Diagnosis

    Authors: Divyagna Bavikadi, Ayushi Agarwal, Shashank Ganta, Yunro Chung, Lusheng Song, Ji Qiu, Paulo Shakarian

    Abstract: Recent advances in experimental methods have enabled researchers to collect data on thousands of analytes simultaneously. This has led to correlational studies that associated molecular measurements with diseases such as Alzheimer's, Liver, and Gastric Cancer. However, the use of thousands of biomarkers selected from the analytes is not practical for real-world medical diagnosis and is likely unde… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  32. arXiv:2405.05581  [pdf, other

    cs.HC cs.AI cs.CL

    One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations

    Authors: Yoonjoo Lee, Kihoon Son, Tae Soo Kim, Jisu Kim, John Joon Young Chung, Eytan Adar, Juho Kim

    Abstract: As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or a… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to FAccT 2024

  33. arXiv:2404.12416  [pdf, other

    physics.plasm-ph cs.LG

    Full Shot Predictions for the DIII-D Tokamak via Deep Recurrent Networks

    Authors: Ian Char, Youngseog Chung, Joseph Abbate, Egemen Kolemen, Jeff Schneider

    Abstract: Although tokamaks are one of the most promising devices for realizing nuclear fusion as an energy source, there are still key obstacles when it comes to understanding the dynamics of the plasma and controlling it. As such, it is crucial that high quality models are developed to assist in overcoming these obstacles. In this work, we take an entirely data driven approach to learn such a model. In pa… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  34. arXiv:2403.20103  [pdf, other

    cs.CL

    NLP for Counterspeech against Hate: A Survey and How-To Guide

    Authors: Helena Bonaldi, Yi-Ling Chung, Gavin Abercrombie, Marco Guerini

    Abstract: In recent years, counterspeech has emerged as one of the most promising strategies to fight online hate. These non-escalatory responses tackle online abuse while preserving the freedom of speech of the users, and can have a tangible impact in reducing online and offline violence. Recently, there has been growing interest from the Natural Language Processing (NLP) community in addressing the challe… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: To appear in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (findings)

  35. A Design Space for Intelligent and Interactive Writing Assistants

    Authors: Mina Lee, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L. C. Guo, Md Naimul Hoque, Yewon Kim, Simon Knight, Seyed Parsa Neshaei, Agnia Sergeyuk, Antonette Shibani, Disha Shrivastava, Lila Shroff, Jessi Stark, Sarah Sterman , et al. (11 additional authors not shown)

    Abstract: In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at CHI 2024

  36. arXiv:2403.09159  [pdf, ps, other

    cs.CL

    Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation

    Authors: Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

    Abstract: Counter Narratives (CNs) are non-negative textual responses to Hate Speech (HS) aiming at defusing online hatred and mitigating its spreading across media. Despite the recent increase in HS content posted online, research on automatic CN generation has been relatively scarce and predominantly focused on English. In this paper, we present CONAN-EUS, a new Basque and Spanish dataset for CN generatio… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted for the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING) 2024

  37. arXiv:2403.07592  [pdf, other

    cs.CV

    Accurate Spatial Gene Expression Prediction by integrating Multi-resolution features

    Authors: Youngmin Chung, Ji Hun Ha, Kyeong Chan Im, Joo Sang Lee

    Abstract: Recent advancements in Spatial Transcriptomics (ST) technology have facilitated detailed gene expression analysis within tissue contexts. However, the high costs and methodological limitations of ST necessitate a more robust predictive model. In response, this paper introduces TRIPLEX, a novel deep learning framework designed to predict spatial gene expression from Whole Slide Images (WSIs). TRIPL… ▽ More

    Submitted 25 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  38. Authors' Values and Attitudes Towards AI-bridged Scalable Personalization of Creative Language Arts

    Authors: Taewook Kim, Hyomin Han, Eytan Adar, Matthew Kay, John Joon Young Chung

    Abstract: Generative AI has the potential to create a new form of interactive media: AI-bridged creative language arts (CLA), which bridge the author and audience by personalizing the author's vision to the audience's context and taste at scale. However, it is unclear what the authors' values and attitudes would be regarding AI-bridged CLA. To identify these values and attitudes, we conducted an interview s… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 16 pages, 6 figures, 2 tables. Accepted to ACM CHI 2024

  39. arXiv:2402.11223  [pdf, other

    cs.LG

    HEAL: Brain-inspired Hyperdimensional Efficient Active Learning

    Authors: Yang Ni, Zhuowen Zou, Wenjun Huang, Hanning Chen, William Youngwoo Chung, Samuel Cho, Ranganath Krishnan, Pietro Mercati, Mohsen Imani

    Abstract: Drawing inspiration from the outstanding learning capability of our human brains, Hyperdimensional Computing (HDC) emerges as a novel computing paradigm, and it leverages high-dimensional vector presentation and operations for brain-like lightweight Machine Learning (ML). Practical deployments of HDC have significantly enhanced the learning efficiency compared to current deep ML methods on a broad… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  40. arXiv:2402.08025  [pdf, other

    cs.CV

    Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing

    Authors: Jacob Tyo, Motolani Olarinre, Youngseog Chung, Zachary C. Lipton

    Abstract: Despite significant progress in optical character recognition (OCR) and computer vision systems, robustly recognizing text and identifying people in images taken in unconstrained \emph{in-the-wild} environments remain an ongoing challenge. However, such obstacles must be overcome in practical applications of vision systems, such as identifying racers in photos taken during off-road racing events.… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2311.09256

  41. arXiv:2401.12295  [pdf, other

    cs.CL

    Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data

    Authors: Leonardo Castro-Gonzalez, Yi-Ling Chung, Hannak Rose Kirk, John Francis, Angus R. Williams, Pica Johansson, Jonathan Bright

    Abstract: The field of machine learning has recently made significant progress in reducing the requirements for labelled training data when building new models. These `cheaper' learning techniques hold significant potential for the social sciences, where development of large labelled training datasets is often a significant practical impediment to the use of machine learning for analytical tasks. In this ar… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 39 pages, 10 figures, 6 tables

    ACM Class: I.2.7; J.4

  42. arXiv:2401.09294  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis

    Authors: Yoonjin Chung, Junwon Lee, Juhan Nam

    Abstract: Foley sound, audio content inserted synchronously with videos, plays a critical role in the user experience of multimedia content. Recently, there has been active research in Foley sound synthesis, leveraging the advancements in deep generative models. However, such works mainly focus on replicating a single sound class or a textual sound description, neglecting temporal information, which is cruc… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  43. arXiv:2401.08117  [pdf, other

    cs.CV cs.AI cs.MM cs.NE

    E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning

    Authors: Qiang Qu, Yiran Shen, Xiaoming Chen, Yuk Ying Chung, Tongliang Liu

    Abstract: The bio-inspired event cameras or dynamic vision sensors are capable of asynchronously capturing per-pixel brightness changes (called event-streams) in high temporal resolution and high dynamic range. However, the non-structural spatial-temporal event-streams make it challenging for providing intuitive visualization with rich semantic information for human vision. It calls for events-to-video (E2V… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted in AAAI2024

  44. Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution

    Authors: Zeke Zexi Hu, Xiaoming Chen, Vera Yuk Ying Chung, Yiran Shen

    Abstract: The effective extraction of spatial-angular features plays a crucial role in light field image super-resolution (LFSR) tasks, and the introduction of convolution and Transformers leads to significant improvement in this area. Nevertheless, due to the large 4D data volume of light field images, many existing methods opted to decompose the data into a number of lower-dimensional subspaces and perfor… ▽ More

    Submitted 11 March, 2025; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE Transactions on Multimedia

  45. arXiv:2312.11949  [pdf, other

    cs.HC

    CreativeConnect: Supporting Reference Recombination for Graphic Design Ideation with Generative AI

    Authors: DaEun Choi, Sumin Hong, Jeongeon Park, John Joon Young Chung, Juho Kim

    Abstract: Graphic designers often get inspiration through the recombination of references. Our formative study (N=6) reveals that graphic designers focus on conceptual keywords during this process, and want support for discovering the keywords, expanding them, and exploring diverse recombination options of them, while still having room for designers' creativity. We propose CreativeConnect, a system with gen… ▽ More

    Submitted 6 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  46. arXiv:2312.06279  [pdf, other

    cs.LG cs.AI

    Regional Correlation Aided Mobile Traffic Prediction with Spatiotemporal Deep Learning

    Authors: JeongJun Park, Lusungu J. Mwasinga, Huigyu Yang, Syed M. Raza, Duc-Tai Le, Moonseong Kim, Min Young Chung, Hyunseung Choo

    Abstract: Mobile traffic data in urban regions shows differentiated patterns during different hours of the day. The exploitation of these patterns enables highly accurate mobile traffic prediction for proactive network management. However, recent Deep Learning (DL) driven studies have only exploited spatiotemporal features and have ignored the geographical correlations, causing high complexity and erroneous… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 4 pages, 5 figures, 1 table. This paper is already accepted on IEEE Consumer Communications & Networking Conference(CCNC) 2024

  47. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  48. arXiv:2311.09256  [pdf, other

    cs.CV

    Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset

    Authors: Jacob Tyo, Youngseog Chung, Motolani Olarinre, Zachary C. Lipton

    Abstract: This paper introduces the off-road motorcycle Racer number Dataset (RnD), a new challenging dataset for optical character recognition (OCR) research. RnD contains 2,411 images from professional motorsports photographers that depict motorcycle racers in off-road competitions. The images exhibit a wide variety of factors that make OCR difficult, including mud occlusions, motion blur, non-standard fo… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  49. arXiv:2311.08488  [pdf, other

    cs.CV

    MUDD: A New Re-Identification Dataset with Efficient Annotation for Off-Road Racers in Extreme Conditions

    Authors: Jacob Tyo, Motolani Olarinre, Youngseog Chung, Zachary C. Lipton

    Abstract: Re-identifying individuals in unconstrained environments remains an open challenge in computer vision. We introduce the Muddy Racer re-IDentification Dataset (MUDD), the first large-scale benchmark for matching identities of motorcycle racers during off-road competitions. MUDD exhibits heavy mud occlusion, motion blurring, complex poses, and extreme lighting conditions previously unseen in existin… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  50. arXiv:2309.07707  [pdf, other

    cs.CL cs.SD eess.AS

    CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

    Authors: Heng-Jui Chang, Ning Dong, Ruslan Mavlyutov, Sravya Popuri, Yu-An Chung

    Abstract: Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech recognition and translation tasks. Due to the high cost of developing these large models, building new encoders for new tasks and deploying them to on-device applications are infeasible. Prior studies propose model compression methods to address this issue, but those works focus on smaller models a… ▽ More

    Submitted 27 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024