Skip to main content

Showing 1–23 of 23 results for author: Chong, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.02863  [pdf, ps, other

    eess.AS cs.AI cs.SD

    CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

    Authors: Helin Wang, Jiarui Hai, Dading Chong, Karan Thakkar, Tiantian Feng, Dongchao Yang, Junhyeok Lee, Laureano Moro Velazquez, Jesus Villalba, Zengyi Qin, Shrikanth Narayanan, Mounya Elhiali, Najim Dehak

    Abstract: Recent advancements in generative artificial intelligence have significantly transformed the field of style-captioned text-to-speech synthesis (CapTTS). However, adapting CapTTS to real-world applications remains challenging due to the lack of standardized, comprehensive datasets and limited research on downstream tasks built upon CapTTS. To address these gaps, we introduce CapSpeech, a new benchm… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  2. arXiv:2504.19314  [pdf, other

    cs.CL

    BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese

    Authors: Peilin Zhou, Bruce Leon, Xiang Ying, Can Zhang, Yifan Shao, Qichen Ye, Dading Chong, Zhiling Jin, Chenxuan Xie, Meng Cao, Yuxin Gu, Sixin Hong, Jing Ren, Jian Chen, Chao Liu, Yining Hua

    Abstract: As large language models (LLMs) evolve into tool-using agents, the ability to browse the web in real-time has become a critical yardstick for measuring their reasoning and retrieval competence. Existing benchmarks such as BrowseComp concentrate on English and overlook the linguistic, infrastructural, and censorship-related complexities of other major information ecosystems -- most notably Chinese.… ▽ More

    Submitted 1 May, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

    Comments: Under Review

  3. arXiv:2409.13948  [pdf, other

    cs.CL

    Aligning Language Models Using Follow-up Likelihood as Reward Signal

    Authors: Chen Zhang, Dading Chong, Feng Jiang, Chengguang Tang, Anningzhe Gao, Guohua Tang, Haizhou Li

    Abstract: In natural human-to-human conversations, participants often receive feedback signals from one another based on their follow-up reactions. These reactions can include verbal responses, facial expressions, changes in emotional state, and other non-verbal cues. Similarly, in human-machine interactions, the machine can leverage the user's follow-up utterances as feedback signals to assess whether it h… ▽ More

    Submitted 23 February, 2025; v1 submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted by AAAI-2025, 16 pages, reward model, LLM Alignment, code repository at (https://github.com/e0397123/FLR)

  4. arXiv:2408.15916  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-modal Adversarial Training for Zero-Shot Voice Cloning

    Authors: John Janiczek, Dading Chong, Dongyang Dai, Arlo Faria, Chao Wang, Tao Wang, Yuzong Liu

    Abstract: A text-to-speech (TTS) model trained to reconstruct speech given text tends towards predictions that are close to the average characteristics of a dataset, failing to model the variations that make human speech sound natural. This problem is magnified for zero-shot voice cloning, a task that requires training data with high variance in speaking styles. We build off of recent works which have used… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted at INTERSPEECH 2024

  5. arXiv:2408.13893  [pdf, other

    cs.SD cs.CL eess.AS

    SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Rongjie Huang, Yuanyuan Wang, Haohan Guo, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng

    Abstract: Scaling Text-to-speech (TTS) to large-scale datasets has been demonstrated as an effective method for improving the diversity and naturalness of synthesized speech. At the high level, previous large-scale TTS models can be categorized into either Auto-regressive (AR) based (\textit{e.g.}, VALL-E) or Non-auto-regressive (NAR) based models (\textit{e.g.}, NaturalSpeech 2/3). Although these works dem… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Submit to TASLP

  6. arXiv:2406.12009  [pdf, other

    cs.CL

    FinTruthQA: A Benchmark Dataset for Evaluating the Quality of Financial Information Disclosure

    Authors: Ziyue Xu, Peilin Zhou, Xinyu Shi, Jiageng Wu, Yikang Jiang, Dading Chong, Bin Ke, Jie Yang

    Abstract: Accurate and transparent financial information disclosure is essential in accounting and finance, fostering trust and enabling informed investment decisions that drive economic development. Among many information disclosure platforms, the Chinese stock exchanges' investor interactive platform provides a novel and interactive way for listed firms to disclose information of interest to investors thr… ▽ More

    Submitted 11 February, 2025; v1 submitted 17 June, 2024; originally announced June 2024.

  7. ClimateIQA: A New Dataset and Benchmark to Advance Vision-Language Models in Meteorology Anomalies Analysis

    Authors: Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Wei Chen, Bing Zhu, Junwei Liang, Zixuan Yuan

    Abstract: Meteorological heatmaps play a vital role in deciphering extreme weather phenomena, yet their inherent complexities marked by irregular contours, unstructured patterns, and complex color variations present unique analytical hurdles for state-of-the-art Vision-Language Models (VLMs). Current state-of-the-art models like GPT-4o, Qwen-VL, and LLaVA 1.6 struggle with tasks such as precise color identi… ▽ More

    Submitted 25 June, 2025; v1 submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.04555  [pdf, other

    cs.CL cs.AI

    Creating an AI Observer: Generative Semantic Workspaces

    Authors: Pavan Holur, Shreyas Rajesh, David Chong, Vwani Roychowdhury

    Abstract: An experienced human Observer reading a document -- such as a crime report -- creates a succinct plot-like $\textit{``Working Memory''}$ comprising different actors, their prototypical roles and states at any point, their evolution over time based on their interactions, and even a map of missing Semantic parts anticipating them in the future.… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 37 pages with appendix, 28 figures

  9. arXiv:2405.20215  [pdf, other

    cs.CL

    TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models

    Authors: Chen Zhang, Chengguang Tang, Dading Chong, Ke Shi, Guohua Tang, Feng Jiang, Haizhou Li

    Abstract: Mainstream approaches to aligning large language models (LLMs) heavily rely on human preference data, particularly when models require periodic updates. The standard process for iterative alignment of LLMs involves collecting new human feedback for each update. However, the data collection process is costly and challenging to scale. To address this issue, we introduce the "TS-Align" framework, whi… ▽ More

    Submitted 29 September, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: EMNLP-2024 Findings

  10. arXiv:2310.17956  [pdf, other

    cs.CV cs.AI cs.CL

    Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare

    Authors: Junling Liu, Ziming Wang, Qichen Ye, Dading Chong, Peilin Zhou, Yining Hua

    Abstract: Large Language Models (LLMs) have introduced a new era of proficiency in comprehending complex healthcare and biomedical topics. However, there is a noticeable lack of models in languages other than English and models that can interpret multi-modal input, which is crucial for global healthcare accessibility. In response, this study introduces Qilin-Med-VL, the first Chinese large vision-language m… ▽ More

    Submitted 1 November, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

  11. arXiv:2310.09089  [pdf, other

    cs.CL

    Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model

    Authors: Qichen Ye, Junling Liu, Dading Chong, Peilin Zhou, Yining Hua, Fenglin Liu, Meng Cao, Ziming Wang, Xuxin Cheng, Zhu Lei, Zhenhua Guo

    Abstract: Integrating large language models (LLMs) into healthcare holds great potential but faces challenges. Pre-training LLMs from scratch for domains like medicine is resource-heavy and often unfeasible. On the other hand, sole reliance on Supervised Fine-tuning (SFT) can result in overconfident predictions and may not tap into domain-specific insights. In response, we present a multi-stage training met… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  12. arXiv:2308.12241  [pdf, other

    cs.IR cs.AI

    LLMRec: Benchmarking Large Language Models on Recommendation Task

    Authors: Junling Liu, Chao Liu, Peilin Zhou, Qichen Ye, Dading Chong, Kang Zhou, Yueqi Xie, Yuwei Cao, Shoujin Wang, Chenyu You, Philip S. Yu

    Abstract: Recently, the fast development of Large Language Models (LLMs) such as ChatGPT has significantly advanced NLP tasks by enhancing the capabilities of conversational models. However, the application of LLMs in the recommendation domain has not been thoroughly investigated. To bridge this gap, we propose LLMRec, a LLM-based recommender system designed for benchmarking LLMs on various recommendation t… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  13. arXiv:2306.03030  [pdf, other

    cs.CL

    Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset

    Authors: Junling Liu, Peilin Zhou, Yining Hua, Dading Chong, Zhongyu Tian, Andrew Liu, Helin Wang, Chenyu You, Zhenhua Guo, Lei Zhu, Michael Lingzhi Li

    Abstract: Recent advancements in large language models (LLMs) have transformed the field of question answering (QA). However, evaluating LLMs in the medical field is challenging due to the lack of standardized and comprehensive datasets. To address this gap, we introduce CMExam, sourced from the Chinese National Medical Licensing Examination. CMExam consists of 60K+ multiple-choice questions for standardize… ▽ More

    Submitted 22 October, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted by NeurIPS 2023 Datasets and Benchmarks Track

  14. arXiv:2211.06993  [pdf, other

    cs.CL

    GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost

    Authors: Qingcheng Zeng, Lucas Garay, Peilin Zhou, Dading Chong, Yining Hua, Jiageng Wu, Yikang Pan, Han Zhou, Rob Voigt, Jie Yang

    Abstract: Large pre-trained models have revolutionized natural language processing (NLP) research and applications, but high training costs and limited data resources have prevented their benefits from being shared equally amongst speakers of all the world's languages. To address issues of cross-linguistic access to such models and reduce energy consumption for sustainability during large-scale model traini… ▽ More

    Submitted 26 May, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

    Comments: Accepted at IJCAI 2023 AI and Social Good Track

  15. arXiv:2209.13773  [pdf, other

    cs.CL

    METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19 Related Tweets

    Authors: Peilin Zhou, Zeqiang Wang, Dading Chong, Zhijiang Guo, Yining Hua, Zichang Su, Zhiyang Teng, Jiageng Wu, Jie Yang

    Abstract: The COVID-19 pandemic continues to bring up various topics discussed or debated on social media. In order to explore the impact of pandemics on people's lives, it is crucial to understand the public's concerns and attitudes towards pandemic-related entities (e.g., drugs, vaccines) on social media. However, models trained on existing named entity recognition (NER) or targeted sentiment analysis (TS… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 10 pages, 6 figures, 6 tables, accepted by NeurIPS 2022 Datasets and Benchmarks track

  16. arXiv:2206.12759  [pdf, other

    cs.CL cs.SD eess.AS

    Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective

    Authors: Qingcheng Zeng, Dading Chong, Peilin Zhou, Jie Yang

    Abstract: Accented speech recognition and accent classification are relatively under-explored research areas in speech technology. Recently, deep learning-based methods and Transformer-based pretrained models have achieved superb performances in both areas. However, most accent classification tasks focused on classifying different kinds of English accents and little attention was paid to geographically-prox… ▽ More

    Submitted 28 June, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

    Comments: INTERSPEECH 2022

  17. arXiv:2205.12702  [pdf, other

    cs.CL

    Detecting Label Errors by using Pre-Trained Language Models

    Authors: Derek Chong, Jenny Hong, Christopher D. Manning

    Abstract: We show that large pre-trained language models are inherently highly capable of identifying label errors in natural language datasets: simply examining out-of-sample data points in descending order of fine-tuned task loss significantly outperforms more complex error-detection mechanisms proposed in previous work. To this end, we contribute a novel method for introducing realistic, human-originat… ▽ More

    Submitted 15 December, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: 18 pages, 10 figures. Accepted to EMNLP 2022; typesetting of this version slightly differs from conference version

  18. arXiv:2205.11008  [pdf, other

    cs.CL cs.SD eess.AS

    Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection

    Authors: Peilin Zhou, Dading Chong, Helin Wang, Qingcheng Zeng

    Abstract: The past ten years have witnessed the rapid development of text-based intent detection, whose benchmark performances have already been taken to a remarkable level by deep learning techniques. However, automatic speech recognition (ASR) errors are inevitable in real-world applications due to the environment noise, unique speech patterns and etc, leading to sharp performance drop in state-of-the-art… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

    Comments: Submit to INTERSPEECH 2022

  19. arXiv:2204.12768  [pdf, other

    cs.SD eess.AS

    Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training

    Authors: Dading Chong, Helin Wang, Peilin Zhou, Qingcheng Zeng

    Abstract: Transformer-based models attain excellent results and generalize well when trained on sufficient amounts of data. However, constrained by the limited data available in the audio domain, most transformer-based models for audio tasks are finetuned from pre-trained models in other domains (e.g. image), which has a notable gap with the audio domain. Other methods explore the self-supervised learning a… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: Submit to INTERSPEECH 2022

  20. arXiv:2108.00071  [pdf

    cs.LG cs.AI stat.ML

    Foundations of data imbalance and solutions for a data democracy

    Authors: Ajay Kulkarni, Deri Chong, Feras A. Batarseh

    Abstract: Dealing with imbalanced data is a prevalent problem while performing classification on the datasets. Many times, this problem contributes to bias while making decisions or implementing policies. Thus, it is vital to understand the factors which cause imbalance in the data (or class imbalance). Such hidden biases and imbalances can lead to data tyranny and a major challenge to a data democracy. In… ▽ More

    Submitted 30 July, 2021; originally announced August 2021.

    Comments: Published in Data Democracy: 1st Edition At the Nexus of Artificial Intelligence, Software Development, and Knowledge Engineering. (Chapter 5)

    Report number: eBook ISBN: 9780128189399, Paperback ISBN: 9780128183663

  21. arXiv:2011.14041  [pdf

    cs.RO

    A RGB-D SLAM Algorithm for Indoor Dynamic Scene

    Authors: Deng Su, Dehong Chong

    Abstract: Visual slam technology is one of the key technologies for robot to explore unknown environment independently. Accurate estimation of camera pose based on visual sensor is the basis of autonomous navigation and positioning. However, most visual slam algorithms are based on static environment assumption and cannot estimate accurate camera pose in dynamic environment. In order to solve this problem,… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: in Chinese

  22. arXiv:2007.03781  [pdf, other

    cs.SD eess.AS

    Acoustic Scene Classification with Spectrogram Processing Strategies

    Authors: Helin Wang, Yuexian Zou, Dading Chong

    Abstract: Recently, convolutional neural networks (CNN) have achieved the state-of-the-art performance in acoustic scene classification (ASC) task. The audio data is often transformed into two-dimensional spectrogram representations, which are then fed to the neural networks. In this paper, we study the problem of efficiently taking advantage of different spectrogram representations through discriminative p… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: Submitted to DCASE 2020 Workshop

  23. arXiv:1912.06808  [pdf, other

    cs.SD cs.LG eess.AS

    Environmental Sound Classification with Parallel Temporal-spectral Attention

    Authors: Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang

    Abstract: Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC). Recently, temporal attention mechanisms have been used in CNN to capture the useful information from the relevant time frames for audio classification, especially for weakly labelled data where the onset and offset times of the sound events are not applied.… ▽ More

    Submitted 20 May, 2020; v1 submitted 14 December, 2019; originally announced December 2019.

    Comments: submitted to INTERSPEECH2020