Skip to main content

Showing 1–8 of 8 results for author: Chiu, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.08238  [pdf, other

    cs.SD eess.AS

    CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset

    Authors: Xuanjun Chen, Jiawei Du, Haibin Wu, Lin Zhang, I-Ming Lin, I-Hsiang Chiu, Wenze Ren, Yuan Tseng, Yu Tsao, Jyh-Shing Roger Jang, Hung-yi Lee

    Abstract: With the rapid advancement of neural audio codecs, codec-based speech generation (CoSG) systems have become highly powerful. Unfortunately, CoSG also enables the creation of highly realistic deepfake speech, making it easier to mimic an individual's voice and spread misinformation. We refer to this emerging deepfake speech generated by CoSG systems as CodecFake. Detecting such CodecFake is an urge… ▽ More

    Submitted 17 March, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

    Comments: Work in Progress

  2. arXiv:2412.04861  [pdf, other

    cs.LG eess.SP

    MSECG: Incorporating Mamba for Robust and Efficient ECG Super-Resolution

    Authors: Jie Lin, I Chiu, Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Ping-Cheng Yeh, Yu Tsao

    Abstract: Electrocardiogram (ECG) signals play a crucial role in diagnosing cardiovascular diseases. To reduce power consumption in wearable or portable devices used for long-term ECG monitoring, super-resolution (SR) techniques have been developed, enabling these devices to collect and transmit signals at a lower sampling rate. In this study, we propose MSECG, a compact neural network model designed for EC… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

    Comments: 5 pages, 3 figures

  3. arXiv:2411.07111  [pdf, other

    cs.CL cs.SD eess.AS

    Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

    Authors: Chih-Kai Yang, Yu-Kuan Fu, Chen-An Li, Yi-Cheng Lin, Yu-Xiang Lin, Wei-Chih Chen, Ho Lam Chung, Chun-Yi Kuan, Wei-Ping Huang, Ke-Han Lu, Tzu-Quan Lin, Hsiu-Hsuan Wang, En-Pei Hu, Chan-Jan Hsu, Liang-Hsuan Tseng, I-Hsiang Chiu, Ulin Sanga, Xuanjun Chen, Po-chun Hsu, Shu-wen Yang, Hung-yi Lee

    Abstract: This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex… ▽ More

    Submitted 27 December, 2024; v1 submitted 11 November, 2024; originally announced November 2024.

    Comments: Work in progress

  4. arXiv:2411.05361  [pdf, ps, other

    cs.CL eess.AS

    Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

    Authors: Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, Yi-Jen Shih, Jiatong Shi, William Chen, Chih-Kai Yang, Wenze Ren, Xuanjun Chen, Chi-Yuan Hsiao, Puyuan Peng, Shih-Heng Wang, Chun-Yi Kuan, Ke-Han Lu, Kai-Wei Chang, Fabian Ritter-Gutierrez, Kuan-Po Huang, Siddhant Arora, You-Kuan Lin, Ming To Chuang , et al. (55 additional authors not shown)

    Abstract: Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluati… ▽ More

    Submitted 9 June, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

    Comments: ICLR 2025

  5. arXiv:2409.08731  [pdf, other

    cs.SD eess.AS

    DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset

    Authors: Jiawei Du, I-Ming Lin, I-Hsiang Chiu, Xuanjun Chen, Haibin Wu, Wenze Ren, Yu Tsao, Hung-yi Lee, Jyh-Shing Roger Jang

    Abstract: Mainstream zero-shot TTS production systems like Voicebox and Seed-TTS achieve human parity speech by leveraging Flow-matching and Diffusion models, respectively. Unfortunately, human-level audio synthesis leads to identity misuse and information security issues. Currently, many antispoofing models have been developed against deepfake audio. However, the efficacy of current state-of-the-art anti-s… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted by IEEE SLT 2024

  6. arXiv:2309.10787  [pdf, other

    eess.AS cs.CV cs.MM cs.SD

    AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

    Authors: Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee

    Abstract: Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information. However, current models often focus on a limited set of tasks, and generalization abilities of learned representations are unclear. To this end, we propose the AV-SUPERB benchmark that enables general-purpose evaluation of unimodal audio/visual a… ▽ More

    Submitted 19 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024; Evaluation Code: https://github.com/roger-tseng/av-superb Submission Platform: https://av.superbbenchmark.org

  7. Hyperspace Neighbor Penetration Approach to Dynamic Programming for Model-Based Reinforcement Learning Problems with Slowly Changing Variables in A Continuous State Space

    Authors: Vincent Zha, Ivey Chiu, Alexandre Guilbault, Jaime Tatis

    Abstract: Slowly changing variables in a continuous state space constitute an important category of reinforcement learning and see its application in many domains, such as modeling a climate control system where temperature, humidity, etc. change slowly over time. However, this subject is less addressed in recent studies. Classical methods with certain variants, such as Dynamic Programming with Tile Coding… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: 8 pages, 7 figures

  8. arXiv:1203.5279  [pdf

    cs.CY cs.HC

    Social Media and the Social Good: How Nonprofits Use Facebook to Communicate with the Public

    Authors: Gregory D. Saxton, Chao Guo, I-Hsuan Chiu, Bo Feng

    Abstract: In this study, we examine the social networking practices of the 100 largest nonprofit organizations in the United States. More specifically, we develop a comprehensive classification scheme to delineate these organizations' use of Facebook as a stakeholder engagement tool. We find that there are 5 primary categories of Facebook "statuses", which can be aggregated into three key dimensions - "in… ▽ More

    Submitted 23 March, 2012; originally announced March 2012.

    Comments: Chinese-language article

    Journal ref: China Third Sector Research, Vol. 1, pp. 40-54, 2011