Skip to main content

Showing 1–30 of 30 results for author: Xie, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2509.02020  [pdf, ps, other

    cs.SD eess.AS

    FireRedTTS-2: Towards Long Conversational Speech Generation for Podcast and Chatbot

    Authors: Kun Xie, Feiyu Shen, Junjie Li, Fenglong Xie, Xu Tang, Yao Hu

    Abstract: Current dialogue generation approaches typically require the complete dialogue text before synthesis and produce a single, inseparable speech containing all voices, making them unsuitable for interactive chat; moreover, they suffer from unstable synthesis, inaccurate speaker transitions, and incoherent prosody. In this work, we present FireRedTTS-2, a long-form streaming TTS system for multi-speak… ▽ More

    Submitted 3 September, 2025; v1 submitted 2 September, 2025; originally announced September 2025.

  2. arXiv:2508.10934  [pdf, ps, other

    cs.CV cs.GR cs.RO eess.IV

    ViPE: Video Pose Engine for 3D Geometric Perception

    Authors: Jiahui Huang, Qunjie Zhou, Hesam Rabeti, Aleksandr Korovko, Huan Ling, Xuanchi Ren, Tianchang Shen, Jun Gao, Dmitry Slepichev, Chen-Hsuan Lin, Jiawei Ren, Kevin Xie, Joydeep Biswas, Laura Leal-Taixe, Sanja Fidler

    Abstract: Accurate 3D geometric perception is an important prerequisite for a wide range of spatial AI systems. While state-of-the-art methods depend on large-scale training data, acquiring consistent and precise 3D annotations from in-the-wild videos remains a key challenge. In this work, we introduce ViPE, a handy and versatile video processing engine designed to bridge this gap. ViPE efficiently estimate… ▽ More

    Submitted 12 August, 2025; originally announced August 2025.

    Comments: Paper website: https://research.nvidia.com/labs/toronto-ai/vipe/

  3. arXiv:2506.21754  [pdf, ps, other

    eess.SY

    Online design of experiments by active learning for nonlinear system identification

    Authors: Kui Xie, Alberto Bemporad

    Abstract: We investigate the use of active-learning (AL) strategies to generate the input excitation signal at runtime for system identification of linear and nonlinear autoregressive and state-space models. We adapt various existing AL approaches for static model regression to the dynamic context, coupling them with a Kalman filter to update the model parameters recursively, and also cope with the presence… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  4. arXiv:2506.20493  [pdf

    eess.SY cs.GT

    Analyzing the Impact of Strategic Bidding on the Reserve Capacity via a Bi-Level Model

    Authors: Yun Xu, Yunxiao Bai, Yunyong Zhang, Peng Wang, Xuelin Wang, Jiqun Guo, Kaijun Xie, Rusheng Zhao

    Abstract: The growing integration of renewable energy sources necessitates adequate reserve capacity to maintain power balance. However, in market clearing, power companies with flexible resources may submit strategic bids to maximize profits, potentially compromising system reserves. This paper examines the effects of such strategic behavior by modeling the market as a bi-level problem. The upper level rep… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  5. arXiv:2506.12106  [pdf, other

    eess.IV cs.CV

    Enhancing Privacy: The Utility of Stand-Alone Synthetic CT and MRI for Tumor and Bone Segmentation

    Authors: André Ferreira, Kunpeng Xie, Caroline Wilpert, Gustavo Correia, Felix Barajas Ordonez, Tiago Gil Oliveira, Maike Bode, Robert Siepmann, Frank Hölzle, Rainer Röhrig, Jens Kleesiek, Daniel Truhn, Jan Egger, Victor Alves, Behrus Puladi

    Abstract: AI requires extensive datasets, while medical data is subject to high data protection. Anonymization is essential, but poses a challenge for some regions, such as the head, as identifying structures overlap with regions of clinical interest. Synthetic data offers a potential solution, but studies often lack rigorous evaluation of realism and utility. Therefore, we investigate to what extent synthe… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  6. arXiv:2505.21248  [pdf, ps, other

    eess.SY

    Active Learning-Enhanced Dual Control for Angle-Only Initial Relative Orbit Determination

    Authors: Kui Xie, Giovanni Romagnoli, Giordana Bucchioni, Alberto Bemporad

    Abstract: Accurate relative orbit determination is a key challenge in modern space operations, particularly when relying on angle-only measurements. The inherent observability limitations of this approach make initial state estimation difficult, impacting mission safety and performance. This work explores the use of active learning (AL) techniques to enhance observability by dynamically designing the input… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  7. arXiv:2503.20499  [pdf, other

    cs.SD eess.AS

    FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System

    Authors: Hao-Han Guo, Yao Hu, Fei-Yu Shen, Xu Tang, Yi-Chen Wu, Feng-Long Xie, Kun Xie

    Abstract: In this work, we upgrade FireRedTTS to a new version, FireRedTTS-1S, a high-quality streaming foundation text-to-speech system. FireRedTTS-1S achieves streaming speech generation via two steps: text-to-semantic decoding and semantic-to-acoustic decoding. In text-to-semantic decoding, a semantic-aware speech tokenizer converts the speech signal into semantic tokens, which can be synthesized from th… ▽ More

    Submitted 26 May, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  8. arXiv:2503.06226  [pdf, ps, other

    eess.SY cs.AI cs.MA math.OC

    Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation

    Authors: Kedi Xie, Martin Guay, Shimin Wang, Fang Deng, Maobin Lu

    Abstract: This paper studies the linear quadratic regulation (LQR) problem of unknown discrete-time systems via dynamic output feedback learning control. In contrast to the state feedback, the optimality of the dynamic output feedback control for solving the LQR problem requires an implicit condition on the convergence of the state observer. Moreover, due to unknown system matrices and the existence of obse… ▽ More

    Submitted 27 May, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: 16 pages, 5 figures

  9. arXiv:2412.16085  [pdf, other

    eess.IV cs.CV

    Efficient MedSAMs: Segment Anything in Medical Images on Laptop

    Authors: Jun Ma, Feifei Li, Sumin Kim, Reza Asakereh, Bao-Hiep Le, Dang-Khoa Nguyen-Vu, Alexander Pfefferle, Muxin Wei, Ruochen Gao, Donghang Lyu, Songxiao Yang, Lennart Purucker, Zdravko Marinov, Marius Staring, Haisheng Lu, Thuy Thanh Dao, Xincheng Ye, Zhi Li, Gianluca Brugnara, Philipp Vollmuth, Martha Foltyn-Dumitru, Jaeyoung Cho, Mustafa Ahmed Mahmutoglu, Martin Bendszus, Irada Pflüger , et al. (57 additional authors not shown)

    Abstract: Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spa… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: CVPR 2024 MedSAM on Laptop Competition Summary: https://www.codabench.org/competitions/1847/

  10. arXiv:2411.02349  [pdf

    eess.IV

    Drone Data Analytics for Measuring Traffic Metrics at Intersections in High-Density Areas

    Authors: Qingwen Pu, Yuan Zhu, Junqing Wang, Hong Yang, Kun Xie, Shunlai Cui

    Abstract: This study employed over 100 hours of high-altitude drone video data from eight intersections in Hohhot to generate a unique and extensive dataset encompassing high-density urban road intersections in China. This research has enhanced the YOLOUAV model to enable precise target recognition on unmanned aerial vehicle (UAV) datasets. An automated calibration algorithm is presented to create a functio… ▽ More

    Submitted 8 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: 30 pages,14 figures

    MSC Class: 68-11 ACM Class: I.4.1

  11. arXiv:2409.03283  [pdf, other

    cs.SD eess.AS

    FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications

    Authors: Hao-Han Guo, Yao Hu, Kun Liu, Fei-Yu Shen, Xu Tang, Yi-Chen Wu, Feng-Long Xie, Kun Xie, Kai-Tuo Xu

    Abstract: This work proposes FireRedTTS, a foundation text-to-speech framework, to meet the growing demands for personalized and diverse generative speech applications. The framework comprises three parts: data processing, foundation system, and downstream applications. First, we comprehensively present our data processing pipeline, which transforms massive raw audio into a large-scale high-quality TTS data… ▽ More

    Submitted 11 April, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

  12. arXiv:2409.00933  [pdf, other

    cs.SD eess.AS

    SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis

    Authors: Haohan Guo, Fenglong Xie, Kun Xie, Dongchao Yang, Dake Guo, Xixin Wu, Helen Meng

    Abstract: The long speech sequence has been troubling language models (LM) based TTS approaches in terms of modeling complexity and efficiency. This work proposes SoCodec, a semantic-ordered multi-stream speech codec, to address this issue. It compresses speech into a shorter, multi-stream discrete semantic sequence with multiple tokens at each frame. Meanwhile, the ordered product quantization is proposed… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by SLT 2024

  13. arXiv:2407.05758  [pdf, other

    eess.IV cs.AI cs.CV

    Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

    Authors: Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang

    Abstract: Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  14. arXiv:2405.09586  [pdf, other

    eess.IV cs.AI cs.CV

    Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation

    Authors: Kang Liu, Zhuoqi Ma, Mengmeng Liu, Zhicheng Jiao, Xiaolu Kang, Qiguang Miao, Kun Xie

    Abstract: A radiology report comprises presentation-style vocabulary, which ensures clarity and organization, and factual vocabulary, which provides accurate and objective descriptions based on observable findings. While manually writing these reports is time-consuming and labor-intensive, automatic report generation offers a promising alternative. A critical step in this process is to align radiographs wit… ▽ More

    Submitted 11 September, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: code is available at https://github.com/mk-runner/FSE

  15. arXiv:2312.02773  [pdf, other

    cs.SD eess.AS

    Integrating Plug-and-Play Data Priors with Weighted Prediction Error for Speech Dereverberation

    Authors: Ziye Yang, Wenxing Yang, Kai Xie, Jie Chen

    Abstract: Speech dereverberation aims to alleviate the detrimental effects of late-reverberant components. While the weighted prediction error (WPE) method has shown superior performance in dereverberation, there is still room for further improvement in terms of performance and robustness in complex and noisy environments. Recent research has highlighted the effectiveness of integrating physics-based and da… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  16. arXiv:2309.17329  [pdf, other

    cs.CV cs.AI cs.GR cs.LG eess.IV

    Efficient Anatomical Labeling of Pulmonary Tree Structures via Deep Point-Graph Representation-based Implicit Fields

    Authors: Kangxian Xie, Jiancheng Yang, Donglai Wei, Ziqiao Weng, Pascal Fua

    Abstract: Pulmonary diseases rank prominently among the principal causes of death worldwide. Curing them will require, among other things, a better understanding of the complex 3D tree-shaped structures within the pulmonary system, such as airways, arteries, and veins. Traditional approaches using high-resolution image stacks and standard CNNs on dense voxel grids face challenges in computational efficiency… ▽ More

    Submitted 17 October, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted by Medical Image Analysis

    MSC Class: 68T45; 62P10; 68U10; 68U05; 05C90

  17. arXiv:2308.09489  [pdf, ps, other

    cs.IT eess.SP

    STAR-RIS Aided MISO SWIPT-NOMA System with Energy Buffer: Performance Analysis and Optimization

    Authors: Kengyuan Xie, Guofa Cai, Jiguang He, Georges Kaddoum

    Abstract: In this paper, we propose a simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) and energy buffer aided multiple-input single-output (MISO) simultaneous wireless information and power transfer (SWIPT) non-orthogonal multiple access (NOMA) system, which consists of a STAR-RIS, an access point (AP), and reflection users and transmission users with energy buffers. I… ▽ More

    Submitted 16 July, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

  18. arXiv:2307.04390  [pdf

    eess.IV cs.CV cs.LG

    CT-based Subchondral Bone Microstructural Analysis in Knee Osteoarthritis via MR-Guided Distillation Learning

    Authors: Yuqi Hu, Xiangyu Zhao, Gaowei Qing, Kai Xie, Chenglei Liu, Lichi Zhang

    Abstract: Background: MR-based subchondral bone effectively predicts knee osteoarthritis. However, its clinical application is limited by the cost and time of MR. Purpose: We aim to develop a novel distillation-learning-based method named SRRD for subchondral bone microstructural analysis using easily-acquired CT images, which leverages paired MR images to enhance the CT-based analysis model during training… ▽ More

    Submitted 11 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: 5 figures, 4 tables

  19. arXiv:2306.16473  [pdf

    eess.SY

    Coordinating O&M and Logistical Resources to Enhance Post-Disaster Resilience of Interdependent Power and Natural Gas Distribution Systems

    Authors: Wei Wang, Kaigui Xie, Hongbin Wang, Tao Chen, Hongzhou Chen, Yufei He

    Abstract: Electric power and natural gas systems are becoming increasingly interdependent, driven by the growth of natural gas-fired generation and the electrification of the gas industry. Recent energy crises have underscored the urgent need for enhanced resilience in these interdependent systems. In response to this challenge, this paper focuses on the interdependent electric power and natural gas distrib… ▽ More

    Submitted 20 December, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: 21 pages, 9 figures

  20. arXiv:2306.01232  [pdf, other

    eess.IV cs.CV

    Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

    Authors: Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray is often utilized for diagnosing common thoracic diseases. In recent years, many approaches have been proposed to handle the problem of automatic diagnosis based on chest X-rays. However, the scarcity of labeled data for related diseases still poses a huge challenge to an accurate diagnosis. In this paper, we focus on the thorax disease diagnostic problem and propose a novel deep r… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  21. arXiv:2305.12072  [pdf, other

    eess.IV cs.CV

    Chest X-ray Image Classification: A Causal Perspective

    Authors: Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray (CXR) is one of the most common and easy-to-get medical tests used to diagnose common diseases of the chest. Recently, many deep learning-based methods have been proposed that are capable of effectively classifying CXRs. Even though these techniques have worked quite well, it is difficult to establish whether what these algorithms actually learn is the cause-and-effect link between… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  22. arXiv:2305.12070  [pdf, other

    eess.IV cs.CV

    Instrumental Variable Learning for Chest X-ray Classification

    Authors: Weizhi Nie, Chen Zhang, Dan song, Yunpeng Bai, Keliang Xie, Anan Liu

    Abstract: The chest X-ray (CXR) is commonly employed to diagnose thoracic illnesses, but the challenge of achieving accurate automatic diagnosis through this method persists due to the complex relationship between pathology. In recent years, various deep learning-based approaches have been suggested to tackle this problem but confounding factors such as image resolution or noise problems often damage model… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  23. arXiv:2205.14315  [pdf, other

    cs.LG eess.SP

    Efficient Federated Learning with Spike Neural Networks for Traffic Sign Recognition

    Authors: Kan Xie, Zhe Zhang, Bo Li, Jiawen Kang, Dusit Niyato, Shengli Xie, Yi Wu

    Abstract: With the gradual popularization of self-driving, it is becoming increasingly important for vehicles to smartly make the right driving decisions and autonomously obey traffic rules by correctly recognizing traffic signs. However, for machine learning-based traffic sign recognition on the Internet of Vehicles (IoV), a large amount of traffic sign data from distributed vehicles is needed to be gather… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: Submitted by IEEE Transactions on Vehicular Technology

  24. Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks

    Authors: Cheng Gong, Ye Lu, Kunpeng Xie, Zongming Jin, Tao Li, Yanzhi Wang

    Abstract: Quantization has been proven to be a vital method for improving the inference efficiency of deep neural networks (DNNs). However, it is still challenging to strike a good balance between accuracy and efficiency while quantizing DNN weights or activation values from high-precision formats to their quantized counterparts. We propose a new method called elastic significant bit quantization (ESB) that… ▽ More

    Submitted 17 November, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: 15 pages, 14 figures

    ACM Class: B.2.4.a; I.2.6.g; I.5.1.d; I.5.4.b

    Journal ref: IEEE Transactions on Parallel and Distributed Systems, 2021

  25. arXiv:2105.03877  [pdf

    eess.SY

    Non-iterative Optimization Algorithm for Active Distribution Grids Considering Uncertainty of Feeder Parameters

    Authors: J. Wu, M. Liu, W. Lu, K. Xie, M. Xie

    Abstract: To cope with fast-fluctuating distributed energy resources (DERs) and uncontrolled loads, this paper formulates a time-varying optimization problem for distribution grids with DERs and develops a novel non-iterative algorithm to track the optimal solutions. Different from existing methods, the proposed approach does not require iterations during the sampling interval. It only needs to perform a si… ▽ More

    Submitted 9 May, 2021; originally announced May 2021.

    Comments: 9 pages, 10 figures. This work has been submitted to the IEEE for possible publication

  26. arXiv:2012.07261  [pdf

    eess.IV cs.CV

    OCTA-500: A Retinal Dataset for Optical Coherence Tomography Angiography Study

    Authors: Mingchao Li, Kun Huang, Qiuzhuo Xu, Jiadong Yang, Yuhan Zhang, Zexuan Ji, Keren Xie, Songtao Yuan, Qinghuai Liu, Qiang Chen

    Abstract: Optical coherence tomography angiography (OCTA) is a novel imaging modality that has been widely utilized in ophthalmology and neuroscience studies to observe retinal vessels and microvascular systems. However, publicly available OCTA datasets remain scarce. In this paper, we introduce the largest and most comprehensive OCTA dataset dubbed OCTA-500, which contains OCTA imaging under two fields of… ▽ More

    Submitted 25 December, 2022; v1 submitted 14 December, 2020; originally announced December 2020.

  27. arXiv:2011.00776  [pdf

    eess.SY

    Incorporating Gas Pipeline Leakage Failure Modes in Risk Evaluation of Electricity-Gas Integrated Energy Systems

    Authors: Yi Tang, Yuan Zhao, Wenyuan Li, Kaigui Xie, Juan Yu

    Abstract: In the existing literatures for the risk evaluation of electricity-gas integrated energy system (EGIES), the impacts of gas leakage in pipelines are ignored. This paper presents a method to incorporate the failure modes of gas pipeline leakage in EGIES risk evaluation. A Markov state transition model of gas pipeline with multi-state and multi-mode transition process, and a bi-level Monte Carlo sam… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: 9 pages, 7 figures

  28. arXiv:1911.09401  [pdf, other

    eess.IV cs.CV

    Segmenting Medical MRI via Recurrent Decoding Cell

    Authors: Ying Wen, Kai Xie, Lianghua He

    Abstract: The encoder-decoder networks are commonly used in medical image segmentation due to their remarkable performance in hierarchical feature fusion. However, the expanding path for feature decoding and spatial recovery does not consider the long-term dependency when fusing feature maps from different layers, and the universal encoder-decoder network does not make full use of the multi-modality informa… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

    Comments: 8 pages, 7 figures, AAAI-20

  29. arXiv:1806.09250  [pdf

    physics.ins-det eess.SP

    Electronics of Time-of-flight Measurement for Back-n at CSNS

    Authors: T. Yu, P. Cao, X. Y. Ji, L. K. Xie, X. R. Huang, Q. An, H. Y. Bai, J. Bao, Y. H. Chen, P. J. Cheng, Z. Q. Cui, R. R. Fan, C. Q. Feng, M. H. Gu, Z. J. Han, G. Z. He, Y. C. He, Y. F. He, H. X. Huang, W. L. Huang, X. L. Ji, H. Y. Jiang, W. Jiang, H. Y. Jing, L. Kang , et al. (46 additional authors not shown)

    Abstract: Back-n is a white neutron experimental facility at China Spallation Neutron Source (CSNS). The time structure of the primary proton beam make it fully applicable to use TOF (time-of-flight) method for neutron energy measuring. We implement the electronics of TOF measurement on the general-purpose readout electronics designed for all of the seven detectors in Back-n. The electronics is based on PXI… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: 4 pages, 13 figures, 21st IEEE Real Time Conference

  30. arXiv:1806.09249  [pdf

    physics.ins-det eess.SP

    T0 Fan-out for Back-n White Neutron Facility at CSNS

    Authors: X. Y. Ji, P. Cao, T. Yu, L. K. Xie, X. R. Huang, Q. An, H. Y. Bai, J. Bao, Y. H. Chen, P. J. Cheng, Z. Q. Cui, R. R. Fan, C. Q. Feng, M. H. Gu, Z. J. Han, G. Z. He, Y. C. He, Y. F. He, H. X. Huang, W. L. Huang, X. L. Ji, H. Y. Jiang, W. Jiang, H. Y. Jing, L. Kang , et al. (46 additional authors not shown)

    Abstract: the main physics goal for Back-n white neutron facility at China Spallation Neutron Source (CSNS) is to measure nuclear data. The energy of neutrons is one of the most important parameters for measuring nuclear data. Method of time of flight (TOF) is used to obtain the energy of neutrons. The time when proton bunches hit the thick tungsten target is considered as the start point of TOF. T0 signal,… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: 3 pages, 6 figures, the 21st IEEE Real Time Conference