Skip to main content

Showing 1–14 of 14 results for author: Xiao, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.20424  [pdf, ps, other

    cs.RO cs.AI eess.SY

    Robot Operation of Home Appliances by Reading User Manuals

    Authors: Jian Zhang, Hanbo Zhang, Anxing Xiao, David Hsu

    Abstract: Operating home appliances, among the most common tools in every household, is a critical capability for assistive home robots. This paper presents ApBot, a robot system that operates novel household appliances by "reading" their user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial policies from their unstructured, textual descriptions in a user manual document, (ii) gr… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2310.09078  [pdf, other

    cs.NI eess.SP

    DNFS-VNE: Deep Neuro Fuzzy System Driven Virtual Network Embedding

    Authors: Ailing Xiao, Ning Chen, Sheng Wu, Peiying Zhang, Linling Kuang, Chunxiao Jiang

    Abstract: By decoupling substrate resources, network virtualization (NV) is a promising solution for meeting diverse demands and ensuring differentiated quality of service (QoS). In particular, virtual network embedding (VNE) is a critical enabling technology that enhances the flexibility and scalability of network deployment by addressing the coupling of Internet processes and services. However, in the exi… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  3. arXiv:2309.02780  [pdf, other

    cs.CL cs.SD eess.AS

    GRASS: Unified Generation Model for Speech-to-Semantic Tasks

    Authors: Aobo Xia, Shuyu Lei, Yushu Yang, Xiang Guo, Hua Chai

    Abstract: This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our pro… ▽ More

    Submitted 11 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  4. arXiv:2203.03927  [pdf, other

    cs.RO eess.SY

    Quadruped Guidance Robot for the Visually Impaired: A Comfort-Based Approach

    Authors: Yanbo Chen, Zhengzhe Xu, Zhuozhu Jian, Gengpan Tang, Yunong Yangli, Anxing Xiao, Xueqian Wang, Bin Liang

    Abstract: Guidance robots that can guide people and avoid various obstacles, could potentially be owned by more visually impaired people at a fairly low cost. Most of the previous guidance robots for the visually impaired ignored the human response behavior and comfort, treating the human as an appendage dragged by the robot, which can lead to imprecise guidance of the human and sudden changes in the tracti… ▽ More

    Submitted 23 June, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: IEEE International Conference on Robotics and Automation (ICRA) 2023

  5. arXiv:2111.09983  [pdf, other

    eess.AS cs.SD

    Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

    Authors: Chunxi Liu, Michael Picheny, Leda Sarı, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

    Abstract: It is well known that many machine learning systems demonstrate bias towards specific groups of individuals. This problem has been studied extensively in the Facial Recognition area, but much less so in Automatic Speech Recognition (ASR). This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate the… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: Submitted to ICASSP 2022. Our dataset will be publicly available at (https://ai.facebook.com/datasets/casual-conversations-downloads) for general use. We also would like to note that considering the limitations of our dataset, we limit the use of it for only evaluation purposes (see license agreement)

  6. arXiv:2111.05948  [pdf, other

    cs.CL cs.SD eess.AS

    Scaling ASR Improves Zero and Few Shot Learning

    Authors: Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

    Abstract: With 4.5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition. We propose data selection techniques to efficiently scale training data to find the most valuable samples in massive datasets. To efficiently scale model sizes, we leverage various optimizations such a… ▽ More

    Submitted 29 November, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

  7. arXiv:2110.06648  [pdf, other

    cs.RO eess.SY

    Robotic Autonomous Trolley Collection with Progressive Perception and Nonlinear Model Predictive Control

    Authors: Anxing Xiao, Hao Luan, Ziqi Zhao, Yue Hong, Jieting Zhao, Weinan Chen, Jiankun Wang, Max Q. -H. Meng

    Abstract: Autonomous mobile manipulation robots that can collect trolleys are widely used to liberate human resources and fight epidemics. Most prior robotic trolley collection solutions only detect trolleys with 2D poses or are merely based on specific marks and lack the formal design of planning algorithms. In this paper, we present a novel mobile manipulation system with applications in luggage trolley c… ▽ More

    Submitted 1 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Accepted to the 2022 International Conference on Robotics and Automation (ICRA 2022)

  8. arXiv:2110.05241  [pdf, other

    eess.AS cs.CL cs.LG

    Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution

    Authors: Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, Mike Seltzer

    Abstract: This paper improves the streaming transformer transducer for speech recognition by using non-causal convolution. Many works apply the causal convolution to improve streaming transformer ignoring the lookahead context. We propose to use non-causal convolution to process the center block and lookahead context separately. This method leverages the lookahead context in convolution and maintains simila… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: 5 pages, 3 figures, submit to ICASSP 2022

  9. arXiv:2110.03174  [pdf, other

    cs.SD cs.AI eess.AS

    Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study

    Authors: Dawei Liang, Yangyang Shi, Yun Wang, Nayan Singhal, Alex Xiao, Jonathan Shaw, Edison Thomaz, Ozlem Kalinli, Mike Seltzer

    Abstract: Detection of common events and scenes from audio is useful for extracting and understanding human contexts in daily life. Prior studies have shown that leveraging knowledge from a relevant domain is beneficial for a target acoustic event detection (AED) process. Inspired by the observation that many human-centered acoustic events in daily life involve voice elements, this paper investigates the po… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  10. arXiv:2107.00773  [pdf, other

    cs.RO cs.AI eess.SY

    Autonomous Navigation for Quadrupedal Robots with Optimized Jumping through Constrained Obstacles

    Authors: Scott Gilroy, Derek Lau, Lizhi Yang, Ed Izaguirre, Kristen Biermayer, Anxing Xiao, Mengti Sun, Ayush Agrawal, Jun Zeng, Zhongyu Li, Koushil Sreenath

    Abstract: Quadrupeds are strong candidates for navigating challenging environments because of their agile and dynamic designs. This paper presents a methodology that extends the range of exploration for quadrupedal robots by creating an end-to-end navigation framework that exploits walking and jumping modes. To obtain a dynamic jumping maneuver while avoiding obstacles, dynamically-feasible trajectories are… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: Accepted to 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE 2021)

  11. arXiv:2104.02232  [pdf, other

    cs.SD cs.CL eess.AS

    Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios

    Authors: Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

    Abstract: Often, the storage and computational constraints of embeddeddevices demand that a single on-device ASR model serve multiple use-cases / domains. In this paper, we propose aFlexibleTransducer(FlexiT) for on-device automatic speech recognition to flexibly deal with multiple use-cases / domains with different accuracy and latency requirements. Specifically, using a single compact model, FlexiT provid… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: Submitted to Interspeech 2021 (under review)

  12. arXiv:2103.14300  [pdf, other

    cs.RO eess.SY

    Robotic Guide Dog: Leading a Human with Leash-Guided Hybrid Physical Interaction

    Authors: Anxing Xiao, Wenzhe Tong, Lizhi Yang, Jun Zeng, Zhongyu Li, Koushil Sreenath

    Abstract: An autonomous robot that is able to physically guide humans through narrow and cluttered spaces could be a big boon to the visually-impaired. Most prior robotic guiding systems are based on wheeled platforms with large bases with actuated rigid guiding canes. The large bases and the actuated arms limit these prior approaches from operating in narrow and cluttered environments. We propose a method… ▽ More

    Submitted 28 June, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: Accepted to 2021 International Conference on Robotics and Automation (ICRA 2021)

  13. arXiv:2005.07850  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Large scale weakly and semi-supervised learning for low-resource video ASR

    Authors: Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

    Abstract: Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems. On the challenging task of transcribing social media videos in low-resource conditions, we conduct a large scale systematic comparison between two self-labeling methods on one hand, and weakly-supervised pretraining using contextual metadata on th… ▽ More

    Submitted 6 August, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

  14. Transformer-based Acoustic Modeling for Hybrid Speech Recognition

    Authors: Yongqiang Wang, Abdelrahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer

    Abstract: We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. Several modeling choices are discussed in this work, including various positional embedding methods and an iterated loss to enable training deep transformers. We also present a preliminary study of using limited right context in transformer models, which makes it possible for streaming applications. We d… ▽ More

    Submitted 29 April, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: to appear in ICASSP 2020