Skip to main content

Showing 1–50 of 62 results for author: Chi, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.23585  [pdf, ps, other

    cs.LG cs.CL

    On-Policy RL with Optimal Reward Baseline

    Authors: Yaru Hao, Li Dong, Xun Wu, Shaohan Huang, Zewen Chi, Furu Wei

    Abstract: Reinforcement learning algorithms are fundamental to align large language models with human preferences and to enhance their reasoning capabilities. However, current reinforcement learning algorithms often suffer from training instability due to loose on-policy constraints and computational inefficiency due to auxiliary models. In this work, we propose On-Policy RL with Optimal reward baseline (OP… ▽ More

    Submitted 3 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  2. arXiv:2505.14674  [pdf, ps, other

    cs.CL

    Reward Reasoning Model

    Authors: Jiaxin Guo, Zewen Chi, Li Dong, Qingxiu Dong, Xun Wu, Shaohan Huang, Furu Wei

    Abstract: Reward models play a critical role in guiding large language models toward outputs that align with human expectations. However, an open challenge remains in effectively utilizing test-time compute to enhance reward model performance. In this work, we introduce Reward Reasoning Models (RRMs), which are specifically designed to execute a deliberate reasoning process before generating final rewards.… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  3. arXiv:2505.14631  [pdf, ps, other

    cs.CL

    Think Only When You Need with Large Hybrid-Reasoning Models

    Authors: Lingjie Jiang, Xun Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi, Li Dong, Xingxing Zhang, Tengchao Lv, Lei Cui, Furu Wei

    Abstract: Recent Large Reasoning Models (LRMs) have shown substantially improved reasoning capabilities over traditional Large Language Models (LLMs) by incorporating extended thinking processes prior to producing final responses. However, excessively lengthy thinking introduces substantial overhead in terms of token consumption and latency, which is particularly unnecessary for simple queries. In this work… ▽ More

    Submitted 21 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  4. arXiv:2503.03792  [pdf, other

    cs.LG cs.AI

    Rebalanced Multimodal Learning with Data-aware Unimodal Sampling

    Authors: Qingyuan Jiang, Zhouyang Chi, Xiao Ma, Qirong Mao, Yang Yang, Jinhui Tang

    Abstract: To address the modality learning degeneration caused by modality imbalance, existing multimodal learning~(MML) approaches primarily attempt to balance the optimization process of each modality from the perspective of model learning. However, almost all existing methods ignore the modality imbalance caused by unimodal data sampling, i.e., equal unimodal data sampling often results in discrepancies… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  5. arXiv:2502.00800  [pdf, other

    cs.CV eess.IV

    Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data

    Authors: Mengping Yang, Zhe Wang, Ziqiu Chi, Dongdong Li, Wenli Du

    Abstract: Generative adversarial networks (GANs) have made remarkable achievements in synthesizing images in recent years. Typically, training GANs requires massive data, and the performance of GANs deteriorates significantly when training data is limited. To improve the synthesis performance of GANs in low-data regimes, existing approaches use various data augmentation techniques to enlarge the training se… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

    Comments: This work was completed in 2022 and submitted to an IEEE journal for potential publication

  6. arXiv:2412.05684  [pdf, other

    cs.CG

    Recursive Computation of Path Homology for Stratified Digraphs

    Authors: Zhengtong Zhu, Zhiyi Chi

    Abstract: Stratified digraphs are popular models for feedforward neural networks. However, computation of their path homologies has been limited to low dimensions due to high computational complexity. A recursive algorithm is proposed to compute certain high-dimensional (reduced) path homologies of stratified digraphs. By recursion on matrix representations of homologies of subgraphs, the algorithm efficien… ▽ More

    Submitted 11 December, 2024; v1 submitted 7 December, 2024; originally announced December 2024.

  7. arXiv:2410.16287  [pdf, other

    cs.CV

    Solution for OOD-CV UNICORN Challenge 2024 Object Detection Assistance LLM Counting Ability Improvement

    Authors: Zhouyang Chi, Qingyuan Jiang, Yang Yang

    Abstract: This report provide a detailed description of the method that we explored and proposed in the ECCV OOD-CV UNICORN Challenge 2024, which focusing on the robustness of responses from large language models. The dataset of this competition are OODCA-VQA and SketchyQA. In order to test the robustness of the model. The organizer extended two variants of the dataset OODCV-Counterfactual and Sketchy-Chall… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

  8. arXiv:2407.12128  [pdf, other

    cs.LG cs.CV

    Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams

    Authors: Ziqiang Wang, Zhixiang Chi, Yanan Wu, Li Gu, Zhi Liu, Konstantinos Plataniotis, Yang Wang

    Abstract: Given a model trained on source data, Test-Time Adaptation (TTA) enables adaptation and inference in test data streams with domain shifts from the source. Current methods predominantly optimize the model for each incoming test data batch using self-training loss. While these methods yield commendable results in ideal test data streams, where batches are independently and identically sampled from t… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  9. arXiv:2407.04587  [pdf, other

    cs.LG cs.CV

    Multimodal Classification via Modal-Aware Interactive Enhancement

    Authors: Qing-Yuan Jiang, Zhouyang Chi, Yang Yang

    Abstract: Due to the notorious modality imbalance problem, multimodal learning (MML) leads to the phenomenon of optimization imbalance, thus struggling to achieve satisfactory performance. Recently, some representative methods have been proposed to boost the performance, mainly focusing on adaptive adjusting the optimization of each modality to rebalance the learning speed of dominant and non-dominant modal… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  10. arXiv:2407.04255  [pdf, other

    cs.CV

    Second Place Solution of WSDM2023 Toloka Visual Question Answering Challenge

    Authors: Xiangyu Wu, Zhouyang Chi, Yang Yang, Jianfeng Lu

    Abstract: In this paper, we present our solution for the WSDM2023 Toloka Visual Question Answering Challenge. Inspired by the application of multimodal pre-trained models to various downstream tasks(e.g., visual question answering, visual grounding, and cross-modal retrieval), we approached this competition as a visual grounding task, where the input is an image and a question, guiding the model to answer t… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Second Place of WSDM2023 Toloka Visual Question Answering Challenge

  11. Could Chemical LLMs benefit from Message Passing

    Authors: Jiaqing Xie, Ziheng Chi

    Abstract: Pretrained language models (LMs) showcase significant capabilities in processing molecular text, while concurrently, message passing neural networks (MPNNs) demonstrate resilience and versatility in the domain of molecular science. Despite these advancements, we find there are limited studies investigating the bidirectional interactions between molecular structures and their corresponding textual… ▽ More

    Submitted 26 August, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted at ACL @ Languages and Molecules 2024. In Proceedings of ACL 2024

  12. arXiv:2405.04065  [pdf, ps, other

    cs.CL

    FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

    Authors: Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu

    Abstract: Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training corpus. Previous work utilizing retrieved content by simply prepending it to the input poses a high runtime issue, which degrades the inference efficiency of the L… ▽ More

    Submitted 13 June, 2025; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: ACL 2025 Findings, 14 pages

  13. arXiv:2405.02797  [pdf, other

    cs.CV cs.LG

    Adapting to Distribution Shift by Visual Domain Prompt Generation

    Authors: Zhixiang Chi, Li Gu, Tao Zhong, Huan Liu, Yuanhao Yu, Konstantinos N Plataniotis, Yang Wang

    Abstract: In this paper, we aim to adapt a model at test-time using a few unlabeled data to address distribution shifts. To tackle the challenges of extracting domain knowledge from a limited amount of data, it is crucial to utilize correlated information from pre-trained backbones and source domains. Previous studies fail to utilize recent foundation models with strong out-of-distribution generalization. A… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: ICLR2024, code: https://github.com/Guliisgreat/VDPG

  14. arXiv:2404.01642  [pdf, other

    cs.LG cs.CR

    Patch Synthesis for Property Repair of Deep Neural Networks

    Authors: Zhiming Chi, Jianan Ma, Pengfei Yang, Cheng-Chao Huang, Renjue Li, Xiaowei Huang, Lijun Zhang

    Abstract: Deep neural networks (DNNs) are prone to various dependability issues, such as adversarial attacks, which hinder their adoption in safety-critical domains. Recently, NN repair techniques have been proposed to address these issues while preserving original performance by locating and modifying guilty neurons and their parameters. However, existing repair approaches are often limited to specific dat… ▽ More

    Submitted 31 January, 2025; v1 submitted 2 April, 2024; originally announced April 2024.

  15. arXiv:2403.17683  [pdf, other

    cs.AI

    Solution for Emotion Prediction Competition of Workshop on Emotionally and Culturally Intelligent AI

    Authors: Shengdong Xu, Zhouyang Chi, Yang Yang

    Abstract: This report provide a detailed description of the method that we explored and proposed in the WECIA Emotion Prediction Competition (EPC), which predicts a person's emotion through an artistic work with a comment. The dataset of this competition is ArtELingo, designed to encourage work on diversity across languages and cultures. The dataset has two main challenges, namely modal imbalance problem an… ▽ More

    Submitted 31 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  16. arXiv:2403.07920  [pdf, other

    q-bio.BM cs.AI cs.CL cs.LG

    ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training

    Authors: Le Zhuo, Zewen Chi, Minghao Xu, Heyan Huang, Heqi Zheng, Conghui He, Xian-Ling Mao, Wentao Zhang

    Abstract: We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural language text is interspersed with an arbitrary number of proteins. Besides, we propose the protein-as-word language modeling approach to train ProtLLM. By dev… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

    Comments: https://protllm.github.io/project/

  17. arXiv:2312.10165  [pdf, other

    cs.CV

    Test-Time Domain Adaptation by Learning Domain-Aware Batch Normalization

    Authors: Yanan Wu, Zhixiang Chi, Yang Wang, Konstantinos N. Plataniotis, Songhe Feng

    Abstract: Test-time domain adaptation aims to adapt the model trained on source domains to unseen target domains using a few unlabeled images. Emerging research has shown that the label and domain information is separately embedded in the weight matrix and batch normalization (BN) layer. Previous works normally update the whole network naively without explicitly decoupling the knowledge between label and do… ▽ More

    Submitted 16 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: AAAI2024(Oral), see this https URL: https://github.com/ynanwu/MABN

  18. arXiv:2311.02874  [pdf, other

    eess.IV cs.CV cs.LG

    Dynamic Neural Fields for Learning Atlases of 4D Fetal MRI Time-series

    Authors: Zeen Chi, Zhongxiao Cong, Clinton J. Wang, Yingcheng Liu, Esra Abaci Turk, P. Ellen Grant, S. Mazdak Abulnaga, Polina Golland, Neel Dey

    Abstract: We present a method for fast biomedical image atlas construction using neural fields. Atlases are key to biomedical image analysis tasks, yet conventional and deep network estimation methods remain time-intensive. In this preliminary work, we frame subject-specific atlas building as learning a neural field of deformable spatiotemporal observations. We apply our method to learning subject-specific… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 6 pages, 2 figures. Accepted by Medical Imaging Meets NeurIPS 2023

  19. arXiv:2308.11063  [pdf, other

    cs.CV

    MetaGCD: Learning to Continually Learn in Generalized Category Discovery

    Authors: Yanan Wu, Zhixiang Chi, Yang Wang, Songhe Feng

    Abstract: In this paper, we consider a real-world scenario where a model that is trained on pre-defined classes continually encounters unlabeled data that contains both known and novel classes. The goal is to continually discover novel classes while maintaining the performance in known classes. We name the setting Continual Generalized Category Discovery (C-GCD). Existing methods for novel class discovery c… ▽ More

    Submitted 17 October, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: This paper has been accepted by ICCV2023

  20. arXiv:2308.09268  [pdf, other

    cs.CV

    Progression-Guided Temporal Action Detection in Videos

    Authors: Chongkai Lu, Man-Wai Mak, Ruimin Li, Zheru Chi, Hong Fu

    Abstract: We present a novel framework, Action Progression Network (APN), for temporal action detection (TAD) in videos. The framework locates actions in videos by detecting the action evolution process. To encode the action evolution, we quantify a complete action process into 101 ordered stages (0\%, 1\%, ..., 100\%), referred to as action progressions. We then train a neural network to recognize the acti… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Under Review. Code available at https://github.com/makecent/APN

  21. arXiv:2308.00520  [pdf, other

    cs.CV

    NormKD: Normalized Logits for Knowledge Distillation

    Authors: Zhihao Chi, Tu Zheng, Hengjia Li, Zheng Yang, Boxi Wu, Binbin Lin, Deng Cai

    Abstract: Logit based knowledge distillation gets less attention in recent years since feature based methods perform better in most cases. Nevertheless, we find it still has untapped potential when we re-investigate the temperature, which is a crucial hyper-parameter to soften the logit outputs. For most of the previous works, it was set as a fixed value for the entire distillation procedure. However, as th… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  22. arXiv:2305.08800  [pdf, other

    cs.CL

    Measuring Cross-Lingual Transferability of Multilingual Transformers on Sentence Classification

    Authors: Zewen Chi, Heyan Huang, Xian-Ling Mao

    Abstract: Recent studies have exhibited remarkable capabilities of pre-trained multilingual Transformers, especially cross-lingual transferability. However, current methods do not measure cross-lingual transferability well, hindering the understanding of multilingual Transformers. In this paper, we propose IGap, a cross-lingual transferability metric for multilingual Transformers on sentence classification… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  23. arXiv:2303.17815  [pdf, other

    cs.CV

    APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud Understanding

    Authors: Hengjia Li, Tu Zheng, Zhihao Chi, Zheng Yang, Wenxiao Wang, Boxi Wu, Binbin Lin, Deng Cai

    Abstract: Transformer-based networks have achieved impressive performance in 3D point cloud understanding. However, most of them concentrate on aggregating local features, but neglect to directly model global dependencies, which results in a limited effective receptive field. Besides, how to effectively incorporate local and global components also remains challenging. To tackle these problems, we propose As… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

  24. arXiv:2302.14045  [pdf, other

    cs.CL cs.CV

    Language Is Not All You Need: Aligning Perception with Language Models

    Authors: Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

    Abstract: A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal co… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  25. arXiv:2302.06455  [pdf, other

    cs.AI cs.FL

    Incremental Satisfiability Modulo Theory for Verification of Deep Neural Networks

    Authors: Pengfei Yang, Zhiming Chi, Zongxin Liu, Mengyu Zhao, Cheng-Chao Huang, Shaowei Cai, Lijun Zhang

    Abstract: Constraint solving is an elementary way for verification of deep neural networks (DNN). In the domain of AI safety, a DNN might be modified in its structure and parameters for its repair or attack. For such situations, we propose the incremental DNN verification problem, which asks whether a safety property still holds after the DNN is modified. To solve the problem, we present an incremental sati… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  26. arXiv:2212.09611  [pdf, other

    cs.CL cs.CV

    Optimizing Prompts for Text-to-Image Generation

    Authors: Yaru Hao, Zewen Chi, Li Dong, Furu Wei

    Abstract: Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretr… ▽ More

    Submitted 29 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted by NeurIPS-23

  27. arXiv:2212.09353  [pdf, other

    cs.CL

    Bridging The Gap: Entailment Fused-T5 for Open-retrieval Conversational Machine Reading Comprehension

    Authors: Xiao Zhang, Heyan Huang, Zewen Chi, Xian-Ling Mao

    Abstract: Open-retrieval conversational machine reading comprehension (OCMRC) simulates real-life conversational interaction scenes. Machines are required to make a decision of "Yes/No/Inquire" or generate a follow-up question when the decision is "Inquire" based on retrieved rule texts, user scenario, user question, and dialogue history. Recent studies explored the methods to reduce the information gap bet… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  28. arXiv:2212.08273  [pdf, other

    cs.CV cs.AI cs.LG

    Learning for Vehicle-to-Vehicle Cooperative Perception under Lossy Communication

    Authors: Jinlong Li, Runsheng Xu, Xinyu Liu, Jin Ma, Zicheng Chi, Jiaqi Ma, Hongkai Yu

    Abstract: Deep learning has been widely used in the perception (e.g., 3D object detection) of intelligent vehicle driving. Due to the beneficial Vehicle-to-Vehicle (V2V) communication, the deep learning based features from other agents can be shared to the ego vehicle so as to improve the perception of the ego vehicle. It is named as Cooperative Perception in the V2V research, whose algorithms have been dra… ▽ More

    Submitted 18 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: this paper was accepted by IEEE Transactions on Intelligent Vehicles

    Journal ref: 2023 IEEE Transactions on Intelligent Vehicles

  29. arXiv:2211.13184  [pdf, other

    cs.LG cs.CL

    TorchScale: Transformers at Scale

    Authors: Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Work in progress

  30. arXiv:2210.14867  [pdf, other

    cs.CL cs.LG

    Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

    Authors: Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song

    Abstract: In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications. We show that going beyond English-centric bitexts, coupled with a novel sampling strategy aimed at reducing… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Work in progress

  31. arXiv:2210.06546  [pdf, other

    cs.LG stat.ML

    Auto-Encoding Goodness of Fit

    Authors: Aaron Palmer, Zhiyi Chi, Derek Aguiar, Jinbo Bi

    Abstract: We develop a new type of generative autoencoder called the Goodness-of-Fit Autoencoder (GoFAE), which incorporates GoF tests at two levels. At the minibatch level, it uses GoF test statistics as regularization objectives. At a more global level, it selects a regularization coefficient based on higher criticism, i.e., a test on the uniformity of the local GoF p-values. We justify the use of GoF tes… ▽ More

    Submitted 18 March, 2025; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: The Eleventh International Conference on Learning Representations. 2023

  32. arXiv:2210.05461  [pdf, other

    cs.CV

    FreGAN: Exploiting Frequency Components for Training GANs under Limited Data

    Authors: Mengping Yang, Zhe Wang, Ziqiu Chi, Yanbing Zhang

    Abstract: Training GANs under limited data often leads to discriminator overfitting and memorization issues, causing divergent training. Existing approaches mitigate the overfitting by employing data augmentations, model regularization, or attention mechanisms. However, they ignore the frequency bias of GANs and take poor consideration towards frequency information, especially high-frequency signals that co… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: To appear in NeurIPS 2022, github:https://github.com/kobeshegu/FreGAN_NeurIPS2022

  33. arXiv:2210.03885  [pdf, other

    cs.LG cs.CV

    Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts

    Authors: Tao Zhong, Zhixiang Chi, Li Gu, Yang Wang, Yuanhao Yu, Jin Tang

    Abstract: In this paper, we tackle the problem of domain shift. Most existing methods perform training on multiple source domains using a single model, and the same trained model is used on all unseen target domains. Such solutions are sub-optimal as each target domain exhibits its own specialty, which is not adapted. Furthermore, expecting single-model training to learn extensive knowledge from multiple so… ▽ More

    Submitted 11 January, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS2022

  34. arXiv:2210.00174  [pdf, other

    cs.CV cs.LG

    Improving ProtoNet for Few-Shot Video Object Recognition: Winner of ORBIT Challenge 2022

    Authors: Li Gu, Zhixiang Chi, Huan Liu, Yuanhao Yu, Yang Wang

    Abstract: In this work, we present the winning solution for ORBIT Few-Shot Video Object Recognition Challenge 2022. Built upon the ProtoNet baseline, the performance of our method is improved with three effective techniques. These techniques include the embedding adaptation, the uniform video clip sampler and the invalid frame detection. In addition, we re-factor and re-implement the official codebase to en… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

    Comments: Winner of ORBIT Challenge 2022

  35. arXiv:2209.11484  [pdf, other

    cs.CL

    ET5: A Novel End-to-end Framework for Conversational Machine Reading Comprehension

    Authors: Xiao Zhang, Heyan Huang, Zewen Chi, Xian-Ling Mao

    Abstract: Conversational machine reading comprehension (CMRC) aims to assist computers to understand an natural language text and thereafter engage in a multi-turn conversation to answer questions related to the text. Existing methods typically require three steps: (1) decision making based on entailment reasoning; (2) span extraction if required by the above decision; (3) question rephrasing based on the e… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: Accepted by COLING2022

  36. arXiv:2208.10813  [pdf, other

    cs.CL

    Unsupervised Question Answering via Answer Diversifying

    Authors: Yuxiang Nie, Heyan Huang, Zewen Chi, Xian-Ling Mao

    Abstract: Unsupervised question answering is an attractive task due to its independence on labeled data. Previous works usually make use of heuristic rules as well as pre-trained models to construct data and train QA models. However, most of these works regard named entity (NE) as the only answer type, which ignores the high diversity of answers in the real world. To tackle this problem, we propose a novel… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

    Comments: Accepted by COLING 2022

  37. arXiv:2207.12305  [pdf, other

    cs.CV

    Error-Aware Spatial Ensembles for Video Frame Interpolation

    Authors: Zhixiang Chi, Rasoul Mohammadi Nasiri, Zheng Liu, Yuanhao Yu, Juwei Lu, Jin Tang, Konstantinos N Plataniotis

    Abstract: Video frame interpolation~(VFI) algorithms have improved considerably in recent years due to unprecedented progress in both data-driven algorithms and their implementations. Recent research has introduced advanced motion estimation or novel warping methods as the means to address challenging VFI scenarios. However, none of the published VFI works considers the spatially non-uniform characteristics… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: 10 pages, 8 figures, demo video: https://www.youtube.com/watch?v=_32GNANSr5U

  38. arXiv:2207.11213  [pdf, other

    cs.CV

    Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free Replay

    Authors: Huan Liu, Li Gu, Zhixiang Chi, Yang Wang, Yuanhao Yu, Jun Chen, Jin Tang

    Abstract: Few-shot class-incremental learning (FSCIL) has been proposed aiming to enable a deep learning system to incrementally learn new classes with limited data. Recently, a pioneer claims that the commonly used replay-based method in class-incremental learning (CIL) is ineffective and thus not preferred for FSCIL. This has, if truth, a significant influence on the fields of FSCIL. In this paper, we sho… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV 2022

  39. arXiv:2207.07288  [pdf, other

    cs.CV eess.IV

    WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation

    Authors: Mengping Yang, Zhe Wang, Ziqiu Chi, Wenyi Feng

    Abstract: Existing few-shot image generation approaches typically employ fusion-based strategies, either on the image or the feature level, to produce new images. However, previous approaches struggle to synthesize high-frequency signals with fine details, deteriorating the synthesis quality. To address this, we propose WaveGAN, a frequency-aware model for few-shot image generation. Concretely, we disentang… ▽ More

    Submitted 9 August, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV2022, Code Link:https://github.com/kobeshegu/ECCV2022_WaveGAN

  40. arXiv:2206.06336  [pdf, other

    cs.CL

    Language Models are General-Purpose Interfaces

    Authors: Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

    Abstract: Foundation models have received much attention due to their effectiveness across a broad range of downstream applications. Though there is a big convergence in terms of architecture, most pretrained models are typically still developed for specific tasks or modalities. In this work, we propose to use language models as a general-purpose interface to various foundation models. A collection of pretr… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: 32 pages. The first three authors contribute equally

  41. arXiv:2204.09179  [pdf, other

    cs.CL cs.LG

    On the Representation Collapse of Sparse Mixture of Experts

    Authors: Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei

    Abstract: Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we… ▽ More

    Submitted 12 October, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: NeurIPS 2022

  42. arXiv:2204.08887  [pdf, other

    cs.CL

    Cross-Lingual Phrase Retrieval

    Authors: Heqi Zheng, Xiao Zhang, Zewen Chi, Heyan Huang, Tan Yan, Tian Lan, Wei Wei, Xian-Ling Mao

    Abstract: Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

  43. arXiv:2112.05883  [pdf, other

    cs.CV cs.LG

    Self-supervised Spatiotemporal Representation Learning by Exploiting Video Continuity

    Authors: Hanwen Liang, Niamul Quader, Zhixiang Chi, Lizhe Chen, Peng Dai, Juwei Lu, Yang Wang

    Abstract: Recent self-supervised video representation learning methods have found significant success by exploring essential properties of videos, e.g. speed, temporal order, etc. This work exploits an essential yet under-explored property of videos, the video continuity, to obtain supervision signals for self-supervised representation learning. Specifically, we formulate three novel continuity-related pret… ▽ More

    Submitted 12 January, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

  44. arXiv:2110.07936  [pdf, other

    cs.CL

    Unifying Cross-lingual Summarization and Machine Translation with Compression Rate

    Authors: Yu Bai, Heyan Huang, Kai Fan, Yang Gao, Yiming Zhu, Jiaao Zhan, Zewen Chi, Boxing Chen

    Abstract: Cross-Lingual Summarization (CLS) is a task that extracts important information from a source document and summarizes it into a summary in another language. It is a challenging task that requires a system to understand, summarize, and translate at the same time, making it highly related to Monolingual Summarization (MS) and Machine Translation (MT). In practice, the training resources for Machine… ▽ More

    Submitted 24 April, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted by SIGIR 2022

  45. arXiv:2109.11129  [pdf, other

    cs.CL

    Cross-Lingual Language Model Meta-Pretraining

    Authors: Zewen Chi, Heyan Huang, Luyang Liu, Yu Bai, Xian-Ling Mao

    Abstract: The success of pretrained cross-lingual language models relies on two essential abilities, i.e., generalization ability for learning downstream tasks in a source language, and cross-lingual transferability for transferring the task knowledge to other languages. However, current methods jointly learn the two abilities in a single-phase cross-lingual pretraining process, resulting in a trade-off bet… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

  46. arXiv:2106.16138  [pdf, other

    cs.CL

    XLM-E: Cross-lingual Language Model Pre-training via ELECTRA

    Authors: Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Bo Zheng, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei

    Abstract: In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training. Specifically, we present two pre-training tasks, namely multilingual replaced token detection, and translation replaced token detection. Besides, we pretrain the model, named as XLM-E, on both multilingual and parallel corpora. Our model outperforms the baseline models on various cross-lingual understandi… ▽ More

    Submitted 19 April, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: ACL-2022

  47. arXiv:2106.08226  [pdf, other

    cs.CL

    Consistency Regularization for Cross-Lingual Fine-Tuning

    Authors: Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei

    Abstract: Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others. In this work, we propose to improve cross-lingual fine-tuning with consistency regularization. Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations, i.e., subword sampling, Gaussian noise, code-sw… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: ACL-2021

  48. arXiv:2106.06381  [pdf, other

    cs.CL

    Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

    Authors: Zewen Chi, Li Dong, Bo Zheng, Shaohan Huang, Xian-Ling Mao, Heyan Huang, Furu Wei

    Abstract: The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task. Specifically, the model first self-labels word alignments for parallel sentences. Then we randomly mask tokens in a bitext pair. Given a masked token, the model uses a poin… ▽ More

    Submitted 13 September, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: ACL 2021

  49. arXiv:2104.08692  [pdf, other

    cs.CL

    MT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

    Authors: Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan Huang, Furu Wei

    Abstract: Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks. In this paper, we improve multilingual text-to-text transfer Transformer with translation pairs (mT6). Specifically, we explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, an… ▽ More

    Submitted 13 September, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021

  50. DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python

    Authors: Jinglin Peng, Weiyuan Wu, Brandon Lockhart, Song Bian, Jing Nathan Yan, Linghao Xu, Zhixuan Chi, Jeffrey Rzeszotarski, Jiannan Wang

    Abstract: Exploratory Data Analysis (EDA) is a crucial step in any data science project. However, existing Python libraries fall short in supporting data scientists to complete common EDA tasks for statistical modeling. Their API design is either too low level, which is optimized for plotting rather than EDA, or too high level, which is hard to specify more fine-grained EDA tasks. In response, we propose Da… ▽ More

    Submitted 10 April, 2021; v1 submitted 1 April, 2021; originally announced April 2021.