Skip to main content

Showing 1–50 of 104 results for author: Qiu, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05411  [pdf, ps, other

    cs.LG

    AXLearn: Modular Large Model Training on Heterogeneous Infrastructure

    Authors: Mark Lee, Tom Gunter, Chang Lan, John Peebles, Hanzhi Zhou, Kelvin Zou, Sneha Bangalore, Chung-Cheng Chiu, Nan Du, Xianzhi Du, Philipp Dufter, Ruixuan Hou, Haoshuo Huang, Dongseong Hwang, Xiang Kong, Jinhao Lei, Tao Lei, Meng Li, Li Li, Jiarui Lu, Zhiyun Lu, Yiping Ma, David Qiu, Vivek Rathod, Senyu Tong , et al. (12 additional authors not shown)

    Abstract: We design and implement AXLearn, a production deep learning system that facilitates scalable and high-performance training of large deep learning models. Compared to other state-of-the-art deep learning systems, AXLearn has a unique focus on modularity and support for heterogeneous hardware infrastructure. AXLearn's internal interfaces between software components follow strict encapsulation, allow… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2506.23542  [pdf, ps, other

    cs.CV

    Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention

    Authors: Weida Wang, Changyong He, Jin Zeng, Di Qiu

    Abstract: Depth images captured by Time-of-Flight (ToF) sensors are prone to noise, requiring denoising for reliable downstream applications. Previous works either focus on single-frame processing, or perform multi-frame processing without considering depth variations at corresponding pixels across frames, leading to undesirable temporal inconsistency and spatial ambiguity. In this paper, we propose a novel… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted for publication at the International Conference on Computer Vision (ICCV) 2025

  3. arXiv:2506.18768  [pdf, other

    cs.CL

    ASP2LJ : An Adversarial Self-Play Laywer Augmented Legal Judgment Framework

    Authors: Ao Chang, Tong Zhou, Yubo Chen, Delai Qiu, Shengping Liu, Kang Liu, Jun Zhao

    Abstract: Legal Judgment Prediction (LJP) aims to predict judicial outcomes, including relevant legal charge, terms, and fines, which is a crucial process in Large Language Model(LLM). However, LJP faces two key challenges: (1)Long Tail Distribution: Current datasets, derived from authentic cases, suffer from high human annotation costs and imbalanced distributions, leading to model performance degradation.… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  4. arXiv:2506.00830  [pdf, ps, other

    cs.CV

    SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers

    Authors: Zhengcong Fei, Hao Jiang, Di Qiu, Baoxuan Gu, Youqiang Zhang, Jiahua Wang, Jialin Bai, Debang Li, Mingyuan Fan, Guibin Chen, Yahui Zhou

    Abstract: The generation and editing of audio-conditioned talking portraits guided by multimodal inputs, including text, images, and videos, remains under explored. In this paper, we present SkyReels-Audio, a unified framework for synthesizing high-fidelity and temporally coherent talking portrait videos. Built upon pretrained video diffusion transformers, our framework supports infinite-length generation a… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  5. arXiv:2505.19640  [pdf, other

    cs.CL

    Interleaved Reasoning for Large Language Models via Reinforcement Learning

    Authors: Roy Xie, David Qiu, Deepak Gopinath, Dong Lin, Yanchao Sun, Chong Wang, Saloni Potdar, Bhuwan Dhingra

    Abstract: Long chain-of-thought (CoT) significantly enhances large language models' (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and an increased time-to-first-token (TTFT). We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions. We observe that models inhe… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  6. arXiv:2505.18881  [pdf, other

    cs.CV cs.AI cs.RO

    SD-OVON: A Semantics-aware Dataset and Benchmark Generation Pipeline for Open-Vocabulary Object Navigation in Dynamic Scenes

    Authors: Dicong Qiu, Jiadi You, Zeying Gong, Ronghe Qiu, Hui Xiong, Junwei Liang

    Abstract: We present the Semantics-aware Dataset and Benchmark Generation Pipeline for Open-vocabulary Object Navigation in Dynamic Scenes (SD-OVON). It utilizes pretraining multimodal foundation models to generate infinite unique photo-realistic scene variants that adhere to real-world semantics and daily commonsense for the training and the evaluation of navigation agents, accompanied with a plugin for ge… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: Preprint. 21 pages

  7. arXiv:2505.16385  [pdf, ps, other

    cs.CL

    Semantic Pivots Enable Cross-Lingual Transfer in Large Language Models

    Authors: Kaiyu He, Tong Zhou, Yubo Chen, Delai Qiu, Shengping Liu, Kang Liu, Jun Zhao

    Abstract: Large language models (LLMs) demonstrate remarkable ability in cross-lingual tasks. Understanding how LLMs acquire this ability is crucial for their interpretability. To quantify the cross-lingual ability of LLMs accurately, we propose a Word-Level Cross-Lingual Translation Task. To find how LLMs learn cross-lingual ability, we trace the outputs of LLMs' intermediate layers in the word translation… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 14 pages, 10 figures

  8. Finding Counterfactual Evidences for Node Classification

    Authors: Dazhuo Qiu, Jinwen Chen, Arijit Khan, Yan Zhao, Francesco Bonchi

    Abstract: Counterfactual learning is emerging as an important paradigm, rooted in causality, which promises to alleviate common issues of graph neural networks (GNNs), such as fairness and interpretability. However, as in many real-world application domains where conducting randomized controlled trials is impractical, one has to rely on available observational (factual) data to detect counterfactuals. In th… ▽ More

    Submitted 2 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

    Comments: Accepted by KDD 2025

  9. arXiv:2505.07635  [pdf, ps, other

    cs.LG cs.DB

    Interpreting Graph Inference with Skyline Explanations

    Authors: Dazhuo Qiu, Haolai Che, Arijit Khan, Yinghui Wu

    Abstract: Inference queries have been routinely issued to graph machine learning models such as graph neural networks (GNNs) for various network analytical tasks. Nevertheless, GNNs outputs are often hard to interpret comprehensively. Existing methods typically compromise to individual pre-defined explainability measures (such as fidelity), which often leads to biased, ``one-sided'' interpretations. This pa… ▽ More

    Submitted 3 July, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  10. arXiv:2504.14856  [pdf, other

    cs.CL

    Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation

    Authors: Jiajun Shen, Tong Zhou, Yubo Chen, Delai Qiu, Shengping Liu, Kang Liu, Jun Zhao

    Abstract: While hallucinations of large language models could been alleviated through retrieval-augmented generation and citation generation, how the model utilizes internal knowledge is still opaque, and the trustworthiness of its generated answers remains questionable. In this work, we introduce Context-Prior Augmented Citation Generation task, requiring models to generate citations considering both exter… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 19 pages, 14 figures

  11. arXiv:2504.13074  [pdf, other

    cs.CV

    SkyReels-V2: Infinite-length Film Generative Model

    Authors: Guibin Chen, Dixuan Lin, Jiangping Yang, Chunze Lin, Junchen Zhu, Mingyuan Fan, Hao Zhang, Sheng Chen, Zheng Chen, Chengcheng Ma, Weiming Xiong, Wei Wang, Nuo Pang, Kang Kang, Zhiheng Xu, Yuzhe Jin, Yupeng Liang, Yubing Song, Peng Zhao, Boyuan Xu, Di Qiu, Debang Li, Zhengcong Fei, Yang Li, Yahui Zhou

    Abstract: Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming fro… ▽ More

    Submitted 21 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: 31 pages,10 figures

  12. arXiv:2504.02436  [pdf, other

    cs.CV

    SkyReels-A2: Compose Anything in Video Diffusion Transformers

    Authors: Zhengcong Fei, Debang Li, Di Qiu, Jiahua Wang, Yikun Dou, Rui Wang, Jingtao Xu, Mingyuan Fan, Guibin Chen, Yang Li, Yahui Zhou

    Abstract: This paper presents SkyReels-A2, a controllable video generation framework capable of assembling arbitrary visual elements (e.g., characters, objects, backgrounds) into synthesized videos based on textual prompts while maintaining strict consistency with reference images for each element. We term this task elements-to-video (E2V), whose primary challenges lie in preserving the fidelity of each ref… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  13. arXiv:2504.00625  [pdf, other

    eess.SY cs.FL

    New Insights into the Decidability of Opacity in Timed Automata

    Authors: Weilin Deng, Daowen Qiu, Jingkai Yang

    Abstract: This paper investigates the decidability of opacity in timed automata (TA), a property that has been proven to be undecidable in general. First, we address a theoretical gap in recent work by J. An et al. (FM 2024) by providing necessary and sufficient conditions for the decidability of location-based opacity in TA. Based on these conditions, we identify a new decidable subclass of TA, called time… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  14. arXiv:2503.23329  [pdf, other

    cs.AI

    A Multi-Agent Framework with Automated Decision Rule Optimization for Cross-Domain Misinformation Detection

    Authors: Hui Li, Ante Wang, kunquan li, Zhihao Wang, Liang Zhang, Delai Qiu, Qingsong Liu, Jinsong Su

    Abstract: Misinformation spans various domains, but detection methods trained on specific domains often perform poorly when applied to others. With the rapid development of Large Language Models (LLMs), researchers have begun to utilize LLMs for cross-domain misinformation detection. However, existing LLM-based methods often fail to adequately analyze news in the target domain, limiting their detection capa… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

  15. arXiv:2503.21458  [pdf, other

    cs.LG cs.DB

    DATA-WA: Demand-based Adaptive Task Assignment with Dynamic Worker Availability Windows

    Authors: Jinwen Chen, Jiannan Guo, Dazhuo Qiu, Yawen Li, Guanhua Ye, Yan Zhao, Kai Zheng

    Abstract: With the rapid advancement of mobile networks and the widespread use of mobile devices, spatial crowdsourcing, which involves assigning location-based tasks to mobile workers, has gained significant attention. However, most existing research focuses on task assignment at the current moment, overlooking the fluctuating demand and supply between tasks and workers over time. To address this issue, we… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  16. arXiv:2503.00059  [pdf, other

    cs.CV cs.LG

    Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models

    Authors: Rui Hu, Delai Qiu, Shuyu Wei, Jiaming Zhang, Yining Wang, Shengping Liu, Jitao Sang

    Abstract: Omnimodal Large Language Models (OLLMs) have shown significant progress in integrating vision and text, but still struggle with integrating vision and audio, often exhibiting suboptimal performance when processing audio queries compared to text queries. This disparity is primarily due to insufficient alignment between vision and audio modalities during training, leading to inadequate attention to… ▽ More

    Submitted 20 May, 2025; v1 submitted 26 February, 2025; originally announced March 2025.

    Comments: Accepted to ACL 2025 Findings

  17. arXiv:2502.10841  [pdf, other

    cs.CV

    SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers

    Authors: Di Qiu, Zhengcong Fei, Rui Wang, Jialin Bai, Changqian Yu, Mingyuan Fan, Guibin Chen, Xiang Wen

    Abstract: We present SkyReels-A1, a simple yet effective framework built upon video diffusion Transformer to facilitate portrait image animation. Existing methodologies still encounter issues, including identity distortion, background instability, and unrealistic facial dynamics, particularly in head-only animation scenarios. Besides, extending to accommodate diverse body proportions usually leads to visual… ▽ More

    Submitted 15 February, 2025; originally announced February 2025.

  18. arXiv:2501.01790  [pdf, other

    cs.CV

    Ingredients: Blending Custom Photos with Video Diffusion Transformers

    Authors: Zhengcong Fei, Debang Li, Di Qiu, Changqian Yu, Mingyuan Fan

    Abstract: This paper presents a powerful framework to customize video creations by incorporating multiple specific identity (ID) photos, with video diffusion Transformers, referred to as Ingredients. Generally, our method consists of three primary modules: (i) a facial extractor that captures versatile and precise facial features for each human ID from both global and local perspectives; (ii) a multi-scale… ▽ More

    Submitted 18 March, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

  19. arXiv:2412.11258  [pdf, other

    cs.RO cs.AI cs.CV

    GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs

    Authors: Xinli Xu, Wenhang Ge, Dicong Qiu, ZhiFei Chen, Dongyu Yan, Zhuoyun Liu, Haoyu Zhao, Hanfeng Zhao, Shunsi Zhang, Junwei Liang, Ying-Cong Chen

    Abstract: Estimating physical properties for visual data is a crucial task in computer vision, graphics, and robotics, underpinning applications such as augmented reality, physical simulation, and robotic grasping. However, this area remains under-explored due to the inherent ambiguities in physical property estimation. To address these challenges, we introduce GaussianProperty, a training-free framework th… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Comments: 17 pages, 17 figures

  20. arXiv:2412.10783  [pdf, other

    cs.CV

    Video Diffusion Transformers are In-Context Learners

    Authors: Zhengcong Fei, Di Qiu, Debang Li, Changqian Yu, Mingyuan Fan

    Abstract: This paper investigates a solution for enabling in-context capabilities of video diffusion transformers, with minimal tuning required for activation. Specifically, we propose a simple pipeline to leverage in-context generation: ($\textbf{i}$) concatenate videos along spacial or time dimension, ($\textbf{ii}$) jointly caption multi-scene video clips from one source, and ($\textbf{iii}$) apply task-… ▽ More

    Submitted 22 March, 2025; v1 submitted 14 December, 2024; originally announced December 2024.

  21. arXiv:2411.18281  [pdf, other

    cs.CV

    MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation

    Authors: Haopeng Fang, Di Qiu, Binjie Mao, Pengfei Yan, He Tang

    Abstract: Recent advancements in personalized Text-to-Video (T2V) generation highlight the importance of integrating character-specific identities and actions. However, previous T2V models struggle with identity consistency and controllable motion dynamics, mainly due to limited fine-grained facial and action-based textual prompts, and datasets that overlook key human attributes and actions. To address thes… ▽ More

    Submitted 30 November, 2024; v1 submitted 27 November, 2024; originally announced November 2024.

  22. arXiv:2410.20974  [pdf, other

    cs.CV

    MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis

    Authors: Di Qiu, Zheng Chen, Rui Wang, Mingyuan Fan, Changqian Yu, Junshi Huang, Xiang Wen

    Abstract: Recent advancements in character video synthesis still depend on extensive fine-tuning or complex 3D modeling processes, which can restrict accessibility and hinder real-time applicability. To address these challenges, we propose a simple yet effective tuning-free framework for character video synthesis, named MovieCharacter, designed to streamline the synthesis process while ensuring high-quality… ▽ More

    Submitted 13 January, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

  23. arXiv:2408.00378  [pdf

    cs.CE

    A deep spatio-temporal attention model of dynamic functional network connectivity shows sensitivity to Alzheimer's in asymptomatic individuals

    Authors: Yuxiang Wei, Anees Abrol, James Lah, Deqiang Qiu, Vince D. Calhoun

    Abstract: Alzheimer's disease (AD) progresses from asymptomatic changes to clinical symptoms, emphasizing the importance of early detection for proper treatment. Functional magnetic resonance imaging (fMRI), particularly dynamic functional network connectivity (dFNC), has emerged as an important biomarker for AD. Nevertheless, studies probing at-risk subjects in the pre-symptomatic stage using dFNC are limi… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by EMBC 2024

  24. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  25. arXiv:2407.00295  [pdf, other

    cs.CV

    A deep neural network framework for dynamic multi-valued mapping estimation and its applications

    Authors: Geng Li, Di Qiu, Lok Ming Lui

    Abstract: This paper addresses the problem of modeling and estimating dynamic multi-valued mappings. While most mathematical models provide a unique solution for a given input, real-world applications often lack deterministic solutions. In such scenarios, estimating dynamic multi-valued mappings is necessary to suggest different reasonable solutions for each input. This paper introduces a deep neural networ… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  26. arXiv:2406.18115  [pdf, other

    cs.RO cs.AI cs.CV

    Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps

    Authors: Dicong Qiu, Wenzong Ma, Zhenfu Pan, Hui Xiong, Junwei Liang

    Abstract: Open-Vocabulary Mobile Manipulation (OVMM) is a crucial capability for autonomous robots, especially when faced with the challenges posed by unknown and dynamic environments. This task requires robots to explore and build a semantic understanding of their surroundings, generate feasible plans to achieve manipulation goals, adapt to environmental changes, and comprehend natural language instruction… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Open-vocabulary, Mobile Manipulation, Dynamic Environments, 3D Semantic Maps, Zero-shot, LLMs, VLMs, 18 pages, 2 figures

  27. arXiv:2405.13467  [pdf, other

    cs.CV

    AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning

    Authors: Di Qiu, Xinyang Lin, Kaiye Wang, Xiangxiang Chu, Pengfei Yan

    Abstract: With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scena… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  28. arXiv:2405.11315  [pdf, other

    cs.CV

    MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection

    Authors: Ximiao Zhang, Min Xu, Dehui Qiu, Ruixin Yan, Ning Lang, Xiuzhuang Zhou

    Abstract: In the field of medical decision-making, precise anomaly detection in medical imaging plays a pivotal role in aiding clinicians. However, previous work is reliant on large-scale datasets for training anomaly detection models, which increases the development cost. This paper first focuses on the task of medical image anomaly detection in the few-shot setting, which is critically significant for the… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures, 5 tables, early accepted at MICCAI 2024

  29. arXiv:2404.19519  [pdf, ps, other

    cs.LG cs.DB

    Generating Robust Counterfactual Witnesses for Graph Neural Networks

    Authors: Dazhuo Qiu, Mengying Wang, Arijit Khan, Yinghui Wu

    Abstract: This paper introduces a new class of explanation structures, called robust counterfactual witnesses (RCWs), to provide robust, both counterfactual and factual explanations for graph neural networks. Given a graph neural network M, a robust counterfactual witness refers to the fraction of a graph G that are counterfactual and factual explanation of the results of M over G, but also remains so for a… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by ICDE 2024

  30. arXiv:2404.02225  [pdf, other

    cs.CV cs.AI

    CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement

    Authors: Di Qiu, Yinda Zhang, Thabo Beeler, Vladimir Tankovich, Christian Häne, Sean Fanello, Christoph Rhemann, Sergio Orts Escolano

    Abstract: We propose CHOSEN, a simple yet flexible, robust and effective multi-view depth refinement framework. It can be employed in any existing multi-view stereo pipeline, with straightforward generalization capability for different multi-view capture systems such as camera relative positioning and lenses. Given an initial depth estimation, CHOSEN iteratively re-samples and selects the best hypotheses, a… ▽ More

    Submitted 5 May, 2025; v1 submitted 2 April, 2024; originally announced April 2024.

  31. arXiv:2404.01296  [pdf, other

    cs.CV

    MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space

    Authors: Armand Comas-Massagué, Di Qiu, Menglei Chai, Marcel Bühler, Amit Raj, Ruiqi Gao, Qiangeng Xu, Mark Matthews, Paulo Gotardo, Octavia Camps, Sergio Orts-Escolano, Thabo Beeler

    Abstract: We introduce a novel framework for 3D human avatar generation and personalization, leveraging text prompts to enhance user engagement and customization. Central to our approach are key innovations aimed at overcoming the challenges in photo-realistic avatar synthesis. Firstly, we utilize a conditional Neural Radiance Fields (NeRF) model, trained on a large-scale unannotated multi-view dataset, to… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  32. arXiv:2404.00667  [pdf, other

    cs.CV

    Weakly-Supervised Cross-Domain Segmentation of Electron Microscopy with Sparse Point Annotation

    Authors: Dafei Qiu, Shan Xiong, Jiajin Yi, Jialin Peng

    Abstract: Accurate segmentation of organelle instances from electron microscopy (EM) images plays an essential role in many neuroscience researches. However, practical scenarios usually suffer from high annotation costs, label scarcity, and large domain diversity. While unsupervised domain adaptation (UDA) that assumes no annotation effort on the target data is promising to alleviate these challenges, its p… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  33. arXiv:2401.08957  [pdf, other

    cs.RO cs.AI

    Learning from Imperfect Demonstrations with Self-Supervision for Robotic Manipulation

    Authors: Kun Wu, Ning Liu, Zhen Zhao, Di Qiu, Jinming Li, Zhengping Che, Zhiyuan Xu, Jian Tang

    Abstract: Improving data utilization, especially for imperfect data from task failures, is crucial for robotic manipulation due to the challenging, time-consuming, and expensive data collection process in the real world. Current imitation learning (IL) typically discards imperfect data, focusing solely on successful expert data. While reinforcement learning (RL) can learn from explorations and failures, the… ▽ More

    Submitted 17 March, 2025; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 8 pages, 4 figures

    ACM Class: I.2.9

  34. arXiv:2401.02086  [pdf, other

    cs.LG cs.DB

    View-based Explanations for Graph Neural Networks

    Authors: Tingyang Chen, Dazhuo Qiu, Yinghui Wu, Arijit Khan, Xiangyu Ke, Yunjun Gao

    Abstract: Generating explanations for graph neural networks (GNNs) has been studied to understand their behavior in analytical tasks such as graph classification. Existing approaches aim to understand the overall results of GNNs rather than providing explanations for specific class labels of interest, and may return explanation structures that are hard to access, nor directly queryable.We propose GVEX, a no… ▽ More

    Submitted 7 January, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: This paper has been accepted by SIGMOD 2024

  35. arXiv:2312.09463  [pdf, other

    cs.CL

    Partial Rewriting for Multi-Stage ASR

    Authors: Antoine Bruguier, David Qiu, Yanzhang He

    Abstract: For many streaming automatic speech recognition tasks, it is important to provide timely intermediate streaming results, while refining a high quality final result. This can be done using a multi-stage architecture, where a small left-context only model creates streaming results and a larger left- and right-context model produces a final result at the end. While this significantly improves the qua… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  36. arXiv:2312.08553  [pdf, other

    eess.AS cs.SD

    USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models

    Authors: Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal

    Abstract: End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the enormous memory usage and computational cost. Therefore, model compression is an important research topic to fit USM-based ASR under budget in real-world scenarios… ▽ More

    Submitted 16 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024. Preprint

  37. arXiv:2312.03763  [pdf, other

    cs.CV cs.GR cs.LG

    Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing

    Authors: Yushi Lan, Feitong Tan, Di Qiu, Qiangeng Xu, Kyle Genova, Zeng Huang, Sean Fanello, Rohit Pandey, Thomas Funkhouser, Chen Change Loy, Yinda Zhang

    Abstract: We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach leverages an implicit function representation of 3D human heads, employing 3D Gaussians anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we embed a lightweight tri-pla… ▽ More

    Submitted 19 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: project webpage: https://nirvanalan.github.io/projects/gaussian3diff/

  38. arXiv:2311.04643  [pdf, other

    cs.SE

    Software Architecture Recovery with Information Fusion

    Authors: Yiran Zhang, Zhengzi Xu, Chengwei Liu, Hongxu Chen, Jianwen Sun, Dong Qiu, Yang Liu

    Abstract: Understanding the architecture is vital for effectively maintaining and managing large software systems. However, as software systems evolve over time, their architectures inevitably change. To keep up with the change, architects need to track the implementation-level changes and update the architectural documentation accordingly, which is time-consuming and error-prone. Therefore, many automatic… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  39. arXiv:2311.00353  [pdf, other

    cs.CV

    LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation

    Authors: Yuxiang Bao, Di Qiu, Guoliang Kang, Baochang Zhang, Bo Jin, Kaiye Wang, Pengfei Yan

    Abstract: Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to enc… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  40. arXiv:2309.11488  [pdf, other

    cs.DC cs.AR

    An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator

    Authors: Tong Dong Qiu, Andreas Thune, Vinicius Oliveira Martins, Markus Blatt, Alf Birger Rustad, Razvan Nane

    Abstract: Realistic reservoir simulation is known to be prohibitively expensive in terms of computation time when increasing the accuracy of the simulation or by enlarging the model grid size. One method to address this issue is to parallelize the computation by dividing the model in several partitions and using multiple CPUs to compute the result using techniques such as MPI and multi-threading. Alternativ… ▽ More

    Submitted 11 April, 2025; v1 submitted 20 September, 2023; originally announced September 2023.

  41. arXiv:2307.03870  [pdf, other

    cs.FL eess.SY

    Opacity of Parametric Discrete Event Systems: Models, Decidability, and Algorithms

    Authors: Weilin Deng, Daowen Qiu, Jingkai Yang

    Abstract: Finite automata (FAs) model is a popular tool to characterize discrete event systems (DESs) due to its succinctness. However, for some complex systems, it is difficult to describe the necessary details by means of FAs model. In this paper, we consider a kind of extended finite automata (EFAs) in which each transition carries a predicate over state and event parameters. We also consider a type of s… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: 13 pages, 9 figures

  42. arXiv:2306.07719  [pdf, other

    cs.AI

    Contextual Dictionary Lookup for Knowledge Graph Completion

    Authors: Jining Wang, Delai Qiu, YouMing Liu, Yining Wang, Chuan Chen, Zibin Zheng, Yuren Zhou

    Abstract: Knowledge graph completion (KGC) aims to solve the incompleteness of knowledge graphs (KGs) by predicting missing links from known triples, numbers of knowledge graph embedding (KGE) models have been proposed to perform KGC by learning embeddings. Nevertheless, most existing embedding models map each relation into a unique vector, overlooking the specific fine-grained semantics of them under diffe… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  43. arXiv:2305.15536  [pdf, other

    eess.AS cs.LG

    RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

    Authors: David Qiu, David Rim, Shaojin Ding, Oleg Rybakov, Yanzhang He

    Abstract: With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models. Despite recent advances in quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have differen… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  44. arXiv:2305.06194  [pdf, other

    cs.RO

    Concentric Tube Robot Redundancy Resolution via Velocity/Compliance Manipulability Optimization

    Authors: Jia Shen, Yifan Wang, Milad Azizkhani, Deqiang Qiu, Yue Chen

    Abstract: Concentric Tube Robots (CTR) have the potential to enable effective minimally invasive surgeries. While extensive modeling and control schemes have been proposed in the past decade, limited efforts have been made to improve the trajectory tracking performance from the perspective of manipulability , which can be critical to generate safe motion and feasible actuator commands. In this paper, we pro… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: 8 pages, 5 figures

  45. arXiv:2304.01436  [pdf, other

    cs.CV cs.GR

    Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

    Authors: Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, Rohit Pandey, Ping Tan, Thabo Beeler, Sean Fanello, Yinda Zhang

    Abstract: We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduc… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: In CVPR2023. Project page: https://augmentedperception.github.io/monoavatar/

  46. arXiv:2212.05719  [pdf, other

    cs.CV

    Tensor Factorization via Transformed Tensor-Tensor Product for Image Alignment

    Authors: Sijia Xia, Duo Qiu, Xiongjun Zhang

    Abstract: In this paper, we study the problem of a batch of linearly correlated image alignment, where the observed images are deformed by some unknown domain transformations, and corrupted by additive Gaussian noise and sparse noise simultaneously. By stacking these images as the frontal slices of a third-order tensor, we propose to utilize the tensor factorization method via transformed tensor-tensor prod… ▽ More

    Submitted 13 December, 2022; v1 submitted 12 December, 2022; originally announced December 2022.

  47. arXiv:2210.13109  [pdf, other

    cs.CV

    WDA-Net: Weakly-Supervised Domain Adaptive Segmentation of Electron Microscopy

    Authors: Dafei Qiu, Jiajin Yi, Jialin Peng

    Abstract: Accurate segmentation of organelle instances, e.g., mitochondria, is essential for electron microscopy analysis. Despite the outstanding performance of fully supervised methods, they highly rely on sufficient per-pixel annotated data and are sensitive to domain shift. Aiming to develop a highly annotation-efficient approach with competitive performance, we focus on weakly-supervised domain adaptat… ▽ More

    Submitted 30 October, 2022; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted by BIBM 2022: International Conference on Bioinformatics & Biomedicine

  48. arXiv:2207.14709  [pdf, other

    eess.IV cs.CV

    Robust Quantitative Susceptibility Mapping via Approximate Message Passing with Parameter Estimation

    Authors: Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

    Abstract: Purpose: For quantitative susceptibility mapping (QSM), the lack of ground-truth in clinical settings makes it challenging to determine suitable parameters for the dipole inversion. We propose a probabilistic Bayesian approach for QSM with built-in parameter estimation, and incorporate the nonlinear formulation of the dipole inversion to achieve a robust recovery of the susceptibility maps. Theo… ▽ More

    Submitted 30 May, 2023; v1 submitted 29 July, 2022; originally announced July 2022.

    Comments: Keywords: Approximate message passing, Compressive sensing, Outlier modelling, Parameter estimation, Quantitative susceptibility mapping

  49. arXiv:2205.13117  [pdf, other

    cs.CV

    Learn to Cluster Faces via Pairwise Classification

    Authors: Junfu Liu, Di Qiu, Pengfei Yan, Xiaolin Wei

    Abstract: Face clustering plays an essential role in exploiting massive unlabeled face data. Recently, graph-based face clustering methods are getting popular for their satisfying performances. However, they usually suffer from excessive memory consumption especially on large-scale graphs, and rely on empirical thresholds to determine the connectivities between samples in inference, which restricts their ap… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted by ICCV2021

  50. Approximate Message Passing with Parameter Estimation for Heavily Quantized Measurements

    Authors: Shuai Huang, Deqiang Qiu, Trac D. Tran

    Abstract: Designing efficient sparse recovery algorithms that could handle noisy quantized measurements is important in a variety of applications -- from radar to source localization, spectrum sensing and wireless networking. We take advantage of the approximate message passing (AMP) framework to achieve this goal given its high computational efficiency and state-of-the-art performance. In AMP, the signal o… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2007.07679

    Journal ref: IEEE Transactions on Signal Processing, Vol. 70, pp. 2062-2077, Apr. 2022