Skip to main content

Showing 1–50 of 71 results for author: Ouyang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.14477  [pdf, ps, other

    cs.AI

    GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies

    Authors: Jingqi Yang, Zhilong Song, Jiawei Chen, Mingli Song, Sheng Zhou, linjun sun, Xiaogang Ouyang, Chun Chen, Can Wang

    Abstract: The development of high-quality datasets is crucial for benchmarking and advancing research in Graphical User Interface (GUI) agents. Despite their importance, existing datasets are often constructed under idealized conditions, overlooking the diverse anomalies frequently encountered in real-world deployments. To address this limitation, we introduce GUI-Robust, a novel dataset designed for compre… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 10 pages, 4 figures, submitted to NIPS 2025

  2. arXiv:2506.11250  [pdf, ps, other

    cs.LG cs.AI eess.SY

    Can Time-Series Foundation Models Perform Building Energy Management Tasks?

    Authors: Ozan Baris Mulayim, Pengrui Quan, Liying Han, Xiaomin Ouyang, Dezhi Hong, Mario Bergés, Mani Srivastava

    Abstract: Building energy management (BEM) tasks require processing and learning from a variety of time-series data. Existing solutions rely on bespoke task- and data-specific models to perform these tasks, limiting their broader applicability. Inspired by the transformative success of Large Language Models (LLMs), Time-Series Foundation Models (TSFMs), trained on diverse datasets, have the potential to cha… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 30 pages, 5 tables, 8 figures. Under review for Data-Centric Engineering journal

  3. arXiv:2505.20306  [pdf, other

    cs.AI eess.IV q-bio.QM

    Multi-Modal Artificial Intelligence of Embryo Grading and Pregnancy Prediction in Assisted Reproductive Technology: A Review

    Authors: Xueqiang Ouyang, Jia Wei

    Abstract: As a global disease, infertility has always affected human beings. The development of assisted reproductive technology can effectively solve this disease. However, the traditional in vitro fertilization-embryo transfer technology still faces many challenges in improving the success rate of pregnancy, such as the subjectivity of embryo grading and the inefficiency of integrating multi-modal data. T… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  4. arXiv:2505.14035  [pdf, ps, other

    cs.MM cs.CL

    ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs

    Authors: Shiyao Cui, Qinglin Zhang, Xuan Ouyang, Renmiao Chen, Zhexin Zhang, Yida Lu, Hongning Wang, Han Qiu, Minlie Huang

    Abstract: Toxicity detection in multimodal text-image content faces growing challenges, especially with multimodal implicit toxicity, where each modality appears benign on its own but conveys hazard when combined. Multimodal implicit toxicity appears not only as formal statements in social platforms but also prompts that can lead to toxic dialogs from Large Vision-Language Models (LVLMs). Despite the succes… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  5. arXiv:2504.09707  [pdf, other

    cs.AI cs.IT cs.LG cs.MM

    InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals

    Authors: Tomoyoshi Kimura, Xinlin Li, Osama Hanna, Yatong Chen, Yizhuo Chen, Denizhan Kara, Tianshi Wang, Jinyang Li, Xiaomin Ouyang, Shengzhong Liu, Mani Srivastava, Suhas Diggavi, Tarek Abdelzaher

    Abstract: Standard multimodal self-supervised learning (SSL) algorithms regard cross-modal synchronization as implicit supervisory labels during pretraining, thus posing high requirements on the scale and quality of multimodal samples. These constraints significantly limit the performance of sensing intelligence in IoT applications, as the heterogeneity and the non-interpretability of time-series signals re… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  6. Toward Foundation Models for Online Complex Event Detection in CPS-IoT: A Case Study

    Authors: Liying Han, Gaofeng Dong, Xiaomin Ouyang, Lance Kaplan, Federico Cerutti, Mani Srivastava

    Abstract: Complex events (CEs) play a crucial role in CPS-IoT applications, enabling high-level decision-making in domains such as smart monitoring and autonomous systems. However, most existing models focus on short-span perception tasks, lacking the long-term reasoning required for CE detection. CEs consist of sequences of short-time atomic events (AEs) governed by spatiotemporal dependencies. Detecting t… ▽ More

    Submitted 25 April, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

    Journal ref: FMSys Proc. 2 (2025) 1-6

  7. arXiv:2502.07250  [pdf, ps, other

    cs.LG cs.AI

    NAROCE: A Neural Algorithmic Reasoner Framework for Online Complex Event Detection

    Authors: Liying Han, Gaofeng Dong, Xiaomin Ouyang, Lance Kaplan, Federico Cerutti, Mani Srivastava

    Abstract: Modern machine learning models excel at detecting individual actions, objects, or scene attributes from short, local observations. However, many real-world tasks, such as in smart cities and healthcare, require reasoning over complex events (CEs): (spatio)temporal, rule-governed patterns of short-term atomic events (AEs) that reflect high-level understanding and critical changes in the environment… ▽ More

    Submitted 16 June, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  8. arXiv:2501.15268  [pdf, other

    cs.CL

    New Evaluation Paradigm for Lexical Simplification

    Authors: Jipeng Qiang, Minjiang Huang, Yi Zhu, Yunhao Yuan, Chaowei Zhang, Xiaoye Ouyang

    Abstract: Lexical Simplification (LS) methods use a three-step pipeline: complex word identification, substitute generation, and substitute ranking, each with separate evaluation datasets. We found large language models (LLMs) can simplify sentences directly with a single prompt, bypassing the traditional pipeline. However, existing LS datasets are not suitable for evaluating these LLM-generated simplified… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  9. arXiv:2501.04353  [pdf, other

    cs.CV cs.LG

    DeFusion: An Effective Decoupling Fusion Network for Multi-Modal Pregnancy Prediction

    Authors: Xueqiang Ouyang, Jia Wei, Wenjie Huo, Xiaocong Wang, Rui Li, Jianlong Zhou

    Abstract: Temporal embryo images and parental fertility table indicators are both valuable for pregnancy prediction in \textbf{in vitro fertilization embryo transfer} (IVF-ET). However, current machine learning models cannot make full use of the complementary information between the two modalities to improve pregnancy prediction performance. In this paper, we propose a Decoupling Fusion Network called DeFus… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  10. arXiv:2412.13735  [pdf, other

    cs.CV

    3D Registration in 30 Years: A Survey

    Authors: Jiaqi Yang, Chu'ai Zhang, Zhengbao Wang, Xinyue Cao, Xuan Ouyang, Xiyu Zhang, Zhenxuan Zeng, Zhao Zeng, Borui Lu, Zhiyi Xia, Qian Zhang, Yulan Guo, Yanning Zhang

    Abstract: 3D point cloud registration is a fundamental problem in computer vision, computer graphics, robotics, remote sensing, and etc. Over the last thirty years, we have witnessed the amazing advancement in this area with numerous kinds of solutions. Although a handful of relevant surveys have been conducted, their coverage is still limited. In this work, we present a comprehensive survey on 3D point clo… ▽ More

    Submitted 19 December, 2024; v1 submitted 18 December, 2024; originally announced December 2024.

  11. arXiv:2412.13542  [pdf, other

    cs.CL

    Multi-Granularity Open Intent Classification via Adaptive Granular-Ball Decision Boundary

    Authors: Yanhua Li, Xiaocao Ouyang, Chaofan Pan, Jie Zhang, Sen Zhao, Shuyin Xia, Xin Yang, Guoyin Wang, Tianrui Li

    Abstract: Open intent classification is critical for the development of dialogue systems, aiming to accurately classify known intents into their corresponding classes while identifying unknown intents. Prior boundary-based methods assumed known intents fit within compact spherical regions, focusing on coarse-grained representation and precise spherical decision boundaries. However, these assumptions are oft… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: This paper has been Accepted on AAAI2025

    Journal ref: AAAI2025

  12. arXiv:2412.07196  [pdf, other

    cs.CV

    Fine-grained Text to Image Synthesis

    Authors: Xu Ouyang, Ying Chen, Kaiyue Zhu, Gady Agam

    Abstract: Fine-grained text to image synthesis involves generating images from texts that belong to different categories. In contrast to general text to image synthesis, in fine-grained synthesis there is high similarity between images of different subclasses, and there may be linguistic discrepancy among texts describing the same image. Recent Generative Adversarial Networks (GAN), such as the Recurrent Af… ▽ More

    Submitted 15 December, 2024; v1 submitted 10 December, 2024; originally announced December 2024.

  13. arXiv:2411.17691  [pdf, other

    cs.LG cs.CL

    Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

    Authors: Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu

    Abstract: We reveal that low-bit quantization favors undertrained large language models (LLMs) by observing that models with larger sizes or fewer training tokens experience less quantization-induced degradation (QiD) when applying low-bit quantization, whereas smaller models with extensive training tokens suffer significant QiD. To gain deeper insights into this trend, we study over 1500 quantized LLM chec… ▽ More

    Submitted 26 November, 2024; v1 submitted 26 November, 2024; originally announced November 2024.

    Comments: Work in Progress

  14. arXiv:2411.12126  [pdf, other

    cs.LG

    MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT

    Authors: Xiaomin Ouyang, Jason Wu, Tomoyoshi Kimura, Yihan Lin, Gunjan Verma, Tarek Abdelzaher, Mani Srivastava

    Abstract: Multimodal sensing systems are increasingly prevalent in various real-world applications. Most existing multimodal learning approaches heavily rely on training with a large amount of synchronized, complete multimodal data. However, such a setting is impractical in real-world IoT sensing applications where data is typically collected by distributed nodes with heterogeneous data modalities, and is a… ▽ More

    Submitted 5 March, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  15. arXiv:2411.02047  [pdf, other

    cs.LG stat.ML

    Theory-inspired Label Shift Adaptation via Aligned Distribution Mixture

    Authors: Ruidong Fan, Xiao Ouyang, Hong Tao, Yuhua Qian, Chenping Hou

    Abstract: As a prominent challenge in addressing real-world issues within a dynamic environment, label shift, which refers to the learning setting where the source (training) and target (testing) label distributions do not match, has recently received increasing attention. Existing label shift methods solely use unlabeled target samples to estimate the target label distribution, and do not involve them duri… ▽ More

    Submitted 5 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

  16. arXiv:2410.10741  [pdf, other

    cs.AI cs.LG eess.SP

    SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing

    Authors: Pengrui Quan, Xiaomin Ouyang, Jeya Vikranth Jeyakumar, Ziqi Wang, Yang Xing, Mani Srivastava

    Abstract: Effective processing, interpretation, and management of sensor data have emerged as a critical component of cyber-physical systems. Traditionally, processing sensor data requires profound theoretical knowledge and proficiency in signal-processing tools. However, recent works show that Large Language Models (LLMs) have promising capabilities in processing sensory data, suggesting their potential as… ▽ More

    Submitted 28 March, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

  17. arXiv:2406.06796  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.SP

    FlexLoc: Conditional Neural Networks for Zero-Shot Sensor Perspective Invariance in Object Localization with Distributed Multimodal Sensors

    Authors: Jason Wu, Ziqi Wang, Xiaomin Ouyang, Ho Lyun Jeong, Colin Samplawski, Lance Kaplan, Benjamin Marlin, Mani Srivastava

    Abstract: Localization is a critical technology for various applications ranging from navigation and surveillance to assisted living. Localization systems typically fuse information from sensors viewing the scene from different perspectives to estimate the target location while also employing multiple modalities for enhanced robustness and accuracy. Recently, such systems have employed end-to-end deep neura… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  18. arXiv:2405.12107  [pdf, other

    cs.CV cs.CL

    Imp: Highly Capable Large Multimodal Models for Mobile Devices

    Authors: Zhenwei Shao, Zhou Yu, Jun Yu, Xuecheng Ouyang, Lihao Zheng, Zhenbiao Gai, Mingyang Wang, Jiajun Ding

    Abstract: By harnessing the capabilities of large language models (LLMs), recent large multimodal models (LMMs) have shown remarkable versatility in open-world multimodal understanding. Nevertheless, they are usually parameter-heavy and computation-intensive, thus hindering their applicability in resource-constrained scenarios. To this end, several lightweight LMMs have been proposed successively to maximiz… ▽ More

    Submitted 29 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: fix some typos and correct a few number in the tables

  19. arXiv:2403.19857  [pdf, other

    cs.AI

    LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces

    Authors: Xiaomin Ouyang, Mani Srivastava

    Abstract: Most studies on machine learning in sensing systems focus on low-level perception tasks that process raw sensory data within a short time window. However, many practical applications, such as human routine modeling and occupancy tracking, require high-level reasoning abilities to comprehend concepts and make inferences based on long-term sensor traces. Existing machine learning-based approaches fo… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 6 pages

  20. arXiv:2403.18198  [pdf, other

    eess.IV cs.CV

    Generative Medical Segmentation

    Authors: Jiayu Huo, Xi Ouyang, Sébastien Ourselin, Rachel Sparks

    Abstract: Rapid advancements in medical image segmentation performance have been significantly driven by the development of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). These models follow the discriminative pixel-wise classification learning paradigm and often have limited ability to generalize across diverse medical imaging datasets. In this manuscript, we introduce Generative Medi… ▽ More

    Submitted 19 August, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  21. arXiv:2402.10464  [pdf, other

    cs.LG cs.NI

    FedKit: Enabling Cross-Platform Federated Learning for Android and iOS

    Authors: Sichang He, Beilong Tang, Boyan Zhang, Jiaoqi Shao, Xiaomin Ouyang, Daniel Nata Nugraha, Bing Luo

    Abstract: We present FedKit, a federated learning (FL) system tailored for cross-platform FL research on Android and iOS devices. FedKit pipelines cross-platform FL development by enabling model conversion, hardware-accelerated training, and cross-platform model aggregation. Our FL workflow supports flexible machine learning operations (MLOps) in production, facilitating continuous model delivery and traini… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: This work has been accepted for demonstration on IEEE International Conference on Computer Communications (INFOCOM) 2024

  22. arXiv:2310.19642  [pdf, other

    cs.DB

    Consistent Query Answering for Primary Keys on Rooted Tree Queries

    Authors: Paraschos Koutris, Xiating Ouyang, Jef Wijsen

    Abstract: We study the data complexity of consistent query answering (CQA) on databases that may violate the primary key constraints. A repair is a maximal subset of the database satisfying the primary key constraints. For a Boolean query q, the problem CERTAINTY(q) takes a database as input, and asks whether or not each repair satisfies q. The computational complexity of CERTAINTY(q) has been established w… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: To appear in PODS'24

  23. arXiv:2310.15301  [pdf, other

    cs.LG

    ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer's Disease

    Authors: Xiaomin Ouyang, Xian Shuai, Yang Li, Li Pan, Xifan Zhang, Heming Fu, Sitong Cheng, Xinyan Wang, Shihua Cao, Jiang Xin, Hazel Mok, Zhenyu Yan, Doris Sau Fung Yu, Timothy Kwok, Guoliang Xing

    Abstract: Alzheimer's Disease (AD) and related dementia are a growing global health challenge due to the aging population. In this paper, we present ADMarker, the first end-to-end system that integrates multi-modal sensors and new federated learning algorithms for detecting multidimensional AD digital biomarkers in natural living environments. ADMarker features a novel three-stage multi-modal federated lear… ▽ More

    Submitted 12 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  24. arXiv:2310.05385  [pdf, ps, other

    cs.DB cs.LO

    Conjunctive Queries with Negation and Aggregation: A Linear Time Characterization

    Authors: Hangdong Zhao, Austen Z. Fan, Xiating Ouyang, Paraschos Koutris

    Abstract: In this paper, we study the complexity of evaluating Conjunctive Queries with negation (\cqneg). First, we present an algorithm with linear preprocessing time and constant delay enumeration for a class of CQs with negation called free-connex signed-acyclic queries. We show that no other queries admit such an algorithm subject to lower bound conjectures. Second, we extend our algorithm to Conjuncti… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 39 pages

  25. arXiv:2310.02373  [pdf, other

    cs.LG cs.CR

    SelectFormer: Private and Practical Data Selection for Transformers

    Authors: Xu Ouyang, Felix Xiaozhu Lin, Yangfeng Ji

    Abstract: Critical to a free data market is $\textit{private data selection}$, i.e. the model owner selects and then appraises training data from the data owner before both parties commit to a transaction. To keep the data and model private, this process shall evaluate the target model to be trained over Multi-Party Computation (MPC). While prior work suggests that evaluating Transformer-based models over M… ▽ More

    Submitted 1 March, 2025; v1 submitted 3 October, 2023; originally announced October 2023.

  26. arXiv:2309.15270  [pdf, other

    cs.DB

    Consistent Query Answering for Primary Keys on Path Queries

    Authors: Paraschos Koutris, Xiating Ouyang, Jef Wijsen

    Abstract: We study the data complexity of consistent query answering (CQA) on databases that may violate the primary key constraints. A repair is a maximal consistent subset of the database. For a Boolean query $q$, the problem $\mathsf{CERTAINTY}(q)$ takes a database as input, and asks whether or not each repair satisfies $q$. It is known that for any self-join-free Boolean conjunctive query $q$,… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: An evolved version of a paper published at PODS'21

  27. arXiv:2308.10837  [pdf, other

    cs.IR

    Leveraging Large Language Models for Pre-trained Recommender Systems

    Authors: Zhixuan Chu, Hongyan Hao, Xin Ouyang, Simeng Wang, Yan Wang, Yue Shen, Jinjie Gu, Qing Cui, Longfei Li, Siqiao Xue, James Y Zhang, Sheng Li

    Abstract: Recent advancements in recommendation systems have shifted towards more comprehensive and personalized recommendations by utilizing large language models (LLM). However, effectively integrating LLM's commonsense knowledge and reasoning abilities into recommendation systems remains a challenging problem. In this paper, we propose RecSysLLM, a novel pre-trained recommendation model based on LLMs. Re… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: 13 pages, 4 figures

  28. arXiv:2308.10835  [pdf, other

    cs.IR

    Enhancing Recommender Systems with Large Language Model Reasoning Graphs

    Authors: Yan Wang, Zhixuan Chu, Xin Ouyang, Simeng Wang, Hongyan Hao, Yue Shen, Jinjie Gu, Siqiao Xue, James Y Zhang, Qing Cui, Longfei Li, Jun Zhou, Sheng Li

    Abstract: Recommendation systems aim to provide users with relevant suggestions, but often lack interpretability and fail to capture higher-level semantic relationships between user behaviors and profiles. In this paper, we propose a novel approach that leverages large language models (LLMs) to construct personalized reasoning graphs. These graphs link a user's profile and behavioral sequences through causa… ▽ More

    Submitted 24 January, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: 12 pages, 6 figures

  29. arXiv:2307.01220  [pdf, other

    eess.IV cs.CV

    ARHNet: Adaptive Region Harmonization for Lesion-aware Augmentation to Improve Segmentation Performance

    Authors: Jiayu Huo, Yang Liu, Xi Ouyang, Alejandro Granados, Sebastien Ourselin, Rachel Sparks

    Abstract: Accurately segmenting brain lesions in MRI scans is critical for providing patients with prognoses and neurological monitoring. However, the performance of CNN-based segmentation methods is constrained by the limited training set size. Advanced data augmentation is an effective strategy to improve the model's robustness. However, they often introduce intensity disparities between foreground and ba… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: 9 pages, 4 figures, 3 tables

  30. arXiv:2305.08826  [pdf, other

    cs.CV

    Learning Better Contrastive View from Radiologist's Gaze

    Authors: Sheng Wang, Zixu Zhuang, Xi Ouyang, Lichi Zhang, Zheren Li, Chong Ma, Tianming Liu, Dinggang Shen, Qian Wang

    Abstract: Recent self-supervised contrastive learning methods greatly benefit from the Siamese structure that aims to minimizing distances between positive pairs. These methods usually apply random data augmentation to input images, expecting the augmented views of the same images to be similar and positively paired. However, random augmentation may overlook image semantic information and degrade the qualit… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  31. arXiv:2304.10226  [pdf, other

    cs.CV cs.LG

    Domain Generalization for Mammographic Image Analysis with Contrastive Learning

    Authors: Zheren Li, Zhiming Cui, Lichi Zhang, Sheng Wang, Chenjin Lei, Xi Ouyang, Dongdong Chen, Xiangyu Zhao, Yajia Gu, Zaiyi Liu, Chunling Liu, Dinggang Shen, Jie-Zhi Cheng

    Abstract: The deep learning technique has been shown to be effectively addressed several image analysis tasks in the computer-aided diagnosis scheme for mammography. The training of an efficacious deep learning model requires large data with diverse styles and qualities. The diversity of data often comes from the use of various scanners of vendors. But, in practice, it is impractical to collect a sufficient… ▽ More

    Submitted 7 September, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: arXiv admin note: text overlap with arXiv:2111.10827

  32. arXiv:2303.08322  [pdf, other

    cs.LG cs.AI cs.DC cs.GT cs.NI

    Optimization Design for Federated Learning in Heterogeneous 6G Networks

    Authors: Bing Luo, Xiaomin Ouyang, Peng Sun, Pengchao Han, Ningning Ding, Jianwei Huang

    Abstract: With the rapid advancement of 5G networks, billions of smart Internet of Things (IoT) devices along with an enormous amount of data are generated at the network edge. While still at an early age, it is expected that the evolving 6G network will adopt advanced artificial intelligence (AI) technologies to collect, transmit, and learn this valuable data for innovative applications and intelligent ser… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted in IEEE Nework

  33. arXiv:2303.04811  [pdf, other

    cs.LG cs.DB

    Naive Bayes Classifiers over Missing Data: Decision and Poisoning

    Authors: Song Bian, Xiating Ouyang, Zhiwei Fan, Paraschos Koutris

    Abstract: We study the certifiable robustness of ML classifiers on dirty datasets that could contain missing values. A test point is certifiably robust for an ML classifier if the classifier returns the same prediction for that test point, regardless of which cleaned version (among exponentially many) of the dirty dataset the classifier is trained on. In this paper, we show theoretically that for Naive Baye… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: 22 pages, 10 figures

    Journal ref: ICML 2024

  34. Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering

    Authors: Zhou Yu, Xuecheng Ouyang, Zhenwei Shao, Meng Wang, Jun Yu

    Abstract: Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question. Early studies retrieve required knowledge from explicit knowledge bases (KBs), which often introduces irrelevant information to the question, hence restricting the performance of their models. Recent works have resorted to using a powerful large language model (LLM) as an implicit k… ▽ More

    Submitted 28 April, 2025; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: An extended journal version of our CVPR 2023 paper, which has been accepted at IEEE T-PAMI 2025. The original conference version can be referred to as the v1 version

  35. arXiv:2302.07257  [pdf, other

    cs.CV eess.IV

    ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models

    Authors: Sheng Wang, Zihao Zhao, Xi Ouyang, Qian Wang, Dinggang Shen

    Abstract: Large language models (LLMs) have recently demonstrated their potential in clinical applications, providing valuable medical knowledge and advice. For example, a large dialog LLM like ChatGPT has successfully passed part of the US medical licensing exam. However, LLMs currently have difficulty processing images, making it challenging to interpret information from medical images, which are rich in… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

  36. e-G2C: A 0.14-to-8.31 $μ$J/Inference NN-based Processor with Continuous On-chip Adaptation for Anomaly Detection and ECG Conversion from EGM

    Authors: Yang Zhao, Yongan Zhang, Yonggan Fu, Xu Ouyang, Cheng Wan, Shang Wu, Anton Banta, Mathews M. John, Allison Post, Mehdi Razavi, Joseph Cavallaro, Behnaam Aazhang, Yingyan Lin

    Abstract: This work presents the first silicon-validated dedicated EGM-to-ECG (G2C) processor, dubbed e-G2C, featuring continuous lightweight anomaly detection, event-driven coarse/precise conversion, and on-chip adaptation. e-G2C utilizes neural network (NN) based G2C conversion and integrates 1) an architecture supporting anomaly detection and coarse/precise conversion via time multiplexing to balance the… ▽ More

    Submitted 23 July, 2022; originally announced September 2022.

    Comments: Accepted by 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)

  37. arXiv:2208.12339  [pdf, other

    cs.DB

    LinCQA: Faster Consistent Query Answering with Linear Time Guarantees

    Authors: Zhiwei Fan, Paraschos Koutris, Xiating Ouyang, Jef Wijsen

    Abstract: Most data analytical pipelines often encounter the problem of querying inconsistent data that violate pre-determined integrity constraints. Data cleaning is an extensively studied paradigm that singles out a consistent repair of the inconsistent data. Consistent query answering (CQA) is an alternative approach to data cleaning that asks for all tuples guaranteed to be returned by a given query on… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

  38. arXiv:2207.14386  [pdf, other

    cs.CL

    Efficient NLP Model Finetuning via Multistage Data Filtering

    Authors: Xu Ouyang, Shahina Mohd Azam Ansari, Felix Xiaozhu Lin, Yangfeng Ji

    Abstract: As model finetuning is central to the modern NLP, we set to maximize its efficiency. Motivated by redundancy in training examples and the sheer sizes of pretrained models, we exploit a key opportunity: training only on important data. To this end, we set to filter training examples in a streaming fashion, in tandem with training the target model. Our key techniques are two: (1) automatically deter… ▽ More

    Submitted 18 May, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

  39. arXiv:2207.09389  [pdf, other

    eess.IV cs.CV

    Image Synthesis with Disentangled Attributes for Chest X-Ray Nodule Augmentation and Detection

    Authors: Zhenrong Shen, Xi Ouyang, Bin Xiao, Jie-Zhi Cheng, Qian Wang, Dinggang Shen

    Abstract: Lung nodule detection in chest X-ray (CXR) images is common to early screening of lung cancers. Deep-learning-based Computer-Assisted Diagnosis (CAD) systems can support radiologists for nodule screening in CXR. However, it requires large-scale and diverse medical data with high-quality annotations to train such robust and accurate CADs. To alleviate the limited availability of such datasets, lung… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

  40. arXiv:2207.03677  [pdf, other

    cs.CV cs.LG

    SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning

    Authors: Haoran You, Baopu Li, Zhanyi Sun, Xu Ouyang, Yingyan Celine Lin

    Abstract: Neural architecture search (NAS) has demonstrated amazing success in searching for efficient deep neural networks (DNNs) from a given supernet. In parallel, the lottery ticket hypothesis has shown that DNNs contain small subnetworks that can be trained from scratch to achieve a comparable or higher accuracy than original DNNs. As such, it is currently a common practice to develop efficient DNNs vi… ▽ More

    Submitted 3 March, 2025; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV 2022

  41. arXiv:2206.08141  [pdf

    cs.AR

    i-FlatCam: A 253 FPS, 91.49 $μ$J/Frame Ultra-Compact Intelligent Lensless Camera for Real-Time and Efficient Eye Tracking in VR/AR

    Authors: Yang Zhao, Ziyun Li, Yonggan Fu, Yongan Zhang, Chaojian Li, Cheng Wan, Haoran You, Shang Wu, Xu Ouyang, Vivek Boominathan, Ashok Veeraraghavan, Yingyan Celine Lin

    Abstract: We present a first-of-its-kind ultra-compact intelligent camera system, dubbed i-FlatCam, including a lensless camera with a computational (Comp.) chip. It highlights (1) a predict-then-focus eye tracking pipeline for boosted efficiency without compromising the accuracy, (2) a unified compression scheme for single-chip processing and improved frame rate per second (FPS), and (3) dedicated intra-ch… ▽ More

    Submitted 28 March, 2025; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Accepted by VLSI 2022

  42. Follow My Eye: Using Gaze to Supervise Computer-Aided Diagnosis

    Authors: Sheng Wang, Xi Ouyang, Tianming Liu, Qian Wang, Dinggang Shen

    Abstract: When deep neural network (DNN) was first introduced to the medical image analysis community, researchers were impressed by its performance. However, it is evident now that a large number of manually labeled data is often a must to train a properly functioning DNN. This demand for supervision data and labels is a major bottleneck in current medical image analysis, since collecting a large number of… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

  43. arXiv:2201.04318  [pdf, other

    eess.IV cs.CV

    Knee Cartilage Defect Assessment by Graph Representation and Surface Convolution

    Authors: Zixu Zhuang, Liping Si, Sheng Wang, Kai Xuan, Xi Ouyang, Yiqiang Zhan, Zhong Xue, Lichi Zhang, Dinggang Shen, Weiwu Yao, Qian Wang

    Abstract: Knee osteoarthritis (OA) is the most common osteoarthritis and a leading cause of disability. Cartilage defects are regarded as major manifestations of knee OA, which are visible by magnetic resonance imaging (MRI). Thus early detection and assessment for knee cartilage defects are important for protecting patients from knee OA. In this way, many attempts have been made on knee cartilage defect as… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: 10 pages, 4 figures

  44. Learning Hierarchical Attention for Weakly-supervised Chest X-Ray Abnormality Localization and Diagnosis

    Authors: Xi Ouyang, Srikrishna Karanam, Ziyan Wu, Terrence Chen, Jiayu Huo, Xiang Sean Zhou, Qian Wang, Jie-Zhi Cheng

    Abstract: We consider the problem of abnormality localization for clinical applications. While deep learning has driven much recent progress in medical imaging, many clinical challenges are not fully addressed, limiting its broader usage. While recent methods report high diagnostic accuracies, physicians have concerns trusting these algorithm results for diagnostic decision-making purposes because of a gene… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Journal ref: IEEE Transactions on Medical Imaging 2021

  45. arXiv:2111.10827  [pdf, other

    eess.IV cs.CV

    Domain Generalization for Mammography Detection via Multi-style and Multi-view Contrastive Learning

    Authors: Zheren Li, Zhiming Cui, Sheng Wang, Yuji Qi, Xi Ouyang, Qitian Chen, Yuezhi Yang, Zhong Xue, Dinggang Shen, Jie-Zhi Cheng

    Abstract: Lesion detection is a fundamental problem in the computer-aided diagnosis scheme for mammography. The advance of deep learning techniques have made a remarkable progress for this task, provided that the training data are large and sufficiently diverse in terms of image style and quality. In particular, the diversity of image style may be majorly attributed to the vendor factor. However, the collec… ▽ More

    Submitted 21 November, 2021; originally announced November 2021.

    Comments: Pages 98-108

    Journal ref: International Conference on Medical Image Computing and Computer-Assisted Intervention 2021

  46. arXiv:2111.01677  [pdf, other

    cs.CV cs.LG

    Top1 Solution of QQ Browser 2021 Ai Algorithm Competition Track 1 : Multimodal Video Similarity

    Authors: Zhuoran Ma, Majing Lou, Xuan Ouyang

    Abstract: In this paper, we describe the solution to the QQ Browser 2021 Ai Algorithm Competition (AIAC) Track 1. We use the multi-modal transformer model for the video embedding extraction. In the pretrain phase, we train the model with three tasks, (1) Video Tag Classification (VTC), (2) Mask Language Modeling (MLM) and (3) Mask Frame Modeling (MFM). In the finetune phase, we train the model with video si… ▽ More

    Submitted 30 October, 2021; originally announced November 2021.

  47. arXiv:2110.14068  [pdf, other

    cs.LG

    Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks

    Authors: Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Celine Lin

    Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i.e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions. To tackle this, adversarial training is currently the most effective defense method, by augmenting the training set with adversarial samples generated on the fly. Interestingly, we discover for th… ▽ More

    Submitted 3 January, 2025; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2021

  48. arXiv:2110.10639  [pdf, other

    cs.CV

    Semi-supervised Domain Adaptation for Semantic Segmentation

    Authors: Ying Chen, Xu Ouyang, Kaiyue Zhu, Gady Agam

    Abstract: Deep learning approaches for semantic segmentation rely primarily on supervised learning approaches and require substantial efforts in producing pixel-level annotations. Further, such approaches may perform poorly when applied to unseen image domains. To cope with these limitations, both unsupervised domain adaptation (UDA) with full source supervision but without target supervision and semi-super… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

  49. arXiv:2107.02137  [pdf, other

    cs.CL

    ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

    Authors: Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Xuan Ouyang, Dianhai Yu, Hao Tian, Hua Wu, Haifeng Wang

    Abstract: Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, the… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

  50. arXiv:2104.08215  [pdf, other

    cs.LG cs.CV

    "BNN - BN = ?": Training Binary Neural Networks without Batch Normalization

    Authors: Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

    Abstract: Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN). However, the BN layer is costly to calculate and is typically implemented with non-binary parameters, leaving a hurdle for the efficient implementation of BNN training. It also introduces undesirable dependence between samples within each batch. Inspired by the latest advance o… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.