-
Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations
Authors:
Kalyan Ramakrishnan,
Jonathan G. Hedley,
Sisi Qu,
Puneet K. Dokania,
Philip H. S. Torr,
Cesar A. Prada-Medina,
Julien Fauqueur,
Kaspar Martens
Abstract:
We train a neural network to predict distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only predict changes in the mean expression, overlooking stochasticity inherent in single-cell data. In contrast, we of…
▽ More
We train a neural network to predict distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only predict changes in the mean expression, overlooking stochasticity inherent in single-cell data. In contrast, we offer a more realistic view of cellular responses by modeling expression distributions. Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics, such as variance, skewness, and kurtosis, at a fraction of the training cost. To generalize to unseen perturbations, we incorporate prior knowledge via gene embeddings from large language models (LLMs). While modeling a richer output space, the method remains competitive in predicting mean expression changes. This work offers a practical step towards more expressive and biologically informative models of perturbation effects.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Automating Exploratory Multiomics Research via Language Models
Authors:
Shang Qu,
Ning Ding,
Linhai Xie,
Yifei Li,
Zaoqu Liu,
Kaiyan Zhang,
Yibai Xiong,
Yuxin Zuo,
Zhangren Chen,
Ermo Hua,
Xingtai Lv,
Youbang Sun,
Yang Li,
Dong Li,
Fuchu He,
Bowen Zhou
Abstract:
This paper introduces PROTEUS, a fully automated system that produces data-driven hypotheses from raw data files. We apply PROTEUS to clinical proteogenomics, a field where effective downstream data analysis and hypothesis proposal is crucial for producing novel discoveries. PROTEUS uses separate modules to simulate different stages of the scientific process, from open-ended data exploration to sp…
▽ More
This paper introduces PROTEUS, a fully automated system that produces data-driven hypotheses from raw data files. We apply PROTEUS to clinical proteogenomics, a field where effective downstream data analysis and hypothesis proposal is crucial for producing novel discoveries. PROTEUS uses separate modules to simulate different stages of the scientific process, from open-ended data exploration to specific statistical analysis and hypothesis proposal. It formulates research directions, tools, and results in terms of relationships between biological entities, using unified graph structures to manage complex research processes. We applied PROTEUS to 10 clinical multiomics datasets from published research, arriving at 360 total hypotheses. Results were evaluated through external data validation and automatic open-ended scoring. Through exploratory and iterative research, the system can navigate high-throughput and heterogeneous multiomics data to arrive at hypotheses that balance reliability and novelty. In addition to accelerating multiomic analysis, PROTEUS represents a path towards tailoring general autonomous systems to specialized scientific domains to achieve open-ended hypothesis generation from data.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
UrbanCraft: Urban View Extrapolation via Hierarchical Sem-Geometric Priors
Authors:
Tianhang Wang,
Fan Lu,
Sanqing Qu,
Guo Yu,
Shihang Du,
Ya Wu,
Yuan Huang,
Guang Chen
Abstract:
Existing neural rendering-based urban scene reconstruction methods mainly focus on the Interpolated View Synthesis (IVS) setting that synthesizes from views close to training camera trajectory. However, IVS can not guarantee the on-par performance of the novel view outside the training camera distribution (\textit{e.g.}, looking left, right, or downwards), which limits the generalizability of the…
▽ More
Existing neural rendering-based urban scene reconstruction methods mainly focus on the Interpolated View Synthesis (IVS) setting that synthesizes from views close to training camera trajectory. However, IVS can not guarantee the on-par performance of the novel view outside the training camera distribution (\textit{e.g.}, looking left, right, or downwards), which limits the generalizability of the urban reconstruction application. Previous methods have optimized it via image diffusion, but they fail to handle text-ambiguous or large unseen view angles due to coarse-grained control of text-only diffusion. In this paper, we design UrbanCraft, which surmounts the Extrapolated View Synthesis (EVS) problem using hierarchical sem-geometric representations serving as additional priors. Specifically, we leverage the partially observable scene to reconstruct coarse semantic and geometric primitives, establishing a coarse scene-level prior through an occupancy grid as the base representation. Additionally, we incorporate fine instance-level priors from 3D bounding boxes to enhance object-level details and spatial relationships. Building on this, we propose the \textbf{H}ierarchical \textbf{S}emantic-Geometric-\textbf{G}uided Variational Score Distillation (HSG-VSD), which integrates semantic and geometric constraints from pretrained UrbanCraft2D into the score distillation sampling process, forcing the distribution to be consistent with the observable scene. Qualitative and quantitative comparisons demonstrate the effectiveness of our methods on EVS problem.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
TTRL: Test-Time Reinforcement Learning
Authors:
Yuxin Zuo,
Kaiyan Zhang,
Li Sheng,
Shang Qu,
Ganqu Cui,
Xuekai Zhu,
Haozhan Li,
Yuchen Zhang,
Xinwei Long,
Ermo Hua,
Biqing Qi,
Youbang Sun,
Zhiyuan Ma,
Lifan Yuan,
Ning Ding,
Bowen Zhou
Abstract:
This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation during inference while not having access to ground-truth information. While this setting appears elusive, we find that common practices in Test-Time Scaling (TTS), such as majority voting, yield surprisingly…
▽ More
This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation during inference while not having access to ground-truth information. While this setting appears elusive, we find that common practices in Test-Time Scaling (TTS), such as majority voting, yield surprisingly effective rewards suitable for driving RL training. In this work, we introduce Test-Time Reinforcement Learning (TTRL), a novel method for training LLMs using RL on unlabeled data. TTRL enables self-evolution of LLMs by utilizing the priors in the pre-trained models. Our experiments demonstrate that TTRL consistently improves performance across a variety of tasks and models. Notably, TTRL boosts the pass@1 performance of Qwen-2.5-Math-7B by approximately 211% on the AIME 2024 with only unlabeled test data. Furthermore, although TTRL is only supervised by the maj@n metric, TTRL has demonstrated performance to consistently surpass the upper limit of the initial model maj@n, and approach the performance of models trained directly on test data with ground-truth labels. Our experimental findings validate the general effectiveness of TTRL across various tasks and highlight TTRL's potential for broader tasks and domains. GitHub: https://github.com/PRIME-RL/TTRL
△ Less
Submitted 30 June, 2025; v1 submitted 22 April, 2025;
originally announced April 2025.
-
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models
Authors:
Xingtai Lv,
Youbang Sun,
Kaiyan Zhang,
Shang Qu,
Xuekai Zhu,
Yuchen Fan,
Yi Wu,
Ermo Hua,
Xinwei Long,
Ning Ding,
Bowen Zhou
Abstract:
State Space Models (SSMs) have emerged as a promising alternative to the popular transformer-based models and have been increasingly gaining attention. Compared to transformers, SSMs excel at tasks with sequential data or longer contexts, demonstrating comparable performances with significant efficiency gains. In this survey, we provide a coherent and systematic overview for SSMs, including their…
▽ More
State Space Models (SSMs) have emerged as a promising alternative to the popular transformer-based models and have been increasingly gaining attention. Compared to transformers, SSMs excel at tasks with sequential data or longer contexts, demonstrating comparable performances with significant efficiency gains. In this survey, we provide a coherent and systematic overview for SSMs, including their theoretical motivations, mathematical formulations, comparison with existing model classes, and various applications. We divide the SSM series into three main sections, providing a detailed introduction to the original SSM, the structured SSM represented by S4, and the selective SSM typified by Mamba. We put an emphasis on technicality, and highlight the various key techniques introduced to address the effectiveness and efficiency of SSMs. We hope this manuscript serves as an introduction for researchers to explore the theoretical foundations of SSMs.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
Authors:
Zhen Qu,
Xian Tao,
Xinyi Gong,
Shichen Qu,
Qiyu Chen,
Zhengtao Zhang,
Xingang Wang,
Guiguang Ding
Abstract:
Recently, vision-language models (e.g. CLIP) have demonstrated remarkable performance in zero-shot anomaly detection (ZSAD). By leveraging auxiliary data during training, these models can directly perform cross-category anomaly detection on target datasets, such as detecting defects on industrial product surfaces or identifying tumors in organ tissues. Existing approaches typically construct text…
▽ More
Recently, vision-language models (e.g. CLIP) have demonstrated remarkable performance in zero-shot anomaly detection (ZSAD). By leveraging auxiliary data during training, these models can directly perform cross-category anomaly detection on target datasets, such as detecting defects on industrial product surfaces or identifying tumors in organ tissues. Existing approaches typically construct text prompts through either manual design or the optimization of learnable prompt vectors. However, these methods face several challenges: 1) handcrafted prompts require extensive expert knowledge and trial-and-error; 2) single-form learnable prompts struggle to capture complex anomaly semantics; and 3) an unconstrained prompt space limits generalization to unseen categories. To address these issues, we propose Bayesian Prompt Flow Learning (Bayes-PFL), which models the prompt space as a learnable probability distribution from a Bayesian perspective. Specifically, a prompt flow module is designed to learn both image-specific and image-agnostic distributions, which are jointly utilized to regularize the text prompt space and improve the model's generalization on unseen categories. These learned distributions are then sampled to generate diverse text prompts, effectively covering the prompt space. Additionally, a residual cross-model attention (RCA) module is introduced to better align dynamic text embeddings with fine-grained image features. Extensive experiments on 15 industrial and medical datasets demonstrate our method's superior performance. The code is available at https://github.com/xiaozhen228/Bayes-PFL.
△ Less
Submitted 3 June, 2025; v1 submitted 13 March, 2025;
originally announced March 2025.
-
Range and Bird's Eye View Fused Cross-Modal Visual Place Recognition
Authors:
Jianyi Peng,
Fan Lu,
Bin Li,
Yuan Huang,
Sanqing Qu,
Guang Chen
Abstract:
Image-to-point cloud cross-modal Visual Place Recognition (VPR) is a challenging task where the query is an RGB image, and the database samples are LiDAR point clouds. Compared to single-modal VPR, this approach benefits from the widespread availability of RGB cameras and the robustness of point clouds in providing accurate spatial geometry and distance information. However, current methods rely o…
▽ More
Image-to-point cloud cross-modal Visual Place Recognition (VPR) is a challenging task where the query is an RGB image, and the database samples are LiDAR point clouds. Compared to single-modal VPR, this approach benefits from the widespread availability of RGB cameras and the robustness of point clouds in providing accurate spatial geometry and distance information. However, current methods rely on intermediate modalities that capture either the vertical or horizontal field of view, limiting their ability to fully exploit the complementary information from both sensors. In this work, we propose an innovative initial retrieval + re-rank method that effectively combines information from range (or RGB) images and Bird's Eye View (BEV) images. Our approach relies solely on a computationally efficient global descriptor similarity search process to achieve re-ranking. Additionally, we introduce a novel similarity label supervision technique to maximize the utility of limited training data. Specifically, we employ points average distance to approximate appearance similarity and incorporate an adaptive margin, based on similarity differences, into the vanilla triplet loss. Experimental results on the KITTI dataset demonstrate that our method significantly outperforms state-of-the-art approaches.
△ Less
Submitted 28 February, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
Proactive Depot Discovery: A Generative Framework for Flexible Location-Routing
Authors:
Site Qu,
Guoqiang Hu
Abstract:
The Location-Routing Problem (LRP), which combines the challenges of facility (depot) locating and vehicle route planning, is critically constrained by the reliance on predefined depot candidates, limiting the solution space and potentially leading to suboptimal outcomes. Previous research on LRP without predefined depots is scant and predominantly relies on heuristic algorithms that iteratively a…
▽ More
The Location-Routing Problem (LRP), which combines the challenges of facility (depot) locating and vehicle route planning, is critically constrained by the reliance on predefined depot candidates, limiting the solution space and potentially leading to suboptimal outcomes. Previous research on LRP without predefined depots is scant and predominantly relies on heuristic algorithms that iteratively attempt depot placements across a planar area. Such approaches lack the ability to proactively generate depot locations that meet specific geographic requirements, revealing a notable gap in current research landscape. To bridge this gap, we propose a data-driven generative DRL framework, designed to proactively generate depots for LRP without predefined depot candidates, solely based on customer requests data which include geographic and demand information. It can operate in two distinct modes: direct generation of exact depot locations, and the creation of a multivariate Gaussian distribution for flexible depots sampling. By extracting depots' geographic pattern from customer requests data, our approach can dynamically respond to logistical needs, identifying high-quality depot locations that further reduce total routing costs compared to traditional methods. Extensive experiments demonstrate that, for a same group of customer requests, compared with those depots identified through random attempts, our framework can proactively generate depots that lead to superior solution routes with lower routing cost. The implications of our framework potentially extend into real-world applications, particularly in emergency medical rescue and disaster relief logistics, where rapid establishment and adjustment of depot locations are paramount, showcasing its potential in addressing LRP for dynamic and unpredictable environments.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
Authors:
Yuxin Zuo,
Shang Qu,
Yifei Li,
Zhangren Chen,
Xuekai Zhu,
Ermo Hua,
Kaiyan Zhang,
Ning Ding,
Bowen Zhou
Abstract:
We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 questions spanning 17 specialties and 11 body systems. It includes two subsets, Text for text evaluation and MM for multimodal evaluation. Notably, MM introduces expert-level exam questions with diverse images and rich clinical infor…
▽ More
We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 questions spanning 17 specialties and 11 body systems. It includes two subsets, Text for text evaluation and MM for multimodal evaluation. Notably, MM introduces expert-level exam questions with diverse images and rich clinical information, including patient records and examination results, setting it apart from traditional medical multimodal benchmarks with simple QA pairs generated from image captions. MedXpertQA applies rigorous filtering and augmentation to address the insufficient difficulty of existing benchmarks like MedQA, and incorporates specialty board questions to improve clinical relevance and comprehensiveness. We perform data synthesis to mitigate data leakage risk and conduct multiple rounds of expert reviews to ensure accuracy and reliability. We evaluate 18 leading models on \benchmark. Moreover, medicine is deeply connected to real-world decision-making, providing a rich and representative setting for assessing reasoning abilities beyond mathematics and code. To this end, we develop a reasoning-oriented subset to facilitate the assessment of o1-like models. Code and data are available at: https://github.com/TsinghuaC3I/MedXpertQA
△ Less
Submitted 6 June, 2025; v1 submitted 30 January, 2025;
originally announced January 2025.
-
Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection
Authors:
Qisen Cheng,
Shuhui Qu,
Janghwan Lee
Abstract:
Unsupervised visual defect detection is critical in industrial applications, requiring a representation space that captures normal data features while detecting deviations. Achieving a balance between expressiveness and compactness is challenging; an overly expressive space risks inefficiency and mode collapse, impairing detection accuracy. We propose a novel approach using an enhanced VQ-VAE fram…
▽ More
Unsupervised visual defect detection is critical in industrial applications, requiring a representation space that captures normal data features while detecting deviations. Achieving a balance between expressiveness and compactness is challenging; an overly expressive space risks inefficiency and mode collapse, impairing detection accuracy. We propose a novel approach using an enhanced VQ-VAE framework optimized for unsupervised defect detection. Our model introduces a patch-aware dynamic code assignment scheme, enabling context-sensitive code allocation to optimize spatial representation. This strategy enhances normal-defect distinction and improves detection accuracy during inference. Experiments on MVTecAD, BTAD, and MTSD datasets show our method achieves state-of-the-art performance.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
Unveiling the Potential of Text in High-Dimensional Time Series Forecasting
Authors:
Xin Zhou,
Weiqing Wang,
Shilin Qu,
Zhiqiang Zhang,
Christoph Bergmeir
Abstract:
Time series forecasting has traditionally focused on univariate and multivariate numerical data, often overlooking the benefits of incorporating multimodal information, particularly textual data. In this paper, we propose a novel framework that integrates time series models with Large Language Models to improve high-dimensional time series forecasting. Inspired by multimodal models, our method com…
▽ More
Time series forecasting has traditionally focused on univariate and multivariate numerical data, often overlooking the benefits of incorporating multimodal information, particularly textual data. In this paper, we propose a novel framework that integrates time series models with Large Language Models to improve high-dimensional time series forecasting. Inspired by multimodal models, our method combines time series and textual data in the dual-tower structure. This fusion of information creates a comprehensive representation, which is then processed through a linear layer to generate the final forecast. Extensive experiments demonstrate that incorporating text enhances high-dimensional time series forecasting performance. This work paves the way for further research in multimodal time series forecasting.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
Towards An Unsupervised Learning Scheme for Efficiently Solving Parameterized Mixed-Integer Programs
Authors:
Shiyuan Qu,
Fenglian Dong,
Zhiwei Wei,
Chao Shang
Abstract:
In this paper, we describe a novel unsupervised learning scheme for accelerating the solution of a family of mixed integer programming (MIP) problems. Distinct substantially from existing learning-to-optimize methods, our proposal seeks to train an autoencoder (AE) for binary variables in an unsupervised learning fashion, using data of optimal solutions to historical instances for a parametric fam…
▽ More
In this paper, we describe a novel unsupervised learning scheme for accelerating the solution of a family of mixed integer programming (MIP) problems. Distinct substantially from existing learning-to-optimize methods, our proposal seeks to train an autoencoder (AE) for binary variables in an unsupervised learning fashion, using data of optimal solutions to historical instances for a parametric family of MIPs. By a deliberate design of AE architecture and exploitation of its statistical implication, we present a simple and straightforward strategy to construct a class of cutting plane constraints from the decoder parameters of an offline-trained AE. These constraints reliably enclose the optimal binary solutions of new problem instances thanks to the representation strength of the AE. More importantly, their integration into the primal MIP problem leads to a tightened MIP with the reduced feasible region, which can be resolved at decision time using off-the-shelf solvers with much higher efficiency. Our method is applied to a benchmark batch process scheduling problem formulated as a mixed integer linear programming (MILP) problem. Comprehensive results demonstrate that our approach significantly reduces the computational cost of off-the-shelf MILP solvers while retaining a high solution quality. The codes of this work are open-sourced at https://github.com/qushiyuan/AE4BV.
△ Less
Submitted 24 December, 2024; v1 submitted 23 December, 2024;
originally announced December 2024.
-
GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos
Authors:
Zhiyuan Chen,
Fan Lu,
Guo Yu,
Bin Li,
Sanqing Qu,
Yuan Huang,
Changhong Fu,
Guang Chen
Abstract:
Tracking the 6DoF pose of unknown objects in monocular RGB video sequences is crucial for robotic manipulation. However, existing approaches typically rely on accurate depth information, which is non-trivial to obtain in real-world scenarios. Although depth estimation algorithms can be employed, geometric inaccuracy can lead to failures in RGBD-based pose tracking methods. To address this challeng…
▽ More
Tracking the 6DoF pose of unknown objects in monocular RGB video sequences is crucial for robotic manipulation. However, existing approaches typically rely on accurate depth information, which is non-trivial to obtain in real-world scenarios. Although depth estimation algorithms can be employed, geometric inaccuracy can lead to failures in RGBD-based pose tracking methods. To address this challenge, we introduce GSGTrack, a novel RGB-based pose tracking framework that jointly optimizes geometry and pose. Specifically, we adopt 3D Gaussian Splatting to create an optimizable 3D representation, which is learned simultaneously with a graph-based geometry optimization to capture the object's appearance features and refine its geometry. However, the joint optimization process is susceptible to perturbations from noisy pose and geometry data. Thus, we propose an object silhouette loss to address the issue of pixel-wise loss being overly sensitive to pose noise during tracking. To mitigate the geometric ambiguities caused by inaccurate depth information, we propose a geometry-consistent image pair selection strategy, which filters out low-confidence pairs and ensures robust geometric optimization. Extensive experiments on the OnePose and HO3D datasets demonstrate the effectiveness of GSGTrack in both 6DoF pose tracking and object reconstruction.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Scalable and Effective Negative Sample Generation for Hyperedge Prediction
Authors:
Shilin Qu,
Weiqing Wang,
Yuan-Fang Li,
Quoc Viet Hung Nguyen,
Hongzhi Yin
Abstract:
Hyperedge prediction is crucial in hypergraph analysis for understanding complex multi-entity interactions in various web-based applications, including social networks and e-commerce systems. Traditional methods often face difficulties in generating high-quality negative samples due to the imbalance between positive and negative instances. To address this, we present the Scalable and Effective Neg…
▽ More
Hyperedge prediction is crucial in hypergraph analysis for understanding complex multi-entity interactions in various web-based applications, including social networks and e-commerce systems. Traditional methods often face difficulties in generating high-quality negative samples due to the imbalance between positive and negative instances. To address this, we present the Scalable and Effective Negative Sample Generation for Hyperedge Prediction (SEHP) framework, which utilizes diffusion models to tackle these challenges. SEHP employs a boundary-aware loss function that iteratively refines negative samples, moving them closer to decision boundaries to improve classification performance. SEHP samples positive instances to form sub-hypergraphs for scalable batch processing. By using structural information from sub-hypergraphs as conditions within the diffusion process, SEHP effectively captures global patterns. To enhance efficiency, our approach operates directly in latent space, avoiding the need for discrete ID generation and resulting in significant speed improvements while preserving accuracy. Extensive experiments show that SEHP outperforms existing methods in accuracy, efficiency, and scalability, representing a substantial advancement in hyperedge prediction techniques. Our code is available here.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Automating Exploratory Proteomics Research via Language Models
Authors:
Ning Ding,
Shang Qu,
Linhai Xie,
Yifei Li,
Zaoqu Liu,
Kaiyan Zhang,
Yibai Xiong,
Yuxin Zuo,
Zhangren Chen,
Ermo Hua,
Xingtai Lv,
Youbang Sun,
Yang Li,
Dong Li,
Fuchu He,
Bowen Zhou
Abstract:
With the development of artificial intelligence, its contribution to science is evolving from simulating a complex problem to automating entire research processes and producing novel discoveries. Achieving this advancement requires both specialized general models grounded in real-world scientific data and iterative, exploratory frameworks that mirror human scientific methodologies. In this paper,…
▽ More
With the development of artificial intelligence, its contribution to science is evolving from simulating a complex problem to automating entire research processes and producing novel discoveries. Achieving this advancement requires both specialized general models grounded in real-world scientific data and iterative, exploratory frameworks that mirror human scientific methodologies. In this paper, we present PROTEUS, a fully automated system for scientific discovery from raw proteomics data. PROTEUS uses large language models (LLMs) to perform hierarchical planning, execute specialized bioinformatics tools, and iteratively refine analysis workflows to generate high-quality scientific hypotheses. The system takes proteomics datasets as input and produces a comprehensive set of research objectives, analysis results, and novel biological hypotheses without human intervention. We evaluated PROTEUS on 12 proteomics datasets collected from various biological samples (e.g. immune cells, tumors) and different sample types (single-cell and bulk), generating 191 scientific hypotheses. These were assessed using both automatic LLM-based scoring on 5 metrics and detailed reviews from human experts. Results demonstrate that PROTEUS consistently produces reliable, logically coherent results that align well with existing literature while also proposing novel, evaluable hypotheses. The system's flexible architecture facilitates seamless integration of diverse analysis tools and adaptation to different proteomics data types. By automating complex proteomics analysis workflows and hypothesis generation, PROTEUS has the potential to considerably accelerate the pace of scientific discovery in proteomics research, enabling researchers to efficiently explore large-scale datasets and uncover biological insights.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
FastSTI: A Fast Conditional Pseudo Numerical Diffusion Model for Spatio-temporal Traffic Data Imputation
Authors:
Shaokang Cheng,
Nada Osman,
Shiru Qu,
Lamberto Ballan
Abstract:
High-quality spatiotemporal traffic data is crucial for intelligent transportation systems (ITS) and their data-driven applications. Inevitably, the issue of missing data caused by various disturbances threatens the reliability of data acquisition. Recent studies of diffusion probability models have demonstrated the superiority of deep generative models in imputation tasks by precisely capturing t…
▽ More
High-quality spatiotemporal traffic data is crucial for intelligent transportation systems (ITS) and their data-driven applications. Inevitably, the issue of missing data caused by various disturbances threatens the reliability of data acquisition. Recent studies of diffusion probability models have demonstrated the superiority of deep generative models in imputation tasks by precisely capturing the spatio-temporal correlation of traffic data. One drawback of diffusion models is their slow sampling/denoising process. In this work, we aim to accelerate the imputation process while retaining the performance. We propose a fast conditional diffusion model for spatiotemporal traffic data imputation (FastSTI). To speed up the process yet, obtain better performance, we propose the application of a high-order pseudo-numerical solver. Our method further revs the imputation by introducing a predefined alignment strategy of variance schedule during the sampling process. Evaluating FastSTI on two types of real-world traffic datasets (traffic speed and flow) with different missing data scenarios proves its ability to impute higher-quality samples in only six sampling steps, especially under high missing rates (60\% $\sim$ 90\%). The experimental results illustrate a speed-up of $\textbf{8.3} \times$ faster than the current state-of-the-art model while achieving better performance.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
Efficient Generation of Molecular Clusters with Dual-Scale Equivariant Flow Matching
Authors:
Akshay Subramanian,
Shuhui Qu,
Cheol Woo Park,
Sulin Liu,
Janghwan Lee,
Rafael Gómez-Bombarelli
Abstract:
Amorphous molecular solids offer a promising alternative to inorganic semiconductors, owing to their mechanical flexibility and solution processability. The packing structure of these materials plays a crucial role in determining their electronic and transport properties, which are key to enhancing the efficiency of devices like organic solar cells (OSCs). However, obtaining these optoelectronic p…
▽ More
Amorphous molecular solids offer a promising alternative to inorganic semiconductors, owing to their mechanical flexibility and solution processability. The packing structure of these materials plays a crucial role in determining their electronic and transport properties, which are key to enhancing the efficiency of devices like organic solar cells (OSCs). However, obtaining these optoelectronic properties computationally requires molecular dynamics (MD) simulations to generate a conformational ensemble, a process that can be computationally expensive due to the large system sizes involved. Recent advances have focused on using generative models, particularly flow-based models as Boltzmann generators, to improve the efficiency of MD sampling. In this work, we developed a dual-scale flow matching method that separates training and inference into coarse-grained and all-atom stages and enhances both the accuracy and efficiency of standard flow matching samplers. We demonstrate the effectiveness of this method on a dataset of Y6 molecular clusters obtained through MD simulations, and we benchmark its efficiency and accuracy against single-scale flow matching methods.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues
Authors:
Shilin Qu,
Weiqing Wang,
Xin Zhou,
Haolan Zhan,
Zhuang Li,
Lizhen Qu,
Linhao Luo,
Yuan-Fang Li,
Gholamreza Haffari
Abstract:
Sociocultural norms serve as guiding principles for personal conduct in social interactions, emphasizing respect, cooperation, and appropriate behavior, which is able to benefit tasks including conversational information retrieval, contextual information retrieval and retrieval-enhanced machine learning. We propose a scalable approach for constructing a Sociocultural Norm (SCN) Base using Large La…
▽ More
Sociocultural norms serve as guiding principles for personal conduct in social interactions, emphasizing respect, cooperation, and appropriate behavior, which is able to benefit tasks including conversational information retrieval, contextual information retrieval and retrieval-enhanced machine learning. We propose a scalable approach for constructing a Sociocultural Norm (SCN) Base using Large Language Models (LLMs) for socially aware dialogues. We construct a comprehensive and publicly accessible Chinese Sociocultural NormBase. Our approach utilizes socially aware dialogues, enriched with contextual frames, as the primary data source to constrain the generating process and reduce the hallucinations. This enables extracting of high-quality and nuanced natural-language norm statements, leveraging the pragmatic implications of utterances with respect to the situation. As real dialogue annotated with gold frames are not readily available, we propose using synthetic data. Our empirical results show: (i) the quality of the SCNs derived from synthetic data is comparable to that from real dialogues annotated with gold frames, and (ii) the quality of the SCNs extracted from real data, annotated with either silver (predicted) or gold frames, surpasses that without the frame annotations. We further show the effectiveness of the extracted SCNs in a RAG-based (Retrieval-Augmented Generation) model to reason about multiple downstream dialogue tasks.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Scalable Transformer for High Dimensional Multivariate Time Series Forecasting
Authors:
Xin Zhou,
Weiqing Wang,
Wray Buntine,
Shilin Qu,
Abishek Sriramulu,
Weicong Tan,
Christoph Bergmeir
Abstract:
Deep models for Multivariate Time Series (MTS) forecasting have recently demonstrated significant success. Channel-dependent models capture complex dependencies that channel-independent models cannot capture. However, the number of channels in real-world applications outpaces the capabilities of existing channel-dependent models, and contrary to common expectations, some models underperform the ch…
▽ More
Deep models for Multivariate Time Series (MTS) forecasting have recently demonstrated significant success. Channel-dependent models capture complex dependencies that channel-independent models cannot capture. However, the number of channels in real-world applications outpaces the capabilities of existing channel-dependent models, and contrary to common expectations, some models underperform the channel-independent models in handling high-dimensional data, which raises questions about the performance of channel-dependent models. To address this, our study first investigates the reasons behind the suboptimal performance of these channel-dependent models on high-dimensional MTS data. Our analysis reveals that two primary issues lie in the introduced noise from unrelated series that increases the difficulty of capturing the crucial inter-channel dependencies, and challenges in training strategies due to high-dimensional data. To address these issues, we propose STHD, the Scalable Transformer for High-Dimensional Multivariate Time Series Forecasting. STHD has three components: a) Relation Matrix Sparsity that limits the noise introduced and alleviates the memory issue; b) ReIndex applied as a training strategy to enable a more flexible batch size setting and increase the diversity of training data; and c) Transformer that handles 2-D inputs and captures channel dependencies. These components jointly enable STHD to manage the high-dimensional MTS while maintaining computational feasibility. Furthermore, experimental results show STHD's considerable improvement on three high-dimensional datasets: Crime-Chicago, Wiki-People, and Traffic. The source code and dataset are publicly available https://github.com/xinzzzhou/ScalableTransformer4HighDimensionMTSF.git.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Few-Shot Medical Image Segmentation with Large Kernel Attention
Authors:
Xiaoxiao Wu,
Xiaowei Chen,
Zhenguo Gao,
Shulei Qu,
Yuanyuan Qiu
Abstract:
Medical image segmentation has witnessed significant advancements with the emergence of deep learning. However, the reliance of most neural network models on a substantial amount of annotated data remains a challenge for medical image segmentation. To address this issue, few-shot segmentation methods based on meta-learning have been employed. Presently, the methods primarily focus on aligning the…
▽ More
Medical image segmentation has witnessed significant advancements with the emergence of deep learning. However, the reliance of most neural network models on a substantial amount of annotated data remains a challenge for medical image segmentation. To address this issue, few-shot segmentation methods based on meta-learning have been employed. Presently, the methods primarily focus on aligning the support set and query set to enhance performance, but this approach hinders further improvement of the model's effectiveness. In this paper, our objective is to propose a few-shot medical segmentation model that acquire comprehensive feature representation capabilities, which will boost segmentation accuracy by capturing both local and long-range features. To achieve this, we introduce a plug-and-play attention module that dynamically enhances both query and support features, thereby improving the representativeness of the extracted features. Our model comprises four key modules: a dual-path feature extractor, an attention module, an adaptive prototype prediction module, and a multi-scale prediction fusion module. Specifically, the dual-path feature extractor acquires multi-scale features by obtaining features of 32{\times}32 size and 64{\times}64 size. The attention module follows the feature extractor and captures local and long-range information. The adaptive prototype prediction module automatically adjusts the anomaly score threshold to predict prototypes, while the multi-scale fusion prediction module integrates prediction masks of various scales to produce the final segmentation result. We conducted experiments on publicly available MRI datasets, namely CHAOS and CMR, and compared our method with other advanced techniques. The results demonstrate that our method achieves state-of-the-art performance.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
ALMRR: Anomaly Localization Mamba on Industrial Textured Surface with Feature Reconstruction and Refinement
Authors:
Shichen Qu,
Xian Tao,
Zhen Qu,
Xinyi Gong,
Zhengtao Zhang,
Mukesh Prasad
Abstract:
Unsupervised anomaly localization on industrial textured images has achieved remarkable results through reconstruction-based methods, yet existing approaches based on image reconstruction and feature reconstruc-tion each have their own shortcomings. Firstly, image-based methods tend to reconstruct both normal and anomalous regions well, which lead to over-generalization. Feature-based methods cont…
▽ More
Unsupervised anomaly localization on industrial textured images has achieved remarkable results through reconstruction-based methods, yet existing approaches based on image reconstruction and feature reconstruc-tion each have their own shortcomings. Firstly, image-based methods tend to reconstruct both normal and anomalous regions well, which lead to over-generalization. Feature-based methods contain a large amount of distin-guishable semantic information, however, its feature structure is redundant and lacks anomalous information, which leads to significant reconstruction errors. In this paper, we propose an Anomaly Localization method based on Mamba with Feature Reconstruction and Refinement(ALMRR) which re-constructs semantic features based on Mamba and then refines them through a feature refinement module. To equip the model with prior knowledge of anomalies, we enhance it by adding artificially simulated anomalies to the original images. Unlike image reconstruction or repair, the features of synthesized defects are repaired along with those of normal areas. Finally, the aligned features containing rich semantic information are fed in-to the refinement module to obtain the anomaly map. Extensive experiments have been conducted on the MVTec-AD-Textured dataset and other real-world industrial dataset, which has demonstrated superior performance com-pared to state-of-the-art (SOTA) methods.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation
Authors:
Tianpei Zou,
Sanqing Qu,
Zhijun Li,
Alois Knoll,
Lianghua He,
Guang Chen,
Changjun Jiang
Abstract:
3D point cloud segmentation has received significant interest for its growing applications. However, the generalization ability of models suffers in dynamic scenarios due to the distribution shift between test and training data. To promote robustness and adaptability across diverse scenarios, test-time adaptation (TTA) has recently been introduced. Nevertheless, most existing TTA methods are devel…
▽ More
3D point cloud segmentation has received significant interest for its growing applications. However, the generalization ability of models suffers in dynamic scenarios due to the distribution shift between test and training data. To promote robustness and adaptability across diverse scenarios, test-time adaptation (TTA) has recently been introduced. Nevertheless, most existing TTA methods are developed for images, and limited approaches applicable to point clouds ignore the inherent hierarchical geometric structures in point cloud streams, i.e., local (point-level), global (object-level), and temporal (frame-level) structures. In this paper, we delve into TTA in 3D point cloud segmentation and propose a novel Hierarchical Geometry Learning (HGL) framework. HGL comprises three complementary modules from local, global to temporal learning in a bottom-up manner.Technically, we first construct a local geometry learning module for pseudo-label generation. Next, we build prototypes from the global geometry perspective for pseudo-label fine-tuning. Furthermore, we introduce a temporal consistency regularization module to mitigate negative transfer. Extensive experiments on four datasets demonstrate the effectiveness and superiority of our HGL. Remarkably, on the SynLiDAR to SemanticKITTI task, HGL achieves an overall mIoU of 46.91\%, improving GIPSO by 3.0\% and significantly reducing the required adaptation time by 80\%. The code is available at https://github.com/tpzou/HGL.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
An eight-neuron network for quadruped locomotion with hip-knee joint control
Authors:
Yide Liu,
Xiyan Liu,
Dongqi Wang,
Wei Yang,
shaoxing Qu
Abstract:
The gait generator, which is capable of producing rhythmic signals for coordinating multiple joints, is an essential component in the quadruped robot locomotion control framework. The biological counterpart of the gait generator is the Central Pattern Generator (abbreviated as CPG), a small neural network consisting of interacting neurons. Inspired by this architecture, researchers have designed a…
▽ More
The gait generator, which is capable of producing rhythmic signals for coordinating multiple joints, is an essential component in the quadruped robot locomotion control framework. The biological counterpart of the gait generator is the Central Pattern Generator (abbreviated as CPG), a small neural network consisting of interacting neurons. Inspired by this architecture, researchers have designed artificial neural networks composed of simulated neurons or oscillator equations. Despite the widespread application of these designed CPGs in various robot locomotion controls, some issues remain unaddressed, including: (1) Simplistic network designs often overlook the symmetry between signal and network structure, resulting in fewer gait patterns than those found in nature. (2) Due to minimal architectural consideration, quadruped control CPGs typically consist of only four neurons, which restricts the network's direct control to leg phases rather than joint coordination. (3) Gait changes are achieved by varying the neuron couplings or the assignment between neurons and legs, rather than through external stimulation. We apply symmetry theory to design an eight-neuron network, composed of Stein neuronal models, capable of achieving five gaits and coordinated control of the hip-knee joints. We validate the signal stability of this network as a gait generator through numerical simulations, which reveal various results and patterns encountered during gait transitions using neuronal stimulation. Based on these findings, we have developed several successful gait transition strategies through neuronal stimulations. Using a commercial quadruped robot model, we demonstrate the usability and feasibility of this network by implementing motion control and gait transitions.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
ELF-UA: Efficient Label-Free User Adaptation in Gaze Estimation
Authors:
Yong Wu,
Yang Wang,
Sanqing Qu,
Zhijun Li,
Guang Chen
Abstract:
We consider the problem of user-adaptive 3D gaze estimation. The performance of person-independent gaze estimation is limited due to interpersonal anatomical differences. Our goal is to provide a personalized gaze estimation model specifically adapted to a target user. Previous work on user-adaptive gaze estimation requires some labeled images of the target person data to fine-tune the model at te…
▽ More
We consider the problem of user-adaptive 3D gaze estimation. The performance of person-independent gaze estimation is limited due to interpersonal anatomical differences. Our goal is to provide a personalized gaze estimation model specifically adapted to a target user. Previous work on user-adaptive gaze estimation requires some labeled images of the target person data to fine-tune the model at test time. However, this can be unrealistic in real-world applications, since it is cumbersome for an end-user to provide labeled images. In addition, previous work requires the training data to have both gaze labels and person IDs. This data requirement makes it infeasible to use some of the available data. To tackle these challenges, this paper proposes a new problem called efficient label-free user adaptation in gaze estimation. Our model only needs a few unlabeled images of a target user for the model adaptation. During offline training, we have some labeled source data without person IDs and some unlabeled person-specific data. Our proposed method uses a meta-learning approach to learn how to adapt to a new user with only a few unlabeled images. Our key technical innovation is to use a generalization bound from domain adaptation to define the loss function in meta-learning, so that our method can effectively make use of both the labeled source data and the unlabeled person-specific data during training. Extensive experiments validate the effectiveness of our method on several challenging benchmarks.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Authors:
Ge Zhang,
Scott Qu,
Jiaheng Liu,
Chenchen Zhang,
Chenghua Lin,
Chou Leuang Yu,
Danny Pan,
Esther Cheng,
Jie Liu,
Qunshu Lin,
Raven Yuan,
Tuney Zheng,
Wei Pang,
Xinrun Du,
Yiming Liang,
Yinghao Ma,
Yizhi Li,
Ziyang Ma,
Bill Lin,
Emmanouil Benetos,
Huan Yang,
Junting Zhou,
Kaijing Ma,
Minghao Liu,
Morry Niu
, et al. (20 additional authors not shown)
Abstract:
Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl…
▽ More
Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparable to existing closed-source LLMs. However, only the model's weights are provided with most details (e.g., intermediate checkpoints, pre-training corpus, and training code, etc.) being undisclosed. To improve the transparency of LLMs, the research community has formed to open-source truly open LLMs (e.g., Pythia, Amber, OLMo), where more details (e.g., pre-training corpus and training code) are being provided. These models have greatly advanced the scientific study of these large models including their strengths, weaknesses, biases and risks. However, we observe that the existing truly open LLMs on reasoning, knowledge, and coding tasks are still inferior to existing state-of-the-art LLMs with similar model sizes. To this end, we open-source MAP-Neo, a highly capable and transparent bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens. Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs. Moreover, we open-source all details to reproduce our MAP-Neo, where the cleaned pre-training corpus, data cleaning pipeline, checkpoints, and well-optimized training/evaluation framework are provided. Finally, we hope our MAP-Neo will enhance and strengthen the open research community and inspire more innovations and creativities to facilitate the further improvements of LLMs.
△ Less
Submitted 10 July, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Multi-Task Learning for Fatigue Detection and Face Recognition of Drivers via Tree-Style Space-Channel Attention Fusion Network
Authors:
Shulei Qu,
Zhenguo Gao,
Xiaowei Chen,
Na Li,
Yakai Wang,
Xiaoxiao Wu
Abstract:
In driving scenarios, automobile active safety systems are increasingly incorporating deep learning technology. These systems typically need to handle multiple tasks simultaneously, such as detecting fatigue driving and recognizing the driver's identity. However, the traditional parallel-style approach of combining multiple single-task models tends to waste resources when dealing with similar task…
▽ More
In driving scenarios, automobile active safety systems are increasingly incorporating deep learning technology. These systems typically need to handle multiple tasks simultaneously, such as detecting fatigue driving and recognizing the driver's identity. However, the traditional parallel-style approach of combining multiple single-task models tends to waste resources when dealing with similar tasks. Therefore, we propose a novel tree-style multi-task modeling approach for multi-task learning, which rooted at a shared backbone, more dedicated separate module branches are appended as the model pipeline goes deeper. Following the tree-style approach, we propose a multi-task learning model for simultaneously performing driver fatigue detection and face recognition for identifying a driver. This model shares a common feature extraction backbone module, with further separated feature extraction and classification module branches. The dedicated branches exploit and combine spatial and channel attention mechanisms to generate space-channel fused-attention enhanced features, leading to improved detection performance. As only single-task datasets are available, we introduce techniques including alternating updation and gradient accumulation for training our multi-task model using only the single-task datasets. The effectiveness of our tree-style multi-task learning model is verified through extensive validations.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Support-Query Prototype Fusion Network for Few-shot Medical Image Segmentation
Authors:
Xiaoxiao Wu,
Zhenguo Gao,
Xiaowei Chen,
Yakai Wang,
Shulei Qu,
Na Li
Abstract:
In recent years, deep learning based on Convolutional Neural Networks (CNNs) has achieved remarkable success in many applications. However, their heavy reliance on extensive labeled data and limited generalization ability to unseen classes pose challenges to their suitability for medical image processing tasks. Few-shot learning, which utilizes a small amount of labeled data to generalize to unsee…
▽ More
In recent years, deep learning based on Convolutional Neural Networks (CNNs) has achieved remarkable success in many applications. However, their heavy reliance on extensive labeled data and limited generalization ability to unseen classes pose challenges to their suitability for medical image processing tasks. Few-shot learning, which utilizes a small amount of labeled data to generalize to unseen classes, has emerged as a critical research area, attracting substantial attention. Currently, most studies employ a prototype-based approach, in which prototypical networks are used to construct prototypes from the support set, guiding the processing of the query set to obtain the final results. While effective, this approach heavily relies on the support set while neglecting the query set, resulting in notable disparities within the model classes. To mitigate this drawback, we propose a novel Support-Query Prototype Fusion Network (SQPFNet). SQPFNet initially generates several support prototypes for the foreground areas of the support images, thus producing a coarse segmentation mask. Subsequently, a query prototype is constructed based on the coarse segmentation mask, additionally exploiting pattern information in the query set. Thus, SQPFNet constructs high-quality support-query fused prototypes, upon which the query image is segmented to obtain the final refined query mask. Evaluation results on two public datasets, SABS and CMR, show that SQPFNet achieves state-of-the-art performance.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Now Let's Make It Physical: Enabling Physically Trusted Certificate Issuance for Keyless Security in CAs
Authors:
Xiaolin Zhang,
Chenghao Chen,
Kailun Qin,
Yuxuan Wang,
Shipei Qu,
Tengfei Wang,
Chi Zhang,
Dawu Gu
Abstract:
The signing key protection of Certificate Authorities (CAs) remains a critical challenge in PKI. Traditional approaches struggle to eliminate the risk of key exposure due to those (un)intentional human errors. This long-standing dilemma motivates us to propose Armored Core, a novel PKI security extension using the trusted binding of Physically Unclonable Function (PUF) for CAs. PUFs leverage manuf…
▽ More
The signing key protection of Certificate Authorities (CAs) remains a critical challenge in PKI. Traditional approaches struggle to eliminate the risk of key exposure due to those (un)intentional human errors. This long-standing dilemma motivates us to propose Armored Core, a novel PKI security extension using the trusted binding of Physically Unclonable Function (PUF) for CAs. PUFs leverage manufacturing variations to generate unique and random responses. Combining with XOR and hash, they can make key exposure impossible for CAs through keyless certificate issuance.
In Armored Core, we design a set of PUF-based X.509v3 certificate functions for CAs to generate physically trusted "signatures" without using a digital key. Moreover, we introduce a novel PUF transparency mechanism to effectively monitor the PUF operations in CAs. We integrate Armored Core into real-world PKI systems including Let's Encrypt Pebble and Certbot. We also provide a PUF-embedded hardware prototype. The evaluation results show that Armored Core can achieve keyless certificate issuance while improving the computation performance by 4.9%~73.7%. It only incurs small communication and storage overhead (<4%).
△ Less
Submitted 13 January, 2025; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Deep Reinforcement Learning Based Toolpath Generation for Thermal Uniformity in Laser Powder Bed Fusion Process
Authors:
Mian Qin,
Junhao Ding,
Shuo Qu,
Xu Song,
Charlie C. L. Wang,
Wei-Hsin Liao
Abstract:
Laser powder bed fusion (LPBF) is a widely used metal additive manufacturing technology. However, the accumulation of internal residual stress during printing can cause significant distortion and potential failure. Although various scan patterns have been studied to reduce possible accumulated stress, such as zigzag scanning vectors with changing directions or a chessboard-based scan pattern with…
▽ More
Laser powder bed fusion (LPBF) is a widely used metal additive manufacturing technology. However, the accumulation of internal residual stress during printing can cause significant distortion and potential failure. Although various scan patterns have been studied to reduce possible accumulated stress, such as zigzag scanning vectors with changing directions or a chessboard-based scan pattern with divided small islands, most conventional scan patterns cannot significantly reduce residual stress. The proposed adaptive toolpath generation (ATG) algorithms, aiming to minimize the thermal gradients, may result in extremely accumulated temperature fields in some cases. To address these issues, we developed a deep reinforcement learning (DRL)-based toolpath generation framework, with the goal of achieving uniformly distributed heat and avoiding extremely thermal accumulation regions during the LPBF process. We first developed an overall pipeline for the DRL-based toolpath generation framework, which includes uniformly sampling, agent moving and environment observation, action selection, moving constraints, rewards calculation, and the training process. To accelerate the training process, we simplified the data-intensive numerical model by considering the turning angles on the toolpath. We designed the action spaces with three options, including the minimum temperature value, the smoothest path, and the second smoothest path. The reward function was designed to minimize energy density to ensure the temperature field remains relatively stable. To verify the effectiveness of the proposed DRL-based toolpath generation framework, we performed numerical simulations of polygon shape printing domains. In addition, four groups of thin plate samples with different scan patterns were compared using the LPBF process.
△ Less
Submitted 16 February, 2024;
originally announced April 2024.
-
GLC++: Source-Free Universal Domain Adaptation through Global-Local Clustering and Contrastive Affinity Learning
Authors:
Sanqing Qu,
Tianpei Zou,
Florian Röhrbein,
Cewu Lu,
Guang Chen,
Dacheng Tao,
Changjun Jiang
Abstract:
Deep neural networks often exhibit sub-optimal performance under covariate and category shifts. Source-Free Domain Adaptation (SFDA) presents a promising solution to this dilemma, yet most SFDA approaches are restricted to closed-set scenarios. In this paper, we explore Source-Free Universal Domain Adaptation (SF-UniDA) aiming to accurately classify "known" data belonging to common categories and…
▽ More
Deep neural networks often exhibit sub-optimal performance under covariate and category shifts. Source-Free Domain Adaptation (SFDA) presents a promising solution to this dilemma, yet most SFDA approaches are restricted to closed-set scenarios. In this paper, we explore Source-Free Universal Domain Adaptation (SF-UniDA) aiming to accurately classify "known" data belonging to common categories and segregate them from target-private "unknown" data. We propose a novel Global and Local Clustering (GLC) technique, which comprises an adaptive one-vs-all global clustering algorithm to discern between target classes, complemented by a local k-NN clustering strategy to mitigate negative transfer. Despite the effectiveness, the inherent closed-set source architecture leads to uniform treatment of "unknown" data, impeding the identification of distinct "unknown" categories. To address this, we evolve GLC to GLC++, integrating a contrastive affinity learning strategy. We examine the superiority of GLC and GLC++ across multiple benchmarks and category shift scenarios. Remarkably, in the most challenging open-partial-set scenarios, GLC and GLC++ surpass GATE by 16.7% and 18.6% in H-score on VisDA, respectively. GLC++ enhances the novel category clustering accuracy of GLC by 4.3% in open-set scenarios on Office-Home. Furthermore, the introduced contrastive learning strategy not only enhances GLC but also significantly facilitates existing methodologies.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
Authors:
Boyang Peng,
Sanqing Qu,
Yong Wu,
Tianpei Zou,
Lianghua He,
Alois Knoll,
Guang Chen,
changjun jiang
Abstract:
Deep learning has achieved remarkable progress in various applications, heightening the importance of safeguarding the intellectual property (IP) of well-trained models. It entails not only authorizing usage but also ensuring the deployment of models in authorized data domains, i.e., making models exclusive to certain target domains. Previous methods necessitate concurrent access to source trainin…
▽ More
Deep learning has achieved remarkable progress in various applications, heightening the importance of safeguarding the intellectual property (IP) of well-trained models. It entails not only authorizing usage but also ensuring the deployment of models in authorized data domains, i.e., making models exclusive to certain target domains. Previous methods necessitate concurrent access to source training data and target unauthorized data when performing IP protection, making them risky and inefficient for decentralized private data. In this paper, we target a practical setting where only a well-trained source model is available and investigate how we can realize IP protection. To achieve this, we propose a novel MAsk Pruning (MAP) framework. MAP stems from an intuitive hypothesis, i.e., there are target-related parameters in a well-trained model, locating and pruning them is the key to IP protection. Technically, MAP freezes the source model and learns a target-specific binary mask to prevent unauthorized data usage while minimizing performance degradation on authorized data. Moreover, we introduce a new metric aimed at achieving a better balance between source and target performance degradation. To verify the effectiveness and versatility, we have evaluated MAP in a variety of scenarios, including vanilla source-available, practical source-free, and challenging data-free. Extensive experiments indicate that MAP yields new state-of-the-art performance.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
Authors:
Sanqing Qu,
Tianpei Zou,
Lianghua He,
Florian Röhrbein,
Alois Knoll,
Guang Chen,
Changjun Jiang
Abstract:
Universal Domain Adaptation (UniDA) targets knowledge transfer in the presence of both covariate and label shifts. Recently, Source-free Universal Domain Adaptation (SF-UniDA) has emerged to achieve UniDA without access to source data, which tends to be more practical due to data protection policies. The main challenge lies in determining whether covariate-shifted samples belong to target-private…
▽ More
Universal Domain Adaptation (UniDA) targets knowledge transfer in the presence of both covariate and label shifts. Recently, Source-free Universal Domain Adaptation (SF-UniDA) has emerged to achieve UniDA without access to source data, which tends to be more practical due to data protection policies. The main challenge lies in determining whether covariate-shifted samples belong to target-private unknown categories. Existing methods tackle this either through hand-crafted thresholding or by developing time-consuming iterative clustering strategies. In this paper, we propose a new idea of LEArning Decomposition (LEAD), which decouples features into source-known and -unknown components to identify target-private data. Technically, LEAD initially leverages the orthogonal decomposition analysis for feature decomposition. Then, LEAD builds instance-level decision boundaries to adaptively identify target-private data. Extensive experiments across various UniDA scenarios have demonstrated the effectiveness and superiority of LEAD. Notably, in the OPDA scenario on VisDA dataset, LEAD outperforms GLC by 3.5% overall H-score and reduces 75% time to derive pseudo-labeling decision boundaries. Besides, LEAD is also appealing in that it is complementary to most existing methods. The code is available at https://github.com/ispc-lab/LEAD.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
PCDepth: Pattern-based Complementary Learning for Monocular Depth Estimation by Best of Both Worlds
Authors:
Haotian Liu,
Sanqing Qu,
Fan Lu,
Zongtao Bu,
Florian Roehrbein,
Alois Knoll,
Guang Chen
Abstract:
Event cameras can record scene dynamics with high temporal resolution, providing rich scene details for monocular depth estimation (MDE) even at low-level illumination. Therefore, existing complementary learning approaches for MDE fuse intensity information from images and scene details from event data for better scene understanding. However, most methods directly fuse two modalities at pixel leve…
▽ More
Event cameras can record scene dynamics with high temporal resolution, providing rich scene details for monocular depth estimation (MDE) even at low-level illumination. Therefore, existing complementary learning approaches for MDE fuse intensity information from images and scene details from event data for better scene understanding. However, most methods directly fuse two modalities at pixel level, ignoring that the attractive complementarity mainly impacts high-level patterns that only occupy a few pixels. For example, event data is likely to complement contours of scene objects. In this paper, we discretize the scene into a set of high-level patterns to explore the complementarity and propose a Pattern-based Complementary learning architecture for monocular Depth estimation (PCDepth). Concretely, PCDepth comprises two primary components: a complementary visual representation learning module for discretizing the scene into high-level patterns and integrating complementary patterns across modalities and a refined depth estimator aimed at scene reconstruction and depth prediction while maintaining an efficiency-accuracy balance. Through pattern-based complementary learning, PCDepth fully exploits two modalities and achieves more accurate predictions than existing methods, especially in challenging nighttime scenarios. Extensive experiments on MVSEC and DSEC datasets verify the effectiveness and superiority of our PCDepth. Remarkably, compared with state-of-the-art, PCDepth achieves a 37.9% improvement in accuracy in MVSEC nighttime scenarios.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations
Authors:
Haolan Zhan,
Zhuang Li,
Xiaoxi Kang,
Tao Feng,
Yuncheng Hua,
Lizhen Qu,
Yi Ying,
Mei Rianto Chandra,
Kelly Rosalin,
Jureynolds Jureynolds,
Suraj Sharma,
Shilin Qu,
Linhao Luo,
Lay-Ki Soon,
Zhaleh Semnani Azad,
Ingrid Zukerman,
Gholamreza Haffari
Abstract:
Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as…
▽ More
Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as define a sequence of tasks to help understand and remediate norm violations step by step. ReNoVi consists of two parts: 512 human-authored dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT through prompt learning. While collecting sufficient human-authored data is costly, synthetic conversations provide suitable amounts of data to help mitigate the scarcity of training data, as well as the chance to assess the alignment between LLMs and humans in the awareness of social norms. We thus harness the power of ChatGPT to generate synthetic training data for our task. To ensure the quality of both human-authored and synthetic data, we follow a quality control protocol during data collection. Our experimental results demonstrate the importance of remediating norm violations in socio-cultural conversations, as well as the improvement in performance obtained from synthetic data.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Teamwork Makes TEE Work: Open and Resilient Remote Attestation on Decentralized Trust
Authors:
Xiaolin Zhang,
Kailun Qin,
Shipei Qu,
Tengfei Wang,
Chi Zhang,
Dawu Gu
Abstract:
Remote Attestation (RA) enables the integrity and authenticity of applications in Trusted Execution Environment (TEE) to be verified. Existing TEE RA designs employ a centralized trust model where they rely on a single provisioned secret key and a centralized verifier to establish trust for remote parties. This model is however brittle and can be untrusted under advanced attacks nowadays. Besides,…
▽ More
Remote Attestation (RA) enables the integrity and authenticity of applications in Trusted Execution Environment (TEE) to be verified. Existing TEE RA designs employ a centralized trust model where they rely on a single provisioned secret key and a centralized verifier to establish trust for remote parties. This model is however brittle and can be untrusted under advanced attacks nowadays. Besides, most designs only have fixed procedures once deployed, making them hard to adapt to different emerging situations and provide resilient functionalities.
Therefore, we propose JANUS, an open and resilient TEE RA scheme. To decentralize trust, we, on one hand, introduce Physically Unclonable Function (PUF) as an intrinsic root of trust (RoT) in TEE to directly provide physical trusted measurements. On the other hand, we design novel decentralized verification functions on smart contract with result audits and RA session snapshot. Furthermore, we design an automated switch mechanism that allows JANUS to remain resilient and offer flexible RA services under various situations. We provide a UC-based security proof and demonstrate the scalability and generality of JANUS by implementing an complete prototype.
△ Less
Submitted 9 August, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators
Authors:
Songyun Qu,
Shixin Zhao,
Bing Li,
Yintao He,
Xuyi Cai,
Lei Zhang,
Ying Wang
Abstract:
In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size, and crossbar number, it is necessary to develop compilation tools that are fully aware of the CIM architectural details and implementation diversity. However, d…
▽ More
In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size, and crossbar number, it is necessary to develop compilation tools that are fully aware of the CIM architectural details and implementation diversity. However, due to the lack of architectural support in current popular open-source compiling stacks, existing CIM designs either manually deploy networks or build their own compilers, which is time-consuming and labor-intensive. Although some works expose the specific CIM device programming interfaces to compilers, they are often bound to a fixed CIM architecture, lacking the flexibility to support the CIM architectures with different computing granularity. On the other hand, existing compilation works usually consider the scheduling of limited operation types (such as crossbar-bound matrix-vector multiplication). Unlike conventional processors, CIM accelerators are featured by their diverse architecture, circuit, and device, which cannot be simply abstracted by a single level if we seek to fully explore the advantages brought by CIM. Therefore, we propose CIM-MLC, a universal multi-level compilation framework for general CIM architectures. We first establish a general hardware abstraction for CIM architectures and computing modes to represent various CIM accelerators. Based on the proposed abstraction, CIM-MLC can compile tasks onto a wide range of CIM accelerators having different devices, architectures, and programming interfaces. More importantly, compared with existing compilation work, CIM-MLC can explore the mapping and scheduling strategies across multiple architectural tiers, which form a tractable yet effective design space, to achieve better scheduling and instruction generation results.
△ Less
Submitted 8 May, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Multi-Attention Fusion Drowsy Driving Detection Model
Authors:
Shulei QU,
Zhenguo Gao,
Xiaoxiao Wu,
Yuanyuan Qiu
Abstract:
Drowsy driving represents a major contributor to traffic accidents, and the implementation of driver drowsy driving detection systems has been proven to significantly reduce the occurrence of such accidents. Despite the development of numerous drowsy driving detection algorithms, many of them impose specific prerequisites such as the availability of complete facial images, optimal lighting conditi…
▽ More
Drowsy driving represents a major contributor to traffic accidents, and the implementation of driver drowsy driving detection systems has been proven to significantly reduce the occurrence of such accidents. Despite the development of numerous drowsy driving detection algorithms, many of them impose specific prerequisites such as the availability of complete facial images, optimal lighting conditions, and the use of RGB images. In our study, we introduce a novel approach called the Multi-Attention Fusion Drowsy Driving Detection Model (MAF). MAF is aimed at significantly enhancing classification performance, especially in scenarios involving partial facial occlusion and low lighting conditions. It accomplishes this by capitalizing on the local feature extraction capabilities provided by multi-attention fusion, thereby enhancing the algorithm's overall robustness. To enhance our dataset, we collected real-world data that includes both occluded and unoccluded faces captured under nighttime and daytime lighting conditions. We conducted a comprehensive series of experiments using both publicly available datasets and our self-built data. The results of these experiments demonstrate that our proposed model achieves an impressive driver drowsiness detection accuracy of 96.8%.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Hypergraph Node Representation Learning with One-Stage Message Passing
Authors:
Shilin Qu,
Weiqing Wang,
Yuan-Fang Li,
Xin Zhou,
Fajie Yuan
Abstract:
Hypergraphs as an expressive and general structure have attracted considerable attention from various research domains. Most existing hypergraph node representation learning techniques are based on graph neural networks, and thus adopt the two-stage message passing paradigm (i.e. node -> hyperedge -> node). This paradigm only focuses on local information propagation and does not effectively take i…
▽ More
Hypergraphs as an expressive and general structure have attracted considerable attention from various research domains. Most existing hypergraph node representation learning techniques are based on graph neural networks, and thus adopt the two-stage message passing paradigm (i.e. node -> hyperedge -> node). This paradigm only focuses on local information propagation and does not effectively take into account global information, resulting in less optimal representations. Our theoretical analysis of representative two-stage message passing methods shows that, mathematically, they model different ways of local message passing through hyperedges, and can be unified into one-stage message passing (i.e. node -> node). However, they still only model local information. Motivated by this theoretical analysis, we propose a novel one-stage message passing paradigm to model both global and local information propagation for hypergraphs. We integrate this paradigm into HGraphormer, a Transformer-based framework for hypergraph node representation learning. HGraphormer injects the hypergraph structure information (local information) into Transformers (global information) by combining the attention matrix and hypergraph Laplacian. Extensive experiments demonstrate that HGraphormer outperforms recent hypergraph learning methods on five representative benchmark datasets on the semi-supervised hypernode classification task, setting new state-of-the-art performance, with accuracy improvements between 2.52% and 6.70%. Our code and datasets are available.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
Abusing Processor Exception for General Binary Instrumentation on Bare-metal Embedded Devices
Authors:
Shipei Qu,
Xiaolin Zhang,
Chi Zhang,
Dawu Gu
Abstract:
Analyzing the security of closed-source drivers and libraries in embedded systems holds significant importance, given their fundamental role in the supply chain. Unlike x86, embedded platforms lack comprehensive binary manipulating tools, making it difficult for researchers and developers to effectively detect and patch security issues in such closed-source components. Existing works either depend…
▽ More
Analyzing the security of closed-source drivers and libraries in embedded systems holds significant importance, given their fundamental role in the supply chain. Unlike x86, embedded platforms lack comprehensive binary manipulating tools, making it difficult for researchers and developers to effectively detect and patch security issues in such closed-source components. Existing works either depend on full-fledged operating system features or suffer from tedious corner cases, restricting their application to bare-metal firmware prevalent in embedded environments. In this paper, we present PIFER (Practical Instrumenting Framework for Embedded fiRmware) that enables general and fine-grained static binary instrumentation for embedded bare-metal firmware. By abusing the built-in hardware exception-handling mechanism of the embedded processors, PIFER can perform instrumentation on arbitrary target addresses. Additionally, We propose an instruction translation-based scheme to guarantee the correct execution of the original firmware after patching. We evaluate PIFER against real-world, complex firmware, including Zephyr RTOS, CoreMark benchmark, and a close-sourced commercial product. The results indicate that PIFER correctly instrumented 98.9% of the instructions. Further, a comprehensive performance evaluation was conducted, demonstrating the practicality and efficiency of our work.
△ Less
Submitted 23 April, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
AI-accelerated Discovery of Altermagnetic Materials
Authors:
Ze-Feng Gao,
Shuai Qu,
Bocheng Zeng,
Yang Liu,
Ji-Rong Wen,
Hao Sun,
Peng-Jie Guo,
Zhong-Yi Lu
Abstract:
Altermagnetism, a new magnetic phase, has been theoretically proposed and experimentally verified to be distinct from ferromagnetism and antiferromagnetism. Although altermagnets have been found to possess many exotic physical properties, the limited availability of known altermagnetic materials hinders the study of such properties. Hence, discovering more types of altermagnetic materials with dif…
▽ More
Altermagnetism, a new magnetic phase, has been theoretically proposed and experimentally verified to be distinct from ferromagnetism and antiferromagnetism. Although altermagnets have been found to possess many exotic physical properties, the limited availability of known altermagnetic materials hinders the study of such properties. Hence, discovering more types of altermagnetic materials with different properties is crucial for a comprehensive understanding of altermagnetism and thus facilitating new applications in the next generation information technologies, e.g., storage devices and high-sensitivity sensors. Since each altermagnetic material has a unique crystal structure, we propose an automated discovery approach empowered by an AI search engine that employs a pre-trained graph neural network to learn the intrinsic features of the material crystal structure, followed by fine-tuning a classifier with limited positive samples to predict the altermagnetism probability of a given material candidate. Finally, we successfully discovered 50 new altermagnetic materials that cover metals, semiconductors, and insulators confirmed by the first-principles electronic structure calculations. The wide range of electronic structural characteristics reveals that various novel physical properties manifest in these newly discovered altermagnetic materials, e.g., anomalous Hall effect, anomalous Kerr effect, and topological property. Noteworthy, we discovered 4 $i$-wave altermagnetic materials for the first time. Overall, the AI search engine performs much better than human experts and suggests a set of new altermagnetic materials with unique properties, outlining its potential for accelerated discovery of the materials with targeted properties.
△ Less
Submitted 13 May, 2025; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Curriculum-Driven Edubot: A Framework for Developing Language Learning Chatbots Through Synthesizing Conversational Data
Authors:
Yu Li,
Shang Qu,
Jili Shen,
Shangchao Min,
Zhou Yu
Abstract:
Chatbots have become popular in educational settings, revolutionizing how students interact with material and how teachers teach. We present Curriculum-Driven EduBot, a framework for developing a chatbot that combines the interactive features of chatbots with the systematic material of English textbooks to assist students in enhancing their conversational skills. We begin by extracting pertinent t…
▽ More
Chatbots have become popular in educational settings, revolutionizing how students interact with material and how teachers teach. We present Curriculum-Driven EduBot, a framework for developing a chatbot that combines the interactive features of chatbots with the systematic material of English textbooks to assist students in enhancing their conversational skills. We begin by extracting pertinent topics from textbooks and using large language models to generate dialogues related to these topics. We then fine-tune an open-source model using our generated conversational data to create our curriculum-driven chatbot. User studies demonstrate that EduBot outperforms ChatGPT in leading curriculum-based dialogues and adapting its dialogue to match the user's English proficiency level. By combining traditional textbook methodologies with conversational AI, our approach offers learners an interactive tool that aligns with their curriculum and provides user-tailored conversation practice. This facilitates meaningful student-bot dialogues and enriches the overall learning experience within the curriculum's pedagogical framework.
△ Less
Submitted 3 August, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Reformulating Sequential Recommendation: Learning Dynamic User Interest with Content-enriched Language Modeling
Authors:
Junzhe Jiang,
Shang Qu,
Mingyue Cheng,
Qi Liu,
Zhiding Liu,
Hao Zhang,
Rujiao Zhang,
Kai Zhang,
Rui Li,
Jiatong Li,
Min Gao
Abstract:
Recommender systems are indispensable in the realm of online applications, and sequential recommendation has enjoyed considerable prevalence due to its capacity to encapsulate the dynamic shifts in user interests. However, previous sequential modeling methods still have limitations in capturing contextual information. The primary reason is the lack of understanding of domain-specific knowledge and…
▽ More
Recommender systems are indispensable in the realm of online applications, and sequential recommendation has enjoyed considerable prevalence due to its capacity to encapsulate the dynamic shifts in user interests. However, previous sequential modeling methods still have limitations in capturing contextual information. The primary reason is the lack of understanding of domain-specific knowledge and item-related textual content. Fortunately, the emergence of powerful language models has unlocked the potential to incorporate extensive world knowledge into recommendation algorithms, enabling them to go beyond simple item attributes and truly understand the world surrounding user preferences. To achieve this, we propose LANCER, which leverages the semantic understanding capabilities of pre-trained language models to generate personalized recommendations. Our approach bridges the gap between language models and recommender systems, resulting in more human-like recommendations. We demonstrate the effectiveness of our approach through a series of experiments conducted on multiple benchmark datasets, showing promising results and providing valuable insights into the influence of our model on sequential recommendation tasks. Furthermore, our experimental codes are publicly available at https://github.com/Gnimixy/lancer.
△ Less
Submitted 13 April, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
SHAPNN: Shapley Value Regularized Tabular Neural Network
Authors:
Qisen Cheng,
Shuhui Qu,
Janghwan Lee
Abstract:
We present SHAPNN, a novel deep tabular data modeling architecture designed for supervised learning. Our approach leverages Shapley values, a well-established technique for explaining black-box models. Our neural network is trained using standard backward propagation optimization methods, and is regularized with realtime estimated Shapley values. Our method offers several advantages, including the…
▽ More
We present SHAPNN, a novel deep tabular data modeling architecture designed for supervised learning. Our approach leverages Shapley values, a well-established technique for explaining black-box models. Our neural network is trained using standard backward propagation optimization methods, and is regularized with realtime estimated Shapley values. Our method offers several advantages, including the ability to provide valid explanations with no computational overhead for data instances and datasets. Additionally, prediction with explanation serves as a regularizer, which improves the model's performance. Moreover, the regularized prediction enhances the model's capability for continual learning. We evaluate our method on various publicly available datasets and compare it with state-of-the-art deep neural network models, demonstrating the superior performance of SHAPNN in terms of AUROC, transparency, as well as robustness to streaming data.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
NormMark: A Weakly Supervised Markov Model for Socio-cultural Norm Discovery
Authors:
Farhad Moghimifar,
Shilin Qu,
Tongtong Wu,
Yuan-Fang Li,
Gholamreza Haffari
Abstract:
Norms, which are culturally accepted guidelines for behaviours, can be integrated into conversational models to generate utterances that are appropriate for the socio-cultural context. Existing methods for norm recognition tend to focus only on surface-level features of dialogues and do not take into account the interactions within a conversation. To address this issue, we propose NormMark, a prob…
▽ More
Norms, which are culturally accepted guidelines for behaviours, can be integrated into conversational models to generate utterances that are appropriate for the socio-cultural context. Existing methods for norm recognition tend to focus only on surface-level features of dialogues and do not take into account the interactions within a conversation. To address this issue, we propose NormMark, a probabilistic generative Markov model to carry the latent features throughout a dialogue. These features are captured by discrete and continuous latent variables conditioned on the conversation history, and improve the model's ability in norm recognition. The model is trainable on weakly annotated data using the variational technique. On a dataset with limited norm annotations, we show that our approach achieves higher F1 score, outperforming current state-of-the-art methods, including GPT3.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
PIE: Personalized Interest Exploration for Large-Scale Recommender Systems
Authors:
Khushhall Chandra Mahajan,
Amey Porobo Dharwadker,
Romil Shah,
Simeng Qu,
Gaurav Bang,
Brad Schumitsch
Abstract:
Recommender systems are increasingly successful in recommending personalized content to users. However, these systems often capitalize on popular content. There is also a continuous evolution of user interests that need to be captured, but there is no direct way to systematically explore users' interests. This also tends to affect the overall quality of the recommendation pipeline as training data…
▽ More
Recommender systems are increasingly successful in recommending personalized content to users. However, these systems often capitalize on popular content. There is also a continuous evolution of user interests that need to be captured, but there is no direct way to systematically explore users' interests. This also tends to affect the overall quality of the recommendation pipeline as training data is generated from the candidates presented to the user. In this paper, we present a framework for exploration in large-scale recommender systems to address these challenges. It consists of three parts, first the user-creator exploration which focuses on identifying the best creators that users are interested in, second the online exploration framework and third a feed composition mechanism that balances explore and exploit to ensure optimal prevalence of exploratory videos. Our methodology can be easily integrated into an existing large-scale recommender system with minimal modifications. We also analyze the value of exploration by defining relevant metrics around user-creator connections and understanding how this helps the overall recommendation pipeline with strong online gains in creator and ecosystem value. In contrast to the regression on user engagement metrics generally seen while exploring, our method is able to achieve significant improvements of 3.50% in strong creator connections and 0.85% increase in novel creator connections. Moreover, our work has been deployed in production on Facebook Watch, a popular video discovery and sharing platform serving billions of users.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
TMA: Temporal Motion Aggregation for Event-based Optical Flow
Authors:
Haotian Liu,
Guang Chen,
Sanqing Qu,
Yanping Zhang,
Zhijun Li,
Alois Knoll,
Changjun Jiang
Abstract:
Event cameras have the ability to record continuous and detailed trajectories of objects with high temporal resolution, thereby providing intuitive motion cues for optical flow estimation. Nevertheless, most existing learning-based approaches for event optical flow estimation directly remould the paradigm of conventional images by representing the consecutive event stream as static frames, ignorin…
▽ More
Event cameras have the ability to record continuous and detailed trajectories of objects with high temporal resolution, thereby providing intuitive motion cues for optical flow estimation. Nevertheless, most existing learning-based approaches for event optical flow estimation directly remould the paradigm of conventional images by representing the consecutive event stream as static frames, ignoring the inherent temporal continuity of event data. In this paper, we argue that temporal continuity is a vital element of event-based optical flow and propose a novel Temporal Motion Aggregation (TMA) approach to unlock its potential. Technically, TMA comprises three components: an event splitting strategy to incorporate intermediate motion information underlying the temporal context, a linear lookup strategy to align temporally fine-grained motion features and a novel motion pattern aggregation module to emphasize consistent patterns for motion feature enhancement. By incorporating temporally fine-grained motion information, TMA can derive better flow estimates than existing methods at early stages, which not only enables TMA to obtain more accurate final predictions, but also greatly reduces the demand for a number of refinements. Extensive experiments on DSEC-Flow and MVSEC datasets verify the effectiveness and superiority of our TMA. Remarkably, compared to E-RAFT, TMA achieves a 6\% improvement in accuracy and a 40\% reduction in inference time on DSEC-Flow. Code will be available at \url{https://github.com/ispc-lab/TMA}.
△ Less
Submitted 21 August, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
A Policy Iteration Approach for Flock Motion Control
Authors:
Shuzheng Qu,
Mohammed Abouheaf,
Wail Gueaieb,
Davide Spinello
Abstract:
The flocking motion control is concerned with managing the possible conflicts between local and team objectives of multi-agent systems. The overall control process guides the agents while monitoring the flock-cohesiveness and localization. The underlying mechanisms may degrade due to overlooking the unmodeled uncertainties associated with the flock dynamics and formation. On another side, the effi…
▽ More
The flocking motion control is concerned with managing the possible conflicts between local and team objectives of multi-agent systems. The overall control process guides the agents while monitoring the flock-cohesiveness and localization. The underlying mechanisms may degrade due to overlooking the unmodeled uncertainties associated with the flock dynamics and formation. On another side, the efficiencies of the various control designs rely on how quickly they can adapt to different dynamic situations in real-time. An online model-free policy iteration mechanism is developed here to guide a flock of agents to follow an independent command generator over a time-varying graph topology. The strength of connectivity between any two agents or the graph edge weight is decided using a position adjacency dependent function. An online recursive least squares approach is adopted to tune the guidance strategies without knowing the dynamics of the agents or those of the command generator. It is compared with another reinforcement learning approach from the literature which is based on a value iteration technique. The simulation results of the policy iteration mechanism revealed fast learning and convergence behaviors with less computational effort.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
An Adaptive Fuzzy Reinforcement Learning Cooperative Approach for the Autonomous Control of Flock Systems
Authors:
Shuzheng Qu,
Mohammed Abouheaf,
Wail Gueaieb,
Davide Spinello
Abstract:
The flock-guidance problem enjoys a challenging structure where multiple optimization objectives are solved simultaneously. This usually necessitates different control approaches to tackle various objectives, such as guidance, collision avoidance, and cohesion. The guidance schemes, in particular, have long suffered from complex tracking-error dynamics. Furthermore, techniques that are based on li…
▽ More
The flock-guidance problem enjoys a challenging structure where multiple optimization objectives are solved simultaneously. This usually necessitates different control approaches to tackle various objectives, such as guidance, collision avoidance, and cohesion. The guidance schemes, in particular, have long suffered from complex tracking-error dynamics. Furthermore, techniques that are based on linear feedback strategies obtained at equilibrium conditions either may not hold or degrade when applied to uncertain dynamic environments. Pre-tuned fuzzy inference architectures lack robustness under such unmodeled conditions. This work introduces an adaptive distributed technique for the autonomous control of flock systems. Its relatively flexible structure is based on online fuzzy reinforcement learning schemes which simultaneously target a number of objectives; namely, following a leader, avoiding collision, and reaching a flock velocity consensus. In addition to its resilience in the face of dynamic disturbances, the algorithm does not require more than the agent position as a feedback signal. The effectiveness of the proposed method is validated with two simulation scenarios and benchmarked against a similar technique from the literature.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Modality-Agnostic Debiasing for Single Domain Generalization
Authors:
Sanqing Qu,
Yingwei Pan,
Guang Chen,
Ting Yao,
Changjun Jiang,
Tao Mei
Abstract:
Deep neural networks (DNNs) usually fail to generalize well to outside of distribution (OOD) data, especially in the extreme case of single domain generalization (single-DG) that transfers DNNs from single domain to multiple unseen domains. Existing single-DG techniques commonly devise various data-augmentation algorithms, and remould the multi-source domain generalization methodology to learn dom…
▽ More
Deep neural networks (DNNs) usually fail to generalize well to outside of distribution (OOD) data, especially in the extreme case of single domain generalization (single-DG) that transfers DNNs from single domain to multiple unseen domains. Existing single-DG techniques commonly devise various data-augmentation algorithms, and remould the multi-source domain generalization methodology to learn domain-generalized (semantic) features. Nevertheless, these methods are typically modality-specific, thereby being only applicable to one single modality (e.g., image). In contrast, we target a versatile Modality-Agnostic Debiasing (MAD) framework for single-DG, that enables generalization for different modalities. Technically, MAD introduces a novel two-branch classifier: a biased-branch encourages the classifier to identify the domain-specific (superficial) features, and a general-branch captures domain-generalized features based on the knowledge from biased-branch. Our MAD is appealing in view that it is pluggable to most single-DG models. We validate the superiority of our MAD in a variety of single-DG scenarios with different modalities, including recognition on 1D texts, 2D images, 3D point clouds, and semantic segmentation on 2D images. More remarkably, for recognition on 3D point clouds and semantic segmentation on 2D images, MAD improves DSU by 2.82\% and 1.5\% in accuracy and mIOU.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Upcycling Models under Domain and Category Shift
Authors:
Sanqing Qu,
Tianpei Zou,
Florian Roehrbein,
Cewu Lu,
Guang Chen,
Dacheng Tao,
Changjun Jiang
Abstract:
Deep neural networks (DNNs) often perform poorly in the presence of domain shift and category shift. How to upcycle DNNs and adapt them to the target task remains an important open problem. Unsupervised Domain Adaptation (UDA), especially recently proposed Source-free Domain Adaptation (SFDA), has become a promising technology to address this issue. Nevertheless, existing SFDA methods require that…
▽ More
Deep neural networks (DNNs) often perform poorly in the presence of domain shift and category shift. How to upcycle DNNs and adapt them to the target task remains an important open problem. Unsupervised Domain Adaptation (UDA), especially recently proposed Source-free Domain Adaptation (SFDA), has become a promising technology to address this issue. Nevertheless, existing SFDA methods require that the source domain and target domain share the same label space, consequently being only applicable to the vanilla closed-set setting. In this paper, we take one step further and explore the Source-free Universal Domain Adaptation (SF-UniDA). The goal is to identify "known" data samples under both domain and category shift, and reject those "unknown" data samples (not present in source classes), with only the knowledge from standard pre-trained source model. To this end, we introduce an innovative global and local clustering learning technique (GLC). Specifically, we design a novel, adaptive one-vs-all global clustering algorithm to achieve the distinction across different target classes and introduce a local k-NN clustering strategy to alleviate negative transfer. We examine the superiority of our GLC on multiple benchmarks with different category shift scenarios, including partial-set, open-set, and open-partial-set DA. Remarkably, in the most challenging open-partial-set DA scenario, GLC outperforms UMAD by 14.8\% on the VisDA benchmark. The code is available at https://github.com/ispc-lab/GLC.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.