-
ManipLVM-R1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models
Authors:
Zirui Song,
Guangxian Ouyang,
Mingzhe Li,
Yuheng Ji,
Chenxi Wang,
Zixiang Xu,
Zeyu Zhang,
Xiaoqing Zhang,
Qian Jiang,
Zhenhao Chen,
Zhongzhi Li,
Rui Yan,
Xiuying Chen
Abstract:
Large Vision-Language Models (LVLMs) have recently advanced robotic manipulation by leveraging vision for scene perception and language for instruction following. However, existing methods rely heavily on costly human-annotated training datasets, which limits their generalization and causes them to struggle in out-of-domain (OOD) scenarios, reducing real-world adaptability. To address these challe…
▽ More
Large Vision-Language Models (LVLMs) have recently advanced robotic manipulation by leveraging vision for scene perception and language for instruction following. However, existing methods rely heavily on costly human-annotated training datasets, which limits their generalization and causes them to struggle in out-of-domain (OOD) scenarios, reducing real-world adaptability. To address these challenges, we propose ManipLVM-R1, a novel reinforcement learning framework that replaces traditional supervision with Reinforcement Learning using Verifiable Rewards (RLVR). By directly optimizing for task-aligned outcomes, our method enhances generalization and physical reasoning while removing the dependence on costly annotations. Specifically, we design two rule-based reward functions targeting key robotic manipulation subtasks: an Affordance Perception Reward to enhance localization of interaction regions, and a Trajectory Match Reward to ensure the physical plausibility of action paths. These rewards provide immediate feedback and impose spatial-logical constraints, encouraging the model to go beyond shallow pattern matching and instead learn deeper, more systematic reasoning about physical interactions.
△ Less
Submitted 24 May, 2025; v1 submitted 22 May, 2025;
originally announced May 2025.
-
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
Authors:
Zirui Song,
Qian Jiang,
Mingxuan Cui,
Mingzhe Li,
Lang Gao,
Zeyu Zhang,
Zixiang Xu,
Yanbo Wang,
Chenxi Wang,
Guangxian Ouyang,
Zhenhao Chen,
Xiuying Chen
Abstract:
The rise of Large Audio Language Models (LAMs) brings both potential and risks, as their audio outputs may contain harmful or unethical content. However, current research lacks a systematic, quantitative evaluation of LAM safety especially against jailbreak attacks, which are challenging due to the temporal and semantic nature of speech. To bridge this gap, we introduce AJailBench, the first bench…
▽ More
The rise of Large Audio Language Models (LAMs) brings both potential and risks, as their audio outputs may contain harmful or unethical content. However, current research lacks a systematic, quantitative evaluation of LAM safety especially against jailbreak attacks, which are challenging due to the temporal and semantic nature of speech. To bridge this gap, we introduce AJailBench, the first benchmark specifically designed to evaluate jailbreak vulnerabilities in LAMs. We begin by constructing AJailBench-Base, a dataset of 1,495 adversarial audio prompts spanning 10 policy-violating categories, converted from textual jailbreak attacks using realistic text to speech synthesis. Using this dataset, we evaluate several state-of-the-art LAMs and reveal that none exhibit consistent robustness across attacks. To further strengthen jailbreak testing and simulate more realistic attack conditions, we propose a method to generate dynamic adversarial variants. Our Audio Perturbation Toolkit (APT) applies targeted distortions across time, frequency, and amplitude domains. To preserve the original jailbreak intent, we enforce a semantic consistency constraint and employ Bayesian optimization to efficiently search for perturbations that are both subtle and highly effective. This results in AJailBench-APT, an extended dataset of optimized adversarial audio samples. Our findings demonstrate that even small, semantically preserved perturbations can significantly reduce the safety performance of leading LAMs, underscoring the need for more robust and semantically aware defense mechanisms.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Interpretable machine learning-guided design of Fe-based soft magnetic alloys
Authors:
Aditi Nachnani,
Kai K. Li-Caldwell,
Saptarshi Biswas,
Prince Sharma,
Gaoyuan Ouyang,
Prashant Singh
Abstract:
We present a machine-learning guided approach to predict saturation magnetization (MS) and coercivity (HC) in Fe-rich soft magnetic alloys, particularly Fe-Si-B systems. ML models trained on experimental data reveals that increasing Si and B content reduces MS from 1.81T (DFT~2.04 T) to ~1.54 T (DFT~1.56T) in Fe-Si-B, which is attributed to decreased magnetic density and structural modifications.…
▽ More
We present a machine-learning guided approach to predict saturation magnetization (MS) and coercivity (HC) in Fe-rich soft magnetic alloys, particularly Fe-Si-B systems. ML models trained on experimental data reveals that increasing Si and B content reduces MS from 1.81T (DFT~2.04 T) to ~1.54 T (DFT~1.56T) in Fe-Si-B, which is attributed to decreased magnetic density and structural modifications. Experimental validation of ML predicted magnetic saturation on Fe-1Si-1B (2.09T), Fe-5Si-5B (2.01T) and Fe-10Si-10B (1.54T) alloy compositions further support our findings. These trends are consistent with density functional theory (DFT) predictions, which link increased electronic disorder and band broadening to lower MS values. Experimental validation on selected alloys confirms the predictive accuracy of the ML model, with good agreement across compositions. Beyond predictive accuracy, detailed uncertainty quantification and model interpretability including through feature importance and partial dependence analysis reveals that MS is governed by a nonlinear interplay between Fe content, early transition metal ratios, and annealing temperature, while HC is more sensitive to processing conditions such as ribbon thickness and thermal treatment windows. The ML framework was further applied to Fe-Si-B/Cr/Cu/Zr/Nb alloys in a pseudo-quaternary compositional space, which shows comparable magnetic properties to NANOMET (Fe84.8Si0.5B9.4Cu0.8 P3.5C1), FINEMET (Fe73.5Si13.5B9 Cu1Nb3), NANOPERM (Fe88Zr7B4Cu1), and HITPERM (Fe44Co44Zr7B4Cu1. Our fundings demonstrate the potential of ML framework for accelerated search of high-performance, Co- and Ni-free, soft magnetic materials.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Machine learning driven search of hydrogen storage materials
Authors:
Tanumoy Banerjee,
Kevin Ji,
Weiyi Xia,
Gaoyuan Ouyang,
Tyler Del Rose,
Ihor Z. Hlova,
Benjamin Ueland,
Duane D. Johnson,
Cai-Zhuan Wang,
Ganesh Balasubramanian,
Prashant Singh
Abstract:
The transition to a low-carbon economy demands efficient and sustainable energy-storage solutions, with hydrogen emerging as a promising clean-energy carrier and with metal hydrides recognized for their hydrogen-storage capacity. Here, we leverage machine learning (ML) to predict hydrogen-to-metal (H/M) ratios and solution energy by incorporating thermodynamic parameters and local lattice distorti…
▽ More
The transition to a low-carbon economy demands efficient and sustainable energy-storage solutions, with hydrogen emerging as a promising clean-energy carrier and with metal hydrides recognized for their hydrogen-storage capacity. Here, we leverage machine learning (ML) to predict hydrogen-to-metal (H/M) ratios and solution energy by incorporating thermodynamic parameters and local lattice distortion (LLD) as key features. Our best-performing ML model provides improvements to H/M ratios and solution energies over a broad class of ternary alloys (easily extendable to multi-principal-element alloys), such as Ti-Nb-X (X = Mo, Cr, Hf, Ta, V, Zr) and Co-Ni-X (X = Al, Mg, V). Ti-Nb-Mo alloys reveal compositional effects in H-storage behavior, in particular Ti, Nb, and V enhance H-storage capacity, while Mo reduces H/M and hydrogen weight percent by 40-50%. We attributed to slow hydrogen kinetics in molybdenum rich alloys, which is validated by our pressure-composition isotherm (PCT) experiments on pure Ti and Ti5Mo95 alloys. Density functional theory (DFT) and molecular simulations also confirm that Ti and Nb promote H diffusion, whereas Mo hinders it, highlighting the interplay between electronic structure, lattice distortions, and hydrogen uptake. Notably, our Gradient Boosting Regression model identifies LLD as a critical factor in H/M predictions. To aid material selection, we present two periodic tables illustrating elemental effects on (a) H2 wt% and (b) solution energy, derived from ML, and provide a reference for identifying alloying elements that enhance hydrogen solubility and storage.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow
Authors:
Geliang Ouyang,
Jingyao Chen,
Zhihe Nie,
Yi Gui,
Yao Wan,
Hongyu Zhang,
Dongping Chen
Abstract:
Natural Language to Visualization (NL2Vis) seeks to convert natural-language descriptions into visual representations of given tables, empowering users to derive insights from large-scale data. Recent advancements in Large Language Models (LLMs) show promise in automating code generation to transform tabular data into accessible visualizations. However, they often struggle with complex queries tha…
▽ More
Natural Language to Visualization (NL2Vis) seeks to convert natural-language descriptions into visual representations of given tables, empowering users to derive insights from large-scale data. Recent advancements in Large Language Models (LLMs) show promise in automating code generation to transform tabular data into accessible visualizations. However, they often struggle with complex queries that require reasoning across multiple tables. To address this limitation, we propose a collaborative agent workflow, termed nvAgent, for NL2Vis. Specifically, nvAgent comprises three agents: a processor agent for database processing and context filtering, a composer agent for planning visualization generation, and a validator agent for code translation and output verification. Comprehensive evaluations on the new VisEval benchmark demonstrate that nvAgent consistently surpasses state-of-the-art baselines, achieving a 7.88% improvement in single-table and a 9.23% improvement in multi-table scenarios. Qualitative analyses further highlight that nvAgent maintains nearly a 20% performance margin over previous models, underscoring its capacity to produce high-quality visual representations from complex, heterogeneous data sources.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies
Authors:
Zirui Song,
Guangxian Ouyang,
Meng Fang,
Hongbin Na,
Zijing Shi,
Zhenhao Chen,
Yujie Fu,
Zeyu Zhang,
Shiyu Jiang,
Miao Fang,
Ling Chen,
Xiuying Chen
Abstract:
Existing household robots have made significant progress in performing routine tasks, such as cleaning floors or delivering objects. However, a key limitation of these robots is their inability to recognize potential problems or dangers in home environments. For example, a child may pick up and ingest medication that has fallen on the floor, posing a serious risk. We argue that household robots sh…
▽ More
Existing household robots have made significant progress in performing routine tasks, such as cleaning floors or delivering objects. However, a key limitation of these robots is their inability to recognize potential problems or dangers in home environments. For example, a child may pick up and ingest medication that has fallen on the floor, posing a serious risk. We argue that household robots should proactively detect such hazards or anomalies within the home, and propose the task of anomaly scenario generation. We leverage foundational models instead of relying on manually labeled data to build simulated environments. Specifically, we introduce a multi-agent brainstorming approach, where agents collaborate and generate diverse scenarios covering household hazards, hygiene management, and child safety. These textual task descriptions are then integrated with designed 3D assets to simulate realistic environments. Within these constructed environments, the robotic agent learns the necessary skills to proactively discover and handle the proposed anomalies through task decomposition, and optimal learning approach selection. We demonstrate that our generated environment outperforms others in terms of task description and scene diversity, ultimately enabling robotic agents to better address potential household hazards.
△ Less
Submitted 16 October, 2024;
originally announced November 2024.
-
RLGNet: Repeating-Local-Global History Network for Temporal Knowledge Graph Reasoning
Authors:
Ao Lv,
Guige Ouyang,
Yongzhong Huang,
Yue Chen,
Haoran Xie
Abstract:
Temporal Knowledge Graph (TKG) reasoning involves predicting future events based on historical information. However, due to the unpredictability of future events, this task is highly challenging. To address this issue, we propose a multi-scale hybrid architecture model based on ensemble learning, called RLGNet (Repeating-Local-Global History Network). Inspired by the application of multi-scale inf…
▽ More
Temporal Knowledge Graph (TKG) reasoning involves predicting future events based on historical information. However, due to the unpredictability of future events, this task is highly challenging. To address this issue, we propose a multi-scale hybrid architecture model based on ensemble learning, called RLGNet (Repeating-Local-Global History Network). Inspired by the application of multi-scale information in other fields, we introduce the concept of multi-scale information into TKG reasoning. Specifically, RLGNet captures and integrates different levels of historical information by combining modules that process information at various scales. The model comprises three modules: the Repeating History Module focuses on identifying repetitive patterns and trends in historical data, the Local History Module captures short-term changes and details, and the Global History Module provides a macro perspective on long-term changes. Additionally, to address the limitations of previous single-architecture models in generalizing across single-step and multi-step reasoning tasks, we adopted architectures based on Recurrent Neural Networks (RNN) and Multi-Layer Perceptrons (MLP) for the Local and Global History Modules, respectively. This hybrid architecture design enables the model to complement both multi-step and single-step reasoning capabilities. Finally, to address the issue of noise in TKGs, we adopt an ensemble learning strategy, combining the predictions of the three modules to reduce the impact of noise on the final prediction results. In the evaluation on six benchmark datasets, our approach generally outperforms existing TKG reasoning models in multi-step and single-step reasoning tasks.
△ Less
Submitted 28 July, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
Clique-based Method for Social Network Clustering
Authors:
Guang Ouyang,
Dipak K. Dey,
Panpan Zhang
Abstract:
In this article, we develop a clique-based method for social network clustering. We introduce a new index to evaluate the quality of clustering results, and propose an efficient algorithm based on recursive bipartition to maximize an objective function of the proposed index. The optimization problem is NP-hard, so we approximate the semi-optimal solution via an implicitly restarted Lanczos method.…
▽ More
In this article, we develop a clique-based method for social network clustering. We introduce a new index to evaluate the quality of clustering results, and propose an efficient algorithm based on recursive bipartition to maximize an objective function of the proposed index. The optimization problem is NP-hard, so we approximate the semi-optimal solution via an implicitly restarted Lanczos method. One of the advantages of our algorithm is that the proposed index of each community in the clustering result is guaranteed to be higher than some predetermined threshold, $p$, which is completely controlled by users. We also account for the situation that $p$ is unknown. A statistical procedure of controlling both under-clustering and over-clustering errors simultaneously is carried out to select localized threshold for each subnetwork, such that the community detection accuracy is optimized. Accordingly, we propose a localized clustering algorithm based on binary tree structure. Finally, we exploit the stochastic blockmodels to conduct simulation studies and demonstrate the accuracy and efficiency of our algorithms, both numerically and graphically.
△ Less
Submitted 9 May, 2018; v1 submitted 25 August, 2017;
originally announced August 2017.
-
A Mixed-Membership Model for Social Network Clustering
Authors:
Guang Ouyang,
Dipak K. Dey,
Panpan Zhang
Abstract:
We propose a simple mixed membership model for social network clustering in this paper. A flexible function is adopted to measure affinities among a set of entities in a social network. The model not only allows each entity in the network to possess more than one membership, but also provides accurate statistical inference about network structure. We estimate the membership parameters using an MCM…
▽ More
We propose a simple mixed membership model for social network clustering in this paper. A flexible function is adopted to measure affinities among a set of entities in a social network. The model not only allows each entity in the network to possess more than one membership, but also provides accurate statistical inference about network structure. We estimate the membership parameters using an MCMC algorithm. We evaluate the performance of the proposed algorithm by applying our model to two empirical social network data, the Zachary club data and the bottlenose dolphin network data. We also conduct some numerical studies based on synthetic networks for further assessing the effectiveness of our algorithm. In the end, some concluding remarks and future work are addressed briefly.
△ Less
Submitted 23 August, 2023; v1 submitted 24 August, 2017;
originally announced August 2017.