Search | arXiv e-print repository

Exploring Multi-Modal Data with Tool-Augmented LLM Agents for Precise Causal Discovery

Authors: ChengAo Shen, Zhengzhang Chen, Dongsheng Luo, Dongkuan Xu, Haifeng Chen, Jingchao Ni

Abstract: Causal discovery is an imperative foundation for decision-making across domains, such as smart health, AI for drug discovery and AIOps. Traditional statistical causal discovery methods, while well-established, predominantly rely on observational data and often overlook the semantic cues inherent in cause-and-effect relationships. The advent of Large Language Models (LLMs) has ushered in an afforda… ▽ More Causal discovery is an imperative foundation for decision-making across domains, such as smart health, AI for drug discovery and AIOps. Traditional statistical causal discovery methods, while well-established, predominantly rely on observational data and often overlook the semantic cues inherent in cause-and-effect relationships. The advent of Large Language Models (LLMs) has ushered in an affordable way of leveraging the semantic cues for knowledge-driven causal discovery, but the development of LLMs for causal discovery lags behind other areas, particularly in the exploration of multi-modal data. To bridge the gap, we introduce MATMCD, a multi-agent system powered by tool-augmented LLMs. MATMCD has two key agents: a Data Augmentation agent that retrieves and processes modality-augmented data, and a Causal Constraint agent that integrates multi-modal data for knowledge-driven reasoning. The proposed design of the inner-workings ensures successful cooperation of the agents. Our empirical study across seven datasets suggests the significant potential of multi-modality enhanced causal discovery. △ Less

Submitted 31 May, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

arXiv:2412.05421 [pdf, other]

KEDformer:Knowledge Extraction Seasonal Trend Decomposition for Long-term Sequence Prediction

Authors: Zhenkai Qin, Baozhong Wei, Caifeng Gao, Jianyuan Ni

Abstract: Time series forecasting is a critical task in domains such as energy, finance, and meteorology, where accurate long-term predictions are essential. While Transformer-based models have shown promise in capturing temporal dependencies, their application to extended sequences is limited by computational inefficiencies and limited generalization. In this study, we propose KEDformer, a knowledge extrac… ▽ More Time series forecasting is a critical task in domains such as energy, finance, and meteorology, where accurate long-term predictions are essential. While Transformer-based models have shown promise in capturing temporal dependencies, their application to extended sequences is limited by computational inefficiencies and limited generalization. In this study, we propose KEDformer, a knowledge extraction-driven framework that integrates seasonal-trend decomposition to address these challenges. KEDformer leverages knowledge extraction methods that focus on the most informative weights within the self-attention mechanism to reduce computational overhead. Additionally, the proposed KEDformer framework decouples time series into seasonal and trend components. This decomposition enhances the model's ability to capture both short-term fluctuations and long-term patterns. Extensive experiments on five public datasets from energy, transportation, and weather domains demonstrate the effectiveness and competitiveness of KEDformer, providing an efficient solution for long-term time series forecasting. △ Less

Submitted 6 December, 2024; originally announced December 2024.

arXiv:2103.02164 [pdf, other]

Dynamic Gaussian Mixture based Deep Generative Model For Robust Forecasting on Sparse Multivariate Time Series

Authors: Yinjun Wu, Jingchao Ni, Wei Cheng, Bo Zong, Dongjin Song, Zhengzhang Chen, Yanchi Liu, Xuchao Zhang, Haifeng Chen, Susan Davidson

Abstract: Forecasting on sparse multivariate time series (MTS) aims to model the predictors of future values of time series given their incomplete past, which is important for many emerging applications. However, most existing methods process MTS's individually, and do not leverage the dynamic distributions underlying the MTS's, leading to sub-optimal results when the sparsity is high. To address this chall… ▽ More Forecasting on sparse multivariate time series (MTS) aims to model the predictors of future values of time series given their incomplete past, which is important for many emerging applications. However, most existing methods process MTS's individually, and do not leverage the dynamic distributions underlying the MTS's, leading to sub-optimal results when the sparsity is high. To address this challenge, we propose a novel generative model, which tracks the transition of latent clusters, instead of isolated feature representations, to achieve robust modeling. It is characterized by a newly designed dynamic Gaussian mixture distribution, which captures the dynamics of clustering structures, and is used for emitting timeseries. The generative model is parameterized by neural networks. A structured inference network is also designed for enabling inductive analysis. A gating mechanism is further introduced to dynamically tune the Gaussian mixture distributions. Extensive experimental results on a variety of real-life datasets demonstrate the effectiveness of our method. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: This paper is accepted by AAAI 2021

arXiv:2011.12161

The thermal power generation and economic growth in the central and western China: A heterogeneous mixed panel Granger-Causality approach

Authors: Jie Ni, Jiayi Qian, Yixiao Lu, Hong Cheng

Abstract: The problem of the new energy economy has become a global hot issue. This study examines the causal relationship between the ratio of thermal power in total power generation (RTPG) and economic growth (GDP) in the western and central China by using the heterogeneous mixed panel Granger causality approach that accounts for both slope heterogeneity and cross-sectional dependence. For the overall pan… ▽ More The problem of the new energy economy has become a global hot issue. This study examines the causal relationship between the ratio of thermal power in total power generation (RTPG) and economic growth (GDP) in the western and central China by using the heterogeneous mixed panel Granger causality approach that accounts for both slope heterogeneity and cross-sectional dependence. For the overall panel, the empirical findings support the presence of unidirectional causality running from GDP to RTPG (in northwest China), and from RTPG to GDP (in central). At the provincial level, there is causality from GDP to RTPG in NeiMongol and Ningxia, and causality from RTPG to GDP in Shanxi, Anhui, and Jiangxi. As for the cross regions relationships, we find that GDP (in western) Granger-cause RTPG (in central), and RTPG (in southwest) Granger-cause GDP (in central and northwest). Moreover, panel regressions show the negative impact from GDP to RTPG in the northwest, and RTPG to GDP in the central. However, RTPG has a positive influence on GDP in the northwest. Therefore, to improve economic development without compromising the regions' competitiveness in central and western China, we can adjust the power generation structure, and increase investments in the renewable energy supply and energy efficiency. △ Less

Submitted 18 October, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

Comments: The results of this paper cannot be reproduced on the 2018 data. Some updated data obtained from government statistical reports may overturn the conclusions of this article. So we consider changing the method of analysis, conduct further research, and rewrite this article

arXiv:2008.13361 [pdf, other]

Multi-Scale One-Class Recurrent Neural Networks for Discrete Event Sequence Anomaly Detection

Authors: Zhiwei Wang, Zhengzhang Chen, Jingchao Ni, Hui Liu, Haifeng Chen, Jiliang Tang

Abstract: Discrete event sequences are ubiquitous, such as an ordered event series of process interactions in Information and Communication Technology systems. Recent years have witnessed increasing efforts in detecting anomalies with discrete-event sequences. However, it still remains an extremely difficult task due to several intrinsic challenges including data imbalance issues, the discrete property of t… ▽ More Discrete event sequences are ubiquitous, such as an ordered event series of process interactions in Information and Communication Technology systems. Recent years have witnessed increasing efforts in detecting anomalies with discrete-event sequences. However, it still remains an extremely difficult task due to several intrinsic challenges including data imbalance issues, the discrete property of the events, and sequential nature of the data. To address these challenges, in this paper, we propose OC4Seq, a multi-scale one-class recurrent neural network for detecting anomalies in discrete event sequences. Specifically, OC4Seq integrates the anomaly detection objective with recurrent neural networks (RNNs) to embed the discrete event sequences into latent spaces, where anomalies can be easily detected. In addition, given that an anomalous sequence could be caused by either individual events, subsequences of events, or the whole sequence, we design a multi-scale RNN framework to capture different levels of sequential patterns simultaneously. Experimental results on three benchmark datasets show that OC4Seq consistently outperforms various representative baselines by a large margin. Moreover, through both quantitative and qualitative analysis, the importance of capturing multi-scale sequential patterns for event anomaly detection is verified. △ Less

Submitted 31 August, 2020; originally announced August 2020.

arXiv:2006.00234 [pdf, other]

Integrating global spatial features in CNN based Hyperspectral/SAR imagery classification

Authors: Fan Zhang, MinChao Yan, Chen Hu, Jun Ni, Fei Ma

Abstract: The land cover classification has played an important role in remote sensing because it can intelligently identify things in one huge remote sensing image to reduce the work of humans. However, a lot of classification methods are designed based on the pixel feature or limited spatial feature of the remote sensing image, which limits the classification accuracy and universality of their methods. Th… ▽ More The land cover classification has played an important role in remote sensing because it can intelligently identify things in one huge remote sensing image to reduce the work of humans. However, a lot of classification methods are designed based on the pixel feature or limited spatial feature of the remote sensing image, which limits the classification accuracy and universality of their methods. This paper proposed a novel method to take into the information of remote sensing image, i.e., geographic latitude-longitude information. In addition, a dual-branch convolutional neural network (CNN) classification method is designed in combination with the global information to mine the pixel features of the image. Then, the features of the two neural networks are fused with another fully neural network to realize the classification of remote sensing images. Finally, two remote sensing images are used to verify the effectiveness of our method, including hyperspectral imaging (HSI) and polarimetric synthetic aperture radar (PolSAR) imagery. The result of the proposed method is superior to the traditional single-channel convolutional neural network. △ Less

Submitted 15 June, 2020; v1 submitted 30 May, 2020; originally announced June 2020.

arXiv:2005.07427 [pdf, other]

Structural Temporal Graph Neural Networks for Anomaly Detection in Dynamic Graphs

Authors: Lei Cai, Zhengzhang Chen, Chen Luo, Jiaping Gui, Jingchao Ni, Ding Li, Haifeng Chen

Abstract: Detecting anomalies in dynamic graphs is a vital task, with numerous practical applications in areas such as security, finance, and social media. Previous network embedding based methods have been mostly focusing on learning good node representations, whereas largely ignoring the subgraph structural changes related to the target nodes in dynamic graphs. In this paper, we propose StrGNN, an end-to-… ▽ More Detecting anomalies in dynamic graphs is a vital task, with numerous practical applications in areas such as security, finance, and social media. Previous network embedding based methods have been mostly focusing on learning good node representations, whereas largely ignoring the subgraph structural changes related to the target nodes in dynamic graphs. In this paper, we propose StrGNN, an end-to-end structural temporal Graph Neural Network model for detecting anomalous edges in dynamic graphs. In particular, we first extract the $h$-hop enclosing subgraph centered on the target edge and propose the node labeling function to identify the role of each node in the subgraph. Then, we leverage graph convolution operation and Sortpooling layer to extract the fixed-size feature from each snapshot/timestamp. Based on the extracted features, we utilize Gated recurrent units (GRUs) to capture the temporal information for anomaly detection. Extensive experiments on six benchmark datasets and a real enterprise security system demonstrate the effectiveness of StrGNN. △ Less

Submitted 25 May, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

arXiv:1912.01389 [pdf, other]

Towards Lingua Franca Named Entity Recognition with BERT

Authors: Taesun Moon, Parul Awasthy, Jian Ni, Radu Florian

Abstract: Information extraction is an important task in NLP, enabling the automatic extraction of data for relational database filling. Historically, research and data was produced for English text, followed in subsequent years by datasets in Arabic, Chinese (ACE/OntoNotes), Dutch, Spanish, German (CoNLL evaluations), and many others. The natural tendency has been to treat each language as a different data… ▽ More Information extraction is an important task in NLP, enabling the automatic extraction of data for relational database filling. Historically, research and data was produced for English text, followed in subsequent years by datasets in Arabic, Chinese (ACE/OntoNotes), Dutch, Spanish, German (CoNLL evaluations), and many others. The natural tendency has been to treat each language as a different dataset and build optimized models for each. In this paper we investigate a single Named Entity Recognition model, based on a multilingual BERT, that is trained jointly on many languages simultaneously, and is able to decode these languages with better accuracy than models trained only on one language. To improve the initial model, we study the use of regularization strategies such as multitask learning and partial gradient updates. In addition to being a single model that can tackle multiple languages (including code switch), the model could be used to make zero-shot predictions on a new language, even ones for which training data is not available, out of the box. The results show that this model not only performs competitively with monolingual models, but it also achieves state-of-the-art results on the CoNLL02 Dutch and Spanish datasets, OntoNotes Arabic and Chinese datasets. Moreover, it performs reasonably well on unseen languages, achieving state-of-the-art for zero-shot on three CoNLL languages. △ Less

Submitted 12 December, 2019; v1 submitted 19 November, 2019; originally announced December 2019.

arXiv:1910.08074 [pdf, other]

Heterogeneous Graph Matching Networks

Authors: Shen Wang, Zhengzhang Chen, Xiao Yu, Ding Li, Jingchao Ni, Lu-An Tang, Jiaping Gui, Zhichun Li, Haifeng Chen, Philip S. Yu

Abstract: Information systems have widely been the target of malware attacks. Traditional signature-based malicious program detection algorithms can only detect known malware and are prone to evasion techniques such as binary obfuscation, while behavior-based approaches highly rely on the malware training samples and incur prohibitively high training cost. To address the limitations of existing techniques,… ▽ More Information systems have widely been the target of malware attacks. Traditional signature-based malicious program detection algorithms can only detect known malware and are prone to evasion techniques such as binary obfuscation, while behavior-based approaches highly rely on the malware training samples and incur prohibitively high training cost. To address the limitations of existing techniques, we propose MatchGNet, a heterogeneous Graph Matching Network model to learn the graph representation and similarity metric simultaneously based on the invariant graph modeling of the program's execution behaviors. We conduct a systematic evaluation of our model and show that it is accurate in detecting malicious program behavior and can help detect malware attacks with less false positives. MatchGNet outperforms the state-of-the-art algorithms in malware detection by generating 50% less false positives while keeping zero false negatives. △ Less

Submitted 17 October, 2019; originally announced October 2019.

arXiv:1905.03679 [pdf, other]

Adversarial Defense Framework for Graph Neural Network

Authors: Shen Wang, Zhengzhang Chen, Jingchao Ni, Xiao Yu, Zhichun Li, Haifeng Chen, Philip S. Yu

Abstract: Graph neural network (GNN), as a powerful representation learning model on graph data, attracts much attention across various disciplines. However, recent studies show that GNN is vulnerable to adversarial attacks. How to make GNN more robust? What are the key vulnerabilities in GNN? How to address the vulnerabilities and defense GNN against the adversarial attacks? In this paper, we propose DefNe… ▽ More Graph neural network (GNN), as a powerful representation learning model on graph data, attracts much attention across various disciplines. However, recent studies show that GNN is vulnerable to adversarial attacks. How to make GNN more robust? What are the key vulnerabilities in GNN? How to address the vulnerabilities and defense GNN against the adversarial attacks? In this paper, we propose DefNet, an effective adversarial defense framework for GNNs. In particular, we first investigate the latent vulnerabilities in every layer of GNNs and propose corresponding strategies including dual-stage aggregation and bottleneck perceptron. Then, to cope with the scarcity of training data, we propose an adversarial contrastive learning method to train the GNN in a conditional GAN manner by leveraging the high-level graph representation. Extensive experiments on three public datasets demonstrate the effectiveness of DefNet in improving the robustness of popular GNN variants, such as Graph Convolutional Network and GraphSAGE, under various types of adversarial attacks. △ Less

Submitted 10 May, 2019; v1 submitted 9 May, 2019; originally announced May 2019.

arXiv:1812.04064 [pdf, other]

Attentional Heterogeneous Graph Neural Network: Application to Program Reidentification

Authors: Shen Wang, Zhengzhang Chen, Ding Li, Lu-An Tang, Jingchao Ni, Zhichun Li, Junghwan Rhee, Haifeng Chen, Philip S. Yu

Abstract: Program or process is an integral part of almost every IT/OT system. Can we trust the identity/ID (e.g., executable name) of the program? To avoid detection, malware may disguise itself using the ID of a legitimate program, and a system tool (e.g., PowerShell) used by the attackers may have the fake ID of another common software, which is less sensitive. However, existing intrusion detection techn… ▽ More Program or process is an integral part of almost every IT/OT system. Can we trust the identity/ID (e.g., executable name) of the program? To avoid detection, malware may disguise itself using the ID of a legitimate program, and a system tool (e.g., PowerShell) used by the attackers may have the fake ID of another common software, which is less sensitive. However, existing intrusion detection techniques often overlook this critical program reidentification problem (i.e., checking the program's identity). In this paper, we propose an attentional heterogeneous graph neural network model (DeepHGNN) to verify the program's identity based on its system behaviors. The key idea is to leverage the representation learning of the heterogeneous program behavior graph to guide the reidentification process. We formulate the program reidentification as a graph classification problem and develop an effective attentional heterogeneous graph embedding algorithm to solve it. Extensive experiments --- using real-world enterprise monitoring data and real attacks --- demonstrate the effectiveness of DeepHGNN across multiple popular metrics and the robustness to the normal dynamic changes like program version upgrades. △ Less

Submitted 8 May, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

arXiv:1811.08055 [pdf, other]

A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data

Authors: Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, Nitesh V. Chawla

Abstract: Nowadays, multivariate time series data are increasingly collected in various real world systems, e.g., power plants, wearable devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependen… ▽ More Nowadays, multivariate time series data are increasingly collected in various real world systems, e.g., power plants, wearable devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependency in each time series, but also need encode the inter-correlations between different pairs of time series. In addition, the system should be robust to noise and provide operators with different levels of anomaly scores based upon the severity of different incidents. Despite the fact that a number of unsupervised anomaly detection algorithms have been developed, few of them can jointly address these challenges. In this paper, we propose a Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED), to perform anomaly detection and diagnosis in multivariate time series data. Specifically, MSCRED first constructs multi-scale (resolution) signature matrices to characterize multiple levels of the system statuses in different time steps. Subsequently, given the signature matrices, a convolutional encoder is employed to encode the inter-sensor (time series) correlations and an attention based Convolutional Long-Short Term Memory (ConvLSTM) network is developed to capture the temporal patterns. Finally, based upon the feature maps which encode the inter-sensor correlations and temporal information, a convolutional decoder is used to reconstruct the input signature matrices and the residual signature matrices are further utilized to detect and diagnose anomalies. Extensive empirical studies based on a synthetic dataset and a real power plant dataset demonstrate that MSCRED can outperform state-of-the-art baseline methods. △ Less

Submitted 19 November, 2018; originally announced November 2018.

arXiv:1503.00463 [pdf, other]

3D Power-map for Smart Grids---An Integration of High-dimensional Analysis and Visualization

Authors: Xing He, Qian Ai, Robert C. qiu, Jianmo Ni, Longjian Piao, Yiting Xu, Xinyi Xu

Abstract: Data with features of volume, velocity, variety, and veracity are challenging traditional tools to extract useful analysis for decision-making. By integrating high-dimensional analysis with visualization, this paper develops a 3D power-map animation as an effective solution to the challenge. An architecture design, with detailed data processing procedure, is proposed to realize the integration. Tw… ▽ More Data with features of volume, velocity, variety, and veracity are challenging traditional tools to extract useful analysis for decision-making. By integrating high-dimensional analysis with visualization, this paper develops a 3D power-map animation as an effective solution to the challenge. An architecture design, with detailed data processing procedure, is proposed to realize the integration. Two of the most important components in the architecture are presented: the Single-Ring Law for random matrices as solid mathematic foundation, and the proposed statistical index MSR as high-dimensional data for visualization. The whole procedure is easy in logic, fast in speed, objective and even robust against bad data. Moreover, it is an unsupervised machine learning mechanism directly oriented to the raw data rather than logics or models based on simplifications and assumptions. A case study validates the effectiveness and performance of the developed 3D power-map in analysis extraction. △ Less

Submitted 2 March, 2015; originally announced March 2015.

Comments: 5 pages, 7 figures, submitted to PESGM 2015. arXiv admin note: substantial text overlap with arXiv:1502.00060

arXiv:1204.5540 [pdf, other]

doi 10.1214/12-SSY073

Learning Loosely Connected Markov Random Fields

Authors: Rui Wu, R. Srikant, Jian Ni

Abstract: We consider the structure learning problem for graphical models that we call loosely connected Markov random fields, in which the number of short paths between any pair of nodes is small, and present a new conditional independence test based algorithm for learning the underlying graph structure. The novel maximization step in our algorithm ensures that the true edges are detected correctly even wh… ▽ More We consider the structure learning problem for graphical models that we call loosely connected Markov random fields, in which the number of short paths between any pair of nodes is small, and present a new conditional independence test based algorithm for learning the underlying graph structure. The novel maximization step in our algorithm ensures that the true edges are detected correctly even when there are short cycles in the graph. The number of samples required by our algorithm is C*log p, where p is the size of the graph and the constant C depends on the parameters of the model. We show that several previously studied models are examples of loosely connected Markov random fields, and our algorithm achieves the same or lower computational complexity than the previously designed algorithms for individual cases. We also get new results for more general graphical models, in particular, our algorithm learns general Ising models on the Erdos-Renyi random graph G(p, c/p) correctly with running time O(np^5). △ Less

Submitted 4 February, 2014; v1 submitted 24 April, 2012; originally announced April 2012.

Comments: 45 pages, minor revision

Journal ref: Wu, Rui, Srikant, R., Ni, Jian, Learning loosely connected Markov random fields, Stochastic Systems, 3, (2013), 362-404

Showing 1–14 of 14 results for author: Ni, J