Search | arXiv e-print repository

From Correlation to Causation: Understanding Climate Change through Causal Analysis and LLM Interpretations

Abstract: This research presents a three-step causal inference framework that integrates correlation analysis, machine learning-based causality discovery, and LLM-driven interpretations to identify socioeconomic factors influencing carbon emissions and contributing to climate change. The approach begins with identifying correlations, progresses to causal analysis, and enhances decision making through LLM-ge… ▽ More This research presents a three-step causal inference framework that integrates correlation analysis, machine learning-based causality discovery, and LLM-driven interpretations to identify socioeconomic factors influencing carbon emissions and contributing to climate change. The approach begins with identifying correlations, progresses to causal analysis, and enhances decision making through LLM-generated inquiries about the context of climate change. The proposed framework offers adaptable solutions that support data-driven policy-making and strategic decision-making in climate-related contexts, uncovering causal relationships within the climate change domain. △ Less

Submitted 21 December, 2024; originally announced December 2024.

arXiv:2408.12888 [pdf, other]

Accelerated Markov Chain Monte Carlo Using Adaptive Weighting Scheme

Authors: Yanbo Wang, Wenyu Chen, Shimin Shan

Abstract: Gibbs sampling is one of the most commonly used Markov Chain Monte Carlo (MCMC) algorithms due to its simplicity and efficiency. It cycles through the latent variables, sampling each one from its distribution conditional on the current values of all the other variables. Conventional Gibbs sampling is based on the systematic scan (with a deterministic order of variables). In contrast, in recent yea… ▽ More Gibbs sampling is one of the most commonly used Markov Chain Monte Carlo (MCMC) algorithms due to its simplicity and efficiency. It cycles through the latent variables, sampling each one from its distribution conditional on the current values of all the other variables. Conventional Gibbs sampling is based on the systematic scan (with a deterministic order of variables). In contrast, in recent years, Gibbs sampling with random scan has shown its advantage in some scenarios. However, almost all the analyses of Gibbs sampling with the random scan are based on uniform selection of variables. In this paper, we focus on a random scan Gibbs sampling method that selects each latent variable non-uniformly. Firstly, we show that this non-uniform scan Gibbs sampling leaves the target posterior distribution invariant. Then we explore how to determine the selection probability for latent variables. In particular, we construct an objective as a function of the selection probability and solve the constrained optimization problem. We further derive an analytic solution of the selection probability, which can be estimated easily. Our algorithm relies on the simple intuition that choosing the variable updates according to their marginal probabilities enhances the mixing time of the Markov chain. Finally, we validate the effectiveness of the proposed Gibbs sampler by conducting a set of experiments on real-world applications. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2403.04231 [pdf]

doi 10.22004/ag.econ.348726

Identification of socioeconomic factors influencing global food price security using machine learning

Authors: Shan Shan

Abstract: Global concern over food prices and security has been exacerbated by the impacts of armed conflicts such as the Russia Ukraine War, pandemic diseases, and climate change. Traditionally, analyzing global food prices and their associations with socioeconomic factors has relied on static linear regression models. However, the complexity of socioeconomic factors and their implications extend beyond si… ▽ More Global concern over food prices and security has been exacerbated by the impacts of armed conflicts such as the Russia Ukraine War, pandemic diseases, and climate change. Traditionally, analyzing global food prices and their associations with socioeconomic factors has relied on static linear regression models. However, the complexity of socioeconomic factors and their implications extend beyond simple linear relationships. By incorporating determinants, critical characteristics identification, and comparative model analysis, this study aimed to identify the critical socioeconomic characteristics and multidimensional relationships associated with the underlying factors of food prices and security. Machine learning tools were used to uncover the socioeconomic factors influencing global food prices from 2000 to 2022. A total of 105 key variables from the World Development Indicators and the Food and Agriculture Organization of the United Nations were selected. Machine learning identified four key dimensions of food price security: economic and population metrics, military spending, health spending, and environmental factors. The top 30 determinants were selected for feature extraction using data mining. The efficiency of the support vector regression model allowed for precise prediction making and correlation analysis. Keywords: environment and growth, global economics, price fluctuation, support vector regression △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2306.11157 [pdf, other]

Human Limits in Machine Learning: Prediction of Plant Phenotypes Using Soil Microbiome Data

Authors: Rosa Aghdam, Xudong Tang, Shan Shan, Richard Lankau, Claudia Solís-Lemus

Abstract: The preservation of soil health is a critical challenge in the 21st century due to its significant impact on agriculture, human health, and biodiversity. We provide the first deep investigation of the predictive potential of machine learning models to understand the connections between soil and biological phenotypes. We investigate an integrative framework performing accurate machine learning-base… ▽ More The preservation of soil health is a critical challenge in the 21st century due to its significant impact on agriculture, human health, and biodiversity. We provide the first deep investigation of the predictive potential of machine learning models to understand the connections between soil and biological phenotypes. We investigate an integrative framework performing accurate machine learning-based prediction of plant phenotypes from biological, chemical, and physical properties of the soil via two models: random forest and Bayesian neural network. We show that prediction is improved when incorporating environmental features like soil physicochemical properties and microbial population density into the models, in addition to the microbiome information. Exploring various data preprocessing strategies confirms the significant impact of human decisions on predictive performance. We show that the naive total sum scaling normalization that is commonly used in microbiome research is not the optimal strategy to maximize predictive power. Also, we find that accurately defined labels are more important than normalization, taxonomic level or model characteristics. In cases where humans are unable to classify samples accurately, machine learning model performance is limited. Lastly, we provide domain scientists via a full model selection decision tree to identify the human choices that optimize model prediction power. Our work is accompanied by open source reproducible scripts (https://github.com/solislemuslab/soil-microbiome-nn) for maximum outreach among the microbiome research community. △ Less

Submitted 16 February, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

arXiv:2203.02867 [pdf, other]

Diffusion Maps : Using the Semigroup Property for Parameter Tuning

Authors: Shan Shan, Ingrid Daubechies

Abstract: Diffusion maps (DM) constitute a classic dimension reduction technique, for data lying on or close to a (relatively) low-dimensional manifold embedded in a much larger dimensional space. The DM procedure consists in constructing a spectral parametrization for the manifold from simulated random walks or diffusion paths on the data set. However, DM is hard to tune in practice. In particular, the tas… ▽ More Diffusion maps (DM) constitute a classic dimension reduction technique, for data lying on or close to a (relatively) low-dimensional manifold embedded in a much larger dimensional space. The DM procedure consists in constructing a spectral parametrization for the manifold from simulated random walks or diffusion paths on the data set. However, DM is hard to tune in practice. In particular, the task to set a diffusion time t when constructing the diffusion kernel matrix is critical. We address this problem by using the semigroup property of the diffusion operator. We propose a semigroup criterion for picking t. Experiments show that this principled approach is effective and robust. △ Less

Submitted 5 March, 2022; originally announced March 2022.

Comments: 14 pages, 12 figures

arXiv:2107.04855 [pdf, ps, other]

Kernel Mean Estimation by Marginalized Corrupted Distributions

Authors: Xiaobo Xia, Shuo Shan, Mingming Gong, Nannan Wang, Fei Gao, Haikun Wei, Tongliang Liu

Abstract: Estimating the kernel mean in a reproducing kernel Hilbert space is a critical component in many kernel learning algorithms. Given a finite sample, the standard estimate of the target kernel mean is the empirical average. Previous works have shown that better estimators can be constructed by shrinkage methods. In this work, we propose to corrupt data examples with noise from known distributions an… ▽ More Estimating the kernel mean in a reproducing kernel Hilbert space is a critical component in many kernel learning algorithms. Given a finite sample, the standard estimate of the target kernel mean is the empirical average. Previous works have shown that better estimators can be constructed by shrinkage methods. In this work, we propose to corrupt data examples with noise from known distributions and present a new kernel mean estimator, called the marginalized kernel mean estimator, which estimates kernel mean under the corrupted distribution. Theoretically, we show that the marginalized kernel mean estimator introduces implicit regularization in kernel mean estimation. Empirically, we show on a variety of datasets that the marginalized kernel mean estimator obtains much lower estimation error than the existing estimators. △ Less

Submitted 10 July, 2021; originally announced July 2021.

arXiv:2104.12476 [pdf, other]

EigenGAN: Layer-Wise Eigen-Learning for GANs

Authors: Zhenliang He, Meina Kan, Shiguang Shan

Abstract: Recent studies on Generative Adversarial Network (GAN) reveal that different layers of a generative CNN hold different semantics of the synthesized images. However, few GAN models have explicit dimensions to control the semantic attributes represented in a specific layer. This paper proposes EigenGAN which is able to unsupervisedly mine interpretable and controllable dimensions from different gene… ▽ More Recent studies on Generative Adversarial Network (GAN) reveal that different layers of a generative CNN hold different semantics of the synthesized images. However, few GAN models have explicit dimensions to control the semantic attributes represented in a specific layer. This paper proposes EigenGAN which is able to unsupervisedly mine interpretable and controllable dimensions from different generator layers. Specifically, EigenGAN embeds one linear subspace with orthogonal basis into each generator layer. Via generative adversarial training to learn a target distribution, these layer-wise subspaces automatically discover a set of "eigen-dimensions" at each layer corresponding to a set of semantic attributes or interpretable variations. By traversing the coefficient of a specific eigen-dimension, the generator can produce samples with continuous changes corresponding to a specific semantic attribute. Taking the human face for example, EigenGAN can discover controllable dimensions for high-level concepts such as pose and gender in the subspace of deep layers, as well as low-level concepts such as hue and color in the subspace of shallow layers. Moreover, in the linear case, we theoretically prove that our algorithm derives the principal components as PCA does. Codes can be found in https://github.com/LynnHo/EigenGAN-Tensorflow. △ Less

Submitted 9 August, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

Comments: ICCV 2021. Code: https://github.com/LynnHo/EigenGAN-Tensorflow

arXiv:2008.02676 [pdf, other]

Exchangeable Neural ODE for Set Modeling

Authors: Yang Li, Haidong Yi, Christopher M. Bender, Siyuan Shan, Junier B. Oliva

Abstract: Reasoning over an instance composed of a set of vectors, like a point cloud, requires that one accounts for intra-set dependent features among elements. However, since such instances are unordered, the elements' features should remain unchanged when the input's order is permuted. This property, permutation equivariance, is a challenging constraint for most neural architectures. While recent work h… ▽ More Reasoning over an instance composed of a set of vectors, like a point cloud, requires that one accounts for intra-set dependent features among elements. However, since such instances are unordered, the elements' features should remain unchanged when the input's order is permuted. This property, permutation equivariance, is a challenging constraint for most neural architectures. While recent work has proposed global pooling and attention-based solutions, these may be limited in the way that intradependencies are captured in practice. In this work we propose a more general formulation to achieve permutation equivariance through ordinary differential equations (ODE). Our proposed module, Exchangeable Neural ODE (ExNODE), can be seamlessly applied for both discriminative and generative tasks. We also extend set modeling in the temporal dimension and propose a VAE based model for temporal set modeling. Extensive experiments demonstrate the efficacy of our method over strong baselines. △ Less

Submitted 6 August, 2020; originally announced August 2020.

arXiv:2002.08327 [pdf, ps, other]

Fawkes: Protecting Privacy against Unauthorized Deep Learning Models

Authors: Shawn Shan, Emily Wenger, Jiayun Zhang, Huiying Li, Haitao Zheng, Ben Y. Zhao

Abstract: Today's proliferation of powerful facial recognition systems poses a real threat to personal privacy. As Clearview.ai demonstrated, anyone can canvas the Internet for data and train highly accurate facial recognition models of individuals without their knowledge. We need tools to protect ourselves from potential misuses of unauthorized facial recognition systems. Unfortunately, no practical or eff… ▽ More Today's proliferation of powerful facial recognition systems poses a real threat to personal privacy. As Clearview.ai demonstrated, anyone can canvas the Internet for data and train highly accurate facial recognition models of individuals without their knowledge. We need tools to protect ourselves from potential misuses of unauthorized facial recognition systems. Unfortunately, no practical or effective solutions exist. In this paper, we propose Fawkes, a system that helps individuals inoculate their images against unauthorized facial recognition models. Fawkes achieves this by helping users add imperceptible pixel-level changes (we call them "cloaks") to their own photos before releasing them. When used to train facial recognition models, these "cloaked" images produce functional models that consistently cause normal images of the user to be misidentified. We experimentally demonstrate that Fawkes provides 95+% protection against user recognition regardless of how trackers train their models. Even when clean, uncloaked images are "leaked" to the tracker and used for training, Fawkes can still maintain an 80+% protection success rate. We achieve 100% success in experiments against today's state-of-the-art facial recognition services. Finally, we show that Fawkes is robust against a variety of countermeasures that try to detect or disrupt image cloaks. △ Less

Submitted 22 June, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

Journal ref: USENIX Security Symposium 2020

arXiv:1910.01226 [pdf, ps, other]

Piracy Resistant Watermarks for Deep Neural Networks

Authors: Huiying Li, Emily Wenger, Shawn Shan, Ben Y. Zhao, Haitao Zheng

Abstract: As companies continue to invest heavily in larger, more accurate and more robust deep learning models, they are exploring approaches to monetize their models while protecting their intellectual property. Model licensing is promising, but requires a robust tool for owners to claim ownership of models, i.e. a watermark. Unfortunately, current designs have not been able to address piracy attacks, whe… ▽ More As companies continue to invest heavily in larger, more accurate and more robust deep learning models, they are exploring approaches to monetize their models while protecting their intellectual property. Model licensing is promising, but requires a robust tool for owners to claim ownership of models, i.e. a watermark. Unfortunately, current designs have not been able to address piracy attacks, where third parties falsely claim model ownership by embedding their own "pirate watermarks" into an already-watermarked model. We observe that resistance to piracy attacks is fundamentally at odds with the current use of incremental training to embed watermarks into models. In this work, we propose null embedding, a new way to build piracy-resistant watermarks into DNNs that can only take place at a model's initial training. A null embedding takes a bit string (watermark value) as input, and builds strong dependencies between the model's normal classification accuracy and the watermark. As a result, attackers cannot remove an embedded watermark via tuning or incremental training, and cannot add new pirate watermarks to already watermarked models. We empirically show that our proposed watermarks achieve piracy resistance and other watermark properties, over a wide range of tasks and models. Finally, we explore a number of adaptive counter-measures, and show our watermark remains robust against a variety of model modifications, including model fine-tuning, compression, and existing methods to detect/remove backdoors. Our watermarked models are also amenable to transfer learning without losing their watermark properties. △ Less

Submitted 2 December, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

Comments: 18 pages

arXiv:1909.09140 [pdf, other]

Meta-Neighborhoods

Authors: Siyuan Shan, Yang Li, Junier Oliva

Abstract: Making an adaptive prediction based on one's input is an important ability for general artificial intelligence. In this work, we step forward in this direction and propose a semi-parametric method, Meta-Neighborhoods, where predictions are made adaptively to the neighborhood of the input. We show that Meta-Neighborhoods is a generalization of $k$-nearest-neighbors. Due to the simpler manifold stru… ▽ More Making an adaptive prediction based on one's input is an important ability for general artificial intelligence. In this work, we step forward in this direction and propose a semi-parametric method, Meta-Neighborhoods, where predictions are made adaptively to the neighborhood of the input. We show that Meta-Neighborhoods is a generalization of $k$-nearest-neighbors. Due to the simpler manifold structure around a local neighborhood, Meta-Neighborhoods represent the predictive distribution $p(y \mid x)$ more accurately. To reduce memory and computation overhead, we propose induced neighborhoods that summarize the training data into a much smaller dictionary. A meta-learning based training mechanism is then exploited to jointly learn the induced neighborhoods and the model. Extensive studies demonstrate the superiority of our method. △ Less

Submitted 13 October, 2020; v1 submitted 18 September, 2019; originally announced September 2019.

Comments: To appear in NeurIPS 2020

arXiv:1904.08554 [pdf, ps, other]

doi 10.1145/3372297.3417231

Gotta Catch 'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks

Authors: Shawn Shan, Emily Wenger, Bolun Wang, Bo Li, Haitao Zheng, Ben Y. Zhao

Abstract: Deep neural networks (DNN) are known to be vulnerable to adversarial attacks. Numerous efforts either try to patch weaknesses in trained models, or try to make it difficult or costly to compute adversarial examples that exploit them. In our work, we explore a new "honeypot" approach to protect DNN models. We intentionally inject trapdoors, honeypot weaknesses in the classification manifold that at… ▽ More Deep neural networks (DNN) are known to be vulnerable to adversarial attacks. Numerous efforts either try to patch weaknesses in trained models, or try to make it difficult or costly to compute adversarial examples that exploit them. In our work, we explore a new "honeypot" approach to protect DNN models. We intentionally inject trapdoors, honeypot weaknesses in the classification manifold that attract attackers searching for adversarial examples. Attackers' optimization algorithms gravitate towards trapdoors, leading them to produce attacks similar to trapdoors in the feature space. Our defense then identifies attacks by comparing neuron activation signatures of inputs to those of trapdoors. In this paper, we introduce trapdoors and describe an implementation of a trapdoor-enabled defense. First, we analytically prove that trapdoors shape the computation of adversarial attacks so that attack inputs will have feature representations very similar to those of trapdoors. Second, we experimentally show that trapdoor-protected models can detect, with high accuracy, adversarial examples generated by state-of-the-art attacks (PGD, optimization-based CW, Elastic Net, BPDA), with negligible impact on normal classification. These results generalize across classification domains, including image, facial, and traffic-sign recognition. We also present significant results measuring trapdoors' robustness against customized adaptive attacks (countermeasures). △ Less

Submitted 28 September, 2020; v1 submitted 17 April, 2019; originally announced April 2019.

Journal ref: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

arXiv:1711.10678 [pdf, other]

AttGAN: Facial Attribute Editing by Only Changing What You Want

Authors: Zhenliang He, Wangmeng Zuo, Meina Kan, Shiguang Shan, Xilin Chen

Abstract: Facial attribute editing aims to manipulate single or multiple attributes of a face image, i.e., to generate a new face with desired attributes while preserving other details. Recently, generative adversarial net (GAN) and encoder-decoder architecture are usually incorporated to handle this task with promising results. Based on the encoder-decoder architecture, facial attribute editing is achieved… ▽ More Facial attribute editing aims to manipulate single or multiple attributes of a face image, i.e., to generate a new face with desired attributes while preserving other details. Recently, generative adversarial net (GAN) and encoder-decoder architecture are usually incorporated to handle this task with promising results. Based on the encoder-decoder architecture, facial attribute editing is achieved by decoding the latent representation of the given face conditioned on the desired attributes. Some existing methods attempt to establish an attribute-independent latent representation for further attribute editing. However, such attribute-independent constraint on the latent representation is excessive because it restricts the capacity of the latent representation and may result in information loss, leading to over-smooth and distorted generation. Instead of imposing constraints on the latent representation, in this work we apply an attribute classification constraint to the generated image to just guarantee the correct change of desired attributes, i.e., to "change what you want". Meanwhile, the reconstruction learning is introduced to preserve attribute-excluding details, in other words, to "only change what you want". Besides, the adversarial learning is employed for visually realistic editing. These three components cooperate with each other forming an effective framework for high quality facial attribute editing, referred as AttGAN. Furthermore, our method is also directly applicable for attribute intensity control and can be naturally extended for attribute style manipulation. Experiments on CelebA dataset show that our method outperforms the state-of-the-arts on realistic attribute editing with facial details well preserved. △ Less

Submitted 25 July, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

Comments: Submitted to IEEE Transactions on Image Processing, Code: https://github.com/LynnHo/AttGAN-Tensorflow

arXiv:1705.08516 [pdf, other]

An Open-Data Analysis of Heterogeneities in Lung Cancer Premature Mortality Rate and Associated Factors among Toronto Neighborhoods

Authors: Zhanwei Du, Jiming Liu, Songwei Shan

Abstract: In public health, various data are rigorously collected and published with open access. These data reflect the environmental and non-environmental characteristics of heterogeneous neighborhoods in cities. In the present study, we aimed to study the relations between these data and disease risks in heterogeneous neighborhoods. A flexible framework was developed to determine the key factors correlat… ▽ More In public health, various data are rigorously collected and published with open access. These data reflect the environmental and non-environmental characteristics of heterogeneous neighborhoods in cities. In the present study, we aimed to study the relations between these data and disease risks in heterogeneous neighborhoods. A flexible framework was developed to determine the key factors correlated with diseases and find the most relevant combination of factors to explain observations of diseases through nonlinear analyses. Taking Lung Cancer Premature Mortality Rate (LCPMR) in Toronto as an example, two environmental factors (green space, and industrial pollution) and two non-environmental factors (immigrants, and mental health visits) were identified in the relational analysis of all of the target neighborhoods. To determine the influence of the heterogeneity of the neighborhoods, they were clustered into three different classes. In the most severe class, two additional factors related to dwellings were determined to be involved, which increased the observation's deviance from 48.1% to 80%. The factors determined in this study may assist governments in improving public health policies. △ Less

Submitted 15 May, 2017; originally announced May 2017.

Showing 1–14 of 14 results for author: Shan, S