-
Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization
Authors:
Guangrui Yang,
Ming Li,
Han Feng,
Xiaosheng Zhuang
Abstract:
Graph convolutional networks (GCNs) have emerged as powerful models for graph learning tasks, exhibiting promising performance in various domains. While their empirical success is evident, there is a growing need to understand their essential ability from a theoretical perspective. Existing theoretical research has primarily focused on the analysis of single-layer GCNs, while a comprehensive theor…
▽ More
Graph convolutional networks (GCNs) have emerged as powerful models for graph learning tasks, exhibiting promising performance in various domains. While their empirical success is evident, there is a growing need to understand their essential ability from a theoretical perspective. Existing theoretical research has primarily focused on the analysis of single-layer GCNs, while a comprehensive theoretical exploration of the stability and generalization of deep GCNs remains limited. In this paper, we bridge this gap by delving into the stability and generalization properties of deep GCNs, aiming to provide valuable insights by characterizing rigorously the associated upper bounds. Our theoretical results reveal that the stability and generalization of deep GCNs are influenced by certain key factors, such as the maximum absolute eigenvalue of the graph filter operators and the depth of the network. Our theoretical studies contribute to a deeper understanding of the stability and generalization properties of deep GCNs, potentially paving the way for developing more reliable and well-performing models.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Exploring Scaling Laws for Local SGD in Large Language Model Training
Authors:
Qiaozhi He,
Xiaomin Zhuang,
Zhihua Wu
Abstract:
This paper investigates scaling laws for local SGD in LLM training, a distributed optimization algorithm that facilitates training on loosely connected devices. Through extensive experiments, we show that local SGD achieves competitive results compared to conventional methods, given equivalent model parameters, datasets, and computational resources. Furthermore, we explore the application of local…
▽ More
This paper investigates scaling laws for local SGD in LLM training, a distributed optimization algorithm that facilitates training on loosely connected devices. Through extensive experiments, we show that local SGD achieves competitive results compared to conventional methods, given equivalent model parameters, datasets, and computational resources. Furthermore, we explore the application of local SGD in various practical scenarios, including multi-cluster setups and edge computing environments. Our findings elucidate the necessary conditions for effective multi-cluster LLM training and examine the potential and limitations of leveraging edge computing resources in the LLM training process. This demonstrates its viability as an alternative to single large-cluster training.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Statistics-Informed Parameterized Quantum Circuit via Maximum Entropy Principle for Data Science and Finance
Authors:
Xi-Ning Zhuang,
Zhao-Yun Chen,
Cheng Xue,
Xiao-Fan Xu,
Chao Wang,
Huan-Yu Liu,
Tai-Ping Sun,
Yun-Jie Wang,
Yu-Chun Wu,
Guo-Ping Guo
Abstract:
Quantum machine learning has demonstrated significant potential in solving practical problems, particularly in statistics-focused areas such as data science and finance. However, challenges remain in preparing and learning statistical models on a quantum processor due to issues with trainability and interpretability. In this letter, we utilize the maximum entropy principle to design a statistics-i…
▽ More
Quantum machine learning has demonstrated significant potential in solving practical problems, particularly in statistics-focused areas such as data science and finance. However, challenges remain in preparing and learning statistical models on a quantum processor due to issues with trainability and interpretability. In this letter, we utilize the maximum entropy principle to design a statistics-informed parameterized quantum circuit (SI-PQC) for efficiently preparing and training of quantum computational statistical models, including arbitrary distributions and their weighted mixtures. The SI-PQC features a static structure with trainable parameters, enabling in-depth optimized circuit compilation, exponential reductions in resource and time consumption, and improved trainability and interpretability for learning quantum states and classical model parameters simultaneously. As an efficient subroutine for preparing and learning in various quantum algorithms, the SI-PQC addresses the input bottleneck and facilitates the injection of prior knowledge.
△ Less
Submitted 18 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Quantum Encoding and Analysis on Continuous Time Stochastic Process with Financial Applications
Authors:
Xi-Ning Zhuang,
Zhao-Yun Chen,
Cheng Xue,
Yu-Chun Wu,
Guo-Ping Guo
Abstract:
The continuous time stochastic process is a mainstream mathematical instrument modeling the random world with a wide range of applications involving finance, statistics, physics, and time series analysis, while the simulation and analysis of the continuous time stochastic process is a challenging problem for classical computers. In this work, a general framework is established to prepare the path…
▽ More
The continuous time stochastic process is a mainstream mathematical instrument modeling the random world with a wide range of applications involving finance, statistics, physics, and time series analysis, while the simulation and analysis of the continuous time stochastic process is a challenging problem for classical computers. In this work, a general framework is established to prepare the path of a continuous time stochastic process in a quantum computer efficiently. The storage and computation resource is exponentially reduced on the key parameter of holding time, as the qubit number and the circuit depth are both optimized via our compressed state preparation method. The desired information, including the path-dependent and history-sensitive information that is essential for financial problems, can be extracted efficiently from the compressed sampling path, and admits a further quadratic speed-up. Moreover, this extraction method is more sensitive to those discontinuous jumps capturing extreme market events. Two applications of option pricing in Merton jump diffusion model and ruin probability computing in the collective risk model are given.
△ Less
Submitted 27 September, 2023; v1 submitted 3 August, 2022;
originally announced August 2022.
-
A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging
Authors:
Zhaohan Xiong,
Qing Xia,
Zhiqiang Hu,
Ning Huang,
Cheng Bian,
Yefeng Zheng,
Sulaiman Vesal,
Nishant Ravikumar,
Andreas Maier,
Xin Yang,
Pheng-Ann Heng,
Dong Ni,
Caizi Li,
Qianqian Tong,
Weixin Si,
Elodie Puybareau,
Younes Khoudli,
Thierry Geraud,
Chen Chen,
Wenjia Bai,
Daniel Rueckert,
Lingchao Xu,
Xiahai Zhuang,
Xinzhe Luo,
Shuman Jia
, et al. (19 additional authors not shown)
Abstract:
Segmentation of cardiac images, particularly late gadolinium-enhanced magnetic resonance imaging (LGE-MRI) widely used for visualizing diseased cardiac structures, is a crucial first step for clinical diagnosis and treatment. However, direct segmentation of LGE-MRIs is challenging due to its attenuated contrast. Since most clinical studies have relied on manual and labor-intensive approaches, auto…
▽ More
Segmentation of cardiac images, particularly late gadolinium-enhanced magnetic resonance imaging (LGE-MRI) widely used for visualizing diseased cardiac structures, is a crucial first step for clinical diagnosis and treatment. However, direct segmentation of LGE-MRIs is challenging due to its attenuated contrast. Since most clinical studies have relied on manual and labor-intensive approaches, automatic methods are of high interest, particularly optimized machine learning approaches. To address this, we organized the "2018 Left Atrium Segmentation Challenge" using 154 3D LGE-MRIs, currently the world's largest cardiac LGE-MRI dataset, and associated labels of the left atrium segmented by three medical experts, ultimately attracting the participation of 27 international teams. In this paper, extensive analysis of the submitted algorithms using technical and biological metrics was performed by undergoing subgroup analysis and conducting hyper-parameter analysis, offering an overall picture of the major design choices of convolutional neural networks (CNNs) and practical considerations for achieving state-of-the-art left atrium segmentation. Results show the top method achieved a dice score of 93.2% and a mean surface to a surface distance of 0.7 mm, significantly outperforming prior state-of-the-art. Particularly, our analysis demonstrated that double, sequentially used CNNs, in which a first CNN is used for automatic region-of-interest localization and a subsequent CNN is used for refined regional segmentation, achieved far superior results than traditional methods and pipelines containing single CNNs. This large-scale benchmarking study makes a significant step towards much-improved segmentation methods for cardiac LGE-MRIs, and will serve as an important benchmark for evaluating and comparing the future works in the field.
△ Less
Submitted 7 May, 2020; v1 submitted 26 April, 2020;
originally announced April 2020.
-
SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition
Authors:
Zhen Huang,
Tim Ng,
Leo Liu,
Henry Mason,
Xiaodan Zhuang,
Daben Liu
Abstract:
Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self- Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and r…
▽ More
Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self- Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet- 50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.
△ Less
Submitted 23 March, 2020; v1 submitted 4 October, 2019;
originally announced October 2019.
-
Haar Graph Pooling
Authors:
Yu Guang Wang,
Ming Li,
Zheng Ma,
Guido Montufar,
Xiaosheng Zhuang,
Yanan Fan
Abstract:
Deep Graph Neural Networks (GNNs) are useful models for graph classification and graph-based regression tasks. In these tasks, graph pooling is a critical ingredient by which GNNs adapt to input graphs of varying size and structure. We propose a new graph pooling operation based on compressive Haar transforms -- HaarPooling. HaarPooling implements a cascade of pooling operations; it is computed by…
▽ More
Deep Graph Neural Networks (GNNs) are useful models for graph classification and graph-based regression tasks. In these tasks, graph pooling is a critical ingredient by which GNNs adapt to input graphs of varying size and structure. We propose a new graph pooling operation based on compressive Haar transforms -- HaarPooling. HaarPooling implements a cascade of pooling operations; it is computed by following a sequence of clusterings of the input graph. A HaarPooling layer transforms a given input graph to an output graph with a smaller node number and the same feature dimension; the compressive Haar transform filters out fine detail information in the Haar wavelet domain. In this way, all the HaarPooling layers together synthesize the features of any given input graph into a feature vector of uniform size. Such transforms provide a sparse characterization of the data and preserve the structure information of the input graph. GNNs implemented with standard graph convolution layers and HaarPooling layers achieve state of the art performance on diverse graph classification and regression problems.
△ Less
Submitted 24 June, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
An Energy Approach to the Solution of Partial Differential Equations in Computational Mechanics via Machine Learning: Concepts, Implementation and Applications
Authors:
Esteban Samaniego,
Cosmin Anitescu,
Somdatta Goswami,
Vien Minh Nguyen-Thanh,
Hongwei Guo,
Khader Hamdia,
Timon Rabczuk,
Xiaoying Zhuang
Abstract:
Partial Differential Equations (PDE) are fundamental to model different phenomena in science and engineering mathematically. Solving them is a crucial step towards a precise knowledge of the behaviour of natural and engineered systems. In general, in order to solve PDEs that represent real systems to an acceptable degree, analytical methods are usually not enough. One has to resort to discretizati…
▽ More
Partial Differential Equations (PDE) are fundamental to model different phenomena in science and engineering mathematically. Solving them is a crucial step towards a precise knowledge of the behaviour of natural and engineered systems. In general, in order to solve PDEs that represent real systems to an acceptable degree, analytical methods are usually not enough. One has to resort to discretization methods. For engineering problems, probably the best known option is the finite element method (FEM). However, powerful alternatives such as mesh-free methods and Isogeometric Analysis (IGA) are also available. The fundamental idea is to approximate the solution of the PDE by means of functions specifically built to have some desirable properties. In this contribution, we explore Deep Neural Networks (DNNs) as an option for approximation. They have shown impressive results in areas such as visual recognition. DNNs are regarded here as function approximation machines. There is great flexibility to define their structure and important advances in the architecture and the efficiency of the algorithms to implement them make DNNs a very interesting alternative to approximate the solution of a PDE. We concentrate in applications that have an interest for Computational Mechanics. Most contributions that have decided to explore this possibility have adopted a collocation strategy. In this contribution, we concentrate in mechanical problems and analyze the energetic format of the PDE. The energy of a mechanical system seems to be the natural loss function for a machine learning method to approach a mechanical problem. As proofs of concept, we deal with several problems and explore the capabilities of the method for applications in engineering.
△ Less
Submitted 2 September, 2019; v1 submitted 27 August, 2019;
originally announced August 2019.
-
Fast Haar Transforms for Graph Neural Networks
Authors:
Ming Li,
Zheng Ma,
Yu Guang Wang,
Xiaosheng Zhuang
Abstract:
Graph Neural Networks (GNNs) have become a topic of intense research recently due to their powerful capability in high-dimensional classification and regression tasks for graph-structured data. However, as GNNs typically define the graph convolution by the orthonormal basis for the graph Laplacian, they suffer from high computational cost when the graph size is large. This paper introduces Haar ba…
▽ More
Graph Neural Networks (GNNs) have become a topic of intense research recently due to their powerful capability in high-dimensional classification and regression tasks for graph-structured data. However, as GNNs typically define the graph convolution by the orthonormal basis for the graph Laplacian, they suffer from high computational cost when the graph size is large. This paper introduces Haar basis which is a sparse and localized orthonormal system for a coarse-grained chain on graph. The graph convolution under Haar basis, called Haar convolution, can be defined accordingly for GNNs. The sparsity and locality of the Haar basis allow Fast Haar Transforms (FHTs) on graph, by which a fast evaluation of Haar convolution between graph data and filters can be achieved. We conduct experiments on GNNs equipped with Haar convolution, which demonstrates state-of-the-art results on graph-based regression and node classification tasks.
△ Less
Submitted 30 October, 2019; v1 submitted 10 July, 2019;
originally announced July 2019.
-
A Fully-Automatic Framework for Parkinson's Disease Diagnosis by Multi-Modality Images
Authors:
Jiahang Xu,
Fangyang Jiao,
Yechong Huang,
Xinzhe Luo,
Qian Xu,
Ling Li,
Xueling Liu,
Chuantao Zuo,
Ping Wu,
Xiahai Zhuang
Abstract:
Background: Parkinson's disease (PD) is a prevalent long-term neurodegenerative disease. Though the diagnostic criteria of PD are relatively well defined, the current medical imaging diagnostic procedures are expertise-demanding, and thus call for a higher-integrated AI-based diagnostic algorithm. Methods: In this paper, we proposed an automatic, end-to-end, multi-modality diagnosis framework, inc…
▽ More
Background: Parkinson's disease (PD) is a prevalent long-term neurodegenerative disease. Though the diagnostic criteria of PD are relatively well defined, the current medical imaging diagnostic procedures are expertise-demanding, and thus call for a higher-integrated AI-based diagnostic algorithm. Methods: In this paper, we proposed an automatic, end-to-end, multi-modality diagnosis framework, including segmentation, registration, feature generation and machine learning, to process the information of the striatum for the diagnosis of PD. Multiple modalities, including T1- weighted MRI and 11C-CFT PET, were used in the proposed framework. The reliability of this framework was then validated on a dataset from the PET center of Huashan Hospital, as the dataset contains paired T1-MRI and CFT-PET images of 18 Normal (NL) subjects and 49 PD subjects. Results: We obtained an accuracy of 100% for the PD/NL classification task, besides, we conducted several comparative experiments to validate the diagnosis ability of our framework. Conclusion: Through experiment we illustrate that (1) automatic segmentation has the same classification effect as the manual segmentation, (2) the multi-modality images generates a better prediction than single modality images, and (3) volume feature is shown to be irrelevant to PD diagnosis.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.