Search | arXiv e-print repository

Interpretable Tensor Fusion

Authors: Saurabh Varshneya, Antoine Ledent, Philipp Liznerski, Andriy Balinskyy, Purvanshi Mehta, Waleed Mustafa, Marius Kloft

Abstract: Conventional machine learning methods are predominantly designed to predict outcomes based on a single data type. However, practical applications may encompass data of diverse types, such as text, images, and audio. We introduce interpretable tensor fusion (InTense), a multimodal learning method for training neural networks to simultaneously learn multimodal data representations and their interpre… ▽ More Conventional machine learning methods are predominantly designed to predict outcomes based on a single data type. However, practical applications may encompass data of diverse types, such as text, images, and audio. We introduce interpretable tensor fusion (InTense), a multimodal learning method for training neural networks to simultaneously learn multimodal data representations and their interpretable fusion. InTense can separately capture both linear combinations and multiplicative interactions of diverse data types, thereby disentangling higher-order interactions from the individual effects of each modality. InTense provides interpretability out of the box by assigning relevance scores to modalities and their associations. The approach is theoretically grounded and yields meaningful relevance scores on multiple synthetic and real-world datasets. Experiments on six real-world datasets show that InTense outperforms existing state-of-the-art multimodal interpretable approaches in terms of accuracy and interpretability. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2106.00115 [pdf, ps, other]

Fine-grained Generalization Analysis of Structured Output Prediction

Authors: Waleed Mustafa, Yunwen Lei, Antoine Ledent, Marius Kloft

Abstract: In machine learning we often encounter structured output prediction problems (SOPPs), i.e. problems where the output space admits a rich internal structure. Application domains where SOPPs naturally occur include natural language processing, speech recognition, and computer vision. Typical SOPPs have an extremely large label set, which grows exponentially as a function of the size of the output. E… ▽ More In machine learning we often encounter structured output prediction problems (SOPPs), i.e. problems where the output space admits a rich internal structure. Application domains where SOPPs naturally occur include natural language processing, speech recognition, and computer vision. Typical SOPPs have an extremely large label set, which grows exponentially as a function of the size of the output. Existing generalization analysis implies generalization bounds with at least a square-root dependency on the cardinality $d$ of the label set, which can be vacuous in practice. In this paper, we significantly improve the state of the art by developing novel high-probability bounds with a logarithmic dependency on $d$. Moreover, we leverage the lens of algorithmic stability to develop generalization bounds in expectation without any dependency on $d$. Our results therefore build a solid theoretical foundation for learning in large-scale SOPPs. Furthermore, we extend our results to learning with weakly dependent data. △ Less

Submitted 31 May, 2021; originally announced June 2021.

Comments: To appearn in IJCAI 2021

arXiv:2011.14842 [pdf, other]

Sparse-View Spectral CT Reconstruction Using Deep Learning

Authors: Wail Mustafa, Christian Kehl, Ulrik Lund Olsen, Søren Kimmer Schou Gregersen, David Malmgren-Hansen, Jan Kehres, Anders Bjorholm Dahl

Abstract: Spectral computed tomography (CT) is an emerging technology capable of providing high chemical specificity, which is crucial for many applications such as detecting threats in luggage. This type of application requires both fast and high-quality image reconstruction and is often based on sparse-view (few) projections. The conventional filtered back projection (FBP) method is fast but it produces l… ▽ More Spectral computed tomography (CT) is an emerging technology capable of providing high chemical specificity, which is crucial for many applications such as detecting threats in luggage. This type of application requires both fast and high-quality image reconstruction and is often based on sparse-view (few) projections. The conventional filtered back projection (FBP) method is fast but it produces low-quality images dominated by noise and artifacts in sparse-view CT. Iterative methods with, e.g., total variation regularizers can circumvent that but they are computationally expensive, as the computational load proportionally increases with the number of spectral channels. Instead, we propose an approach for fast reconstruction of sparse-view spectral CT data using a U-Net convolutional neural network architecture with multi-channel input and output. The network is trained to output high-quality CT images from FBP input image reconstructions. Our method is fast at run-time and because the internal convolutions are shared between the channels, the computational load increases only at the first and last layers, making it an efficient approach to process spectral data with a large number of channels. We have validated our approach using real CT scans. Our results show qualitatively and quantitatively that our approach outperforms the state-of-the-art iterative methods. Furthermore, the results indicate that the network can exploit the coupling of the channels to enhance the overall quality and robustness. △ Less

Submitted 26 March, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

Comments: 13 pages, 9 figures, submitted to The IEEE Transactions on Computational Imaging

arXiv:2009.06571 [pdf, other]

Input Hessian Regularization of Neural Networks

Authors: Waleed Mustafa, Robert A. Vandermeulen, Marius Kloft

Abstract: Regularizing the input gradient has shown to be effective in promoting the robustness of neural networks. The regularization of the input's Hessian is therefore a natural next step. A key challenge here is the computational complexity. Computing the Hessian of inputs is computationally infeasible. In this paper we propose an efficient algorithm to train deep neural networks with Hessian operator-n… ▽ More Regularizing the input gradient has shown to be effective in promoting the robustness of neural networks. The regularization of the input's Hessian is therefore a natural next step. A key challenge here is the computational complexity. Computing the Hessian of inputs is computationally infeasible. In this paper we propose an efficient algorithm to train deep neural networks with Hessian operator-norm regularization. We analyze the approach theoretically and prove that the Hessian operator norm relates to the ability of a neural network to withstand an adversarial attack. We give a preliminary experimental evaluation on the MNIST and FMNIST datasets, which demonstrates that the new regularizer can, indeed, be feasible and, furthermore, that it increases the robustness of neural networks over input gradient regularization. △ Less

Submitted 14 September, 2020; originally announced September 2020.

Comments: Workshop on "Beyond first-order methods in ML systems" at the 37th International Conference on Machine Learning, Vienna, Austria, 2020

arXiv:1908.10661 [pdf, other]

Method and System for Image Analysis to Detect Cancer

Authors: Waleed A. Yousef, Ahmed A. Abouelkahire, Deyaaeldeen Almahallawi, Omar S. Marzouk, Sameh K. Mohamed, Waleed A. Mustafa, Omar M. Osama, Ali A. Saleh, Naglaa M. Abdelrazek

Abstract: Breast cancer is the most common cancer and is the leading cause of cancer death among women worldwide. Detection of breast cancer, while it is still small and confined to the breast, provides the best chance of effective treatment. Computer Aided Detection (CAD) systems that detect cancer from mammograms will help in reducing the human errors that lead to missing breast carcinoma. Literature is r… ▽ More Breast cancer is the most common cancer and is the leading cause of cancer death among women worldwide. Detection of breast cancer, while it is still small and confined to the breast, provides the best chance of effective treatment. Computer Aided Detection (CAD) systems that detect cancer from mammograms will help in reducing the human errors that lead to missing breast carcinoma. Literature is rich of scientific papers for methods of CAD design, yet with no complete system architecture to deploy those methods. On the other hand, commercial CADs are developed and deployed only to vendors' mammography machines with no availability to public access. This paper presents a complete CAD; it is complete since it combines, on a hand, the rigor of algorithm design and assessment (method), and, on the other hand, the implementation and deployment of a system architecture for public accessibility (system). (1) We develop a novel algorithm for image enhancement so that mammograms acquired from any digital mammography machine look qualitatively of the same clarity to radiologists' inspection; and is quantitatively standardized for the detection algorithms. (2) We develop novel algorithms for masses and microcalcifications detection with accuracy superior to both literature results and the majority of approved commercial systems. (3) We design, implement, and deploy a system architecture that is computationally effective to allow for deploying these algorithms to cloud for public access. △ Less

Submitted 26 August, 2019; originally announced August 2019.

arXiv:1906.09669 [pdf, other]

Nested Cavity Classifier: performance and remedy

Authors: Waleed A. Mustafa, Waleed A. Yousef

Abstract: Nested Cavity Classifier (NCC) is a classification rule that pursues partitioning the feature space, in parallel coordinates, into convex hulls to build decision regions. It is claimed in some literatures that this geometric-based classifier is superior to many others, particularly in higher dimensions. First, we give an example on how NCC can be inefficient, then motivate a remedy by combining th… ▽ More Nested Cavity Classifier (NCC) is a classification rule that pursues partitioning the feature space, in parallel coordinates, into convex hulls to build decision regions. It is claimed in some literatures that this geometric-based classifier is superior to many others, particularly in higher dimensions. First, we give an example on how NCC can be inefficient, then motivate a remedy by combining the NCC with the Linear Discriminant Analysis (LDA) classifier. We coin the term Nested Cavity Discriminant Analysis (NCDA) for the resulting classifier. Second, a simulation study is conducted to compare both, NCC and NCDA to another two basic classifiers, Linear and Quadratic Discriminant Analysis. NCC alone proves to be inferior to others, while NCDA always outperforms NCC and competes with LDA and QDA. △ Less

Submitted 14 August, 2019; v1 submitted 23 June, 2019; originally announced June 2019.

Comments: This manuscript was composed in 2009 as part of a research pursued that time

arXiv:1905.12430 [pdf, other]

Norm-based generalisation bounds for multi-class convolutional neural networks

Authors: Antoine Ledent, Waleed Mustafa, Yunwen Lei, Marius Kloft

Abstract: We show generalisation error bounds for deep learning with two main improvements over the state of the art. (1) Our bounds have no explicit dependence on the number of classes except for logarithmic factors. This holds even when formulating the bounds in terms of the $L^2$-norm of the weight matrices, where previous bounds exhibit at least a square-root dependence on the number of classes. (2) We… ▽ More We show generalisation error bounds for deep learning with two main improvements over the state of the art. (1) Our bounds have no explicit dependence on the number of classes except for logarithmic factors. This holds even when formulating the bounds in terms of the $L^2$-norm of the weight matrices, where previous bounds exhibit at least a square-root dependence on the number of classes. (2) We adapt the classic Rademacher analysis of DNNs to incorporate weight sharing -- a task of fundamental theoretical importance which was previously attempted only under very restrictive assumptions. In our results, each convolutional filter contributes only once to the bound, regardless of how many times it is applied. Further improvements exploiting pooling and sparse connections are provided. The presented bounds scale as the norms of the parameter matrices, rather than the number of parameters. In particular, contrary to bounds based on parameter counting, they are asymptotically tight (up to log factors) when the weights approach initialisation, making them suitable as a basic ingredient in bounds sensitive to the optimisation procedure. We also show how to adapt the recent technique of loss function augmentation to our situation to replace spectral norms by empirical analogues whilst maintaining the advantages of our approach. △ Less

Submitted 21 February, 2021; v1 submitted 29 May, 2019; originally announced May 2019.

arXiv:1810.11823 [pdf, other]

Multi-Spectral Imaging via Computed Tomography (MUSIC) - Comparing Unsupervised Spectral Segmentations for Material Differentiation

Authors: Christian Kehl, Wail Mustafa, Jan Kehres, Anders Bjorholm Dahl, Ulrik Lund Olsen

Abstract: Multi-spectral computed tomography is an emerging technology for the non-destructive identification of object materials and the study of their physical properties. Applications of this technology can be found in various scientific and industrial contexts, such as luggage scanning at airports. Material distinction and its identification is challenging, even with spectral x-ray information, due to a… ▽ More Multi-spectral computed tomography is an emerging technology for the non-destructive identification of object materials and the study of their physical properties. Applications of this technology can be found in various scientific and industrial contexts, such as luggage scanning at airports. Material distinction and its identification is challenging, even with spectral x-ray information, due to acquisition noise, tomographic reconstruction artefacts and scanning setup application constraints. We present MUSIC - and open access multi-spectral CT dataset in 2D and 3D - to promote further research in the area of material identification. We demonstrate the value of this dataset on the image analysis challenge of object segmentation purely based on the spectral response of its composing materials. In this context, we compare the segmentation accuracy of fast adaptive mean shift (FAMS) and unconstrained graph cuts on both datasets. We further discuss the impact of reconstruction artefacts and segmentation controls on the achievable results. Dataset, related software packages and further documentation are made available to the imaging community in an open-access manner to promote further data-driven research on the subject △ Less

Submitted 28 October, 2018; originally announced October 2018.

Comments: 21 pages, 24 figures (in articles), includes 2 appendices with 8 additional figures

Showing 1–8 of 8 results for author: Mustafa, W