-
A Comparative Study of U-Net Architectures for Change Detection in Satellite Images
Authors:
Yaxita Amin,
Naimisha S Trivedi,
Rashmi Bhattad
Abstract:
Remote sensing change detection is essential for monitoring the everchanging landscapes of the Earth. The U-Net architecture has gained popularity for its capability to capture spatial information and perform pixel-wise classification. However, their application in the Remote sensing field remains largely unexplored. Therefore, this paper fill the gap by conducting a comprehensive analysis of 34 p…
▽ More
Remote sensing change detection is essential for monitoring the everchanging landscapes of the Earth. The U-Net architecture has gained popularity for its capability to capture spatial information and perform pixel-wise classification. However, their application in the Remote sensing field remains largely unexplored. Therefore, this paper fill the gap by conducting a comprehensive analysis of 34 papers. This study conducts a comparison and analysis of 18 different U-Net variations, assessing their potential for detecting changes in remote sensing. We evaluate both benefits along with drawbacks of each variation within the framework of this particular application. We emphasize variations that are explicitly built for change detection, such as Siamese Swin-U-Net, which utilizes a Siamese architecture. The analysis highlights the significance of aspects such as managing data from different time periods and collecting relationships over a long distance to enhance the precision of change detection. This study provides valuable insights for researchers and practitioners that choose U-Net versions for remote sensing change detection tasks.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Watermarking Degrades Alignment in Language Models: Analysis and Mitigation
Authors:
Apurv Verma,
NhatHai Phan,
Shubhendu Trivedi
Abstract:
Watermarking techniques for large language models (LLMs) can significantly impact output quality, yet their effects on truthfulness, safety, and helpfulness remain critically underexamined. This paper presents a systematic analysis of how two popular watermarking approaches-Gumbel and KGW-affect these core alignment properties across four aligned LLMs. Our experiments reveal two distinct degradati…
▽ More
Watermarking techniques for large language models (LLMs) can significantly impact output quality, yet their effects on truthfulness, safety, and helpfulness remain critically underexamined. This paper presents a systematic analysis of how two popular watermarking approaches-Gumbel and KGW-affect these core alignment properties across four aligned LLMs. Our experiments reveal two distinct degradation patterns: guard attenuation, where enhanced helpfulness undermines model safety, and guard amplification, where excessive caution reduces model helpfulness. These patterns emerge from watermark-induced shifts in token distribution, surfacing the fundamental tension that exists between alignment objectives.
To mitigate these degradations, we propose Alignment Resampling (AR), an inference-time sampling method that uses an external reward model to restore alignment. We establish a theoretical lower bound on the improvement in expected reward score as the sample size is increased and empirically demonstrate that sampling just 2-4 watermarked generations effectively recovers or surpasses baseline (unwatermarked) alignment scores. To overcome the limited response diversity of standard Gumbel watermarking, our modified implementation sacrifices strict distortion-freeness while maintaining robust detectability, ensuring compatibility with AR. Experimental results confirm that AR successfully recovers baseline alignment in both watermarking approaches, while maintaining strong watermark detectability. This work reveals the critical balance between watermark strength and model alignment, providing a simple inference-time solution to responsibly deploy watermarked LLMs in practice.
△ Less
Submitted 10 July, 2025; v1 submitted 4 June, 2025;
originally announced June 2025.
-
On the Need to Align Intent and Implementation in Uncertainty Quantification for Machine Learning
Authors:
Shubhendu Trivedi,
Brian D. Nord
Abstract:
Quantifying uncertainties for machine learning (ML) models is a foundational challenge in modern data analysis. This challenge is compounded by at least two key aspects of the field: (a) inconsistent terminology surrounding uncertainty and estimation across disciplines, and (b) the varying technical requirements for establishing trustworthy uncertainties in diverse problem contexts. In this positi…
▽ More
Quantifying uncertainties for machine learning (ML) models is a foundational challenge in modern data analysis. This challenge is compounded by at least two key aspects of the field: (a) inconsistent terminology surrounding uncertainty and estimation across disciplines, and (b) the varying technical requirements for establishing trustworthy uncertainties in diverse problem contexts. In this position paper, we aim to clarify the depth of these challenges by identifying these inconsistencies and articulating how different contexts impose distinct epistemic demands. We examine the current landscape of estimation targets (e.g., prediction, inference, simulation-based inference), uncertainty constructs (e.g., frequentist, Bayesian, fiducial), and the approaches used to map between them. Drawing on the literature, we highlight and explain examples of problematic mappings. To help address these issues, we advocate for standards that promote alignment between the \textit{intent} and \textit{implementation} of uncertainty quantification (UQ) approaches. We discuss several axes of trustworthiness that are necessary (if not sufficient) for reliable UQ in ML models, and show how these axes can inform the design and evaluation of uncertainty-aware ML systems. Our practical recommendations focus on scientific ML, offering illustrative cases and use scenarios, particularly in the context of simulation-based inference (SBI).
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
On Universality Classes of Equivariant Networks
Authors:
Marco Pacini,
Gabriele Santin,
Bruno Lepri,
Shubhendu Trivedi
Abstract:
Equivariant neural networks provide a principled framework for incorporating symmetry into learning architectures and have been extensively analyzed through the lens of their separation power, that is, the ability to distinguish inputs modulo symmetry. This notion plays a central role in settings such as graph learning, where it is often formalized via the Weisfeiler-Leman hierarchy. In contrast,…
▽ More
Equivariant neural networks provide a principled framework for incorporating symmetry into learning architectures and have been extensively analyzed through the lens of their separation power, that is, the ability to distinguish inputs modulo symmetry. This notion plays a central role in settings such as graph learning, where it is often formalized via the Weisfeiler-Leman hierarchy. In contrast, the universality of equivariant models-their capacity to approximate target functions-remains comparatively underexplored. In this work, we investigate the approximation power of equivariant neural networks beyond separation constraints. We show that separation power does not fully capture expressivity: models with identical separation power may differ in their approximation ability. To demonstrate this, we characterize the universality classes of shallow invariant networks, providing a general framework for understanding which functions these architectures can approximate. Since equivariant models reduce to invariant ones under projection, this analysis yields sufficient conditions under which shallow equivariant networks fail to be universal. Conversely, we identify settings where shallow models do achieve separation-constrained universality. These positive results, however, depend critically on structural properties of the symmetry group, such as the existence of adequate normal subgroups, which may not hold in important cases like permutation symmetry.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Reliability and Availability in Virtualized Networks: A Survey on Standards, Modeling Approaches, and Research Challenges
Authors:
Mario Di Mauro,
Walter Cerroni,
Fabio Postiglione,
Massimo Tornatore,
Kishor S. Trivedi
Abstract:
The rise of Network Function Virtualization (NFV) has transformed network infrastructures by replacing fixed hardware with software-based Virtualized Network Functions (VNFs), enabling greater agility, scalability, and cost efficiency. Virtualization increases the distribution of system components and introduces stronger interdependencies. As a result, failures become harder to predict, monitor, a…
▽ More
The rise of Network Function Virtualization (NFV) has transformed network infrastructures by replacing fixed hardware with software-based Virtualized Network Functions (VNFs), enabling greater agility, scalability, and cost efficiency. Virtualization increases the distribution of system components and introduces stronger interdependencies. As a result, failures become harder to predict, monitor, and manage compared to traditional monolithic networks. Reliability, i.e. the ability of a system to perform regularly under specified conditions, and availability, i.e. the probability of a system of being ready to use, are critical requirements that must be guaranteed to maintain seamless network operations. Accurate modeling of these aspects is crucial for designing robust, fault-tolerant virtualized systems that can withstand service disruptions. This survey focuses on reliability and availability attributes of virtualized networks from a modeling perspective. After introducing the NFV architecture and basic definitions, we discuss the standardization efforts of the European Telecommunications Standards Institute (ETSI), which provides guidelines and recommendations through a series of standard documents focusing on reliability and availability. Next, we explore several formalisms proposed in the literature for characterizing reliability and availability, with a focus on their application to modeling the failure and repair behavior of virtualized networks through practical examples. Then, we overview numerous references demonstrating how different authors adopt specific methods to characterize reliability and/or availability of virtualized systems. Moreover, we present a selection of the most valuable software tools that support modeling of reliable virtualized networks. Finally, we discuss a set of open problems with the aim to encourage readers to explore further advances in this field.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels
Authors:
Xiaoou Liu,
Zhen Lin,
Longchao Da,
Chacha Chen,
Shubhendu Trivedi,
Hua Wei
Abstract:
Large Language Models (LLMs) require robust confidence estimation, particularly in critical domains like healthcare and law where unreliable outputs can lead to significant consequences. Despite much recent work in confidence estimation, current evaluation frameworks rely on correctness functions -- various heuristics that are often noisy, expensive, and possibly introduce systematic biases. These…
▽ More
Large Language Models (LLMs) require robust confidence estimation, particularly in critical domains like healthcare and law where unreliable outputs can lead to significant consequences. Despite much recent work in confidence estimation, current evaluation frameworks rely on correctness functions -- various heuristics that are often noisy, expensive, and possibly introduce systematic biases. These methodological weaknesses tend to distort evaluation metrics and thus the comparative ranking of confidence measures. We introduce MCQA-Eval, an evaluation framework for assessing confidence measures in Natural Language Generation (NLG) that eliminates dependence on an explicit correctness function by leveraging gold-standard correctness labels from multiple-choice datasets. MCQA-Eval enables systematic comparison of both internal state-based white-box (e.g. logit-based) and consistency-based black-box confidence measures, providing a unified evaluation methodology across different approaches. Through extensive experiments on multiple LLMs and widely used QA datasets, we report that MCQA-Eval provides efficient and more reliable assessments of confidence estimation methods than existing approaches.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
LoRE: Logit-Ranked Retriever Ensemble for Enhancing Open-Domain Question Answering
Authors:
Saikrishna Sanniboina,
Shiv Trivedi,
Sreenidhi Vijayaraghavan
Abstract:
Retrieval-based question answering systems often suffer from positional bias, leading to suboptimal answer generation. We propose LoRE (Logit-Ranked Retriever Ensemble), a novel approach that improves answer accuracy and relevance by mitigating positional bias. LoRE employs an ensemble of diverse retrievers, such as BM25 and sentence transformers with FAISS indexing. A key innovation is a logit-ba…
▽ More
Retrieval-based question answering systems often suffer from positional bias, leading to suboptimal answer generation. We propose LoRE (Logit-Ranked Retriever Ensemble), a novel approach that improves answer accuracy and relevance by mitigating positional bias. LoRE employs an ensemble of diverse retrievers, such as BM25 and sentence transformers with FAISS indexing. A key innovation is a logit-based answer ranking algorithm that combines the logit scores from a large language model (LLM), with the retrieval ranks of the passages. Experimental results on NarrativeQA, SQuAD demonstrate that LoRE significantly outperforms existing retrieval-based methods in terms of exact match and F1 scores. On SQuAD, LoRE achieves 14.5\%, 22.83\%, and 14.95\% improvements over the baselines for ROUGE-L, EM, and F1, respectively. Qualitatively, LoRE generates more relevant and accurate answers, especially for complex queries.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Symmetry-Based Structured Matrices for Efficient Approximately Equivariant Networks
Authors:
Ashwin Samudre,
Mircea Petrache,
Brian D. Nord,
Shubhendu Trivedi
Abstract:
There has been much recent interest in designing neural networks (NNs) with relaxed equivariance, which interpolate between exact equivariance and full flexibility for consistent performance gains. In a separate line of work, structured parameter matrices with low displacement rank (LDR) -- which permit fast function and gradient evaluation -- have been used to create compact NNs, though primarily…
▽ More
There has been much recent interest in designing neural networks (NNs) with relaxed equivariance, which interpolate between exact equivariance and full flexibility for consistent performance gains. In a separate line of work, structured parameter matrices with low displacement rank (LDR) -- which permit fast function and gradient evaluation -- have been used to create compact NNs, though primarily benefiting classical convolutional neural networks (CNNs). In this work, we propose a framework based on symmetry-based structured matrices to build approximately equivariant NNs with fewer parameters. Our approach unifies the aforementioned areas using Group Matrices (GMs), a forgotten precursor to the modern notion of regular representations of finite groups. GMs allow the design of structured matrices similar to LDR matrices, which can generalize all the elementary operations of a CNN from cyclic groups to arbitrary finite groups. We show GMs can also generalize classical LDR theory to general discrete groups, enabling a natural formalism for approximate equivariance. We test GM-based architectures on various tasks with relaxed symmetry and find that our framework performs competitively with approximately equivariant NNs and other structured matrix-based methods, often with one to two orders of magnitude fewer parameters.
△ Less
Submitted 18 April, 2025; v1 submitted 18 September, 2024;
originally announced September 2024.
-
Improving Equivariant Model Training via Constraint Relaxation
Authors:
Stefanos Pertigkiozoglou,
Evangelos Chatzipantazis,
Shubhendu Trivedi,
Kostas Daniilidis
Abstract:
Equivariant neural networks have been widely used in a variety of applications due to their ability to generalize well in tasks where the underlying data symmetries are known. Despite their successes, such networks can be difficult to optimize and require careful hyperparameter tuning to train successfully. In this work, we propose a novel framework for improving the optimization of such models by…
▽ More
Equivariant neural networks have been widely used in a variety of applications due to their ability to generalize well in tasks where the underlying data symmetries are known. Despite their successes, such networks can be difficult to optimize and require careful hyperparameter tuning to train successfully. In this work, we propose a novel framework for improving the optimization of such models by relaxing the hard equivariance constraint during training: We relax the equivariance constraint of the network's intermediate layers by introducing an additional non-equivariant term that we progressively constrain until we arrive at an equivariant solution. By controlling the magnitude of the activation of the additional relaxation term, we allow the model to optimize over a larger hypothesis space containing approximate equivariant networks and converge back to an equivariant solution at the end of training. We provide experimental results on different state-of-the-art network architectures, demonstrating how this training framework can result in equivariant models with improved generalization performance. Our code is available at https://github.com/StefanosPert/Equivariant_Optimization_CR
△ Less
Submitted 2 January, 2025; v1 submitted 23 August, 2024;
originally announced August 2024.
-
Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation
Authors:
Zhen Lin,
Shubhendu Trivedi,
Jimeng Sun
Abstract:
The advent of large language models (LLMs) has dramatically advanced the state-of-the-art in numerous natural language generation tasks. For LLMs to be applied reliably, it is essential to have an accurate measure of their confidence. Currently, the most commonly used confidence score function is the likelihood of the generated sequence, which, however, conflates semantic and syntactic components.…
▽ More
The advent of large language models (LLMs) has dramatically advanced the state-of-the-art in numerous natural language generation tasks. For LLMs to be applied reliably, it is essential to have an accurate measure of their confidence. Currently, the most commonly used confidence score function is the likelihood of the generated sequence, which, however, conflates semantic and syntactic components. For instance, in question-answering (QA) tasks, an awkward phrasing of the correct answer might result in a lower probability prediction. Additionally, different tokens should be weighted differently depending on the context. In this work, we propose enhancing the predicted sequence probability by assigning different weights to various tokens using attention values elicited from the base LLM. By employing a validation set, we can identify the relevant attention heads, thereby significantly improving the reliability of the vanilla sequence probability confidence measure. We refer to this new score as the Contextualized Sequence Likelihood (CSL). CSL is easy to implement, fast to compute, and offers considerable potential for further improvement with task-specific prompts. Across several QA datasets and a diverse array of LLMs, CSL has demonstrated significantly higher reliability than state-of-the-art baselines in predicting generation quality, as measured by the AUROC or AUARC.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation
Authors:
Atrisha Sarkar,
Andrei Ioan Muresanu,
Carter Blair,
Aaryam Sharma,
Rakshit S Trivedi,
Gillian K Hadfield
Abstract:
Generative agents, which implement behaviors using a large language model (LLM) to interpret and evaluate an environment, has demonstrated the capacity to solve complex tasks across many social and technological domains. However, when these agents interact with other agents and humans in presence of social structures such as existing norms, fostering cooperation between them is a fundamental chall…
▽ More
Generative agents, which implement behaviors using a large language model (LLM) to interpret and evaluate an environment, has demonstrated the capacity to solve complex tasks across many social and technological domains. However, when these agents interact with other agents and humans in presence of social structures such as existing norms, fostering cooperation between them is a fundamental challenge. In this paper, we develop the framework of a 'Normative Module': an architecture designed to enhance cooperation by enabling agents to recognize and adapt to the normative infrastructure of a given environment. We focus on the equilibrium selection aspect of the cooperation problem and inform our agent design based on the existence of classification institutions that implement correlated equilibrium to provide effective resolution of the equilibrium selection problem. Specifically, the normative module enables agents to learn through peer interactions which of multiple candidate institutions in the environment, does a group treat as authoritative. By enabling normative competence in this sense, agents gain ability to coordinate their sanctioning behaviour; coordinated sanctioning behaviour in turn shapes primary behaviour within a social environment, leading to higher average welfare. We design a new environment that supports institutions and evaluate the proposed framework based on two key criteria derived from agent interactions with peers and institutions: (i) the agent's ability to disregard non-authoritative institutions and (ii) the agent's ability to identify authoritative institutions among several options. We show that these capabilities allow the agent to achieve more stable cooperative outcomes compared to baseline agents without the normative module, paving the way for research in a new avenue of designing environments and agents that account for normative infrastructure.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Position Paper: Generalized grammar rules and structure-based generalization beyond classical equivariance for lexical tasks and transduction
Authors:
Mircea Petrache,
Shubhendu Trivedi
Abstract:
Compositional generalization is one of the main properties which differentiates lexical learning in humans from state-of-art neural networks. We propose a general framework for building models that can generalize compositionally using the concept of Generalized Grammar Rules (GGRs), a class of symmetry-based compositional constraints for transduction tasks, which we view as a transduction analogue…
▽ More
Compositional generalization is one of the main properties which differentiates lexical learning in humans from state-of-art neural networks. We propose a general framework for building models that can generalize compositionally using the concept of Generalized Grammar Rules (GGRs), a class of symmetry-based compositional constraints for transduction tasks, which we view as a transduction analogue of equivariance constraints in physics-inspired tasks. Besides formalizing generalized notions of symmetry for language transduction, our framework is general enough to contain many existing works as special cases. We present ideas on how GGRs might be implemented, and in the process draw connections to reinforcement learning and other areas of research.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Understanding Container-based Services under Software Aging: Dependability and Performance Views
Authors:
Jing Bai,
Xiaolin Chang,
Fumio Machida,
Kishor S. Trivedi
Abstract:
Container technology, as the key enabler behind microservice architectures, is widely applied in Cloud and Edge Computing. A long and continuous running of operating system (OS) host-ing container-based services can encounter software aging that leads to performance deterioration and even causes system fail-ures. OS rejuvenation techniques can mitigate the impact of software aging but the rejuvena…
▽ More
Container technology, as the key enabler behind microservice architectures, is widely applied in Cloud and Edge Computing. A long and continuous running of operating system (OS) host-ing container-based services can encounter software aging that leads to performance deterioration and even causes system fail-ures. OS rejuvenation techniques can mitigate the impact of software aging but the rejuvenation trigger interval needs to be carefully determined to reduce the downtime cost due to rejuve-nation. This paper proposes a comprehensive semi-Markov-based approach to quantitatively evaluate the effect of OS reju-venation on the dependability and the performance of a con-tainer-based service. In contrast to the existing studies, we nei-ther restrict the distributions of time intervals of events to be exponential nor assume that backup resources are always avail-able. Through the numerical study, we show the optimal con-tainer-migration trigger intervals that can maximize the de-pendability or minimize the performance of a container-based service.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Towards Semi-Markov Model-based Dependability Evaluation of VM-based Multi-Domain Service Function Chain
Authors:
Lina Liu,
Jing Bai,
Xiaolin Chang,
Fumio Machida,
Kishor S. Trivedi,
Haoran Zhu
Abstract:
In NFV networks, service functions (SFs) can be deployed on virtual machines (VMs) across multiple domains and then form a service function chain (MSFC) for end-to-end network service provision. However, any software component in a VM-based MSFC must experience software aging issue after a long period of operation. This paper quantitatively investigates the capability of proactive rejuvenation tec…
▽ More
In NFV networks, service functions (SFs) can be deployed on virtual machines (VMs) across multiple domains and then form a service function chain (MSFC) for end-to-end network service provision. However, any software component in a VM-based MSFC must experience software aging issue after a long period of operation. This paper quantitatively investigates the capability of proactive rejuvenation techniques in reducing the damage of software aging on a VM-based MSFC. We develop a semi-Markov model to capture the behaviors of SFs, VMs and virtual machine monitors (VMMs) from software aging to recovery under the condition that failure times and recovery times follow general distributions. We derive the formulas for calculating the steady-state availability and reliability of the VM-based MSFC composed of multiple SFs running on VMs hosted by VMMs. Sensitivity analysis is also conducted to identify potential dependability bottlenecks.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Authors:
Zhen Lin,
Shubhendu Trivedi,
Jimeng Sun
Abstract:
Large language models (LLMs) specializing in natural language generation (NLG) have recently started exhibiting promising capabilities across a variety of domains. However, gauging the trustworthiness of responses generated by LLMs remains an open challenge, with limited research on uncertainty quantification (UQ) for NLG. Furthermore, existing literature typically assumes white-box access to lang…
▽ More
Large language models (LLMs) specializing in natural language generation (NLG) have recently started exhibiting promising capabilities across a variety of domains. However, gauging the trustworthiness of responses generated by LLMs remains an open challenge, with limited research on uncertainty quantification (UQ) for NLG. Furthermore, existing literature typically assumes white-box access to language models, which is becoming unrealistic either due to the closed-source nature of the latest LLMs or computational constraints. In this work, we investigate UQ in NLG for *black-box* LLMs. We first differentiate *uncertainty* vs *confidence*: the former refers to the ``dispersion'' of the potential predictions for a fixed input, and the latter refers to the confidence on a particular prediction/generation. We then propose and compare several confidence/uncertainty measures, applying them to *selective NLG* where unreliable results could either be ignored or yielded for further assessment. Experiments were carried out with several popular LLMs on question-answering datasets (for evaluation purposes). Results reveal that a simple measure for the semantic dispersion can be a reliable predictor of the quality of LLM responses, providing valuable insights for practitioners on uncertainty management when adopting LLMs. The code to replicate our experiments is available at https://github.com/zlin7/UQ-NLG.
△ Less
Submitted 19 May, 2024; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Approximation-Generalization Trade-offs under (Approximate) Group Equivariance
Authors:
Mircea Petrache,
Shubhendu Trivedi
Abstract:
The explicit incorporation of task-specific inductive biases through symmetry has emerged as a general design precept in the development of high-performance machine learning models. For example, group equivariant neural networks have demonstrated impressive performance across various domains and applications such as protein and drug design. A prevalent intuition about such models is that the integ…
▽ More
The explicit incorporation of task-specific inductive biases through symmetry has emerged as a general design precept in the development of high-performance machine learning models. For example, group equivariant neural networks have demonstrated impressive performance across various domains and applications such as protein and drug design. A prevalent intuition about such models is that the integration of relevant symmetry results in enhanced generalization. Moreover, it is posited that when the data and/or the model may only exhibit $\textit{approximate}$ or $\textit{partial}$ symmetry, the optimal or best-performing model is one where the model symmetry aligns with the data symmetry. In this paper, we conduct a formal unified investigation of these intuitions. To begin, we present general quantitative bounds that demonstrate how models capturing task-specific symmetries lead to improved generalization. In fact, our results do not require the transformations to be finite or even form a group and can work with partial or approximate equivariance. Utilizing this quantification, we examine the more general question of model mis-specification i.e. when the model symmetries don't align with the data symmetries. We establish, for a given symmetry group, a quantitative comparison between the approximate/partial equivariance of the model and that of the data distribution, precisely connecting model equivariance error and data equivariance error. Our result delineates conditions under which the model equivariance error is optimal, thereby yielding the best-performing model for the given task and data. Our results are the most general results of their type in the literature.
△ Less
Submitted 18 April, 2025; v1 submitted 27 May, 2023;
originally announced May 2023.
-
Fast Online Value-Maximizing Prediction Sets with Conformal Cost Control
Authors:
Zhen Lin,
Shubhendu Trivedi,
Cao Xiao,
Jimeng Sun
Abstract:
Many real-world multi-label prediction problems involve set-valued predictions that must satisfy specific requirements dictated by downstream usage. We focus on a typical scenario where such requirements, separately encoding $\textit{value}$ and $\textit{cost}$, compete with each other. For instance, a hospital might expect a smart diagnosis system to capture as many severe, often co-morbid, disea…
▽ More
Many real-world multi-label prediction problems involve set-valued predictions that must satisfy specific requirements dictated by downstream usage. We focus on a typical scenario where such requirements, separately encoding $\textit{value}$ and $\textit{cost}$, compete with each other. For instance, a hospital might expect a smart diagnosis system to capture as many severe, often co-morbid, diseases as possible (the value), while maintaining strict control over incorrect predictions (the cost). We present a general pipeline, dubbed as FavMac, to maximize the value while controlling the cost in such scenarios. FavMac can be combined with almost any multi-label classifier, affording distribution-free theoretical guarantees on cost control. Moreover, unlike prior works, it can handle real-world large-scale applications via a carefully designed online update mechanism, which is of independent interest. Our methodological and theoretical contributions are supported by experiments on several healthcare tasks and synthetic datasets - FavMac furnishes higher value compared with several variants and baselines while maintaining strict cost control. Our code is available at https://github.com/zlin7/FavMac
△ Less
Submitted 25 April, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.
-
A Novel IoT-based Framework for Non-Invasive Human Hygiene Monitoring using Machine Learning Techniques
Authors:
Md Jobair Hossain Faruk,
Shashank Trivedi,
Mohammad Masum,
Maria Valero,
Hossain Shahriar,
Sheikh Iqbal Ahamed
Abstract:
People's personal hygiene habits speak volumes about the condition of taking care of their bodies and health in daily lifestyle. Maintaining good hygiene practices not only reduces the chances of contracting a disease but could also reduce the risk of spreading illness within the community. Given the current pandemic, daily habits such as washing hands or taking regular showers have taken primary…
▽ More
People's personal hygiene habits speak volumes about the condition of taking care of their bodies and health in daily lifestyle. Maintaining good hygiene practices not only reduces the chances of contracting a disease but could also reduce the risk of spreading illness within the community. Given the current pandemic, daily habits such as washing hands or taking regular showers have taken primary importance among people, especially for the elderly population living alone at home or in an assisted living facility. This paper presents a novel and non-invasive framework for monitoring human hygiene using vibration sensors where we adopt Machine Learning techniques. The approach is based on a combination of a geophone sensor, a digitizer, and a cost-efficient computer board in a practical enclosure. Monitoring daily hygiene routines may help healthcare professionals be proactive rather than reactive in identifying and controlling the spread of potential outbreaks within the community. The experimental result indicates that applying a Support Vector Machine (SVM) for binary classification exhibits a promising accuracy of ~95% in the classification of different hygiene habits. Furthermore, both tree-based classifier (Random Forrest and Decision Tree) outperforms other models by achieving the highest accuracy (100%), which means that classifying hygiene events using vibration and non-invasive sensors is possible for monitoring hygiene activity.
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
Conformal Prediction Intervals with Temporal Dependence
Authors:
Zhen Lin,
Shubhendu Trivedi,
Jimeng Sun
Abstract:
Cross-sectional prediction is common in many domains such as healthcare, including forecasting tasks using electronic health records, where different patients form a cross-section. We focus on the task of constructing valid prediction intervals (PIs) in time series regression with a cross-section. A prediction interval is considered valid if it covers the true response with (a pre-specified) high…
▽ More
Cross-sectional prediction is common in many domains such as healthcare, including forecasting tasks using electronic health records, where different patients form a cross-section. We focus on the task of constructing valid prediction intervals (PIs) in time series regression with a cross-section. A prediction interval is considered valid if it covers the true response with (a pre-specified) high probability. We first distinguish between two notions of validity in such a setting: cross-sectional and longitudinal. Cross-sectional validity is concerned with validity across the cross-section of the time series data, while longitudinal validity accounts for the temporal dimension. Coverage guarantees along both these dimensions are ideally desirable; however, we show that distribution-free longitudinal validity is theoretically impossible. Despite this limitation, we propose Conformal Prediction with Temporal Dependence (CPTD), a procedure that is able to maintain strict cross-sectional validity while improving longitudinal coverage. CPTD is post-hoc and light-weight, and can easily be used in conjunction with any prediction model as long as a calibration set is available. We focus on neural networks due to their ability to model complicated data such as diagnosis codes for time series regression, and perform extensive experimental validation to verify the efficacy of our approach. We find that CPTD outperforms baselines on a variety of datasets by improving longitudinal coverage and often providing more efficient (narrower) PIs.
△ Less
Submitted 2 October, 2022; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Conformal Prediction with Temporal Quantile Adjustments
Authors:
Zhen Lin,
Shubhendu Trivedi,
Jimeng Sun
Abstract:
We develop Temporal Quantile Adjustment (TQA), a general method to construct efficient and valid prediction intervals (PIs) for regression on cross-sectional time series data. Such data is common in many domains, including econometrics and healthcare. A canonical example in healthcare is predicting patient outcomes using physiological time-series data, where a population of patients composes a cro…
▽ More
We develop Temporal Quantile Adjustment (TQA), a general method to construct efficient and valid prediction intervals (PIs) for regression on cross-sectional time series data. Such data is common in many domains, including econometrics and healthcare. A canonical example in healthcare is predicting patient outcomes using physiological time-series data, where a population of patients composes a cross-section. Reliable PI estimators in this setting must address two distinct notions of coverage: cross-sectional coverage across a cross-sectional slice, and longitudinal coverage along the temporal dimension for each time series. Recent works have explored adapting Conformal Prediction (CP) to obtain PIs in the time series context. However, none handles both notions of coverage simultaneously. CP methods typically query a pre-specified quantile from the distribution of nonconformity scores on a calibration set. TQA adjusts the quantile to query in CP at each time $t$, accounting for both cross-sectional and longitudinal coverage in a theoretically-grounded manner. The post-hoc nature of TQA facilitates its use as a general wrapper around any time series regression model. We validate TQA's performance through extensive experimentation: TQA generally obtains efficient PIs and improves longitudinal coverage while preserving cross-sectional coverage.
△ Less
Submitted 23 May, 2022; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Taking a Step Back with KCal: Multi-Class Kernel-Based Calibration for Deep Neural Networks
Authors:
Zhen Lin,
Shubhendu Trivedi,
Jimeng Sun
Abstract:
Deep neural network (DNN) classifiers are often overconfident, producing miscalibrated class probabilities. In high-risk applications like healthcare, practitioners require $\textit{fully calibrated}$ probability predictions for decision-making. That is, conditioned on the prediction $\textit{vector}$, $\textit{every}$ class' probability should be close to the predicted value. Most existing calibr…
▽ More
Deep neural network (DNN) classifiers are often overconfident, producing miscalibrated class probabilities. In high-risk applications like healthcare, practitioners require $\textit{fully calibrated}$ probability predictions for decision-making. That is, conditioned on the prediction $\textit{vector}$, $\textit{every}$ class' probability should be close to the predicted value. Most existing calibration methods either lack theoretical guarantees for producing calibrated outputs, reduce classification accuracy in the process, or only calibrate the predicted class. This paper proposes a new Kernel-based calibration method called KCal. Unlike existing calibration procedures, KCal does not operate directly on the logits or softmax outputs of the DNN. Instead, KCal learns a metric space on the penultimate-layer latent embedding and generates predictions using kernel density estimates on a calibration set. We first analyze KCal theoretically, showing that it enjoys a provable $\textit{full}$ calibration guarantee. Then, through extensive experiments across a variety of datasets, we show that KCal consistently outperforms baselines as measured by the calibration error and by proper scoring rules like the Brier Score.
△ Less
Submitted 8 December, 2022; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?
Authors:
Matthew Farrell,
Blake Bordelon,
Shubhendu Trivedi,
Cengiz Pehlevan
Abstract:
Equivariance has emerged as a desirable property of representations of objects subject to identity-preserving transformations that constitute a group, such as translations and rotations. However, the expressivity of a representation constrained by group equivariance is still not fully understood. We address this gap by providing a generalization of Cover's Function Counting Theorem that quantifies…
▽ More
Equivariance has emerged as a desirable property of representations of objects subject to identity-preserving transformations that constitute a group, such as translations and rotations. However, the expressivity of a representation constrained by group equivariance is still not fully understood. We address this gap by providing a generalization of Cover's Function Counting Theorem that quantifies the number of linearly separable and group-invariant binary dichotomies that can be assigned to equivariant representations of objects. We find that the fraction of separable dichotomies is determined by the dimension of the space that is fixed by the group action. We show how this relation extends to operations such as convolutions, element-wise nonlinearities, and global and local pooling. While other operations do not change the fraction of separable dichotomies, local pooling decreases the fraction, despite being a highly nonlinear operation. Finally, we test our theory on intermediate representations of randomly initialized and fully trained convolutional neural networks and find perfect agreement.
△ Less
Submitted 5 February, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Locally Valid and Discriminative Prediction Intervals for Deep Learning Models
Authors:
Zhen Lin,
Shubhendu Trivedi,
Jimeng Sun
Abstract:
Crucial for building trust in deep learning models for critical real-world applications is efficient and theoretically sound uncertainty quantification, a task that continues to be challenging. Useful uncertainty information is expected to have two key properties: It should be valid (guaranteeing coverage) and discriminative (more uncertain when the expected risk is high). Moreover, when combined…
▽ More
Crucial for building trust in deep learning models for critical real-world applications is efficient and theoretically sound uncertainty quantification, a task that continues to be challenging. Useful uncertainty information is expected to have two key properties: It should be valid (guaranteeing coverage) and discriminative (more uncertain when the expected risk is high). Moreover, when combined with deep learning (DL) methods, it should be scalable and affect the DL model performance minimally. Most existing Bayesian methods lack frequentist coverage guarantees and usually affect model performance. The few available frequentist methods are rarely discriminative and/or violate coverage guarantees due to unrealistic assumptions. Moreover, many methods are expensive or require substantial modifications to the base neural network. Building upon recent advances in conformal prediction [13, 33] and leveraging the classical idea of kernel regression, we propose Locally Valid and Discriminative prediction intervals (LVD), a simple, efficient, and lightweight method to construct discriminative prediction intervals (PIs) for almost any DL model. With no assumptions on the data distribution, such PIs also offer finite-sample local coverage guarantees (contrasted to the simpler marginal coverage). We empirically verify, using diverse datasets, that besides being the only locally valid method for DL, LVD also exceeds or matches the performance (including coverage rate and prediction accuracy) of existing uncertainty quantification methods, while offering additional benefits in scalability and flexibility.
△ Less
Submitted 26 October, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
DeepSZ: Identification of Sunyaev-Zel'dovich Galaxy Clusters using Deep Learning
Authors:
Zhen Lin,
Nicholas Huang,
Camille Avestruz,
W. L. Kimmy Wu,
Shubhendu Trivedi,
João Caldeira,
Brian Nord
Abstract:
Galaxy clusters identified from the Sunyaev Zel'dovich (SZ) effect are a key ingredient in multi-wavelength cluster-based cosmology. We present a comparison between two methods of cluster identification: the standard Matched Filter (MF) method in SZ cluster finding and a method using Convolutional Neural Networks (CNN). We further implement and show results for a `combined' identifier. We apply th…
▽ More
Galaxy clusters identified from the Sunyaev Zel'dovich (SZ) effect are a key ingredient in multi-wavelength cluster-based cosmology. We present a comparison between two methods of cluster identification: the standard Matched Filter (MF) method in SZ cluster finding and a method using Convolutional Neural Networks (CNN). We further implement and show results for a `combined' identifier. We apply the methods to simulated millimeter maps for several observing frequencies for an SPT-3G-like survey. There are some key differences between the methods. The MF method requires image pre-processing to remove point sources and a model for the noise, while the CNN method requires very little pre-processing of images. Additionally, the CNN requires tuning of hyperparameters in the model and takes as input, cutout images of the sky. Specifically, we use the CNN to classify whether or not an 8 arcmin $\times$ 8 arcmin cutout of the sky contains a cluster. We compare differences in purity and completeness. The MF signal-to-noise ratio depends on both mass and redshift. Our CNN, trained for a given mass threshold, captures a different set of clusters than the MF, some of which have SNR below the MF detection threshold. However, the CNN tends to mis-classify cutouts whose clusters are located near the edge of the cutout, which can be mitigated with staggered cutouts. We leverage the complementarity of the two methods, combining the scores from each method for identification. The purity and completeness of the MF alone are both 0.61, assuming a standard detection threshold. The purity and completeness of the CNN alone are 0.59 and 0.61. The combined classification method yields 0.60 and 0.77, a significant increase for completeness with a modest decrease in purity. We advocate for combined methods that increase the confidence of many lower signal-to-noise clusters.
△ Less
Submitted 8 March, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.
-
Rotation-Invariant Autoencoders for Signals on Spheres
Authors:
Suhas Lohit,
Shubhendu Trivedi
Abstract:
Omnidirectional images and spherical representations of $3D$ shapes cannot be processed with conventional 2D convolutional neural networks (CNNs) as the unwrapping leads to large distortion. Using fast implementations of spherical and $SO(3)$ convolutions, researchers have recently developed deep learning methods better suited for classifying spherical images. These newly proposed convolutional la…
▽ More
Omnidirectional images and spherical representations of $3D$ shapes cannot be processed with conventional 2D convolutional neural networks (CNNs) as the unwrapping leads to large distortion. Using fast implementations of spherical and $SO(3)$ convolutions, researchers have recently developed deep learning methods better suited for classifying spherical images. These newly proposed convolutional layers naturally extend the notion of convolution to functions on the unit sphere $S^2$ and the group of rotations $SO(3)$ and these layers are equivariant to 3D rotations. In this paper, we consider the problem of unsupervised learning of rotation-invariant representations for spherical images. In particular, we carefully design an autoencoder architecture consisting of $S^2$ and $SO(3)$ convolutional layers. As 3D rotations are often a nuisance factor, the latent space is constrained to be exactly invariant to these input transformations. As the rotation information is discarded in the latent space, we craft a novel rotation-invariant loss function for training the network. Extensive experiments on multiple datasets demonstrate the usefulness of the learned representations on clustering, retrieval and classification applications.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.
-
The Expected Jacobian Outerproduct: Theory and Empirics
Authors:
Shubhendu Trivedi,
J. Wang
Abstract:
The expected gradient outerproduct (EGOP) of an unknown regression function is an operator that arises in the theory of multi-index regression, and is known to recover those directions that are most relevant to predicting the output. However, work on the EGOP, including that on its cheap estimators, is restricted to the regression setting. In this work, we adapt this operator to the multi-class se…
▽ More
The expected gradient outerproduct (EGOP) of an unknown regression function is an operator that arises in the theory of multi-index regression, and is known to recover those directions that are most relevant to predicting the output. However, work on the EGOP, including that on its cheap estimators, is restricted to the regression setting. In this work, we adapt this operator to the multi-class setting, which we dub the expected Jacobian outerproduct (EJOP). Moreover, we propose a simple rough estimator of the EJOP and show that somewhat surprisingly, it remains statistically consistent under mild assumptions. Furthermore, we show that the eigenvalues and eigenspaces also remain consistent. Finally, we show that the estimated EJOP can be used as a metric to yield improvements in real-world non-parametric classification tasks: both by its use as a metric, and also as cheap initialization in metric learning tasks.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Response to NITRD, NCO, NSF Request for Information on "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan"
Authors:
J. Amundson,
J. Annis,
C. Avestruz,
D. Bowring,
J. Caldeira,
G. Cerati,
C. Chang,
S. Dodelson,
D. Elvira,
A. Farahi,
K. Genser,
L. Gray,
O. Gutsche,
P. Harris,
J. Kinney,
J. B. Kowalkowski,
R. Kutschke,
S. Mrenna,
B. Nord,
A. Para,
K. Pedro,
G. N. Perdue,
A. Scheinker,
P. Spentzouris,
J. St. John
, et al. (5 additional authors not shown)
Abstract:
We present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan." Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspect…
▽ More
We present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan." Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspective of Fermilab, America's premier national laboratory for High Energy Physics (HEP). We believe the NAIRDSP should be extended in light of the rapid pace of development and innovation in the field of Artificial Intelligence (AI) since 2016, and present our recommendations below. AI has profoundly impacted many areas of human life, promising to dramatically reshape society --- e.g., economy, education, science --- in the coming years. We are still early in this process. It is critical to invest now in this technology to ensure it is safe and deployed ethically. Science and society both have a strong need for accuracy, efficiency, transparency, and accountability in algorithms, making investments in scientific AI particularly valuable. Thus far the US has been a leader in AI technologies, and we believe as a national Laboratory it is crucial to help maintain and extend this leadership. Moreover, investments in AI will be important for maintaining US leadership in the physical sciences.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
Asymmetric Multiresolution Matrix Factorization
Authors:
Pramod Kaushik Mudrakarta,
Shubhendu Trivedi,
Risi Kondor
Abstract:
Multiresolution Matrix Factorization (MMF) was recently introduced as an alternative to the dominant low-rank paradigm in order to capture structure in matrices at multiple different scales. Using ideas from multiresolution analysis (MRA), MMF teased out hierarchical structure in symmetric matrices by constructing a sequence of wavelet bases. While effective for such matrices, there is plenty of d…
▽ More
Multiresolution Matrix Factorization (MMF) was recently introduced as an alternative to the dominant low-rank paradigm in order to capture structure in matrices at multiple different scales. Using ideas from multiresolution analysis (MRA), MMF teased out hierarchical structure in symmetric matrices by constructing a sequence of wavelet bases. While effective for such matrices, there is plenty of data that is more naturally represented as nonsymmetric matrices (e.g. directed graphs), but nevertheless has similar hierarchical structure. In this paper, we explore techniques for extending MMF to any square matrix. We validate our approach on numerous matrix compression tasks, demonstrating its efficacy compared to low-rank methods. Moreover, we also show that a combined low-rank and MMF approach, which amounts to removing a small global-scale component of the matrix and then extracting hierarchical structure from the residual, is even more effective than each of the two complementary methods for matrix compression.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
Deep Learning for Automated Classification and Characterization of Amorphous Materials
Authors:
Kirk Swanson,
Shubhendu Trivedi,
Joshua Lequieu,
Kyle Swanson,
Risi Kondor
Abstract:
It is difficult to quantify structure-property relationships and to identify structural features of complex materials. The characterization of amorphous materials is especially challenging because their lack of long-range order makes it difficult to define structural metrics. In this work, we apply deep learning algorithms to accurately classify amorphous materials and characterize their structura…
▽ More
It is difficult to quantify structure-property relationships and to identify structural features of complex materials. The characterization of amorphous materials is especially challenging because their lack of long-range order makes it difficult to define structural metrics. In this work, we apply deep learning algorithms to accurately classify amorphous materials and characterize their structural features. Specifically, we show that convolutional neural networks and message passing neural networks can classify two-dimensional liquids and liquid-cooled glasses from molecular dynamics simulations with greater than 0.98 AUC, with no a priori assumptions about local particle relationships, even when the liquids and glasses are prepared at the same inherent structure energy. Furthermore, we demonstrate that message passing neural networks surpass convolutional neural networks in this context in both accuracy and interpretability. We extract a clear interpretation of how message passing neural networks evaluate liquid and glass structures by using a self-attention mechanism. Using this interpretation, we derive three novel structural metrics that accurately characterize glass formation. The methods presented here provide us with a procedure to identify important structural features in materials that could be missed by standard techniques and give us a unique insight into how these neural networks process data.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
DeepCMB: Lensing Reconstruction of the Cosmic Microwave Background with Deep Neural Networks
Authors:
João Caldeira,
W. L. Kimmy Wu,
Brian Nord,
Camille Avestruz,
Shubhendu Trivedi,
Kyle T. Story
Abstract:
Next-generation cosmic microwave background (CMB) experiments will have lower noise and therefore increased sensitivity, enabling improved constraints on fundamental physics parameters such as the sum of neutrino masses and the tensor-to-scalar ratio r. Achieving competitive constraints on these parameters requires high signal-to-noise extraction of the projected gravitational potential from the C…
▽ More
Next-generation cosmic microwave background (CMB) experiments will have lower noise and therefore increased sensitivity, enabling improved constraints on fundamental physics parameters such as the sum of neutrino masses and the tensor-to-scalar ratio r. Achieving competitive constraints on these parameters requires high signal-to-noise extraction of the projected gravitational potential from the CMB maps. Standard methods for reconstructing the lensing potential employ the quadratic estimator (QE). However, the QE performs suboptimally at the low noise levels expected in upcoming experiments. Other methods, like maximum likelihood estimators (MLE), are under active development. In this work, we demonstrate reconstruction of the CMB lensing potential with deep convolutional neural networks (CNN) - ie, a ResUNet. The network is trained and tested on simulated data, and otherwise has no physical parametrization related to the physical processes of the CMB and gravitational lensing. We show that, over a wide range of angular scales, ResUNets recover the input gravitational potential with a higher signal-to-noise ratio than the QE method, reaching levels comparable to analytic approximations of MLE methods. We demonstrate that the network outputs quantifiably different lensing maps when given input CMB maps generated with different cosmologies. We also show we can use the reconstructed lensing map for cosmological parameter estimation. This application of CNN provides a few innovations at the intersection of cosmology and machine learning. First, while training and regressing on images, we predict a continuous-variable field rather than discrete classes. Second, we are able to establish uncertainty measures for the network output that are analogous to standard methods. We expect this approach to excel in capturing hard-to-model non-Gaussian astrophysical foreground and noise contributions.
△ Less
Submitted 12 June, 2020; v1 submitted 2 October, 2018;
originally announced October 2018.
-
Discriminative Learning of Similarity and Group Equivariant Representations
Authors:
Shubhendu Trivedi
Abstract:
One of the most fundamental problems in machine learning is to compare examples: Given a pair of objects we want to return a value which indicates degree of (dis)similarity. Similarity is often task specific, and pre-defined distances can perform poorly, leading to work in metric learning. However, being able to learn a similarity-sensitive distance function also presupposes access to a rich, disc…
▽ More
One of the most fundamental problems in machine learning is to compare examples: Given a pair of objects we want to return a value which indicates degree of (dis)similarity. Similarity is often task specific, and pre-defined distances can perform poorly, leading to work in metric learning. However, being able to learn a similarity-sensitive distance function also presupposes access to a rich, discriminative representation for the objects at hand. In this dissertation we present contributions towards both ends. In the first part of the thesis, assuming good representations for the data, we present a formulation for metric learning that makes a more direct attempt to optimize for the k-NN accuracy as compared to prior work. We also present extensions of this formulation to metric learning for kNN regression, asymmetric similarity learning and discriminative learning of Hamming distance. In the second part, we consider a situation where we are on a limited computational budget i.e. optimizing over a space of possible metrics would be infeasible, but access to a label aware distance metric is still desirable. We present a simple, and computationally inexpensive approach for estimating a well motivated metric that relies only on gradient estimates, discussing theoretical and experimental results. In the final part, we address representational issues, considering group equivariant convolutional neural networks (GCNNs). Equivariance to symmetry transformations is explicitly encoded in GCNNs; a classical CNN being the simplest example. In particular, we present a SO(3)-equivariant neural network architecture for spherical data, that operates entirely in Fourier space, while also providing a formalism for the design of fully Fourier neural networks that are equivariant to the action of any continuous compact group.
△ Less
Submitted 30 August, 2022; v1 submitted 29 August, 2018;
originally announced August 2018.
-
Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network
Authors:
Risi Kondor,
Zhen Lin,
Shubhendu Trivedi
Abstract:
Recent work by Cohen \emph{et al.} has achieved state-of-the-art results for learning spherical images in a rotation invariant way by using ideas from group representation theory and noncommutative harmonic analysis. In this paper we propose a generalization of this work that generally exhibits improved performace, but from an implementation point of view is actually simpler. An unusual feature of…
▽ More
Recent work by Cohen \emph{et al.} has achieved state-of-the-art results for learning spherical images in a rotation invariant way by using ideas from group representation theory and noncommutative harmonic analysis. In this paper we propose a generalization of this work that generally exhibits improved performace, but from an implementation point of view is actually simpler. An unusual feature of the proposed architecture is that it uses the Clebsch--Gordan transform as its only source of nonlinearity, thus avoiding repeated forward and backward Fourier transforms. The underlying ideas of the paper generalize to constructing neural networks that are invariant to the action of other compact groups.
△ Less
Submitted 10 November, 2018; v1 submitted 24 June, 2018;
originally announced June 2018.
-
On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups
Authors:
Risi Kondor,
Shubhendu Trivedi
Abstract:
Convolutional neural networks have been extremely successful in the image recognition domain because they ensure equivariance to translations. There have been many recent attempts to generalize this framework to other domains, including graphs and data lying on manifolds. In this paper we give a rigorous, theoretical treatment of convolution and equivariance in neural networks with respect to not…
▽ More
Convolutional neural networks have been extremely successful in the image recognition domain because they ensure equivariance to translations. There have been many recent attempts to generalize this framework to other domains, including graphs and data lying on manifolds. In this paper we give a rigorous, theoretical treatment of convolution and equivariance in neural networks with respect to not just translations, but the action of any compact group. Our main result is to prove that (given some natural constraints) convolutional structure is not just a sufficient, but also a necessary condition for equivariance to the action of a compact group. Our exposition makes use of concepts from representation theory and noncommutative harmonic analysis and derives new generalized convolution formulae.
△ Less
Submitted 10 November, 2018; v1 submitted 10 February, 2018;
originally announced February 2018.
-
Covariant Compositional Networks For Learning Graphs
Authors:
Risi Kondor,
Hy Truong Son,
Horace Pan,
Brandon Anderson,
Shubhendu Trivedi
Abstract:
Most existing neural networks for learning graphs address permutation invariance by conceiving of the network as a message passing scheme, where each node sums the feature vectors coming from its neighbors. We argue that this imposes a limitation on their representation power, and instead propose a new general architecture for representing objects consisting of a hierarchy of parts, which we call…
▽ More
Most existing neural networks for learning graphs address permutation invariance by conceiving of the network as a message passing scheme, where each node sums the feature vectors coming from its neighbors. We argue that this imposes a limitation on their representation power, and instead propose a new general architecture for representing objects consisting of a hierarchy of parts, which we call Covariant Compositional Networks (CCNs). Here, covariance means that the activation of each neuron must transform in a specific way under permutations, similarly to steerability in CNNs. We achieve covariance by making each activation transform according to a tensor representation of the permutation group, and derive the corresponding tensor aggregation rules that each neuron must implement. Experiments show that CCNs can outperform competing methods on standard graph learning benchmarks.
△ Less
Submitted 7 January, 2018;
originally announced January 2018.
-
The Utility of Clustering in Prediction Tasks
Authors:
Shubhendu Trivedi,
Zachary A. Pardos,
Neil T. Heffernan
Abstract:
We explore the utility of clustering in reducing error in various prediction tasks. Previous work has hinted at the improvement in prediction accuracy attributed to clustering algorithms if used to pre-process the data. In this work we more deeply investigate the direct utility of using clustering to improve prediction accuracy and provide explanations for why this may be so. We look at a number o…
▽ More
We explore the utility of clustering in reducing error in various prediction tasks. Previous work has hinted at the improvement in prediction accuracy attributed to clustering algorithms if used to pre-process the data. In this work we more deeply investigate the direct utility of using clustering to improve prediction accuracy and provide explanations for why this may be so. We look at a number of datasets, run k-means at different scales and for each scale we train predictors. This produces k sets of predictions. These predictions are then combined by a naïve ensemble. We observed that this use of a predictor in conjunction with clustering improved the prediction accuracy in most datasets. We believe this indicates the predictive utility of exploiting structure in the data and the data compression handed over by clustering. We also found that using this method improves upon the prediction of even a Random Forests predictor which suggests this method is providing a novel, and useful source of variance in the prediction process.
△ Less
Submitted 21 September, 2015;
originally announced September 2015.
-
NDTAODV: Neighbor Defense Technique for Ad Hoc On-Demand Distance Vector(AODV) to mitigate flood attack in MANETS
Authors:
Akshai Aggarwal,
Savita Gandhi,
Nirbhay Chaubey,
Naren Tada,
Srushti Trivedi
Abstract:
Mobile Ad Hoc Networks (MANETs) are collections of mobile nodes that can communicate with one another using multihop wireless links. MANETs are often deployed in the environments, where there is no fixed infrastructure and centralized management. The nodes of mobile ad hoc networks are susceptible to compromise. In such a scenario, designing an efficient, reliable and secure routing protocol has b…
▽ More
Mobile Ad Hoc Networks (MANETs) are collections of mobile nodes that can communicate with one another using multihop wireless links. MANETs are often deployed in the environments, where there is no fixed infrastructure and centralized management. The nodes of mobile ad hoc networks are susceptible to compromise. In such a scenario, designing an efficient, reliable and secure routing protocol has been a major challengesue over the last many years. The routing protocol Ad hoc On-demand Distance Vector (AODV) has no security measures in-built in it. It is vulnerable to many types of routing attacks. The flood attack is one of them. In this paper, we propose a simple and effective technique to secure Ad hoc Ondemand Distance Vector (AODV) routing protocol against flood attacks. To deal with a flood attack, we have proposed Neighbor Defense Technique for Ad hoc On-demand Distance Vector (NDTAODV). This makes AODV more robust. The proposed technique has been designed to isolate the flood attacker with the use of timers, peak value and hello alarm technique. We have simulated our work in Network Simulator NS-2.33 (NS-2) with different pause times by way of different number of malicious nodes. We have compared the performance of NDTAODV with the AODV in normal situation as well as in the presence of malicious attacks. We have considered Packet Delivery Fraction (PDF), Average Throughput (AT) and Normalized Routing Load (NRL) for comparing the performance of NDTAODV and AODV.
△ Less
Submitted 10 February, 2014;
originally announced May 2014.
-
A Practical Regularity Partitioning Algorithm and its Applications in Clustering
Authors:
Gábor N. Sárközy,
Fei Song,
Endre Szemerédi,
Shubhendu Trivedi
Abstract:
In this paper we introduce a new clustering technique called Regularity Clustering. This new technique is based on the practical variants of the two constructive versions of the Regularity Lemma, a very useful tool in graph theory. The lemma claims that every graph can be partitioned into pseudo-random graphs. While the Regularity Lemma has become very important in proving theoretical results, it…
▽ More
In this paper we introduce a new clustering technique called Regularity Clustering. This new technique is based on the practical variants of the two constructive versions of the Regularity Lemma, a very useful tool in graph theory. The lemma claims that every graph can be partitioned into pseudo-random graphs. While the Regularity Lemma has become very important in proving theoretical results, it has no direct practical applications so far. An important reason for this lack of practical applications is that the graph under consideration has to be astronomically large. This requirement makes its application restrictive in practice where graphs typically are much smaller. In this paper we propose modifications of the constructive versions of the Regularity Lemma that work for smaller graphs as well. We call this the Practical Regularity partitioning algorithm. The partition obtained by this is used to build the reduced graph which can be viewed as a compressed representation of the original graph. Then we apply a pairwise clustering method such as spectral clustering on this reduced graph to get a clustering of the original graph that we call Regularity Clustering. We present results of using Regularity Clustering on a number of benchmark datasets and compare them with standard clustering techniques, such as $k$-means and spectral clustering. These empirical results are very encouraging. Thus in this paper we report an attempt to harness the power of the Regularity Lemma for real-world applications.
△ Less
Submitted 28 September, 2012;
originally announced September 2012.