-
Next Generation Multi-element monolithic Germanium detectors for Spectroscopy: First integration at ESRF facility
Authors:
N. Goyal,
S. Aplin,
A. Balerna,
P. Bell,
J. Casas,
M. Cascella,
S. Chatterji,
C. Cohen,
E. Collet,
P. Fajardo,
E. N. Gimenez,
H. Graafsma,
H. Hiresmann,
F. J. Iguaz,
K. Klementiev,
T. Kolodziej,
L. Manzanillas,
T. Martin,
R. H. Menk,
M. Porro,
M. Quispe,
B. Schmitt,
S. Scully,
M. Turcato,
C. Ward
, et al. (1 additional authors not shown)
Abstract:
The XAFS-DET work package of the European LEAPS-INNOV project is developing a high-purity Germanium detectors for synchrotron applications requiring spectroscopic-grade response. The detectors integrate three key features: (1) newly designed monolithic Germanium sensors optimised to mitigate charge-sharing events, (2) an improved cooling and mechanical design structure supported by thermal simulat…
▽ More
The XAFS-DET work package of the European LEAPS-INNOV project is developing a high-purity Germanium detectors for synchrotron applications requiring spectroscopic-grade response. The detectors integrate three key features: (1) newly designed monolithic Germanium sensors optimised to mitigate charge-sharing events, (2) an improved cooling and mechanical design structure supported by thermal simulations, and (3) complete electronic chain featuring a low-noise CMOS technology-based preamplifier. enabling high X-ray count rate capability over a broad energy range (5-100 keV). This paper discusses the first integration and characterization of one of the two multi-element Ge detectors at the European Synchrotron Radiation Facility (ESRF). The integration phase included validating high-throughput front-End electronics, integrating them with the Ge sensor, and operating them at liquid nitrogen temperature, in addition to the experimental characterization, which consists of electronics noise study and spectroscopic performance evaluation.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Progress in the Development of Multi-Element Monolithic Germanium Detectors in LEAPS-INNOV Project: Insights from Detector Performance Simulation
Authors:
N. Goyal,
S. Aplin,
A. Balerna,
P. Bell,
J. Casas,
M. Cascella,
S. Chatterji,
C. Cohen,
E. Collet,
G. Dennis,
P. Fajardo,
E. N. Gimenez,
H. Graafsma,
H. Hiresmann,
F. J. Iguaz,
K. Klementiev,
T. Kolodziej,
L. Manzanillas,
T. Martin,
R. H. Menk,
M. Porro,
M. Quispe,
B. Schmitt,
S. Scully,
M. Turcato
, et al. (2 additional authors not shown)
Abstract:
This study presents a detailed simulation-based analysis of the detection limits of multi-element monolithic Germanium (Ge) detectors to cadmium traces in environmental soil samples. Using the capabilities of the Geant4 Monte Carlo toolkit in combination with the Solid State Detector Package, we evaluated the detection limit variation with the sample-to-detector distances and photon flux. These si…
▽ More
This study presents a detailed simulation-based analysis of the detection limits of multi-element monolithic Germanium (Ge) detectors to cadmium traces in environmental soil samples. Using the capabilities of the Geant4 Monte Carlo toolkit in combination with the Solid State Detector Package, we evaluated the detection limit variation with the sample-to-detector distances and photon flux. These simulations were conducted to mimic realistic conditions, with a photon flux measured by the SAMBA beamline at the SOLEIL synchrotron facility. Our findings for the detection limit for trace amounts of pollutants in low concentrations like cadmium in the soil provide valuable insights for optimizing experimental setups in environmental monitoring and synchrotron-based applications, where precise detection of trace elements is critical.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Authors:
Rylan Schaeffer,
Punit Singh Koura,
Binh Tang,
Ranjan Subramanian,
Aaditya K Singh,
Todor Mihaylov,
Prajjwal Bhargava,
Lovish Madaan,
Niladri S. Chatterji,
Vedanuj Goswami,
Sergey Edunov,
Dieuwke Hupkes,
Sanmi Koyejo,
Sharan Narang
Abstract:
The explosion of high-performing conversational language models (LMs) has spurred a shift from classic natural language processing (NLP) benchmarks to expensive, time-consuming and noisy human evaluations - yet the relationship between these two evaluation strategies remains hazy. In this paper, we conduct a large-scale study of four Chat Llama 2 models, comparing their performance on 160 standard…
▽ More
The explosion of high-performing conversational language models (LMs) has spurred a shift from classic natural language processing (NLP) benchmarks to expensive, time-consuming and noisy human evaluations - yet the relationship between these two evaluation strategies remains hazy. In this paper, we conduct a large-scale study of four Chat Llama 2 models, comparing their performance on 160 standard NLP benchmarks (e.g., MMLU, ARC, BIG-Bench Hard) against extensive human preferences on more than 11k single-turn and 2k multi-turn dialogues from over 2k human annotators. Our findings are striking: most NLP benchmarks strongly correlate with human evaluations, suggesting that cheaper, automated metrics can serve as surprisingly reliable predictors of human preferences. Three human evaluations, such as adversarial dishonesty and safety, are anticorrelated with NLP benchmarks, while two are uncorrelated. Moreover, through overparameterized linear regressions, we show that NLP scores can accurately predict human evaluations across different model scales, offering a path to reduce costly human annotation without sacrificing rigor. Overall, our results affirm the continued value of classic benchmarks and illuminate how to harness them to anticipate real-world user satisfaction - pointing to how NLP benchmarks can be leveraged to meet evaluation needs of our new era of conversational AI.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning
Authors:
Satchit Chatterji,
Erman Acar
Abstract:
Safe reinforcement learning (RL) is crucial for real-world applications, and multi-agent interactions introduce additional safety challenges. While Probabilistic Logic Shields (PLS) has been a powerful proposal to enforce safety in single-agent RL, their generalizability to multi-agent settings remains unexplored. In this paper, we address this gap by conducting extensive analyses of PLS within de…
▽ More
Safe reinforcement learning (RL) is crucial for real-world applications, and multi-agent interactions introduce additional safety challenges. While Probabilistic Logic Shields (PLS) has been a powerful proposal to enforce safety in single-agent RL, their generalizability to multi-agent settings remains unexplored. In this paper, we address this gap by conducting extensive analyses of PLS within decentralized, multi-agent environments, and in doing so, propose Shielded Multi-Agent Reinforcement Learning (SMARL) as a general framework for steering MARL towards norm-compliant outcomes. Our key contributions are: (1) a novel Probabilistic Logic Temporal Difference (PLTD) update for shielded, independent Q-learning, which incorporates probabilistic constraints directly into the value update process; (2) a probabilistic logic policy gradient method for shielded PPO with formal safety guarantees for MARL; and (3) comprehensive evaluation across symmetric and asymmetrically shielded $n$-player game-theoretic benchmarks, demonstrating fewer constraint violations and significantly better cooperation under normative constraints. These results position SMARL as an effective mechanism for equilibrium selection, paving the way toward safer, socially aligned multi-agent systems.
△ Less
Submitted 14 May, 2025; v1 submitted 7 November, 2024;
originally announced November 2024.
-
Decomposability and Strategy-proofness in Multidimensional Models
Authors:
Shurojit Chatterji,
Huaxia Zeng
Abstract:
We introduce the notion of a multidimensional hybrid preference domain on a (finite) set of alternatives that is a Cartesian product of finitely many components. We demonstrate that in a model of public goods provision, multidimensional hybrid preferences arise naturally through assembling marginal preferences under the condition of semi-separability - a weakening of separability. The main result…
▽ More
We introduce the notion of a multidimensional hybrid preference domain on a (finite) set of alternatives that is a Cartesian product of finitely many components. We demonstrate that in a model of public goods provision, multidimensional hybrid preferences arise naturally through assembling marginal preferences under the condition of semi-separability - a weakening of separability. The main result shows that under a suitable "richness" condition, every strategy-proof rule on this domain can be decomposed into component-wise strategy-proof rules, and more importantly every domain of preferences that reconciles decomposability of rules with strategy-proofness must be a multidimensional hybrid domain.
△ Less
Submitted 16 November, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Authors:
Niladri S. Chatterji,
Philip M. Long
Abstract:
We bound the excess risk of interpolating deep linear networks trained using gradient flow. In a setting previously used to establish risk bounds for the minimum $\ell_2$-norm interpolant, we show that randomly initialized deep linear networks can closely approximate or even match known bounds for the minimum $\ell_2$-norm interpolant. Our analysis also reveals that interpolating deep linear model…
▽ More
We bound the excess risk of interpolating deep linear networks trained using gradient flow. In a setting previously used to establish risk bounds for the minimum $\ell_2$-norm interpolant, we show that randomly initialized deep linear networks can closely approximate or even match known bounds for the minimum $\ell_2$-norm interpolant. Our analysis also reveals that interpolating deep linear models have exactly the same conditional variance as the minimum $\ell_2$-norm solution. Since the noise affects the excess risk only through the conditional variance, this implies that depth does not improve the algorithm's ability to "hide the noise". Our simulations verify that aspects of our bounds reflect typical behavior for simple data distributions. We also find that similar phenomena are seen in simulations with ReLU networks, although the situation there is more nuanced.
△ Less
Submitted 6 February, 2023; v1 submitted 19 September, 2022;
originally announced September 2022.
-
Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification
Authors:
Niladri S. Chatterji,
Saminul Haque,
Tatsunori Hashimoto
Abstract:
While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ balanced dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks. This is rather surprising, since undersampling algorithms discard excess majority group data. To understand this phenomenon, we ask if learning is fundam…
▽ More
While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ balanced dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks. This is rather surprising, since undersampling algorithms discard excess majority group data. To understand this phenomenon, we ask if learning is fundamentally constrained by a lack of minority group samples. We prove that this is indeed the case in the setting of nonparametric binary classification. Our results show that in the worst case, an algorithm cannot outperform undersampling unless there is a high degree of overlap between the train and test distributions (which is unlikely to be the case in real-world datasets), or if the algorithm leverages additional structure about the distribution shift. In particular, in the case of label shift we show that there is always an undersampling algorithm that is minimax optimal. In the case of group-covariate shift we show that there is an undersampling algorithm that is minimax optimal when the overlap between the group distributions is small. We also perform an experimental case study on a label shift dataset and find that in line with our theory, the test accuracy of robust neural network classifiers is constrained by the number of minority samples.
△ Less
Submitted 19 June, 2023; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Outcome coding choice in randomized trials of programs to reduce violence
Authors:
Christopher Boyer,
Sangeeta Chatterji,
Jasper Cooper,
Lori Heise
Abstract:
Over the last decade, the number of randomized trials of programs to reduce intimate partner violence (IPV) has grown precipitously. However, most trials continue to measure and code violence using standards originally designed for global prevalence surveys. This choice may have consequences in terms of bias, power, and efficiency of trial estimates and may limit what we can learn about how progra…
▽ More
Over the last decade, the number of randomized trials of programs to reduce intimate partner violence (IPV) has grown precipitously. However, most trials continue to measure and code violence using standards originally designed for global prevalence surveys. This choice may have consequences in terms of bias, power, and efficiency of trial estimates and may limit what we can learn about how programs are working. In this paper, we return to first principles to develop a generative model for violence reduction. We then use this model to better understand trade-offs in outcome coding choices via simulation. We re-analyze results from seven recent trials in Southern and Eastern Africa to highlight some of our findings. We conclude with a discussion of key take-aways for trialists.
△ Less
Submitted 27 September, 2022; v1 submitted 26 April, 2022;
originally announced April 2022.
-
Random Feature Amplification: Feature Learning and Generalization in Neural Networks
Authors:
Spencer Frei,
Niladri S. Chatterji,
Peter L. Bartlett
Abstract:
In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although line…
▽ More
In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.
△ Less
Submitted 13 September, 2023; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data
Authors:
Spencer Frei,
Niladri S. Chatterji,
Peter L. Bartlett
Abstract:
Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We…
▽ More
Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
△ Less
Submitted 13 September, 2023; v1 submitted 11 February, 2022;
originally announced February 2022.
-
A Taxonomy of Non-dictatorial Unidimensional Domains
Authors:
Shurojit Chatterji,
Huaxia Zeng
Abstract:
A preference domain is called a non-dictatorial domain if it allows the design of unanimous social choice functions (henceforth, rules) that are non-dictatorial and strategy-proof. We study a class of preference domains called unidimensional domains and establish that the unique seconds property (introduced by Aswal, Chatterji, and Sen (2003)) characterizes all non-dictatorial domains. The princip…
▽ More
A preference domain is called a non-dictatorial domain if it allows the design of unanimous social choice functions (henceforth, rules) that are non-dictatorial and strategy-proof. We study a class of preference domains called unidimensional domains and establish that the unique seconds property (introduced by Aswal, Chatterji, and Sen (2003)) characterizes all non-dictatorial domains. The principal contribution is the subsequent exhaustive classification of all non-dictatorial, unidimensional domains and canonical strategy-proof rules on these domains, based on a simple property of two-voter rules called invariance. The preference domains that constitute the classification are semi-single-peaked domains (introduced by Chatterji, Sanver, and Sen (2013)) and semi-hybrid domains (introduced here) which are two appropriate weakenings of single-peaked domains and are shown to allow strategy-proof rules to depend on non-peak information of voters' preferences; the canonical rules for these domains are the projection rule and the hybrid rule respectively. As a refinement of the classification, single-peaked domains and hybrid domains emerge as the only unidimensional domains that force strategy-proof rules to be determined completely by voters' preference peaks.
△ Less
Submitted 24 October, 2022; v1 submitted 3 January, 2022;
originally announced January 2022.
-
Is Importance Weighting Incompatible with Interpolating Classifiers?
Authors:
Ke Alexander Wang,
Niladri S. Chatterji,
Saminul Haque,
Tatsunori Hashimoto
Abstract:
Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We sh…
▽ More
Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We show that importance weighting fails not because of the overparameterization, but instead, as a result of using exponentially-tailed losses like the logistic or cross-entropy loss. As a remedy, we show that polynomially-tailed losses restore the effects of importance reweighting in correcting distribution shift in overparameterized models. We characterize the behavior of gradient descent on importance weighted polynomially-tailed losses with overparameterized linear models, and theoretically demonstrate the advantage of using polynomially-tailed losses in a label shift setting. Surprisingly, our theory shows that using weights that are obtained by exponentiating the classical unbiased importance weights can improve performance. Finally, we demonstrate the practical value of our analysis with neural network experiments on a subpopulation shift and a label shift dataset. When reweighted, our loss function can outperform reweighted cross-entropy by as much as 9% in test accuracy. Our loss function also gives test accuracies comparable to, or even exceeding, well-tuned state-of-the-art methods for correcting distribution shifts.
△ Less
Submitted 4 March, 2022; v1 submitted 24 December, 2021;
originally announced December 2021.
-
Probabilistic Impact Score Generation using Ktrain-BERT to Identify Hate Words from Twitter Discussions
Authors:
Sourav Das,
Prasanta Mandal,
Sanjay Chatterji
Abstract:
Social media has seen a worrying rise in hate speech in recent times. Branching to several distinct categories of cyberbullying, gender discrimination, or racism, the combined label for such derogatory content can be classified as toxic content in general. This paper presents experimentation with a Keras wrapped lightweight BERT model to successfully identify hate speech and predict probabilistic…
▽ More
Social media has seen a worrying rise in hate speech in recent times. Branching to several distinct categories of cyberbullying, gender discrimination, or racism, the combined label for such derogatory content can be classified as toxic content in general. This paper presents experimentation with a Keras wrapped lightweight BERT model to successfully identify hate speech and predict probabilistic impact score for the same to extract the hateful words within sentences. The dataset used for this task is the Hate Speech and Offensive Content Detection (HASOC 2021) data from FIRE 2021 in English. Our system obtained a validation accuracy of 82.60%, with a maximum F1-Score of 82.68%. Subsequently, our predictive cases performed significantly well in generating impact scores for successful identification of the hate tweets as well as the hateful words from tweet pools.
△ Less
Submitted 8 January, 2022; v1 submitted 25 November, 2021;
originally announced November 2021.
-
Foolish Crowds Support Benign Overfitting
Authors:
Niladri S. Chatterji,
Philip M. Long
Abstract:
We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime. We apply this result to obtain a lower bound for basis pursuit (the minimum $\ell_1$-norm interpolant) that implies that its excess risk can converge at an exponentially slower rate than OLS (the minimum $\ell_2$-norm interpolant), even when the gro…
▽ More
We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime. We apply this result to obtain a lower bound for basis pursuit (the minimum $\ell_1$-norm interpolant) that implies that its excess risk can converge at an exponentially slower rate than OLS (the minimum $\ell_2$-norm interpolant), even when the ground truth is sparse. Our analysis exposes the benefit of an effect analogous to the "wisdom of the crowd", except here the harm arising from fitting the $\textit{noise}$ is ameliorated by spreading it among many directions -- the variance reduction arises from a $\textit{foolish}$ crowd.
△ Less
Submitted 17 March, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks
Authors:
Niladri S. Chatterji,
Philip M. Long,
Peter L. Bartlett
Abstract:
The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign overfitting}$ has attracted intense theoretical and empirical study. In this paper, we consider interpolating two-layer linear neural networks trained wit…
▽ More
The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign overfitting}$ has attracted intense theoretical and empirical study. In this paper, we consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk when the covariates satisfy sub-Gaussianity and anti-concentration properties, and the noise is independent and sub-Gaussian. By leveraging recent results that characterize the implicit bias of this estimator, our bounds emphasize the role of both the quality of the initialization as well as the properties of the data covariance matrix in achieving low excess risk.
△ Less
Submitted 9 September, 2022; v1 submitted 25 August, 2021;
originally announced August 2021.
-
On the Theory of Reinforcement Learning with Once-per-Episode Feedback
Authors:
Niladri S. Chatterji,
Aldo Pacchiano,
Peter L. Bartlett,
Michael I. Jordan
Abstract:
We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode. While this is an extreme test case for theory, it is also arguably more representative of real-world applications than the traditional requirement in RL practice that the learner receive feedback at every time step. Indeed, in many real-world applications of reinforcem…
▽ More
We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode. While this is an extreme test case for theory, it is also arguably more representative of real-world applications than the traditional requirement in RL practice that the learner receive feedback at every time step. Indeed, in many real-world applications of reinforcement learning, such as self-driving cars and robotics, it is easier to evaluate whether a learner's complete trajectory was either "good" or "bad," but harder to provide a reward signal at each step. To show that learning is possible in this more challenging setting, we study the case where trajectory labels are generated by an unknown parametric model, and provide a statistically and computationally efficient algorithm that achieves sublinear regret.
△ Less
Submitted 21 August, 2022; v1 submitted 29 May, 2021;
originally announced May 2021.
-
Probabilistic Fixed Ballot Rules and Hybrid Domains
Authors:
Shurojit Chatterji,
Souvik Roy,
Soumyarup Sadhukhan,
Arunava Sen,
Huaxia Zeng
Abstract:
We study a class of preference domains that satisfies the familiar properties of minimal richness, diversity and no-restoration. We show that a specific preference restriction, hybridness, has been embedded in these domains so that the preferences are single-peaked at the "extremes" and unrestricted in the "middle". We also study the structure of strategy-proof and unanimous Random Social Choice F…
▽ More
We study a class of preference domains that satisfies the familiar properties of minimal richness, diversity and no-restoration. We show that a specific preference restriction, hybridness, has been embedded in these domains so that the preferences are single-peaked at the "extremes" and unrestricted in the "middle". We also study the structure of strategy-proof and unanimous Random Social Choice Functions on these domains. We show them to be special cases of probabilistic fixed ballot rules (introduced by Ehlers, Peters, and Storcken (2002)).
△ Less
Submitted 4 January, 2022; v1 submitted 22 May, 2021;
originally announced May 2021.
-
When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?
Authors:
Niladri S. Chatterji,
Philip M. Long,
Peter L. Bartlett
Abstract:
We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU, such as Swish and the Huberized ReLU, proposed in previous applied work. We provide two sufficient conditions for convergence. The first is simply a bound on the loss at…
▽ More
We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU, such as Swish and the Huberized ReLU, proposed in previous applied work. We provide two sufficient conditions for convergence. The first is simply a bound on the loss at initialization. The second is a data separation condition used in prior analyses.
△ Less
Submitted 1 July, 2021; v1 submitted 9 February, 2021;
originally announced February 2021.
-
When does gradient descent with logistic loss find interpolating two-layer networks?
Authors:
Niladri S. Chatterji,
Philip M. Long,
Peter L. Bartlett
Abstract:
We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss. We show that gradient descent drives the training loss to zero if the initial loss is small enough. When the data satisfies certain cluster and separation conditions and the network is wide enough, we show that one step of gradient descent reduces the loss sufficiently that the…
▽ More
We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss. We show that gradient descent drives the training loss to zero if the initial loss is small enough. When the data satisfies certain cluster and separation conditions and the network is wide enough, we show that one step of gradient descent reduces the loss sufficiently that the first result applies.
△ Less
Submitted 1 July, 2021; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime
Authors:
Niladri S. Chatterji,
Philip M. Long
Abstract:
We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification. For linearly separable training data, the maximum margin algorithm has been shown in previous work to be equivalent to a limit of training with logistic loss using gradient descent, as the training error is driven to zero. We analyze this algorithm applied to random data including misclassif…
▽ More
We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification. For linearly separable training data, the maximum margin algorithm has been shown in previous work to be equivalent to a limit of training with logistic loss using gradient descent, as the training error is driven to zero. We analyze this algorithm applied to random data including misclassification noise. Our assumptions on the clean data include the case in which the class-conditional distributions are standard normal distributions. The misclassification noise may be chosen by an adversary, subject to a limit on the fraction of corrupted labels. Our bounds show that, with sufficient over-parameterization, the maximum margin algorithm trained on noisy data can achieve nearly optimal population risk.
△ Less
Submitted 1 June, 2021; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Oracle Lower Bounds for Stochastic Gradient Sampling Algorithms
Authors:
Niladri S. Chatterji,
Peter L. Bartlett,
Philip M. Long
Abstract:
We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an…
▽ More
We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all these algorithms.
We show that for every algorithm, there exists a well-conditioned strongly log-concave target density for which the distribution of points generated by the algorithm would be at least $\varepsilon$ away from the target in total variation distance if the number of gradient queries is less than $Ω(σ^2 d/\varepsilon^2)$, where $σ^2 d$ is the variance of the stochastic gradient. Our lower bound follows by combining the ideas of Le Cam deficiency routinely used in the comparison of statistical experiments along with standard information theoretic tools used in lower bounding Bayes risk functions. To the best of our knowledge our results provide the first nontrivial dimension-dependent lower bound for this problem.
△ Less
Submitted 3 July, 2021; v1 submitted 1 February, 2020;
originally announced February 2020.
-
The intriguing role of module criticality in the generalization of deep networks
Authors:
Niladri S. Chatterji,
Behnam Neyshabur,
Hanie Sedghi
Abstract:
We study the phenomenon that some modules of deep neural networks (DNNs) are more critical than others. Meaning that rewinding their parameter values back to initialization, while keeping other modules fixed at the trained parameters, results in a large drop in the network's performance. Our analysis reveals interesting properties of the loss landscape which leads us to propose a complexity measur…
▽ More
We study the phenomenon that some modules of deep neural networks (DNNs) are more critical than others. Meaning that rewinding their parameter values back to initialization, while keeping other modules fixed at the trained parameters, results in a large drop in the network's performance. Our analysis reveals interesting properties of the loss landscape which leads us to propose a complexity measure, called module criticality, based on the shape of the valleys that connects the initial and final values of the module parameters. We formulate how generalization relates to the module criticality, and show that this measure is able to explain the superior generalization performance of some architectures over others, whereas earlier measures fail to do so.
△ Less
Submitted 14 February, 2020; v1 submitted 1 December, 2019;
originally announced December 2019.
-
Langevin Monte Carlo without smoothness
Authors:
Niladri S. Chatterji,
Jelena Diakonikolas,
Michael I. Jordan,
Peter L. Bartlett
Abstract:
Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant. The nonasymptotic dependence of its mixing time on the dimension and target accuracy is understood mainly in the setting of smooth (gradient-Lipschitz) log-densities, a serious limitation for applications in machine learning. In this paper, we remove th…
▽ More
Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant. The nonasymptotic dependence of its mixing time on the dimension and target accuracy is understood mainly in the setting of smooth (gradient-Lipschitz) log-densities, a serious limitation for applications in machine learning. In this paper, we remove this limitation, providing polynomial-time convergence guarantees for a variant of LMC in the setting of nonsmooth log-concave distributions. At a high level, our results follow by leveraging the implicit smoothing of the log-density that comes from a small Gaussian perturbation that we add to the iterates of the algorithm and controlling the bias and variance that are induced by this perturbation.
△ Less
Submitted 24 February, 2020; v1 submitted 30 May, 2019;
originally announced May 2019.
-
OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits
Authors:
Niladri S. Chatterji,
Vidya Muthukumar,
Peter L. Bartlett
Abstract:
We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for the alternate regime. We design a single computationally efficient algorithm that simultaneously obt…
▽ More
We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for the alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.
△ Less
Submitted 5 October, 2020; v1 submitted 24 May, 2019;
originally announced May 2019.
-
Sharp convergence rates for Langevin dynamics in the nonconvex setting
Authors:
Xiang Cheng,
Niladri S. Chatterji,
Yasin Abbasi-Yadkori,
Peter L. Bartlett,
Michael I. Jordan
Abstract:
We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball. We study both overdamped and underdamped Langevin MCMC and establish upper bounds on the number of steps required to obtain a sample from a distribution that is…
▽ More
We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball. We study both overdamped and underdamped Langevin MCMC and establish upper bounds on the number of steps required to obtain a sample from a distribution that is within $ε$ of $p^*$ in $1$-Wasserstein distance. For the first-order method (overdamped Langevin MCMC), the iteration complexity is $\tilde{\mathcal{O}}\left(e^{cLR^2}d/ε^2\right)$, where $d$ is the dimension of the underlying space. For the second-order method (underdamped Langevin MCMC), the iteration complexity is $\tilde{\mathcal{O}}\left(e^{cLR^2}\sqrt{d}/ε\right)$ for an explicit positive constant $c$. Surprisingly, the iteration complexity for both these algorithms is only polynomial in the dimension $d$ and the target accuracy $ε$. It is exponential, however, in the problem parameter $LR^2$, which is a measure of non-log-concavity of the target distribution.
△ Less
Submitted 6 July, 2020; v1 submitted 4 May, 2018;
originally announced May 2018.
-
Online learning with kernel losses
Authors:
Aldo Pacchiano,
Niladri S. Chatterji,
Peter L. Bartlett
Abstract:
We present a generalization of the adversarial linear bandits framework, where the underlying losses are kernel functions (with an associated reproducing kernel Hilbert space) rather than linear functions. We study a version of the exponential weights algorithm and bound its regret in this setting. Under conditions on the eigendecay of the kernel we provide a sharp characterization of the regret f…
▽ More
We present a generalization of the adversarial linear bandits framework, where the underlying losses are kernel functions (with an associated reproducing kernel Hilbert space) rather than linear functions. We study a version of the exponential weights algorithm and bound its regret in this setting. Under conditions on the eigendecay of the kernel we provide a sharp characterization of the regret for this algorithm. When we have polynomial eigendecay $μ_j \le \mathcal{O}(j^{-β})$, we find that the regret is bounded by $\mathcal{R}_n \le \mathcal{O}(n^{β/(2(β-1))})$; while under the assumption of exponential eigendecay $μ_j \le \mathcal{O}(e^{-βj })$, we get an even tighter bound on the regret $\mathcal{R}_n \le \mathcal{O}(n^{1/2}\log(n)^{1/2})$. We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.
△ Less
Submitted 27 February, 2018;
originally announced February 2018.
-
On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo
Authors:
Niladri S. Chatterji,
Nicolas Flammarion,
Yi-An Ma,
Peter L. Bartlett,
Michael I. Jordan
Abstract:
We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion. We analyze these methods under a uniform set of assumptions on the log-posterior distribution, assuming it to be smooth, strongly convex and Hessian Lipschitz. This is achieved by a new proof tech…
▽ More
We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion. We analyze these methods under a uniform set of assumptions on the log-posterior distribution, assuming it to be smooth, strongly convex and Hessian Lipschitz. This is achieved by a new proof technique combining ideas from finite-sum optimization and the analysis of sampling methods. Our sharp theoretical bounds allow us to identify regimes of interest where each method performs better than the others. Our theory is verified with experiments on real-world and synthetic datasets.
△ Less
Submitted 15 February, 2018;
originally announced February 2018.
-
Alternating minimization for dictionary learning: Local Convergence Guarantees
Authors:
Niladri S. Chatterji,
Peter L. Bartlett
Abstract:
We present theoretical guarantees for an alternating minimization algorithm for the dictionary learning/sparse coding problem. The dictionary learning problem is to factorize vector samples $y^{1},y^{2},\ldots, y^{n}$ into an appropriate basis (dictionary) $A^*$ and sparse vectors $x^{1*},\ldots,x^{n*}$. Our algorithm is a simple alternating minimization procedure that switches between $\ell_1$ mi…
▽ More
We present theoretical guarantees for an alternating minimization algorithm for the dictionary learning/sparse coding problem. The dictionary learning problem is to factorize vector samples $y^{1},y^{2},\ldots, y^{n}$ into an appropriate basis (dictionary) $A^*$ and sparse vectors $x^{1*},\ldots,x^{n*}$. Our algorithm is a simple alternating minimization procedure that switches between $\ell_1$ minimization and gradient descent in alternate steps. Dictionary learning and specifically alternating minimization algorithms for dictionary learning are well studied both theoretically and empirically. However, in contrast to previous theoretical analyses for this problem, we replace a condition on the operator norm (that is, the largest magnitude singular value) of the true underlying dictionary $A^*$ with a condition on the matrix infinity norm (that is, the largest magnitude term). Our guarantees are under a reasonable generative model that allows for dictionaries with growing operator norms, and can handle an arbitrary level of overcompleteness, while having sparsity that is information theoretically optimal. We also establish upper bounds on the sample complexity of our algorithm.
△ Less
Submitted 30 July, 2019; v1 submitted 9 November, 2017;
originally announced November 2017.
-
Underdamped Langevin MCMC: A non-asymptotic analysis
Authors:
Xiang Cheng,
Niladri S. Chatterji,
Peter L. Bartlett,
Michael I. Jordan
Abstract:
We study the underdamped Langevin diffusion when the log of the target distribution is smooth and strongly concave. We present a MCMC algorithm based on its discretization and show that it achieves $\varepsilon$ error (in 2-Wasserstein distance) in $\mathcal{O}(\sqrt{d}/\varepsilon)$ steps. This is a significant improvement over the best known rate for overdamped Langevin MCMC, which is…
▽ More
We study the underdamped Langevin diffusion when the log of the target distribution is smooth and strongly concave. We present a MCMC algorithm based on its discretization and show that it achieves $\varepsilon$ error (in 2-Wasserstein distance) in $\mathcal{O}(\sqrt{d}/\varepsilon)$ steps. This is a significant improvement over the best known rate for overdamped Langevin MCMC, which is $\mathcal{O}(d/\varepsilon^2)$ steps under the same smoothness/concavity assumptions.
The underdamped Langevin MCMC scheme can be viewed as a version of Hamiltonian Monte Carlo (HMC) which has been observed to outperform overdamped Langevin MCMC methods in a number of application areas. We provide quantitative rates that support this empirical wisdom.
△ Less
Submitted 26 January, 2018; v1 submitted 12 July, 2017;
originally announced July 2017.
-
Optimization of various isolation techniques to develop low noise, radiation hard double-sided silicon strip detectors for the CBM Silicon Tracking System
Authors:
S. Chatterji,
M. Singla,
W. F. J. Mueller,
J. M. Heuser
Abstract:
This paper reports on the design optimization done for Double Sided silicon microStrip Detectors(DSSDs) to reduce the Equivalent Noise Charge (ENC) and to maximize the breakdown voltage and Charge Collection Efficiency. Various isolation techniques have been explored and a detailed comparison has been studied to optimize the detector performance. For the evaluation of the performance of the silico…
▽ More
This paper reports on the design optimization done for Double Sided silicon microStrip Detectors(DSSDs) to reduce the Equivalent Noise Charge (ENC) and to maximize the breakdown voltage and Charge Collection Efficiency. Various isolation techniques have been explored and a detailed comparison has been studied to optimize the detector performance. For the evaluation of the performance of the silicon detectors, a radiation damage model has been included. The neutron fluence is expected to be 2x10^{13}n_{eq} cm$^{-2}$ per year for five years of expected CBM run with intermediate periods of warm maintenance, cold maintenance and shutdown. Transient simulations have been performed to estimate the charge collection performance of the irradiated detectors and simulations have been verified with experimental data.
△ Less
Submitted 27 November, 2012;
originally announced November 2012.
-
Curve Reconstruction in Riemannian Manifolds: Ordering Motion Frames
Authors:
Pratik Shah,
Samaresh Chatterji
Abstract:
In this article we extend the computational geometric curve reconstruction approach to curves in Riemannian manifolds. We prove that the minimal spanning tree, given a sufficiently dense sample, correctly reconstructs the smooth arcs and further closed and simple curves in Riemannian manifolds. The proof is based on the behaviour of the curve segment inside the tubular neighbourhood of the curve.…
▽ More
In this article we extend the computational geometric curve reconstruction approach to curves in Riemannian manifolds. We prove that the minimal spanning tree, given a sufficiently dense sample, correctly reconstructs the smooth arcs and further closed and simple curves in Riemannian manifolds. The proof is based on the behaviour of the curve segment inside the tubular neighbourhood of the curve. To take care of the local topological changes of the manifold, the tubular neighbourhood is constructed in consideration with the injectivity radius of the underlying Riemannian manifold. We also present examples of successfully reconstructed curves and show an applications of curve reconstruction to ordering motion frames.
△ Less
Submitted 20 December, 2010; v1 submitted 15 December, 2010;
originally announced December 2010.
-
A search for gravitational waves associated with the August 2006 timing glitch of the Vela pulsar
Authors:
The LIGO Scientific Collaboration,
J. Abadie,
B. P. Abbott,
R. Abbott,
R. Adhikari,
P. Ajith,
B. Allen,
G. Allen,
E. Amador Ceron,
R. S. Amin,
S. B. Anderson,
W. G. Anderson,
M. A. Arain,
M. Araya,
Y. Aso,
S. Aston,
P. Aufmuth,
C. Aulbert,
S. Babak,
P. Baker,
S. Ballmer,
D. Barker,
B. Barr,
P. Barriga,
L. Barsotti
, et al. (477 additional authors not shown)
Abstract:
The physical mechanisms responsible for pulsar timing glitches are thought to excite quasi-normal mode oscillations in their parent neutron star that couple to gravitational wave emission. In August 2006, a timing glitch was observed in the radio emission of PSR B0833-45, the Vela pulsar. At the time of the glitch, the two co-located Hanford gravitational wave detectors of the Laser Interferometer…
▽ More
The physical mechanisms responsible for pulsar timing glitches are thought to excite quasi-normal mode oscillations in their parent neutron star that couple to gravitational wave emission. In August 2006, a timing glitch was observed in the radio emission of PSR B0833-45, the Vela pulsar. At the time of the glitch, the two co-located Hanford gravitational wave detectors of the Laser Interferometer Gravitational-wave observatory (LIGO) were operational and taking data as part of the fifth LIGO science run (S5). We present the first direct search for the gravitational wave emission associated with oscillations of the fundamental quadrupole mode excited by a pulsar timing glitch. No gravitational wave detection candidate was found. We place Bayesian 90% confidence upper limits of 6.3e-21 to 1.4e-20 on the peak intrinsic strain amplitude of gravitational wave ring-down signals, depending on which spherical harmonic mode is excited. The corresponding range of energy upper limits is 5.0e44 to 1.3e45 erg.
△ Less
Submitted 23 November, 2010; v1 submitted 5 November, 2010;
originally announced November 2010.
-
An Algebraic Approach for Computing Equilibria of a Subclass of Finite Normal Form Games
Authors:
Samaresh Chatterji,
Ratnik Gandhi
Abstract:
A Nash equilibrium has become important solution concept for analyzing the decision making in Game theory. In this paper, we consider the problem of computing Nash equilibria of a subclass of generic finite normal form games. We define "rational payoff irrational equilibria games" to be the games with all rational payoffs and all irrational equilibria. We present a purely algebraic method for comp…
▽ More
A Nash equilibrium has become important solution concept for analyzing the decision making in Game theory. In this paper, we consider the problem of computing Nash equilibria of a subclass of generic finite normal form games. We define "rational payoff irrational equilibria games" to be the games with all rational payoffs and all irrational equilibria. We present a purely algebraic method for computing all Nash equilibria of these games that uses knowledge of Galois groups. Some results, showing properties of the class of games, and an example to show working of the method concludes the paper.
△ Less
Submitted 30 May, 2010;
originally announced May 2010.
-
Methods for Reducing False Alarms in Searches for Compact Binary Coalescences in LIGO Data
Authors:
J. Slutsky,
L. Blackburn,
D. A. Brown,
L. Cadonati,
J. Cain,
M. Cavaglià,
S. Chatterji,
N. Christensen,
M. Coughlin,
S. Desai,
G. González,
T. Isogai,
E. Katsavounidis,
B. Rankins,
T. Reed,
K. Riles,
P. Shawhan,
J. R. Smith,
N. Zotov,
J. Zweizig
Abstract:
The LIGO detectors are sensitive to a variety of noise transients of non-astrophysical origin. Instrumental glitches and environmental disturbances increase the false alarm rate in the searches for gravitational waves. Using times already identified when the interferometers produced data of questionable quality, or when the channels that monitor the interferometer indicated non-stationarity, we…
▽ More
The LIGO detectors are sensitive to a variety of noise transients of non-astrophysical origin. Instrumental glitches and environmental disturbances increase the false alarm rate in the searches for gravitational waves. Using times already identified when the interferometers produced data of questionable quality, or when the channels that monitor the interferometer indicated non-stationarity, we have developed techniques to safely and effectively veto false triggers from the compact binary coalescences (CBCs) search pipeline.
△ Less
Submitted 8 April, 2010; v1 submitted 6 April, 2010;
originally announced April 2010.
-
Some Algebraic Properties of a Subclass of Finite Normal Form Games
Authors:
Samaresh Chatterji,
Ratnik Gandhi
Abstract:
We study the problem of computing all Nash equilibria of a subclass of finite normal form games. With algebraic characterization of the games, we present a method for computing all its Nash equilibria. Further, we present a method for deciding membership to the class of games with its related results. An appendix, containing an example to show working of each of the presented methods, concludes…
▽ More
We study the problem of computing all Nash equilibria of a subclass of finite normal form games. With algebraic characterization of the games, we present a method for computing all its Nash equilibria. Further, we present a method for deciding membership to the class of games with its related results. An appendix, containing an example to show working of each of the presented methods, concludes the work.
△ Less
Submitted 27 January, 2010;
originally announced January 2010.
-
Search for gravitational-wave inspiral signals associated with short Gamma-Ray Bursts during LIGO's fifth and Virgo's first science run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
J. Abadie,
B. P. Abbott,
R. Abbott,
T. Accadia,
F. Acernese,
R. Adhikari,
P. Ajith,
B. Allen,
G. Allen,
E. Amador Ceron,
R. S. Amin,
S. B. Anderson,
W. G. Anderson,
F. Antonucci,
S. Aoudia,
M. A. Arain,
M. Araya,
K. G. Arun,
Y. Aso,
S. Aston,
P. Astone,
P. Aufmuth,
C. Aulbert
, et al. (643 additional authors not shown)
Abstract:
Progenitor scenarios for short gamma-ray bursts (short GRBs) include coalescenses of two neutron stars or a neutron star and black hole, which would necessarily be accompanied by the emission of strong gravitational waves. We present a search for these known gravitational-wave signatures in temporal and directional coincidence with 22 GRBs that had sufficient gravitational-wave data available in…
▽ More
Progenitor scenarios for short gamma-ray bursts (short GRBs) include coalescenses of two neutron stars or a neutron star and black hole, which would necessarily be accompanied by the emission of strong gravitational waves. We present a search for these known gravitational-wave signatures in temporal and directional coincidence with 22 GRBs that had sufficient gravitational-wave data available in multiple instruments during LIGO's fifth science run, S5, and Virgo's first science run, VSR1. We find no statistically significant gravitational-wave candidates within a [-5, +1) s window around the trigger time of any GRB. Using the Wilcoxon-Mann-Whitney U test, we find no evidence for an excess of weak gravitational-wave signals in our sample of GRBs. We exclude neutron star-black hole progenitors to a median 90% CL exclusion distance of 6.7 Mpc.
△ Less
Submitted 3 March, 2010; v1 submitted 4 January, 2010;
originally announced January 2010.
-
Searches for gravitational waves from known pulsars with S5 LIGO data
Authors:
The LIGO Scientific Collaboration,
The Virgo Collaboration,
B. P. Abbott,
R. Abbott,
F. Acernese,
R. Adhikari,
P. Ajith,
B. Allen,
G. Allen,
M. Alshourbagy,
R. S. Amin,
S. B. Anderson,
W. G. Anderson,
F. Antonucci,
S. Aoudia,
M. A. Arain,
M. Araya,
H. Armandula,
P. Armor,
K. G. Arun,
Y. Aso,
S. Aston,
P. Astone,
P. Aufmuth,
C. Aulbert
, et al. (656 additional authors not shown)
Abstract:
We present a search for gravitational waves from 116 known millisecond and young pulsars using data from the fifth science run of the LIGO detectors. For this search ephemerides overlapping the run period were obtained for all pulsars using radio and X-ray observations. We demonstrate an updated search method that allows for small uncertainties in the pulsar phase parameters to be included in th…
▽ More
We present a search for gravitational waves from 116 known millisecond and young pulsars using data from the fifth science run of the LIGO detectors. For this search ephemerides overlapping the run period were obtained for all pulsars using radio and X-ray observations. We demonstrate an updated search method that allows for small uncertainties in the pulsar phase parameters to be included in the search. We report no signal detection from any of the targets and therefore interpret our results as upper limits on the gravitational wave signal strength. The most interesting limits are those for young pulsars. We present updated limits on gravitational radiation from the Crab pulsar, where the measured limit is now a factor of seven below the spin-down limit. This limits the power radiated via gravitational waves to be less than ~2% of the available spin-down power. For the X-ray pulsar J0537-6910 we reach the spin-down limit under the assumption that any gravitational wave signal from it stays phase locked to the X-ray pulses over timing glitches, and for pulsars J1913+1011 and J1952+3252 we are only a factor of a few above the spin-down limit. Of the recycled millisecond pulsars several of the measured upper limits are only about an order of magnitude above their spin-down limits. For these our best (lowest) upper limit on gravitational wave amplitude is 2.3x10^-26 for J1603-7202 and our best (lowest) limit on the inferred pulsar ellipticity is 7.0x10^-8 for J2124-3358.
△ Less
Submitted 26 February, 2010; v1 submitted 19 September, 2009;
originally announced September 2009.
-
Search for gravitational-wave bursts associated with gamma-ray bursts using data from LIGO Science Run 5 and Virgo Science Run 1
Authors:
LIGO Scientific Collaboration,
Virgo Collaboration,
B. P. Abbott,
R. Abbott,
F. Acernese,
R. Adhikari,
P. Ajith,
B. Allen,
G. Allen,
M. Alshourbagy,
R. S. Amin,
S. B. Anderson,
W. G. Anderson,
F. Antonucci,
S. Aoudia,
M. A. Arain,
M. Araya,
H. Armandula,
P. Armor,
K. G. Arun,
Y. Aso,
S. Aston,
P. Astone,
P. Aufmuth,
C. Aulbert
, et al. (643 additional authors not shown)
Abstract:
We present the results of a search for gravitational-wave bursts associated with 137 gamma-ray bursts (GRBs) that were detected by satellite-based gamma-ray experiments during the fifth LIGO science run and first Virgo science run. The data used in this analysis were collected from 2005 November 4 to 2007 October 1, and most of the GRB triggers were from the Swift satellite. The search uses a co…
▽ More
We present the results of a search for gravitational-wave bursts associated with 137 gamma-ray bursts (GRBs) that were detected by satellite-based gamma-ray experiments during the fifth LIGO science run and first Virgo science run. The data used in this analysis were collected from 2005 November 4 to 2007 October 1, and most of the GRB triggers were from the Swift satellite. The search uses a coherent network analysis method that takes into account the different locations and orientations of the interferometers at the three LIGO-Virgo sites. We find no evidence for gravitational-wave burst signals associated with this sample of GRBs. Using simulated short-duration (<1 s) waveforms, we set upper limits on the amplitude of gravitational waves associated with each GRB. We also place lower bounds on the distance to each GRB under the assumption of a fixed energy emission in gravitational waves, with typical limits of D ~ 15 Mpc (E_GW^iso / 0.01 M_o c^2)^1/2 for emission at frequencies around 150 Hz, where the LIGO-Virgo detector network has best sensitivity. We present astrophysical interpretations and implications of these results, and prospects for corresponding searches during future LIGO-Virgo runs.
△ Less
Submitted 7 April, 2010; v1 submitted 26 August, 2009;
originally announced August 2009.
-
X-Pipeline: An analysis package for autonomous gravitational-wave burst searches
Authors:
Patrick J. Sutton,
Gareth Jones,
Shourov Chatterji,
Peter Michael Kalmus,
Isabel Leonor,
Stephen Poprocki,
Jameson Rollins,
Antony Searle,
Leo Stein,
Massimo Tinto,
Michal Was
Abstract:
Autonomous gravitational-wave searches -- fully automated analyses of data that run without human intervention or assistance -- are desirable for a number of reasons. They are necessary for the rapid identification of gravitational-wave burst candidates, which in turn will allow for follow-up observations by other observatories and the maximum exploitation of their scientific potential. A fully…
▽ More
Autonomous gravitational-wave searches -- fully automated analyses of data that run without human intervention or assistance -- are desirable for a number of reasons. They are necessary for the rapid identification of gravitational-wave burst candidates, which in turn will allow for follow-up observations by other observatories and the maximum exploitation of their scientific potential. A fully automated analysis would also circumvent the traditional "by hand" setup and tuning of burst searches that is both labourious and time consuming. We demonstrate a fully automated search with X-Pipeline, a software package for the coherent analysis of data from networks of interferometers for detecting bursts associated with GRBs and other astrophysical triggers. We discuss the methods X-Pipeline uses for automated running, including background estimation, efficiency studies, unbiased optimal tuning of search thresholds, and prediction of upper limits. These are all done automatically via Monte Carlo with multiple independent data samples, and without requiring human intervention. As a demonstration of the power of this approach, we apply X-Pipeline to LIGO data to search for gravitational-wave emission associated with GRB 031108. We find that X-Pipeline is sensitive to signals approximately a factor of 2 weaker in amplitude than those detectable by the cross-correlation technique used in LIGO searches to date. We conclude with the prospects for running X-Pipeline as a fully autonomous, near real-time triggered burst search in the next LSC-Virgo Science Run.
△ Less
Submitted 7 April, 2010; v1 submitted 25 August, 2009;
originally announced August 2009.
-
Un-modeled search for black hole binary systems in the NINJA project
Authors:
Laura Cadonati,
Shourov Chatterji,
Sebastian Fischetti,
Gianluca Guidi,
Satyanarayan R. P. Mohapatra,
Riccardo Sturani,
Andrea Viceré
Abstract:
The gravitational wave signature from binary black hole coalescences is an important target for LIGO and VIRGO. The Numerical INJection Analysis (NINJA) project brought together the numerical relativity and gravitational wave data analysis communities, with the goal to optimize the detectability of these events. In its first instantiation, the NINJA project produced a simulated data set with numer…
▽ More
The gravitational wave signature from binary black hole coalescences is an important target for LIGO and VIRGO. The Numerical INJection Analysis (NINJA) project brought together the numerical relativity and gravitational wave data analysis communities, with the goal to optimize the detectability of these events. In its first instantiation, the NINJA project produced a simulated data set with numerical waveforms from binary black hole coalescences of various morphologies (spin, mass ratio, initial conditions), superimposed to Gaussian colored noise at the design sensitivity for initial LIGO and VIRGO. We analyzed this simulated data set with the Q-pipeline burst algorithm. This code, designed for the all-sky detection of gravitational wave bursts with minimal assumptions on the shape of the waveform, filters the data with a bank of sine-Gaussians, or sinusoids with Gaussian envelope. The algorithm's performance was compared to matched filtering with ring-down templates. The results are qualitatively consistent; however due to the low simulation statistics in the first NINJA project, it is premature to draw quantitative conclusions at this stage.
△ Less
Submitted 24 October, 2010; v1 submitted 12 June, 2009;
originally announced June 2009.
-
Testing gravitational-wave searches with numerical relativity waveforms: Results from the first Numerical INJection Analysis (NINJA) project
Authors:
Benjamin Aylott,
John G. Baker,
William D. Boggs,
Michael Boyle,
Patrick R. Brady,
Duncan A. Brown,
Bernd Brügmann,
Luisa T. Buchman,
Alessandra Buonanno,
Laura Cadonati,
Jordan Camp,
Manuela Campanelli,
Joan Centrella,
Shourov Chatterji,
Nelson Christensen,
Tony Chu,
Peter Diener,
Nils Dorband,
Zachariah B. Etienne,
Joshua Faber,
Stephen Fairhurst,
Benjamin Farr,
Sebastian Fischetti,
Gianluca Guidi,
Lisa M. Goggin
, et al. (52 additional authors not shown)
Abstract:
The Numerical INJection Analysis (NINJA) project is a collaborative effort between members of the numerical relativity and gravitational-wave data analysis communities. The purpose of NINJA is to study the sensitivity of existing gravitational-wave search algorithms using numerically generated waveforms and to foster closer collaboration between the numerical relativity and data analysis communi…
▽ More
The Numerical INJection Analysis (NINJA) project is a collaborative effort between members of the numerical relativity and gravitational-wave data analysis communities. The purpose of NINJA is to study the sensitivity of existing gravitational-wave search algorithms using numerically generated waveforms and to foster closer collaboration between the numerical relativity and data analysis communities. We describe the results of the first NINJA analysis which focused on gravitational waveforms from binary black hole coalescence. Ten numerical relativity groups contributed numerical data which were used to generate a set of gravitational-wave signals. These signals were injected into a simulated data set, designed to mimic the response of the Initial LIGO and Virgo gravitational-wave detectors. Nine groups analysed this data using search and parameter-estimation pipelines. Matched filter algorithms, un-modelled-burst searches and Bayesian parameter-estimation and model-selection algorithms were applied to the data. We report the efficiency of these search methods in detecting the numerical waveforms and measuring their parameters. We describe preliminary comparisons between the different search methods and suggest improvements for future NINJA analyses.
△ Less
Submitted 9 July, 2009; v1 submitted 28 January, 2009;
originally announced January 2009.
-
Enhancing the capabilities of LIGO time-frequency plane searches through clustering
Authors:
Rubab Khan,
Shourov Chatterji
Abstract:
One class of gravitational wave signals LIGO is searching for consists of short duration bursts of unknown waveforms. Potential sources include core collapse supernovae, gamma ray burst progenitors, and mergers of binary black holes or neutron stars. We present a density-based clustering algorithm to improve the performance of time-frequency searches for such gravitational-wave bursts when they…
▽ More
One class of gravitational wave signals LIGO is searching for consists of short duration bursts of unknown waveforms. Potential sources include core collapse supernovae, gamma ray burst progenitors, and mergers of binary black holes or neutron stars. We present a density-based clustering algorithm to improve the performance of time-frequency searches for such gravitational-wave bursts when they are extended in time and/or frequency, and not sufficiently well known to permit matched filtering. We have implemented this algorithm as an extension to the QPipeline, a gravitational-wave data analysis pipeline for the detection of bursts, which currently determines the statistical significance of events based solely on the peak significance observed in minimum uncertainty regions of the time-frequency plane. Density based clustering improves the performance of such a search by considering the aggregate significance of arbitrarily shaped regions in the time-frequency plane and rejecting the isolated minimum uncertainty features expected from the background detector noise. In this paper, we present test results for simulated signals and demonstrate that density based clustering improves the performance of the QPipeline for signals extended in time and/or frequency.
△ Less
Submitted 18 June, 2009; v1 submitted 23 January, 2009;
originally announced January 2009.
-
Beating the spin-down limit on gravitational wave emission from the Crab pulsar
Authors:
The LIGO Scientific Collaboration,
B. Abbott,
R. Abbott,
R. Adhikari,
P. Ajith,
B. Allen,
G. Allen,
R. Amin,
S. B. Anderson,
W. G. Anderson,
M. A. Arain,
M. Araya,
H. Armandula,
P. Armor,
Y. Aso,
S. Aston,
P. Aufmuth,
C. Aulbert,
S. Babak,
S. Ballmer,
H. Bantilan,
B. C. Barish,
C. Barker,
D. Barker,
B. Barr
, et al. (419 additional authors not shown)
Abstract:
We present direct upper limits on gravitational wave emission from the Crab pulsar using data from the first nine months of the fifth science run of the Laser Interferometer Gravitational-wave Observatory (LIGO). These limits are based on two searches. In the first we assume that the gravitational wave emission follows the observed radio timing, giving an upper limit on gravitational wave emissi…
▽ More
We present direct upper limits on gravitational wave emission from the Crab pulsar using data from the first nine months of the fifth science run of the Laser Interferometer Gravitational-wave Observatory (LIGO). These limits are based on two searches. In the first we assume that the gravitational wave emission follows the observed radio timing, giving an upper limit on gravitational wave emission that beats indirect limits inferred from the spin-down and braking index of the pulsar and the energetics of the nebula. In the second we allow for a small mismatch between the gravitational and radio signal frequencies and interpret our results in the context of two possible gravitational wave emission mechanisms.
△ Less
Submitted 22 July, 2008; v1 submitted 30 May, 2008;
originally announced May 2008.
-
The LSC Glitch Group : Monitoring Noise Transients during the fifth LIGO Science Run
Authors:
L. Blackburn,
L. Cadonati,
S. Caride,
S. Caudill,
S. Chatterji,
N. Christensen,
J. Dalrymple,
S. Desai,
A. Di Credico,
G. Ely,
J. Garofoli,
L. Goggin,
G. González,
R. Gouaty,
C. Gray,
A. Gretarsson,
D. Hoak,
T. Isogai,
E. Katsavounidis,
J. Kissel,
S. Klimenko,
R. A. Mercer,
S. Mohapatra,
S. Mukherjee,
F. Raab
, et al. (11 additional authors not shown)
Abstract:
The LIGO Scientific Collaboration (LSC) glitch group is part of the LIGO detector characterization effort. It consists of data analysts and detector experts who, during and after science runs, collaborate for a better understanding of noise transients in the detectors. Goals of the glitch group during the fifth LIGO science run (S5) included (1) offline assessment of the detector data quality, w…
▽ More
The LIGO Scientific Collaboration (LSC) glitch group is part of the LIGO detector characterization effort. It consists of data analysts and detector experts who, during and after science runs, collaborate for a better understanding of noise transients in the detectors. Goals of the glitch group during the fifth LIGO science run (S5) included (1) offline assessment of the detector data quality, with focus on noise transients, (2) veto recommendations for astrophysical analysis and (3) feedback to the commissioning team on anomalies seen in gravitational wave and auxiliary data channels. Other activities included the study of auto-correlation of triggers from burst searches, stationarity of the detector noise and veto studies. The group identified causes for several noise transients that triggered false alarms in the gravitational wave searches; the times of such transients were identified and vetoed from the data generating the LSC astrophysical results.
△ Less
Submitted 14 July, 2008; v1 submitted 4 April, 2008;
originally announced April 2008.
-
CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads
Authors:
Sourav Chatterji,
Ichitaro Yamazaki,
Zhaojun Bai,
Jonathan Eisen
Abstract:
A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly…
▽ More
A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly facilitated by associating every sequence read with its source organism. We report the development of CompostBin, a DNA composition-based algorithm for analyzing metagenomic sequence reads and distributing them into taxon-specific bins. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. It applies principal component analysis to project the data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We demonstrate the algorithm's accuracy on a variety of simulated data sets and on one metagenomic data set with known species assignments. CompostBin is a work in progress, with several refinements of the algorithm planned for the future.
△ Less
Submitted 22 August, 2007;
originally announced August 2007.
-
Study of Direct Photon plus Jet production in CMS Experiment at \sqrt{s}=14 TeV
Authors:
Pooja Gupta,
B. C. Choudhary,
S. Chatterji,
S. Bhattacharya
Abstract:
We present simulation results of $γ$ + Jet analysis using CMS (Compact Muon Solenoid) Object-Oriented software at the Large Hadron Collider (LHC) center of mass energy $\sqrt{s}$=14 TeV. The study of direct photon production helps in validating the perturbative Quantum Chromodynamics (pQCD) and providing information on the gluon distribution in the nucleons. Direct photon processes also constitu…
▽ More
We present simulation results of $γ$ + Jet analysis using CMS (Compact Muon Solenoid) Object-Oriented software at the Large Hadron Collider (LHC) center of mass energy $\sqrt{s}$=14 TeV. The study of direct photon production helps in validating the perturbative Quantum Chromodynamics (pQCD) and providing information on the gluon distribution in the nucleons. Direct photon processes also constitute a major background to several other Standard Model (SM) processes and signals of new physics. Thus these processes need to be understood precisely in the new energy regime. In this work, we have done a detailed study of the GEANT4 simulated $γ$ + jet events generated with Pythia, and the related background processes. Isolation cuts have been optimized for direct photon which improves the signal over background ratio by $\sim25%$ as compared to previous studies done in CMS. The inclusion of a large $Δφ$ cut between the photon and the leading jet at $40^0$ in the analysis leads to a further increase of $\sim15%$ in S/B, thus giving an overall gain of $\sim42%$ in S/B ratio.
△ Less
Submitted 24 October, 2007; v1 submitted 18 May, 2007;
originally announced May 2007.
-
Detailed comparison of LIGO and Virgo Inspiral Pipelines in Preparation for a Joint Search
Authors:
F. Beauville,
M. -A. Bizouard,
L. Blackburn,
L. Bosi,
L. Brocco,
D. Brown,
D. Buskulic,
F. Cavalier,
S. Chatterji,
N. Christensen,
A. -C. Clapson,
S. Fairhurst,
D. Grosjean,
G. Guidi,
P. Hello,
S. Heng,
M. Hewitson,
E. Katsavounidis,
S. Klimenko,
M. Knight,
A. Lazzarini,
N. Leroy,
F. Marion,
J. Markowitz,
C. Melachrinos
, et al. (5 additional authors not shown)
Abstract:
Presented in this paper is a detailed and direct comparison of the LIGO and Virgo binary neutron star detection pipelines. In order to test the search programs, numerous inspiral signals were added to 24 hours of simulated detector data. The efficiencies of the different pipelines were tested, and found to be comparable. Parameter estimation routines were also tested. We demonstrate that there a…
▽ More
Presented in this paper is a detailed and direct comparison of the LIGO and Virgo binary neutron star detection pipelines. In order to test the search programs, numerous inspiral signals were added to 24 hours of simulated detector data. The efficiencies of the different pipelines were tested, and found to be comparable. Parameter estimation routines were also tested. We demonstrate that there are definite benefits to be had if LIGO and Virgo conduct a joint coincident analysis; these advantages include increased detection efficiency and the providing of source sky location information.
△ Less
Submitted 3 January, 2007;
originally announced January 2007.
-
A comparison of methods for gravitational wave burst searches from LIGO and Virgo
Authors:
F. Beauville,
M. -A. Bizouard,
L. Blackburn,
L. Bosi,
L. Brocco,
D. Brown,
D. Buskulic,
F. Cavalier,
S. Chatterji,
N. Christensen,
A. -C. Clapson,
S. Fairhurst,
D. Grosjean,
G. Guidi,
P. Hello,
S. Heng,
M. Hewitson,
E. Katsavounidis,
S. Klimenko,
M. Knight,
A. Lazzarini,
N. Leroy,
F. Marion,
J. Markowitz,
C. Melachrinos
, et al. (5 additional authors not shown)
Abstract:
The search procedure for burst gravitational waves has been studied using 24 hours of simulated data in a network of three interferometers (Hanford 4-km, Livingston 4-km and Virgo 3-km are the example interferometers). Several methods to detect burst events developed in the LIGO Scientific Collaboration (LSC) and Virgo collaboration have been studied and compared. We have performed coincidence a…
▽ More
The search procedure for burst gravitational waves has been studied using 24 hours of simulated data in a network of three interferometers (Hanford 4-km, Livingston 4-km and Virgo 3-km are the example interferometers). Several methods to detect burst events developed in the LIGO Scientific Collaboration (LSC) and Virgo collaboration have been studied and compared. We have performed coincidence analysis of the triggers obtained in the different interferometers with and without simulated signals added to the data. The benefits of having multiple interferometers of similar sensitivity are demonstrated by comparing the detection performance of the joint coincidence analysis with LSC and Virgo only burst searches. Adding Virgo to the LIGO detector network can increase by 50% the detection efficiency for this search. Another advantage of a joint LIGO-Virgo network is the ability to reconstruct the source sky position. The reconstruction accuracy depends on the timing measurement accuracy of the events in each interferometer, and is displayed in this paper with a fixed source position example.
△ Less
Submitted 3 January, 2007;
originally announced January 2007.
-
Coherent network analysis technique for discriminating gravitational-wave bursts from instrumental noise
Authors:
Shourov Chatterji,
Albert Lazzarini,
Leo Stein,
Patrick Sutton,
Antony Searle,
Massimo Tinto
Abstract:
Existing coherent network analysis techniques for detecting gravitational-wave bursts simultaneously test data from multiple observatories for consistency with the expected properties of the signals. These techniques assume the output of the detector network to be the sum of a stationary Gaussian noise process and a gravitational-wave signal, and they may fail in the presence of transient non-st…
▽ More
Existing coherent network analysis techniques for detecting gravitational-wave bursts simultaneously test data from multiple observatories for consistency with the expected properties of the signals. These techniques assume the output of the detector network to be the sum of a stationary Gaussian noise process and a gravitational-wave signal, and they may fail in the presence of transient non-stationarities, which are common in real detectors. In order to address this problem we introduce a consistency test that is robust against noise non-stationarities and allows one to distinguish between gravitational-wave bursts and noise transients. This technique does not require any a priori knowledge of the putative burst waveform.
△ Less
Submitted 1 May, 2006; v1 submitted 28 April, 2006;
originally announced May 2006.
-
Benefits of joint LIGO -- Virgo coincidence searches for burst and inspiral signals
Authors:
F. Beauville,
M. -A. Bizouard,
L. Blackburn,
L. Bosi,
P. Brady,
L. Brocco,
D. Brown,
D. Buskulic,
F. Cavalier,
S. Chatterji,
N. Christensen,
A. -C. Clapson,
S. Fairhurst,
D. Grosjean,
G. Guidi,
P. Hello,
E. Katsavounidis,
M. Knight,
A. Lazzarini,
N. Leroy,
F. Marion,
B. Mours,
F. Ricci,
A. Vicere,
M. Zanolin
Abstract:
We examine the benefits of performing a joint LIGO--Virgo search for transient signals. We do this by adding burst and inspiral signals to 24 hours of simulated detector data. We find significant advantages to performing a joint coincidence analysis, above either a LIGO only or Virgo only search. These include an increased detection efficiency, at a fixed false alarm rate, to both burst and insp…
▽ More
We examine the benefits of performing a joint LIGO--Virgo search for transient signals. We do this by adding burst and inspiral signals to 24 hours of simulated detector data. We find significant advantages to performing a joint coincidence analysis, above either a LIGO only or Virgo only search. These include an increased detection efficiency, at a fixed false alarm rate, to both burst and inspiral events and an ability to reconstruct the sky location of a signal.
△ Less
Submitted 12 September, 2005;
originally announced September 2005.