Skip to main content

Showing 1–12 of 12 results for author: Paul, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.04166  [pdf, ps, other

    cs.LG stat.CO stat.ML

    N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion

    Authors: Caleb Chin, Aashish Khubchandani, Harshvardhan Maskara, Kyuseong Choi, Jacob Feitelberg, Albert Gong, Manit Paul, Tathagata Sadhukhan, Anish Agarwal, Raaz Dwivedi

    Abstract: Nearest neighbor (NN) methods have re-emerged as competitive tools for matrix completion, offering strong empirical performance and recent theoretical guarantees, including entry-wise error bounds, confidence intervals, and minimax optimality. Despite their simplicity, recent work has shown that NN approaches are robust to a range of missingness patterns and effective across diverse applications.… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 21 pages, 6 figures

  2. arXiv:2505.09612  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Adaptively-weighted Nearest Neighbors for Matrix Completion

    Authors: Tathagata Sadhukhan, Manit Paul, Raaz Dwivedi

    Abstract: In this technical note, we introduce and analyze AWNN: an adaptively weighted nearest neighbor method for performing matrix completion. Nearest neighbor (NN) methods are widely used in missing data problems across multiple disciplines such as in recommender systems and for performing counterfactual inference in panel data settings. Prior works have shown that in addition to being very intuitive an… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 25 pages, 6 figures

  3. arXiv:2503.23711  [pdf, other

    math.ST stat.ME

    Finite sample valid confidence sets of mode

    Authors: Manit Paul, Arun Kumar Kuchibhotla

    Abstract: Estimating the mode of a unimodal distribution is a classical problem in statistics. Although there are several approaches for point-estimation of mode in the literature, very little has been explored about the interval-estimation of mode. Our work proposes a collection of novel methods of obtaining finite sample valid confidence set of the mode of a unimodal distribution. We analyze the behaviour… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: 38 pages, 1 figure

  4. arXiv:2411.12965  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    On adaptivity and minimax optimality of two-sided nearest neighbors

    Authors: Tathagata Sadhukhan, Manit Paul, Raaz Dwivedi

    Abstract: Nearest neighbor (NN) algorithms have been extensively used for missing data problems in recommender systems and sequential decision-making systems. Prior theoretical analysis has established favorable guarantees for NN when the underlying data is sufficiently smooth and the missingness probabilities are lower bounded. Here we analyze NN with non-smooth non-linear functions with vast amounts of mi… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: 29 pages, 7 figures

  5. Data-driven Crop Growth Simulation on Time-varying Generated Images using Multi-conditional Generative Adversarial Networks

    Authors: Lukas Drees, Dereje T. Demie, Madhuri R. Paul, Johannes Leonhardt, Sabine J. Seidel, Thomas F. Döring, Ribana Roscher

    Abstract: Image-based crop growth modeling can substantially contribute to precision agriculture by revealing spatial crop development over time, which allows an early and location-specific estimation of relevant future plant traits, such as leaf area or biomass. A prerequisite for realistic and sharp crop image generation is the integration of multiple growth-influencing conditions in a model, such as an i… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 26 pages, 16 figures, code available at https://github.com/luked12/crop-growth-cgan

  6. arXiv:2210.07744  [pdf, ps, other

    stat.AP stat.ME

    Effect of influence in voter models and its application in detecting significant interference in political elections

    Authors: Manit Paul, Rishideep Roy, Soudeep Deb

    Abstract: In this article, we study the effect of vector-valued interventions in votes under a binary voter model, where each voter expresses their vote as a $0-1$ valued random variable to choose between two candidates. We assume that the outcome is determined by the majority function, which is true for a democratic system. The term intervention includes cases of counting errors, reporting irregularities,… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: 23 pages, 4 figures

  7. arXiv:2210.03044  [pdf, other

    cs.LG cs.AI stat.ML

    Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

    Authors: Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite

    Abstract: Modern deep learning involves training costly, highly overparameterized networks, thus motivating the search for sparser networks that can still be trained to the same accuracy as the full network (i.e. matching). Iterative magnitude pruning (IMP) is a state of the art algorithm that can find such highly sparse matching subnetworks, known as winning tickets. IMP operates by iterative cycles of tra… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: The first three authors contributed equally

  8. arXiv:2206.01278  [pdf, other

    cs.LG cs.AI stat.ML

    Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

    Authors: Mansheej Paul, Brett W. Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite

    Abstract: A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network. However, the same does not hold at step 0, i.e. random initialization. In this work, we seek to understand how this ear… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: The first two authors contributed equally

  9. arXiv:2010.15110  [pdf, other

    cs.LG stat.ML

    Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

    Authors: Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli

    Abstract: In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    Comments: 19 pages, 19 figures, In Advances in Neural Information Processing Systems 34 (NeurIPS 2020)

  10. arXiv:2004.05113  [pdf, ps, other

    cs.CY cs.LG stat.ML

    Automatically Assessing Quality of Online Health Articles

    Authors: Fariha Afsana, Muhammad Ashad Kabir, Naeemul Hassan, Manoranjan Paul

    Abstract: The information ecosystem today is overwhelmed by an unprecedented quantity of data on versatile topics are with varied quality. However, the quality of information disseminated in the field of medicine has been questioned as the negative health consequences of health misinformation can be life-threatening. There is currently no generic automated tool for evaluating the quality of online health in… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

  11. Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance

    Authors: Mauricio Santillana, Andre T. Nguyen, Mark Dredze, Michael J. Paul, John S. Brownstein

    Abstract: We present a machine learning-based methodology capable of providing real-time ("nowcast") and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like… ▽ More

    Submitted 27 August, 2015; originally announced August 2015.

  12. arXiv:1401.2642  [pdf, other

    stat.AP

    Hierarchical modelling of faecal egg counts to assess anthelmintic efficacy

    Authors: Michaela Paul, Paul R. Torgerson, Johan Höglund, Reinhard Furrer

    Abstract: Counting the number of parasite eggs in faecal samples is a widely used diagnostic method to evaluate parasite burden. Typically a sub-sample of the diluted faeces is examined for eggs. The resulting egg counts are multiplied by a specific correction factor to estimate the mean parasite burden. To detect anthelmintic resistance, the mean parasite burden from treated and untreated animals are compa… ▽ More

    Submitted 12 January, 2014; originally announced January 2014.

    Comments: 14 pages, 7 figures, 1 table