Skip to main content

Showing 1–13 of 13 results for author: Sahoo, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.04371  [pdf

    cs.LG stat.ML

    A Novel Cholesky Kernel based Support Vector Classifier

    Authors: Satyajeet Sahoo, Jhareswar Maiti

    Abstract: Support Vector Machine (SVM) is a popular supervised classification model that works by first finding the margin boundaries for the training data classes and then calculating the decision boundary, which is then used to classify the test data. This study demonstrates limitations of traditional support vector classification which uses cartesian coordinate geometry to find the margin and decision bo… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  2. arXiv:2503.00307  [pdf, other

    cs.LG stat.ML

    Remasking Discrete Diffusion Models with Inference-Time Scaling

    Authors: Guanghan Wang, Yair Schiff, Subham Sekhar Sahoo, Volodymyr Kuleshov

    Abstract: Part of the success of diffusion models stems from their ability to perform iterative refinement, i.e., repeatedly correcting outputs during generation. However, modern masked discrete diffusion lacks this capability: when a token is generated, it cannot be updated again, even when it introduces an error. Here, we address this limitation by introducing the remasking diffusion model (ReMDM) sampler… ▽ More

    Submitted 21 May, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

    Comments: Project page: https://remdm.github.io

  3. arXiv:2502.02233  [pdf

    stat.ML cs.LG

    Variance-Adjusted Cosine Distance as Similarity Metric

    Authors: Satyajeet Sahoo, Jhareswar Maiti

    Abstract: Cosine similarity is a popular distance measure that measures the similarity between two vectors in the inner product space. It is widely used in many data classification algorithms like K-Nearest Neighbors, Clustering etc. This study demonstrates limitations of application of cosine similarity. Particularly, this study demonstrates that traditional cosine similarity metric is valid only in the Eu… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: 6 Pages

  4. arXiv:2412.14527  [pdf, other

    stat.ML cs.LG

    Statistical Undersampling with Mutual Information and Support Points

    Authors: Alex Mak, Shubham Sahoo, Shivani Pandey, Yidan Yue, Linglong Kong

    Abstract: Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning, often leading to biased models and poor predictive performance for minority classes. This work introduces two novel undersampling approaches: mutual information-based stratified simple random sampling and support points optimization. These methods prioritize re… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  5. arXiv:2404.01468  [pdf, other

    eess.SY math.DS stat.AP

    Performance triggered adaptive model reduction for soil moisture estimation in precision irrigation

    Authors: Sarupa Debnath, Bernard T. Agyeman, Soumya R. Sahoo, Xunyuan Yin, Jinfeng Liu

    Abstract: Accurate soil moisture information is crucial for developing precise irrigation control strategies to enhance water use efficiency. Soil moisture estimation based on limited soil moisture sensors is crucial for obtaining comprehensive soil moisture information when dealing with large-scale agricultural fields. The major challenge in soil moisture estimation lies in the high dimensionality of the s… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  6. arXiv:2206.06672  [pdf, other

    cs.LG stat.ML

    Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows

    Authors: Phillip Si, Zeyi Chen, Subham Sekhar Sahoo, Yair Schiff, Volodymyr Kuleshov

    Abstract: Training normalizing flow generative models can be challenging due to the need to calculate computationally expensive determinants of Jacobians. This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules. The energy objective is determinant-free and supports flexible model architectures that are not eas… ▽ More

    Submitted 22 June, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: 9 pages, 3 figures, 8 tables, 11 pages appendix

    MSC Class: 68T37 (Primary) 68T07 (Secondary)

  7. arXiv:2203.06548  [pdf, other

    stat.AP cs.CY eess.SY math.OC

    Impact of sensor placement in soil water estimation: A real-case study

    Authors: Erfan Orouskhani, Soumya R. Sahoo, Bernard T. Agyeman, Song Bo, Jinfeng Liu

    Abstract: One of the essential elements in implementing a closed-loop irrigation system is soil moisture estimation based on a limited number of available sensors. One associated problem is the determination of the optimal locations to install the sensors such that good soil moisture estimation can be obtained. In our previous work, the modal degree of observability was employed to address the problem of op… ▽ More

    Submitted 12 March, 2022; originally announced March 2022.

  8. arXiv:2010.03228  [pdf, other

    stat.ML cs.AI cs.LG

    FairMixRep : Self-supervised Robust Representation Learning for Heterogeneous Data with Fairness constraints

    Authors: Souradip Chakraborty, Ekansh Verma, Saswata Sahoo, Jyotishka Datta

    Abstract: Representation Learning in a heterogeneous space with mixed variables of numerical and categorical types has interesting challenges due to its complex feature manifold. Moreover, feature learning in an unsupervised setup, without class labels and a suitable learning loss function, adds to the problem complexity. Further, the learned representation and subsequent predictions should not reflect disc… ▽ More

    Submitted 14 October, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: This paper has been accepted at the ICDM'2020 DLC Workshop

  9. arXiv:2009.09634  [pdf, other

    cs.LG stat.AP

    Learning Representation for Mixed Data Types with a Nonlinear Deep Encoder-Decoder Framework

    Authors: Saswata Sahoo, Souradip Chakraborty

    Abstract: Representation of data on mixed variables, numerical and categorical types to get suitable feature map is a challenging task as important information lies in a complex non-linear manifold. The feature transformation should be able to incorporate marginal information of the individual variables and complex cross-dependence structure among the mixed type of variables simultaneously. In this work, we… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  10. arXiv:2006.16322  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Scaling Symbolic Methods using Gradients for Neural Model Explanation

    Authors: Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley

    Abstract: Symbolic techniques based on Satisfiability Modulo Theory (SMT) solvers have been proposed for analyzing and verifying neural network properties, but their usage has been fairly limited owing to their poor scalability with larger networks. In this work, we propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses and demonstrate its application for mo… ▽ More

    Submitted 5 May, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

  11. arXiv:2005.02817  [pdf, other

    stat.ML cs.LG stat.AP

    Graph Spectral Feature Learning for Mixed Data of Categorical and Numerical Type

    Authors: Saswata Sahoo, Souradip Chakraborty

    Abstract: Feature learning in the presence of a mixed type of variables, numerical and categorical types, is an important issue for related modeling problems. For simple neighborhood queries under mixed data space, standard practice is to consider numerical and categorical variables separately and combining them based on some suitable distance functions. Alternatives, such as Kernel learning or Principal Co… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

  12. arXiv:1806.07259  [pdf, other

    cs.LG stat.ML

    Learning Equations for Extrapolation and Control

    Authors: Subham S. Sahoo, Christoph H. Lampert, Georg Martius

    Abstract: We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to inclu… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

    Comments: 9 pages, 9 figures, ICML 2018

    MSC Class: 68T05; 68T30; 68T40; 62M20; 62J02; 65D15; 70E60; 93C40 ACM Class: I.2.6; I.2.8

  13. arXiv:1612.06738  [pdf, other

    cs.CV cs.IR eess.IV eess.SP stat.AP

    Local Sparse Approximation for Image Restoration with Adaptive Block Size Selection

    Authors: Sujit Kumar Sahoo

    Abstract: In this paper the problem of image restoration (denoising and inpainting) is approached using sparse approximation of local image blocks. The local image blocks are extracted by sliding square windows over the image. An adaptive block size selection procedure for local sparse approximation is proposed, which affects the global recovery of underlying image. Ideally the adaptive local block selectio… ▽ More

    Submitted 20 December, 2016; originally announced December 2016.