Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > physics.data-an

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Data Analysis, Statistics and Probability

  • Cross-lists
  • Replacements

See recent articles

Showing new listings for Thursday, 21 August 2025

Total of 5 entries
Showing up to 2000 entries per page: fewer | more | all

Cross submissions (showing 2 of 2 entries)

[1] arXiv:2508.14078 (cross-list from cs.LG) [pdf, html, other]
Title: Out-of-Sample Hydrocarbon Production Forecasting: Time Series Machine Learning using Productivity Index-Driven Features and Inductive Conformal Prediction
Mohamed Hassan Abdalla Idris, Jakub Marek Cebula, Jebraeel Gholinezhad, Shamsul Masum, Hongjie Ma
Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)

This research introduces a new ML framework designed to enhance the robustness of out-of-sample hydrocarbon production forecasting, specifically addressing multivariate time series analysis. The proposed methodology integrates Productivity Index (PI)-driven feature selection, a concept derived from reservoir engineering, with Inductive Conformal Prediction (ICP) for rigorous uncertainty quantification. Utilizing historical data from the Volve (wells PF14, PF12) and Norne (well E1H) oil fields, this study investigates the efficacy of various predictive algorithms-namely Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and eXtreme Gradient Boosting (XGBoost) - in forecasting historical oil production rates (OPR_H). All the models achieved "out-of-sample" production forecasts for an upcoming future timeframe. Model performance was comprehensively evaluated using traditional error metrics (e.g., MAE) supplemented by Forecast Bias and Prediction Direction Accuracy (PDA) to assess bias and trend-capturing capabilities. The PI-based feature selection effectively reduced input dimensionality compared to conventional numerical simulation workflows. The uncertainty quantification was addressed using the ICP framework, a distribution-free approach that guarantees valid prediction intervals (e.g., 95% coverage) without reliance on distributional assumptions, offering a distinct advantage over traditional confidence intervals, particularly for complex, non-normal data. Results demonstrated the superior performance of the LSTM model, achieving the lowest MAE on test (19.468) and genuine out-of-sample forecast data (29.638) for well PF14, with subsequent validation on Norne well E1H. These findings highlight the significant potential of combining domain-specific knowledge with advanced ML techniques to improve the reliability of hydrocarbon production forecasts.

[2] arXiv:2508.14680 (cross-list from cond-mat.stat-mech) [pdf, html, other]
Title: Size-structured populations with growth fluctuations: Feynman--Kac formula and decoupling
Ethan Levien, Yair Heïn, Farshid Jafarpour
Comments: 29 pages, 4 figures
Subjects: Statistical Mechanics (cond-mat.stat-mech); Data Analysis, Statistics and Probability (physics.data-an); Populations and Evolution (q-bio.PE)

We study a size--structured population model of proliferating cells in which biomass accumulation and binary division occur at rates modulated by fluctuating internal phenotypes. We quantify how fluctuations in internal variables that influence both growth and division shape the distribution of population phenotypes. We derive conditions under which the distributions of size and internal state decouple. Under this decoupling, population--level expectations are obtained from lineage-level expectations by an exponential tilting given by the Feynman--Kac formula. We further characterize weaker (ensemble-specific) versions of decoupling that hold in the lineage or the population ensemble but not both. Finally, we provide a more general interpretation of the tilted expectations in terms of the mass-weighted phenotype distribution.

Replacement submissions (showing 3 of 3 entries)

[3] arXiv:2502.00038 (replaced) [pdf, other]
Title: The Spectral Barycentre of a Set of Graphs with Community Structure
François G. Meyer
Comments: 28 pages
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

The notion of barycentre graph is of crucial importance for machine learning algorithms that process graph-valued data. The barycentre graph is a "summary graph" that captures the mean topology and connectivity structure of a training dataset of graphs. The construction of a barycentre requires the definition of a metric to quantify distances between pairs of graphs. In this work, we use a multiscale spectral distance that is defined using the eigenvalues of the normalized graph Laplacian. The eigenvalues -- but not the eigenvectors -- of the normalized Laplacian of the barycentre graph can be determined from the optimization problem that defines the barycentre. In this work, we propose a structural constraint on the eigenvectors of the normalized graph Laplacian of the barycentre graph that guarantees that the barycentre inherits the topological structure of the graphs in the sample dataset. The eigenvectors can be computed using an algorithm that explores the large library of Soules bases. When the graphs are random realizations of a balanced stochastic block model, then our algorithm returns a barycentre that converges asymptotically (in the limit of large graph size) almost-surely to the population mean of the graphs. We perform Monte Carlo simulations to validate the theoretical properties of the estimator; we conduct experiments on real-life graphs that suggest that our approach works beyond the controlled environment of stochastic block models.

[4] arXiv:2502.08615 (replaced) [pdf, html, other]
Title: Learning Selection Cuts With Gradients
Mike Hance, Juan Robles
Comments: 14 pages, 8 figures
Subjects: High Energy Physics - Experiment (hep-ex); Data Analysis, Statistics and Probability (physics.data-an)

Many analyses in high-energy physics rely on selection thresholds (cuts) applied to detector, particle, or event properties. Initial cut values can often be guessed from physical intuition, but cut optimization, especially for multiple features, is commonly performed by hand, or skipped entirely in favor of multivariate algorithms like BDTs or neural networks. We revisit this problem, and develop a cut optimization approach based on gradient descent. Cut thresholds are learned as parameters of a network with a simple architecture, and can be tuned to achieve a target signal efficiency through the use of custom loss functions. Contractive terms in the loss can be used to ensure a smooth evolution of cuts as functions of efficiency, particle kinematics, or event features. The method is used to classify events in a search for Supersymmetry, and the performance is compared with common classification tools. An implementation of this approach is available in a public code repository and python package.

[5] arXiv:2508.13767 (replaced) [pdf, other]
Title: Mueller Matrix Polarimetry of Fiber Bragg Grating Strain and Torsion
Hani J. Kbashi, Alberto R. Cuevas, Sergey Sergeyev
Comments: Nine pages, seven figures
Subjects: Optics (physics.optics); Data Analysis, Statistics and Probability (physics.data-an); Instrumentation and Detectors (physics.ins-det)

We experimentally demonstrate a polarimetric dual-comb spectroscopy technique for simultaneous strain and torsion sensing using a single-cavity mode-locked fiber laser and fiber Bragg grating (FBG) sensors. Dual-comb generation in a single-cavity fiber laser was achieved by utilizing a piece of high-birefringence fiber and adjusting the in-cavity polarization controller. Fast Fourier Transform analysis was applied to the time-domain Stokes parameters, enabling the detection of FBG spectral shifts induced by strain and torsion. To further enhance discrimination between strain and torsion, we applied a novel approach to extract Mueller matrix elements without using complex adjustable polarization components. We explored the analysis of polarimetric purity of the FBG's Mueller matrix in terms of polarizance, diattenuation, and structural polarization response as a function of FBG strain and torsion. The obtained results enabled the measurement of strain and torsion based on a single FBG, which paves the way for the development of cost-effective shape sensing technologies.

Total of 5 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack