-
Spatial Data Generators
Authors:
Tin Vu,
Sara Migliorini,
Ahmed Eldawy,
Alberto Belussi
Abstract:
This gem describes a standard method for generating synthetic spatial data that can be used in benchmarking and scalability tests. The goal is to improve the reproducibility and increase the trust in experiments on synthetic data by using standard widely acceptable dataset distributions. In addition, this article describes how to assign a unique identifier to each synthetic dataset that can be sha…
▽ More
This gem describes a standard method for generating synthetic spatial data that can be used in benchmarking and scalability tests. The goal is to improve the reproducibility and increase the trust in experiments on synthetic data by using standard widely acceptable dataset distributions. In addition, this article describes how to assign a unique identifier to each synthetic dataset that can be shared in papers for reproducibility of results. Finally, this gem provides a supplementary material that gives a reference implementation for all the provided distributions.
△ Less
Submitted 27 September, 2021; v1 submitted 17 July, 2021;
originally announced July 2021.
-
The impact of using reconditioned correlated observation error covariance matrices in the Met Office 1D-Var system
Authors:
Jemima M. Tabeart,
Sarah L. Dance,
Amos S. Lawless,
Stefano Migliorini,
Nancy K. Nichols,
Fiona Smith,
Joanne A. Waller
Abstract:
Recent developments in numerical weather prediction have led to the use of correlated observation error covariance (OEC) information in data assimilation and forecasting systems. However, diagnosed OEC matrices are often ill-conditioned and may cause convergence problems for variational data assimilation procedures. Reconditioning methods are used to improve the conditioning of covariance matrices…
▽ More
Recent developments in numerical weather prediction have led to the use of correlated observation error covariance (OEC) information in data assimilation and forecasting systems. However, diagnosed OEC matrices are often ill-conditioned and may cause convergence problems for variational data assimilation procedures. Reconditioning methods are used to improve the conditioning of covariance matrices while retaining correlation information. In this paper we study the impact of using the 'ridge regression' method of reconditioning to assimilate Infrared Atmospheric Sounding Interferometer (IASI) observations in the Met Office 1D-Var system. This is the first systematic investigation of how changing target condition numbers affects convergence of a 1D-Var routine. This procedure is used for quality control, and to estimate key variables (skin temperature, cloud top pressure, cloud fraction) that are not analysed by the main 4D-Var data assimilation system. Our new results show that the current (uncorrelated) OEC matrix requires more iterations to reach convergence than any choice of correlated OEC matrix studied. This suggests that using a correlated OEC matrix in the 1D-Var routine would have computational benefits for IASI observations. Using reconditioned correlated OEC matrices also increases the number of observations that pass quality control. However, the impact on skin temperature, cloud fraction and cloud top pressure is less clear. As the reconditioning parameter is increased, differences between retrieved variables for correlated OEC matrices and the operational diagonal OEC matrix reduce. As correlated choices of OEC matrix yield faster convergence, using stricter convergence criteria along with these matrices may increase efficiency and improve quality control.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
Cache-based Multi-query Optimization for Data-intensive Scalable Computing Frameworks
Authors:
Pietro Michiardi,
Damiano Carra,
Sara Migliorini
Abstract:
In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in redundant and wasteful processing, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce…
▽ More
In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in redundant and wasteful processing, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce a novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks. By careful selection and exploitation of common (sub)expressions, while satisfying memory constraints, our method transforms a batch of queries into a new, more efficient one which avoids unnecessary recomputations. To find feasible and efficient execution plans, our method uses a cost-based optimization formulation akin to the multiple-choice knapsack problem. Extensive experiments on a prototype implementation of our system show significant benefits of worksharing for both TPC-DS workloads and detailed micro-benchmarks.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
Northern JHK Standard Stars for Array Detectors
Authors:
L. K. Hunt,
F. Mannucci,
L. Testi,
S. Migliorini,
R. M. Stanga,
C. Baffa,
F. Lisi,
L. Vanzi
Abstract:
We report J, H and K photometry of 86 stars in 40 fields in the northern hemisphere. The fields are smaller than or comparable to a 4x4 arcmin field-of-view, and are roughly uniformly distributed over the sky, making them suitable for a homogeneous broadband calibration network for near-infrared panoramic detectors. K magnitudes range from 8.5 to 14, and J-K colors from -0.1 to 1.2. The photomet…
▽ More
We report J, H and K photometry of 86 stars in 40 fields in the northern hemisphere. The fields are smaller than or comparable to a 4x4 arcmin field-of-view, and are roughly uniformly distributed over the sky, making them suitable for a homogeneous broadband calibration network for near-infrared panoramic detectors. K magnitudes range from 8.5 to 14, and J-K colors from -0.1 to 1.2. The photometry is derived from a total of 3899 reduced images; each star has been measured, on average, 26.0 times per filter on 5.5 nights. Typical errors on the photometry are about 0.012.
△ Less
Submitted 22 October, 1999; v1 submitted 13 March, 1998;
originally announced March 1998.