-
Using ChatGPT to refine draft conceptual schemata in supply-driven design of multidimensional cubes
Authors:
Stefano Rizzi
Abstract:
Refinement is a critical step in supply-driven conceptual design of multidimensional cubes because it can hardly be automated. In fact, it includes steps such as the labeling of attributes as descriptive and the removal of uninteresting attributes, thus relying on the end-users' requirements on the one hand, and on the semantics of measures, dimensions, and attributes on the other. As a consequenc…
▽ More
Refinement is a critical step in supply-driven conceptual design of multidimensional cubes because it can hardly be automated. In fact, it includes steps such as the labeling of attributes as descriptive and the removal of uninteresting attributes, thus relying on the end-users' requirements on the one hand, and on the semantics of measures, dimensions, and attributes on the other. As a consequence, it is normally carried out manually by designers in close collaboration with end-users. The goal of this work is to check whether LLMs can act as facilitators for the refinement task, so as to let it be carried out entirely -- or mostly -- by end-users. The Dimensional Fact Model is the target formalism for our study; as a representative LLM, we use ChatGPT's model GPT-4o. To achieve our goal, we formulate three research questions aimed at (i) understanding the basic competences of ChatGPT in multidimensional modeling; (ii) understanding the basic competences of ChatGPT in refinement; and (iii) investigating if the latter can be improved via prompt engineering. The results of our experiments show that, indeed, a careful prompt engineering can significantly improve the accuracy of refinement, and that the residual errors can quickly be fixed via one additional prompt. However, we conclude that, at present, some involvement of designers in refinement is still necessary to ensure the validity of the refined schemata.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Deciphering boundary layer dynamics in high-Rayleigh-number convection using 3360 GPUs and a high-scaling in-situ workflow
Authors:
Mathis Bode,
Damian Alvarez,
Paul Fischer,
Christos E. Frouzakis,
Jens Henrik Göbbert,
Joseph A. Insley,
Yu-Hsiang Lan,
Victor A. Mateevitsi,
Misun Min,
Michael E. Papka,
Silvio Rizzi,
Roshan J. Samuel,
Jörg Schumacher
Abstract:
Turbulent heat and momentum transfer processes due to thermal convection cover many scales and are of great importance for several natural and technical flows. One consequence is that a fully resolved three-dimensional analysis of these turbulent transfers at high Rayleigh numbers, which includes the boundary layers, is possible only using supercomputers. The visualization of these dynamics poses…
▽ More
Turbulent heat and momentum transfer processes due to thermal convection cover many scales and are of great importance for several natural and technical flows. One consequence is that a fully resolved three-dimensional analysis of these turbulent transfers at high Rayleigh numbers, which includes the boundary layers, is possible only using supercomputers. The visualization of these dynamics poses an additional hurdle since the thermal and viscous boundary layers in thermal convection fluctuate strongly. In order to track these fluctuations continuously, data must be tapped at high frequency for visualization, which is difficult to achieve using conventional methods. This paper makes two main contributions in this context. First, it discusses the simulations of turbulent Rayleigh-Bénard convection up to Rayleigh numbers of $Ra=10^{12}$ computed with NekRS on GPUs. The largest simulation was run on 840 nodes with 3360 GPU on the JUWELS Booster supercomputer. Secondly, an in-situ workflow using ASCENT is presented, which was successfully used to visualize the high-frequency turbulent fluctuations.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
Visualization Requirements for Business Intelligence Analytics: A Goal-Based, Iterative Framework
Authors:
Ana Lavalle,
Alejandro Maté,
Juan Trujillo,
Stefano Rizzi
Abstract:
Information visualization plays a key role in business intelligence analytics. With ever larger amounts of data that need to be interpreted, using the right visualizations is crucial in order to understand the underlying patterns and results obtained by analysis algorithms. Despite its importance, defining the right visualization is still a challenging task. Business users are rarely experts in in…
▽ More
Information visualization plays a key role in business intelligence analytics. With ever larger amounts of data that need to be interpreted, using the right visualizations is crucial in order to understand the underlying patterns and results obtained by analysis algorithms. Despite its importance, defining the right visualization is still a challenging task. Business users are rarely experts in information visualization, and they may not exactly know the most adequate visualization tools or patterns for their goals. Consequently, misinterpreted graphs and wrong results can be obtained, leading to missed opportunities and significant losses for companies. The main problem underneath is a lack of tools and methodologies that allow non-expert users to define their visualization and data analysis goals in business terms. In order to tackle this problem, we present an iterative goal-oriented approach based on the i* language for the automatic derivation of data visualizations. Our approach links non-expert user requirements to the data to be analyzed, choosing the most suited visualization techniques in a semi-automatic way. The great advantage of our proposal is that we provide non-expert users with the best suited visualizations according to their information needs and their data with little effort and without requiring expertise in information visualization.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Scaling Computational Fluid Dynamics: In Situ Visualization of NekRS using SENSEI
Authors:
Victor A. Mateevitsi,
Mathis Bode,
Nicola Ferrier,
Paul Fischer,
Jens Henrik Göbbert,
Joseph A. Insley,
Yu-Hsiang Lan,
Misun Min,
Michael E. Papka,
Saumil Patel,
Silvio Rizzi,
Jonathan Windgassen
Abstract:
In the realm of Computational Fluid Dynamics (CFD), the demand for memory and computation resources is extreme, necessitating the use of leadership-scale computing platforms for practical domain sizes. This intensive requirement renders traditional checkpointing methods ineffective due to the significant slowdown in simulations while saving state data to disk. As we progress towards exascale and G…
▽ More
In the realm of Computational Fluid Dynamics (CFD), the demand for memory and computation resources is extreme, necessitating the use of leadership-scale computing platforms for practical domain sizes. This intensive requirement renders traditional checkpointing methods ineffective due to the significant slowdown in simulations while saving state data to disk. As we progress towards exascale and GPU-driven High-Performance Computing (HPC) and confront larger problem sizes, the choice becomes increasingly stark: to compromise data fidelity or to reduce resolution. To navigate this challenge, this study advocates for the use of in situ analysis and visualization techniques. These allow more frequent data "snapshots" to be taken directly from memory, thus avoiding the need for disruptive checkpointing. We detail our approach of instrumenting NekRS, a GPU-focused thermal-fluid simulation code employing the spectral element method (SEM), and describe varied in situ and in transit strategies for data rendering. Additionally, we provide concrete scientific use-cases and report on runs performed on Polaris, Argonne Leadership Computing Facility's (ALCF) 44 Petaflop supercomputer and Jülich Wizard for European Leadership Science (JUWELS) Booster, Jülich Supercomputing Centre's (JSC) 71 Petaflop High Performance Computing (HPC) system, offering practical insight into the implications of our methodology.
△ Less
Submitted 18 December, 2023; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Distributed Neural Representation for Reactive in situ Visualization
Authors:
Qi Wu,
Joseph A. Insley,
Victor A. Mateevitsi,
Silvio Rizzi,
Michael E. Papka,
Kwan-Liu Ma
Abstract:
Implicit neural representations (INRs) have emerged as a powerful tool for compressing large-scale volume data. This opens up new possibilities for in situ visualization. However, the efficient application of INRs to distributed data remains an underexplored area. In this work, we develop a distributed volumetric neural representation and optimize it for in situ visualization. Our technique elimin…
▽ More
Implicit neural representations (INRs) have emerged as a powerful tool for compressing large-scale volume data. This opens up new possibilities for in situ visualization. However, the efficient application of INRs to distributed data remains an underexplored area. In this work, we develop a distributed volumetric neural representation and optimize it for in situ visualization. Our technique eliminates data exchanges between processes, achieving state-of-the-art compression speed, quality and ratios. Our technique also enables the implementation of an efficient strategy for caching large-scale simulation data in high temporal frequencies, further facilitating the use of reactive in situ visualization in a wider range of scientific problems. We integrate this system with the Ascent infrastructure and evaluate its performance and usability using real-world simulations.
△ Less
Submitted 20 July, 2024; v1 submitted 27 March, 2023;
originally announced April 2023.
-
Beyond Roll-Up's and Drill-Down's: An Intentional Analytics Model to Reinvent OLAP (long-version)
Authors:
Panos Vassiliadis,
Patrick Marcel,
Stefano Rizzi
Abstract:
This paper structures a novel vision for OLAP by fundamentally redefining several of the pillars on which OLAP has been based for the last 20 years. We redefine OLAP queries, in order to move to higher degrees of abstraction from roll-up's and drill-down's, and we propose a set of novel intentional OLAP operators, namely, describe, assess, explain, predict, and suggest, which express the user's ne…
▽ More
This paper structures a novel vision for OLAP by fundamentally redefining several of the pillars on which OLAP has been based for the last 20 years. We redefine OLAP queries, in order to move to higher degrees of abstraction from roll-up's and drill-down's, and we propose a set of novel intentional OLAP operators, namely, describe, assess, explain, predict, and suggest, which express the user's need for results. We fundamentally redefine what a query answer is, and escape from the constraint that the answer is a set of tuples; on the contrary, we complement the set of tuples with models (typically, but not exclusively, results of data mining algorithms over the involved data) that concisely represent the internal structure or correlations of the data. Due to the diverse nature of the involved models, we come up (for the first time ever, to the best of our knowledge) with a unifying framework for them, that places its pillars on the extension of each data cell of a cube with information about the models that pertain to it -- practically converting the small parts that build up the models to data that annotate each cell. We exploit this data-to-model mapping to provide highlights of the data, by isolating data and models that maximize the delivery of new information to the user. We introduce a novel method for assessing the surprise that a new query result brings to the user, with respect to the information contained in previous results the user has seen via a new interestingness measure. The individual parts of our proposal are integrated in a new data model for OLAP, which we call the Intentional Analytics Model. We complement our contribution with a list of significant open problems for the community to address.
△ Less
Submitted 8 December, 2020; v1 submitted 19 December, 2018;
originally announced December 2018.