-
Active Sybil Attack and Efficient Defense Strategy in IPFS DHT
Authors:
V. H. M. Netto,
T. Cholez,
C. L. Ignat
Abstract:
The InterPlanetary File System (IPFS) is a decentralized peer-to-peer (P2P) storage that relies on Kademlia, a Distributed Hash Table (DHT) structure commonly used in P2P systems for its proved scalability. However, DHTs are known to be vulnerable to Sybil attacks, in which a single entity controls multiple malicious nodes. Recent studies have shown that IPFS is affected by a passive content eclip…
▽ More
The InterPlanetary File System (IPFS) is a decentralized peer-to-peer (P2P) storage that relies on Kademlia, a Distributed Hash Table (DHT) structure commonly used in P2P systems for its proved scalability. However, DHTs are known to be vulnerable to Sybil attacks, in which a single entity controls multiple malicious nodes. Recent studies have shown that IPFS is affected by a passive content eclipse attack, leveraging Sybils, in which adversarial nodes hide received indexed information from other peers, making the content appear unavailable. Fortunately, the latest mitigation strategy coupling an attack detection based on statistical tests and a wider publication strategy upon detection was able to circumvent it.
In this work, we present a new active attack, with malicious nodes responding with semantically correct but intentionally false data, exploiting both an optimized placement of Sybils to stay below the detection threshold and an early trigger of the content discovery termination in Kubo, the main IPFS implementation. Our attack achieves to completely eclipse content on the latest Kubo release. When evaluated against the most recent known mitigation, it successfully denies access to the target content in approximately 80\% of lookup attempts.
To address this vulnerability, we propose a new mitigation called SR-DHT-Store, which enables efficient, Sybil-resistant content publication without relying on attack detection but instead on a systematic and precise use of region-based queries, defined by a dynamically computed XOR distance to the target ID. SR-DHT-Store can be combined with other defense mechanisms resulting in a defense strategy that completely mitigates both passive and active Sybil attacks at a lower overhead, while allowing an incremental deployment.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
A Thermodynamic Model for Dark Energy Including Particle Creation or Destruction Processes
Authors:
José Medeiros da Costa Netto,
Heydson Henrique Brito da Silva
Abstract:
Thermodynamic analyses of dark energy as a relativistic fluid indicates that this intriguing component of the universe mimics a bulk viscous pressure when the parameter of its barotropic equation of state varies with time. Since in cosmology bulk viscosity and creation or destruction of matter are closely linked processes, we propose in this work a brief thermodynamic study of dark energy consider…
▽ More
Thermodynamic analyses of dark energy as a relativistic fluid indicates that this intriguing component of the universe mimics a bulk viscous pressure when the parameter of its barotropic equation of state varies with time. Since in cosmology bulk viscosity and creation or destruction of matter are closely linked processes, we propose in this work a brief thermodynamic study of dark energy considering that particles can be created or destroyed in the fluid. We derive new expressions for quantities such as particle density, entropy density etc. that have been shown to be sensitive to this new ingredient. We also obtain new thermodynamic constraints and compare them with those where the number of particles is conserved. In particular, we found that in the presence of a sink, dark energy tends towards the cosmological constant over time regardless of the sign of its chemical potential and without violating the laws of thermodynamics.
△ Less
Submitted 28 February, 2025;
originally announced March 2025.
-
Simplifying HPC resource selection: A tool for optimizing execution time and cost on Azure
Authors:
Marco A. S. Netto,
Wolfgang De Savador,
Davide Vanzo
Abstract:
Azure Cloud offers a wide range of resources for running HPC workloads, requiring users to configure their deployment by selecting VM types, number of VMs, and processes per VM. Suboptimal decisions may lead to longer execution times or additional costs for the user. We are developing an open-source tool to assist users in making these decisions by considering application input parameters, as they…
▽ More
Azure Cloud offers a wide range of resources for running HPC workloads, requiring users to configure their deployment by selecting VM types, number of VMs, and processes per VM. Suboptimal decisions may lead to longer execution times or additional costs for the user. We are developing an open-source tool to assist users in making these decisions by considering application input parameters, as they influence resource consumption. The tool automates the time-consuming process of setting up the cloud environment, executing the benchmarking runs, handling output, and providing users with resource selection recommendations as high level insights on run times and costs across different VM types and number of VMs. In this work, we present initial results and insights on reducing the number of cloud executions needed to provide such guidance, leveraging data analytics and optimization techniques with two well-known HPC applications: OpenFOAM and LAMMPS.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
HPCAdvisor: A Tool for Assisting Users in Selecting HPC Resources in the Cloud
Authors:
Marco A. S. Netto
Abstract:
Cloud platforms are increasingly being used to run HPC workloads. Major cloud providers offer a wide variety of virtual machine (VM) types, enabling users to find the optimal balance between performance and cost. However, this extensive selection of VM types can also present challenges, as users must decide not only which VM types to use but also how many nodes are required for a given workload. A…
▽ More
Cloud platforms are increasingly being used to run HPC workloads. Major cloud providers offer a wide variety of virtual machine (VM) types, enabling users to find the optimal balance between performance and cost. However, this extensive selection of VM types can also present challenges, as users must decide not only which VM types to use but also how many nodes are required for a given workload. Although benchmarking data is available for well-known applications from major cloud providers, the choice of resources is also influenced by the specifics of the user's application input. This paper presents the vision and current implementation of HPCAdvisor, a tool designed to assist users in defining their HPC clusters in the cloud. It considers the application's input and utilizes a major cloud provider as a use case for its back-end component.
△ Less
Submitted 22 November, 2024;
originally announced November 2024.
-
Participation Factors for Nonlinear Autonomous Dynamical Systems in the Koopman Operator Framework
Authors:
Kenji Takamichi,
Yoshihiko Susuki,
Marcos Netto
Abstract:
We devise a novel formulation and propose the concept of modal participation factors to nonlinear dynamical systems. The original definition of modal participation factors (or simply participation factors) provides a simple yet effective metric. It finds use in theory and practice, quantifying the interplay between states and modes of oscillation in a linear time-invariant (LTI) system. In this pa…
▽ More
We devise a novel formulation and propose the concept of modal participation factors to nonlinear dynamical systems. The original definition of modal participation factors (or simply participation factors) provides a simple yet effective metric. It finds use in theory and practice, quantifying the interplay between states and modes of oscillation in a linear time-invariant (LTI) system. In this paper, with the Koopman operator framework, we present the results of participation factors for nonlinear dynamical systems with an asymptotically stable equilibrium point or limit cycle. We show that participation factors are defined for the entire domain of attraction, beyond the vicinity of an attractor, where the original definition of participation factors for LTI systems is a special case. Finally, we develop a numerical method to estimate participation factors using time series data from the underlying nonlinear dynamical system. The numerical method can be implemented by leveraging a well-established numerical scheme in the Koopman operator framework called dynamic mode decomposition.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Urban Scaling Laws
Authors:
Fabiano L. Ribeiro,
Vinicius M. Netto
Abstract:
Understanding how size influences the internal characteristics of a system is a crucial concern across various fields. Concepts like scale invariance, universalities, and fractals are fundamental to this inquiry and find application in biology, physics, and particularly urbanism. Size profoundly impacts how cities develop and function economically and socially. For example, what are the pros and c…
▽ More
Understanding how size influences the internal characteristics of a system is a crucial concern across various fields. Concepts like scale invariance, universalities, and fractals are fundamental to this inquiry and find application in biology, physics, and particularly urbanism. Size profoundly impacts how cities develop and function economically and socially. For example, what are the pros and cons of residing in larger cities? Is life really more expensive or less safe in larger cities? Or do they really offer more opportunities and generally higher incomes than smaller ones? To address such inquiries, we utilize theoretical tools from scaling theory, enabling a quantitative description of how a system's behavior changes across different scales, from micro to macro. Drawing parallels with research in biology and spatial economics, this chapter explores recent discoveries, ongoing progress, and unanswered questions regarding urban scaling.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Measurement Uncertainty Impact on Koopman Operator Estimation of Power System Dynamics
Authors:
P. Algikar,
P. Sharma,
M. Netto,
L. Mili
Abstract:
Sensor measurements are mission-critical for monitoring and controlling power systems because they provide real-time insight into the grid operating condition; however, confidence in these insights depends greatly on the quality of the sensor data. Uncertainty in sensor measurements is an intrinsic aspect of the measurement process. In this paper, we develop an analytical method to quantify the im…
▽ More
Sensor measurements are mission-critical for monitoring and controlling power systems because they provide real-time insight into the grid operating condition; however, confidence in these insights depends greatly on the quality of the sensor data. Uncertainty in sensor measurements is an intrinsic aspect of the measurement process. In this paper, we develop an analytical method to quantify the impact of measurement uncertainties in numerical methods that employ the Koopman operator to identify nonlinear dynamics based on recorded data. In particular, we quantify the confidence interval of each element in the push-forward matrix from which a subset of the Koopman operator's discrete spectrum is estimated. We provide a detailed numerical analysis of the developed method applied to numerical simulations and field data collected from experiments conducted in a megawatt-scale facility at the National Renewable Energy Laboratory.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Statistical analysis and method to quantify the impact of measurement uncertainty on dynamic mode decomposition
Authors:
P. Algikar,
P. Sharma,
M. Netto,
L. Mili
Abstract:
We apply random matrix theory to study the impact of measurement uncertainty on dynamic mode decomposition. Specifically, when the measurements follow a normal probability density function, we show how the moments of that density propagate through the dynamic mode decomposition. While we focus on the first and second moments, the analytical expressions we derive are general and can be extended to…
▽ More
We apply random matrix theory to study the impact of measurement uncertainty on dynamic mode decomposition. Specifically, when the measurements follow a normal probability density function, we show how the moments of that density propagate through the dynamic mode decomposition. While we focus on the first and second moments, the analytical expressions we derive are general and can be extended to higher-order moments. Furthermore, the proposed numerical method for propagating uncertainty is agnostic of specific dynamic mode decomposition formulations. Of particular relevance, the estimated second moments provide confidence bounds that may be used as a metric of trustworthiness, that is, how much one can rely on a finite-dimensional linear operator to represent an underlying dynamical system. We perform numerical experiments on two canonical systems and verify the estimated confidence levels by comparing the moments with those obtained from Monte Carlo simulations.
△ Less
Submitted 24 January, 2025; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Entropy and the City: Origins, trajectories and explorations of the concept in urban science
Authors:
Vinicius M. Netto,
Otavio Peres,
Caio Cacholas
Abstract:
Entropy is arguably one of the most powerful concepts to understand the world, from the behavior of molecules to the expansion of the universe, from how life emerges to how hybrid complex systems like cities come into being and continue existing. Yet, despite its widespread application, it is also one of the most misunderstood concepts across the sciences. This chapter seeks to demystify entropy a…
▽ More
Entropy is arguably one of the most powerful concepts to understand the world, from the behavior of molecules to the expansion of the universe, from how life emerges to how hybrid complex systems like cities come into being and continue existing. Yet, despite its widespread application, it is also one of the most misunderstood concepts across the sciences. This chapter seeks to demystify entropy and its main interpretations, along with some of its explorations in the context of cities. It first establishes the foundations of the concept by describing its trajectory since its inception in thermodynamics and statistical mechanics in the 19th century, its different incarnations from Boltzmanns pioneering formulation and Shannons information theory to its absorption in biology and the social sciences, until it reaches a nascent urban science in the 1960s. The chapter then identifies some of the main domains in which entropy has been explored to understand cities as complex systems, from entropy-maximizing models of spatial interaction and applications as a measure of urban form, diversity, and complexity to a tool for understanding conditions of self-organization and urban sustainability.
△ Less
Submitted 2 April, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Propagating Parameter Uncertainty in Power System Nonlinear Dynamic Simulations Using a Koopman Operator-Based Surrogate Model
Authors:
Yijun Xu,
Marcos Netto,
Lamine Mili
Abstract:
We propose a Koopman operator-based surrogate model for propagating parameter uncertainties in power system nonlinear dynamic simulations. First, we augment the a priori known state-space model by reformulating parameters deemed uncertain as pseudo-state variables. Then, we apply the Koopman operator theory to the resulting state-space model and obtain a linear dynamical system model. This transfo…
▽ More
We propose a Koopman operator-based surrogate model for propagating parameter uncertainties in power system nonlinear dynamic simulations. First, we augment the a priori known state-space model by reformulating parameters deemed uncertain as pseudo-state variables. Then, we apply the Koopman operator theory to the resulting state-space model and obtain a linear dynamical system model. This transformation allows us to analyze the evolution of the system dynamics through its Koopman eigenfunctions, eigenvalues, and modes. Of particular importance for this letter, the obtained linear dynamical system is a surrogate that enables the evaluation of parameter uncertainties by simply perturbing the initial conditions of the Koopman eigenfunctions associated with the pseudo-state variables. Simulations carried out on the New England test system reveal the excellent performance of the proposed method in terms of accuracy and computational efficiency.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
A mode-in-state contribution factor based on Koopman operator and its application to power system analysis
Authors:
Kenji Takamichi,
Yoshihiko Susuki,
Marcos Netto,
Atsushi Ishigame
Abstract:
This paper proposes a mode-in-state contribution factor for a class of nonlinear dynamical systems by utilizing spectral properties of the Koopman operator and sensitivity analysis. Using eigenfunctions of the Koopman operator for a target nonlinear system, we show that the relative contribution between modes and state variables can be quantified beyond a linear regime, where the nonlinearity of t…
▽ More
This paper proposes a mode-in-state contribution factor for a class of nonlinear dynamical systems by utilizing spectral properties of the Koopman operator and sensitivity analysis. Using eigenfunctions of the Koopman operator for a target nonlinear system, we show that the relative contribution between modes and state variables can be quantified beyond a linear regime, where the nonlinearity of the system is taken into consideration. The proposed contribution factor is applied to the numerical analysis of large-signal simulations for an interconnected AC/multi-terminal DC power system.
△ Less
Submitted 22 May, 2022;
originally announced May 2022.
-
Entropy and hierarchical clustering: characterising the morphology of the urban fabric in different spatial cultures
Authors:
E. Brigatti,
V. M. Netto,
F. N. M. de Sousa Filho,
C. Cacholas
Abstract:
In this work, we develop a general method for estimating the Shannon entropy of a bidimensional sequence based on the extrapolation of block entropies. We apply this method to analyse the spatial configurations of cities of different cultures and regions of the world. Findings suggest that this approach can identify similarities between cities, generating accurate results for recognising and class…
▽ More
In this work, we develop a general method for estimating the Shannon entropy of a bidimensional sequence based on the extrapolation of block entropies. We apply this method to analyse the spatial configurations of cities of different cultures and regions of the world. Findings suggest that this approach can identify similarities between cities, generating accurate results for recognising and classifying different urban morphologies. The hierarchical clustering analysis based on this metric also opens up new questions about the possibility that urban form can embody characteristics related to different cultural identities, historical processes and geographical regions.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
Context-aware Execution Migration Tool for Data Science Jupyter Notebooks on Hybrid Clouds
Authors:
Renato L. F. Cunha,
Lucas V. Real,
Renan Souza,
Bruno Silva,
Marco A. S. Netto
Abstract:
Interactive computing notebooks, such as Jupyter notebooks, have become a popular tool for developing and improving data-driven models. Such notebooks tend to be executed either in the user's own machine or in a cloud environment, having drawbacks and benefits in both approaches. This paper presents a solution developed as a Jupyter extension that automatically selects which cells, as well as in w…
▽ More
Interactive computing notebooks, such as Jupyter notebooks, have become a popular tool for developing and improving data-driven models. Such notebooks tend to be executed either in the user's own machine or in a cloud environment, having drawbacks and benefits in both approaches. This paper presents a solution developed as a Jupyter extension that automatically selects which cells, as well as in which scenarios, such cells should be migrated to a more suitable platform for execution. We describe how we reduce the execution state of the notebook to decrease migration time and we explore the knowledge of user interactivity patterns with the notebook to determine which blocks of cells should be migrated. Using notebooks from Earth science (remote sensing), image recognition, and hand written digit identification (machine learning), our experiments show notebook state reductions of up to 55x and migration decisions leading to performance gains of up to 3.25x when the user interactivity with the notebook is taken into consideration.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
Measurement placement in electric power transmission and distribution grids: Review of concepts, methods, and research needs
Authors:
Marcos Netto,
Venkat Krishnan,
Yingchen Zhang,
Lamine Mili
Abstract:
Sensing and measurement systems are quintessential to the safe and reliable operation of electric power grids. Their strategic placement is of ultimate importance because it is not economically viable to install measurement systems on every node and branch of a power grid, though they need to be monitored. An overwhelming number of strategies have been developed to meet oftentimes multiple conflic…
▽ More
Sensing and measurement systems are quintessential to the safe and reliable operation of electric power grids. Their strategic placement is of ultimate importance because it is not economically viable to install measurement systems on every node and branch of a power grid, though they need to be monitored. An overwhelming number of strategies have been developed to meet oftentimes multiple conflicting objectives. The prime challenge in formulating the problem lies in developing a heuristic or an optimization model that, though mathematically tractable and constrained in cost, leads to trustworthy technical solutions. Further, large-scale, long-term deployments pose additional challenges because the boundary conditions change as technologies evolve. For instance, the advent of new technologies in sensing and measurement, as well as in communications and networking, might impact the cost and performance of available solutions and shift initially set conditions. Also, the placement strategies developed for transmission grids might not be suitable for distribution grids, and vice versa, because of unique characteristics; therefore, the strategies need to be flexible, to a certain extent, because no two power grids are alike. Despite the extensive literature on the present topic, the focus of published works tends to be on a specific subject, such as the optimal placement of measurements to ensure observability in transmission grids. There is a dearth of work providing a comprehensive picture for developing optimal placement strategies. Because of the ongoing efforts on the modernization of electric power grids, there is a need to consolidate the status quo while exposing its limitations to inform policymakers, industry stakeholders, and researchers on the research-and-development needs to push the boundaries for innovation.
△ Less
Submitted 30 September, 2021; v1 submitted 31 May, 2021;
originally announced May 2021.
-
Robust Dynamic Mode Decomposition
Authors:
Amir Hossein Abolmasoumi,
Marcos Netto,
Lamine Mili
Abstract:
This paper develops a robust dynamic mode decomposition (RDMD) method endowed with statistical and numerical robustness. Statistical robustness ensures estimation efficiency at the Gaussian and non-Gaussian probability distributions, including heavy-tailed distributions. The proposed RDMD is statistically robust because the outliers in the data set are flagged via projection statistics and suppres…
▽ More
This paper develops a robust dynamic mode decomposition (RDMD) method endowed with statistical and numerical robustness. Statistical robustness ensures estimation efficiency at the Gaussian and non-Gaussian probability distributions, including heavy-tailed distributions. The proposed RDMD is statistically robust because the outliers in the data set are flagged via projection statistics and suppressed using a Schweppe-type Huber generalized maximum-likelihood estimator that minimizes a convex Huber cost function. The latter is solved using the iteratively reweighted least-squares algorithm that is known to exhibit a better convergence property and numerical stability than the Newton algorithms. Several numerical simulations using canonical models of dynamical systems demonstrate the excellent performance of the proposed RDMD method. The results reveal that it outperforms several other methods proposed in the literature.
△ Less
Submitted 11 October, 2021; v1 submitted 20 May, 2021;
originally announced May 2021.
-
A robust extended Kalman filter for power system dynamic state estimation using PMU measurements
Authors:
Marcos Netto,
Junbo Zhao,
Lamine Mili
Abstract:
This paper develops a robust extended Kalman filter to estimate the rotor angles and the rotor speeds of synchronous generators of a multimachine power system. Using a batch-mode regression form, the filter processes together predicted state vector and PMU measurements to track the system dynamics faster than the standard extended Kalman filter. Our proposed filter is based on a robust GM-estimato…
▽ More
This paper develops a robust extended Kalman filter to estimate the rotor angles and the rotor speeds of synchronous generators of a multimachine power system. Using a batch-mode regression form, the filter processes together predicted state vector and PMU measurements to track the system dynamics faster than the standard extended Kalman filter. Our proposed filter is based on a robust GM-estimator that bounds the influence of vertical outliers and bad leverage points, which are identified by means of the projection statistics. Good statistical efficiency under the Gaussian distribution assumption of the process and the observation noise is achieved thanks to the use of the Huber cost function, which is minimized via the iteratively reweighted least squares algorithm. The asymptotic covariance matrix of the state estimation error vector is derived via the covariance matrix of the total influence function of the GM-estimator. Simulations carried out on the IEEE 39-bus test system reveal that our robust extended Kalman filter exhibits good tracking capabilities under Gaussian process and observation noise while suppressing observation outliers, even in position of leverage. These good performances are obtained only under the validity of the linear approximation of the power system model.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Workflow Provenance in the Lifecycle of Scientific Machine Learning
Authors:
Renan Souza,
Leonardo G. Azevedo,
Vítor Lourenço,
Elton Soares,
Raphael Thiago,
Rafael Brandão,
Daniel Civitarese,
Emilio Vital Brazil,
Marcio Moreno,
Patrick Valduriez,
Marta Mattoso,
Renato Cerqueira,
Marco A. S. Netto
Abstract:
Machine Learning (ML) has already fundamentally changed several businesses. More recently, it has also been profoundly impacting the computational science and engineering domains, like geoscience, climate science, and health science. In these domains, users need to perform comprehensive data analyses combining scientific data and ML models to provide for critical requirements, such as reproducibil…
▽ More
Machine Learning (ML) has already fundamentally changed several businesses. More recently, it has also been profoundly impacting the computational science and engineering domains, like geoscience, climate science, and health science. In these domains, users need to perform comprehensive data analyses combining scientific data and ML models to provide for critical requirements, such as reproducibility, model explainability, and experiment data understanding. However, scientific ML is multidisciplinary, heterogeneous, and affected by the physical constraints of the domain, making such analyses even more challenging. In this work, we leverage workflow provenance techniques to build a holistic view to support the lifecycle of scientific ML. We contribute with (i) characterization of the lifecycle and taxonomy for data analyses; (ii) design principles to build this view, with a W3C PROV compliant data representation and a reference system architecture; and (iii) lessons learned after an evaluation in an Oil & Gas case using an HPC cluster with 393 nodes and 946 GPUs. The experiments show that the principles enable queries that integrate domain semantics with ML models while keeping low overhead (<1%), high scalability, and an order of magnitude of query acceleration under certain workloads against without our representation.
△ Less
Submitted 25 August, 2021; v1 submitted 30 September, 2020;
originally announced October 2020.
-
On analytical construction of observable functions in extended dynamic mode decomposition for nonlinear estimation and prediction
Authors:
Marcos Netto,
Yoshihiko Susuki,
Venkat Krishnan,
Yingchen Zhang
Abstract:
We propose an analytical construction of observable functions in the extended dynamic mode decomposition (EDMD) algorithm. EDMD is a numerical method for approximating the spectral properties of the Koopman operator. The choice of observable functions is fundamental for the application of EDMD to nonlinear problems arising in systems and control. Existing methods either start from a set of diction…
▽ More
We propose an analytical construction of observable functions in the extended dynamic mode decomposition (EDMD) algorithm. EDMD is a numerical method for approximating the spectral properties of the Koopman operator. The choice of observable functions is fundamental for the application of EDMD to nonlinear problems arising in systems and control. Existing methods either start from a set of dictionary functions and look for the subset that best fits the underlying nonlinear dynamics or they rely on machine learning algorithms to "learn" observable functions. Conversely, in this paper, we start from the dynamical system model and lift it through the Lie derivatives, rendering it into a polynomial form. This proposed transformation into a polynomial form is exact, and it provides an adequate set of observable functions. The strength of the proposed approach is its applicability to a broader class of nonlinear dynamical systems, particularly those with nonpolynomial functions and compositions thereof. Moreover, it retains the physical interpretability of the underlying dynamical system and can be readily integrated into existing numerical libraries. The proposed approach is illustrated with an application to electric power systems. The modeled system consists of a single generator connected to an infinite bus, where nonlinear terms include sine and cosine functions. The results demonstrate the effectiveness of the proposed procedure in off-attractor nonlinear dynamics for estimation and prediction; the observable functions obtained from the proposed construction outperform methods that use dictionary functions comprising monomials or radial basis functions.
△ Less
Submitted 5 January, 2021; v1 submitted 28 August, 2020;
originally announced August 2020.
-
From form to information: Analysing built environments in different spatial cultures
Authors:
Vinicius M. Netto,
Edgardo Brigatti,
Caio Cacholas
Abstract:
Cities are different around the world, but does this fact have any relation to culture? The idea that urban form embodies idiosyncrasies related to cultural identities captures the imagination of many in urban studies, but it is an assumption yet to be carefully examined. Approaching spatial configurations in the built environment as a proxy of urban culture, this paper searches for differences po…
▽ More
Cities are different around the world, but does this fact have any relation to culture? The idea that urban form embodies idiosyncrasies related to cultural identities captures the imagination of many in urban studies, but it is an assumption yet to be carefully examined. Approaching spatial configurations in the built environment as a proxy of urban culture, this paper searches for differences potentially consistent with specific regional cultures or cultures of planning in urban development. It does so focusing on the elementary components shaping cities: buildings and how they are aggregated in cellular complexes of built form. Exploring Shannon's work, we introduce an entropy measure to analyse the probability distribution of cellular arrangements in built form systems. We apply it to downtown areas of 45 cities from different regions of the world as a similarity measure to compare and cluster cities potentially consistent with specific spatial cultures. Findings suggest a classification scheme that sheds further light on what we call the "cultural hypothesis": the possibility that different cultures and regions find different ways of ordering space.
△ Less
Submitted 26 June, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Roles of Dynamic State Estimation in Power System Modeling, Monitoring and Operation
Authors:
Junbo Zhao,
Marcos Netto,
Zhenyu Huang,
Samson Shenglong Yu,
Antonio Gomez-Exposito,
Shaobu Wang,
Innocent Kamwa,
Shahrokh Akhlaghi,
Lamine Mili,
Vladimir Terzija,
A. P. Sakis Meliopoulos,
Bikash Pal,
Abhinav Kumar Singh,
Ali Abur,
Tianshu Bi,
Alireza Rouhani
Abstract:
Power system dynamic state estimation (DSE) remains an active research area. This is driven by the absence of accurate models, the increasing availability of fast-sampled, time-synchronized measurements, and the advances in the capability, scalability, and affordability of computing and communications. This paper discusses the advantages of DSE as compared to static state estimation, and the imple…
▽ More
Power system dynamic state estimation (DSE) remains an active research area. This is driven by the absence of accurate models, the increasing availability of fast-sampled, time-synchronized measurements, and the advances in the capability, scalability, and affordability of computing and communications. This paper discusses the advantages of DSE as compared to static state estimation, and the implementation differences between the two, including the measurement configuration, modeling framework and support software features. The important roles of DSE are discussed from modeling, monitoring and operation aspects for today's synchronous machine dominated systems and the future power electronics-interfaced generation systems. Several examples are presented to demonstrate the benefits of DSE on enhancing the operational robustness and resilience of 21st century power system through time critical applications. Future research directions are identified and discussed, paving the way for developing the next generation of energy management systems.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
Assessing Spatial Information in Physical Environments
Authors:
Vinicius M. Netto,
Edgardo Brigatti,
Caio Cacholas,
Vinicius Gomes Aleixo
Abstract:
Many approaches have dealt with the hypothesis that the environment contain information, mostly focusing on how humans decode information from the environment in visual perception, navigation, and spatial decision-making. A question yet to be fully explored is how the built environment could encode forms of information in its own physical structures. This paper explores a new measure of spatial in…
▽ More
Many approaches have dealt with the hypothesis that the environment contain information, mostly focusing on how humans decode information from the environment in visual perception, navigation, and spatial decision-making. A question yet to be fully explored is how the built environment could encode forms of information in its own physical structures. This paper explores a new measure of spatial information, and applies it to twenty cities from different spatial cultures and regions of the world. Findings suggest that this methodology is able to identify similarities between cities, generating a classification scheme that opens up new questions about what we call "cultural hypothesis": the idea that spatial configurations find consistent differences between cultures and regions.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.
-
Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering
Authors:
Renan Souza,
Leonardo Azevedo,
Vítor Lourenço,
Elton Soares,
Raphael Thiago,
Rafael Brandão,
Daniel Civitarese,
Emilio Vital Brazil,
Marcio Moreno,
Patrick Valduriez,
Marta Mattoso,
Renato Cerqueira,
Marco A. S. Netto
Abstract:
Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stakeholders how it was created. The m…
▽ More
Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stakeholders how it was created. The main limitation of provenance tracking solutions is that they cannot cope with provenance capture and integration of domain and ML data processed in the multiple workflows in the lifecycle while keeping the provenance capture overhead low. To handle this problem, in this paper we contribute with a detailed characterization of provenance data in the ML lifecycle in CSE; a new provenance data representation, called PROV-ML, built on top of W3C PROV and ML Schema; and extensions to a system that tracks provenance from multiple workflows to address the characteristics of ML and CSE, and to allow for provenance queries with a standard vocabulary. We show a practical use in a real case in the Oil and Gas industry, along with its evaluation using 48 GPUs in parallel.
△ Less
Submitted 21 October, 2019; v1 submitted 9 October, 2019;
originally announced October 2019.
-
On the relation between Transversal and Longitudinal Scaling in Cities
Authors:
Fabiano L. Ribeiro,
Joao Meirelles,
Vinicius M. Netto,
Camilo Rodrigues Neto,
Andrea Baronchelli
Abstract:
Given that a group of cities follows a scaling law connecting urban population with socio-economic or infrastructural metrics (transversal scaling), should we expect that each city would follow the same behavior over time (longitudinal scaling)? This assumption has important policy implications, although rigorous empirical tests have been so far hindered by the lack of suitable data. Here, we adva…
▽ More
Given that a group of cities follows a scaling law connecting urban population with socio-economic or infrastructural metrics (transversal scaling), should we expect that each city would follow the same behavior over time (longitudinal scaling)? This assumption has important policy implications, although rigorous empirical tests have been so far hindered by the lack of suitable data. Here, we advance the debate by looking into the temporal evolution of the scaling laws for 5507 municipalities in Brazil. We focus on the relationship between population size and two urban variables, GDP and water network length, analyzing the time evolution of the system of cities as well as their individual trajectory. We find that longitudinal (individual) scaling exponents are city-specific, but they are distributed around an average value that approaches to the transversal scaling exponent when the data are decomposed to eliminate external factors, and when we only consider cities with a sufficiently large growth rate. Such results give support to the idea that the longitudinal dynamics is a micro-scaling version of the transversal dynamics of the entire urban system. Finally, we propose a mathematical framework that connects the microscopic level to global behavior, and, in all analyzed cases, we find good agreement between theoretical prediction and empirical evidence.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
DeepDownscale: a Deep Learning Strategy for High-Resolution Weather Forecast
Authors:
Eduardo R. Rodrigues,
Igor Oliveira,
Renato L. F. Cunha,
Marco A. S. Netto
Abstract:
Running high-resolution physical models is computationally expensive and essential for many disciplines. Agriculture, transportation, and energy are sectors that depend on high-resolution weather models, which typically consume many hours of large High Performance Computing (HPC) systems to deliver timely results. Many users cannot afford to run the desired resolution and are forced to use low res…
▽ More
Running high-resolution physical models is computationally expensive and essential for many disciplines. Agriculture, transportation, and energy are sectors that depend on high-resolution weather models, which typically consume many hours of large High Performance Computing (HPC) systems to deliver timely results. Many users cannot afford to run the desired resolution and are forced to use low resolution output. One simple solution is to interpolate results for visualization. It is also possible to combine an ensemble of low resolution models to obtain a better prediction. However, these approaches fail to capture the redundant information and patterns in the low-resolution input that could help improve the quality of prediction. In this paper, we propose and evaluate a strategy based on a deep neural network to learn a high-resolution representation from low-resolution predictions using weather forecast as a practical use case. We take a supervised learning approach, since obtaining labeled data can be done automatically. Our results show significant improvement when compared with standard practices and the strategy is still lightweight enough to run on modest computer systems.
△ Less
Submitted 15 August, 2018;
originally announced August 2018.
-
A Scalable Machine Learning System for Pre-Season Agriculture Yield Forecast
Authors:
Igor Oliveira,
Renato L. F. Cunha,
Bruno Silva,
Marco A. S. Netto
Abstract:
Yield forecast is essential to agriculture stakeholders and can be obtained with the use of machine learning models and data coming from multiple sources. Most solutions for yield forecast rely on NDVI (Normalized Difference Vegetation Index) data, which is time-consuming to be acquired and processed. To bring scalability for yield forecast, in the present paper we describe a system that incorpora…
▽ More
Yield forecast is essential to agriculture stakeholders and can be obtained with the use of machine learning models and data coming from multiple sources. Most solutions for yield forecast rely on NDVI (Normalized Difference Vegetation Index) data, which is time-consuming to be acquired and processed. To bring scalability for yield forecast, in the present paper we describe a system that incorporates satellite-derived precipitation and soil properties datasets, seasonal climate forecasting data from physical models and other sources to produce a pre-season prediction of soybean/maize yield---with no need of NDVI data. This system provides significantly useful results by the exempting the need for high-resolution remote-sensing data and allowing farmers to prepare for adverse climate influence on the crop cycle. In our studies, we forecast the soybean and maize yields for Brazil and USA, which corresponded to 44% of the world's grain production in 2016. Results show the error metrics for soybean and maize yield forecasts are comparable to similar systems that only provide yield forecast information in the first weeks to months of the crop cycle.
△ Less
Submitted 15 October, 2018; v1 submitted 24 June, 2018;
originally announced June 2018.
-
Data-Driven Participation Factors for Nonlinear Systems Based on Koopman Mode Decomposition
Authors:
Marcos Netto,
Yoshihiko Susuki,
Lamine Mili
Abstract:
This paper develops a novel data-driven technique to compute the participation factors for nonlinear systems based on the Koopman mode decomposition. Provided that certain conditions are satisfied, it is shown that the proposed technique generalizes the original definition of the linear mode-in-state participation factors. Two numerical examples are provided to demonstrate the performance of our a…
▽ More
This paper develops a novel data-driven technique to compute the participation factors for nonlinear systems based on the Koopman mode decomposition. Provided that certain conditions are satisfied, it is shown that the proposed technique generalizes the original definition of the linear mode-in-state participation factors. Two numerical examples are provided to demonstrate the performance of our approach: one relying on a canonical nonlinear dynamical system, and the other based on the two-area four-machine power system. The Koopman mode decomposition is capable of coping with a large class of nonlinearity, thereby making our technique able to deal with oscillations arising in practice due to nonlinearities while being fast to compute and compatible with real-time applications.
△ Less
Submitted 19 September, 2018; v1 submitted 4 June, 2018;
originally announced June 2018.
-
JobPruner: A Machine Learning Assistant for Exploring Parameter Spaces in HPC Applications
Authors:
Bruno Silva,
Marco A. S. Netto,
Renato L. F. Cunha
Abstract:
High Performance Computing (HPC) applications are essential for scientists and engineers to create and understand models and their properties. These professionals depend on the execution of large sets of computational jobs that explore combinations of parameter values. Avoiding the execution of unnecessary jobs brings not only speed to these experiments, but also reductions in infrastructure usage…
▽ More
High Performance Computing (HPC) applications are essential for scientists and engineers to create and understand models and their properties. These professionals depend on the execution of large sets of computational jobs that explore combinations of parameter values. Avoiding the execution of unnecessary jobs brings not only speed to these experiments, but also reductions in infrastructure usage---particularly important due to the shift of these applications to HPC cloud platforms. Our hypothesis is that data generated by these experiments can help users in identifying such jobs. To address this hypothesis we need to understand the similarity levels among multiple experiments necessary for job elimination decisions and the steps required to automate this process. In this paper we present a study and a machine learning-based tool called JobPruner to support parameter exploration in HPC experiments. The tool was evaluated with three real-world use cases from different domains including seismic analysis and agronomy. We observed the tool reduced 93% of jobs in a single experiment, while improving quality in most scenarios. In addition, reduction in job executions was possible even considering past experiments with low correlations.
△ Less
Submitted 14 February, 2018; v1 submitted 3 February, 2018;
originally announced February 2018.
-
A Manifesto for Future Generation Cloud Computing: Research Directions for the Next Decade
Authors:
Rajkumar Buyya,
Satish Narayana Srirama,
Giuliano Casale,
Rodrigo Calheiros,
Yogesh Simmhan,
Blesson Varghese,
Erol Gelenbe,
Bahman Javadi,
Luis Miguel Vaquero,
Marco A. S. Netto,
Adel Nadjaran Toosi,
Maria Alejandra Rodriguez,
Ignacio M. Llorente,
Sabrina De Capitani di Vimercati,
Pierangela Samarati,
Dejan Milojicic,
Carlos Varela,
Rami Bahsoon,
Marcos Dias de Assuncao,
Omer Rana,
Wanlei Zhou,
Hai Jin,
Wolfgang Gentzsch,
Albert Y. Zomaya,
Haiying Shen
Abstract:
The Cloud computing paradigm has revolutionised the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries, and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This…
▽ More
The Cloud computing paradigm has revolutionised the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries, and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This has instigated (1) shorter establishment times for start-ups, (2) creation of scalable global enterprise applications, (3) better cost-to-value associativity for scientific and high performance computing applications, and (4) different invocation/execution models for pervasive and ubiquitous applications. The recent technological developments and paradigms such as serverless computing, software-defined networking, Internet of Things, and processing at network edge are creating new opportunities for Cloud computing. However, they are also posing several new challenges and creating the need for new approaches and research strategies, as well as the re-evaluation of the models that were developed to address issues such as scalability, elasticity, reliability, security, sustainability, and application models. The proposed manifesto addresses them by identifying the major open challenges in Cloud computing, emerging trends, and impact areas. It then offers research directions for the next decade, thus helping in the realisation of Future Generation Cloud Computing.
△ Less
Submitted 24 August, 2018; v1 submitted 24 November, 2017;
originally announced November 2017.
-
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
Authors:
Marco A. S. Netto,
Rodrigo N. Calheiros,
Eduardo R. Rodrigues,
Renato L. F. Cunha,
Rajkumar Buyya
Abstract:
High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get t…
▽ More
High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.
△ Less
Submitted 2 February, 2018; v1 submitted 24 October, 2017;
originally announced October 2017.
-
SLA-aware Interactive Workflow Assistant for HPC Parameter Sweeping Experiments
Authors:
Bruno Silva,
Marco A. S. Netto,
Renato L. F. Cunha
Abstract:
A common workflow in science and engineering is to (i) setup and deploy large experiments with tasks comprising an application and multiple parameter values; (ii) generate intermediate results; (iii) analyze them; and (iv) reprioritize the tasks. These steps are repeated until the desired goal is achieved, which can be the evaluation/simulation of complex systems or model calibration. Due to time…
▽ More
A common workflow in science and engineering is to (i) setup and deploy large experiments with tasks comprising an application and multiple parameter values; (ii) generate intermediate results; (iii) analyze them; and (iv) reprioritize the tasks. These steps are repeated until the desired goal is achieved, which can be the evaluation/simulation of complex systems or model calibration. Due to time and cost constraints, sweeping all possible parameter values of the user application is not always feasible. Experimental Design techniques can help users reorganize submission-execution-analysis workflows to bring a solution in a more timely manner. This paper introduces a novel tool that leverages users' feedback on analyzing intermediate results of parameter sweeping experiments to advise them about their strategies on parameter selections tied to their SLA constraints. We evaluated our tool with three applications of distinct domains and search space shapes. Our main finding is that users with submission-execution-analysis workflows can benefit from their interaction with intermediate results and adapt themselves according to their domain expertise and SLA constraints.
△ Less
Submitted 9 November, 2016;
originally announced November 2016.
-
Helping HPC Users Specify Job Memory Requirements via Machine Learning
Authors:
Eduardo R. Rodrigues,
Renato L. F. Cunha,
Marco A. S. Netto,
Michael Spriggs
Abstract:
Resource allocation in High Performance Computing (HPC) settings is still not easy for end-users due to the wide variety of application and environment configuration options. Users have difficulties to estimate the number of processors and amount of memory required by their jobs, select the queue and partition, and estimate when job output will be available to plan for next experiments. Apart from…
▽ More
Resource allocation in High Performance Computing (HPC) settings is still not easy for end-users due to the wide variety of application and environment configuration options. Users have difficulties to estimate the number of processors and amount of memory required by their jobs, select the queue and partition, and estimate when job output will be available to plan for next experiments. Apart from wasting infrastructure resources by making wrong allocation decisions, overall user response time can also be negatively impacted. Techniques that exploit batch scheduler systems to predict waiting time and runtime of user jobs have already been proposed. However, we observed that such techniques are not suitable for predicting job memory usage. In this paper we introduce a tool to help users predict their memory requirements using machine learning. We describe the integration of the tool with a batch scheduler system, discuss how batch scheduler log data can be exploited to generate memory usage predictions through machine learning, and present results of two production systems containing thousands of jobs.
△ Less
Submitted 9 November, 2016;
originally announced November 2016.
-
Job Placement Advisor Based on Turnaround Predictions for HPC Hybrid Clouds
Authors:
Renato L. F. Cunha,
Eduardo R. Rodrigues,
Leonardo P. Tizzei,
Marco A. S. Netto
Abstract:
Several companies and research institutes are moving their CPU-intensive applications to hybrid High Performance Computing (HPC) cloud environments. Such a shift depends on the creation of software systems that help users decide where a job should be placed considering execution time and queue wait time to access on-premise clusters. Relying blindly on turnaround prediction techniques will affect…
▽ More
Several companies and research institutes are moving their CPU-intensive applications to hybrid High Performance Computing (HPC) cloud environments. Such a shift depends on the creation of software systems that help users decide where a job should be placed considering execution time and queue wait time to access on-premise clusters. Relying blindly on turnaround prediction techniques will affect negatively response times inside HPC cloud environments. This paper introduces a tool to make job placement decisions in HPC hybrid cloud environments taking into account the inaccuracy of execution and waiting time predictions. We used job traces from real supercomputing centers to run our experiments, and compared the performance between environments using real speedup curves. We also extended a state-of-the-art machine learning based predictor to work with data from the cluster scheduler. Our main findings are: (i) depending on workload characteristics, there is a turning point where predictions should be disregarded in favor of a more conservative decision to minimize job turnaround times and (ii) scheduler data plays a key role in improving predictions generated with machine learning using job trace data---our experiments showed around 20% prediction accuracy improvements.
△ Less
Submitted 26 August, 2016; v1 submitted 22 August, 2016;
originally announced August 2016.
-
Exploiting Workload Cycles for Orchestration of Virtual Machine Live Migrations in Clouds
Authors:
Artur Baruchi,
Edson T. Midorikawa,
Liria M. Sato,
Marco A. S. Netto
Abstract:
Virtual machine live migration in cloud environments aims at reducing energy costs and increasing resource utilization. However, its potential has not been fully explored because of simultaneous migrations that may cause user application performance degradation and network congestion. Research efforts on live migration orchestration policies still mostly rely on system level metrics. This work int…
▽ More
Virtual machine live migration in cloud environments aims at reducing energy costs and increasing resource utilization. However, its potential has not been fully explored because of simultaneous migrations that may cause user application performance degradation and network congestion. Research efforts on live migration orchestration policies still mostly rely on system level metrics. This work introduces an Application-aware Live Migration Architecture (ALMA) that selects suitable moments for migrations using application characterization data. This characterization consists in recognizing resource usage cycles via Fast Fourier Transform. From our experiments, live migration times were reduced by up to 74% for benchmarks and by up to 67% for real applications, when compared to migration policies with no application workload analysis. Network data transfer during the live migration was reduced by up to 62%.
△ Less
Submitted 26 July, 2016;
originally announced July 2016.
-
An SLA-based Advisor for Placement of HPC Jobs on Hybrid Clouds
Authors:
Kiran Mantripragada,
Leonardo P. Tizzei,
Alecio P. D. Binotto,
Marco A. S. Netto
Abstract:
Several scientific and industry applications require High Performance Computing (HPC) resources to process and/or simulate complex models. Not long ago, companies, research institutes, and universities used to acquire and maintain on-premise computer clusters; but, recently, cloud computing has emerged as an alternative for a subset of HPC applications. This poses a challenge to end-users, who hav…
▽ More
Several scientific and industry applications require High Performance Computing (HPC) resources to process and/or simulate complex models. Not long ago, companies, research institutes, and universities used to acquire and maintain on-premise computer clusters; but, recently, cloud computing has emerged as an alternative for a subset of HPC applications. This poses a challenge to end-users, who have to decide where to run their jobs: on local clusters or burst to a remote cloud service provider. While current research on HPC cloud has focused on comparing performance of on-premise clusters against cloud resources, we build on top of existing efforts and introduce an advisory service to help users make this decision considering the trade-offs of resource costs, performance, and availability on hybrid clouds. We evaluated our service using a real test-bed with a seismic processing application based on Full Waveform Inversion; a technique used by geophysicists in the oil & gas industry and earthquake prediction. We also discuss how the advisor can be used for other applications and highlight the main lessons learned constructing this service to reduce costs and turnaround times.
△ Less
Submitted 20 July, 2015;
originally announced July 2015.
-
Using Application Data for SLA-aware Auto-scaling in Cloud Environments
Authors:
Andre Abrantes D. P. Souza,
Marco A. S. Netto
Abstract:
With the establishment of cloud computing as the environment of choice for most modern applications, auto-scaling is an economic matter of great importance. For applications like stream computing that process ever changing amounts of data, modifying the number and configuration of resources to meet performance requirements becomes essential. Current solutions on auto-scaling are mostly rule-based…
▽ More
With the establishment of cloud computing as the environment of choice for most modern applications, auto-scaling is an economic matter of great importance. For applications like stream computing that process ever changing amounts of data, modifying the number and configuration of resources to meet performance requirements becomes essential. Current solutions on auto-scaling are mostly rule-based using infrastructure level metrics such as CPU/memory/network utilization, and system level metrics such as throughput and response time. In this paper, we introduce a study on how effective auto-scaling can be using data generated by the application itself. To make this assessment, two algorithms are proposed that use a priori knowledge of the data stream and use sentiment analysis from soccer-related tweets, triggering auto-scaling operations according to rapid changes in the public sentiment about the soccer players that happens just before big bursts of messages. Our application-based auto-scaling was able to reduce the number of SLA violations by up to 95% and reduce resource requirements by up to 33%.
△ Less
Submitted 17 June, 2015;
originally announced June 2015.
-
Big Data Computing and Clouds: Trends and Future Directions
Authors:
Marcos D. Assuncao,
Rodrigo N. Calheiros,
Silvia Bianchi,
Marco A. S. Netto,
Rajkumar Buyya
Abstract:
This paper discusses approaches and environments for carrying out analytics on Clouds for Big Data applications. It revolves around four important areas of analytics and Big Data, namely (i) data management and supporting architectures; (ii) model development and scoring; (iii) visualisation and user interaction; and (iv) business models. Through a detailed survey, we identify possible gaps in tec…
▽ More
This paper discusses approaches and environments for carrying out analytics on Clouds for Big Data applications. It revolves around four important areas of analytics and Big Data, namely (i) data management and supporting architectures; (ii) model development and scoring; (iii) visualisation and user interaction; and (iv) business models. Through a detailed survey, we identify possible gaps in technology and provide recommendations for the research community on future directions on Cloud-supported Big Data computing and analytics solutions.
△ Less
Submitted 22 August, 2014; v1 submitted 17 December, 2013;
originally announced December 2013.
-
Patience-aware Scheduling for Cloud Services: Freeing Users from the Chains of Boredom
Authors:
Carlos Cardonha,
Marcos D. Assunção,
Marco A. S. Netto,
Renato L. F. Cunha,
Carlos Queiroz
Abstract:
Scheduling of service requests in Cloud computing has traditionally focused on the reduction of pre-service wait, generally termed as waiting time. Under certain conditions such as peak load, however, it is not always possible to give reasonable response times to all users. This work explores the fact that different users may have their own levels of tolerance or patience with response delays. We…
▽ More
Scheduling of service requests in Cloud computing has traditionally focused on the reduction of pre-service wait, generally termed as waiting time. Under certain conditions such as peak load, however, it is not always possible to give reasonable response times to all users. This work explores the fact that different users may have their own levels of tolerance or patience with response delays. We introduce scheduling strategies that produce better assignment plans by prioritising requests from users who expect to receive the results earlier and by postponing servicing jobs from those who are more tolerant to response delays. Our analytical results show that the behaviour of users' patience plays a key role in the evaluation of scheduling techniques, and our computational evaluation demonstrates that, under peak load, the new algorithms typically provide better user experience than the traditional FIFO strategy.
△ Less
Submitted 19 August, 2013;
originally announced August 2013.
-
Biochemical analysis of human breast tissues using FT-Raman spectroscopy
Authors:
Renata Andrade Bitar,
Herculano da Silva Martinho,
Carlos Julio Tierra Criollo,
Leandra Naira Zambelli Ramalho,
Mario Mourao Netto,
Airton Abrahao Martin
Abstract:
In this work we employ the Fourier Transform Raman Spectroscopy to study normal and tumoral human breast tissues, including several subtypes of cancers. We analyzed 194 Raman spectra from breast tissues that were separated into 9 groups according to their corresponding histopathological diagnosis. The assignment of the relevant Raman bands enabled us to connect the several kinds of breast tissue…
▽ More
In this work we employ the Fourier Transform Raman Spectroscopy to study normal and tumoral human breast tissues, including several subtypes of cancers. We analyzed 194 Raman spectra from breast tissues that were separated into 9 groups according to their corresponding histopathological diagnosis. The assignment of the relevant Raman bands enabled us to connect the several kinds of breast tissues (normal and pathological) to their corresponding biochemical moieties alterations and distinguish among 7 groups: normal breast, fibrocystic condition, duct carcinoma-in-situ, duct carcinoma-in-situ with necrosis, infiltrating duct carcinoma not otherwise specified, colloid infiltrating duct carcinoma and invasive lobular carcinomas. We were able to establish the biochemical basis for each spectrum, relating the observed peaks to specific biomolecules that play special role in the carcinogenesis process. This work is very useful for the premature optical diagnosis of a broad range of breast pathologies. We noticed that we were not able to differentiate inflammatory and medullary duct carcinomas from infiltrating duct carcinoma not otherwise specified.
△ Less
Submitted 31 March, 2006;
originally announced March 2006.