-
Advancing GDP Forecasting: The Potential of Machine Learning Techniques in Economic Predictions
Authors:
Bogdan Oancea
Abstract:
The quest for accurate economic forecasting has traditionally been dominated by econometric models, which most of the times rely on the assumptions of linear relationships and stationarity in of the data. However, the complex and often nonlinear nature of global economies necessitates the exploration of alternative approaches. Machine learning methods offer promising advantages over traditional ec…
▽ More
The quest for accurate economic forecasting has traditionally been dominated by econometric models, which most of the times rely on the assumptions of linear relationships and stationarity in of the data. However, the complex and often nonlinear nature of global economies necessitates the exploration of alternative approaches. Machine learning methods offer promising advantages over traditional econometric techniques for Gross Domestic Product forecasting, given their ability to model complex, nonlinear interactions and patterns without the need for explicit specification of the underlying relationships. This paper investigates the efficacy of Recurrent Neural Networks, in forecasting GDP, specifically LSTM networks. These models are compared against a traditional econometric method, SARIMA. We employ the quarterly Romanian GDP dataset from 1995 to 2023 and build a LSTM network to forecast to next 4 values in the series. Our findings suggest that machine learning models, consistently outperform traditional econometric models in terms of predictive accuracy and flexibility
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Text classification using machine learning methods
Authors:
Bogdan Oancea
Abstract:
In this paper we present the results of an experiment aimed to use machine learning methods to obtain models that can be used for the automatic classification of products. In order to apply automatic classification methods, we transformed the product names from a text representation to numeric vectors, a process called word embedding. We used several embedding methods: Count Vectorization, TF-IDF,…
▽ More
In this paper we present the results of an experiment aimed to use machine learning methods to obtain models that can be used for the automatic classification of products. In order to apply automatic classification methods, we transformed the product names from a text representation to numeric vectors, a process called word embedding. We used several embedding methods: Count Vectorization, TF-IDF, Word2Vec, FASTTEXT, and GloVe. Having the product names in a form of numeric vectors, we proceeded with a set of machine learning methods for automatic classification: Logistic Regression, Multinomial Naive Bayes, kNN, Artificial Neural Networks, Support Vector Machines, and Decision trees with several variants. The results show an impressive accuracy of the classification process for Support Vector Machines, Logistic Regression, and Random Forests. Regarding the word embedding methods, the best results were obtained with the FASTTEXT technique.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Urban Green Index estimation based on data collected by remote sensing for Romanian cities
Authors:
Marian Necula,
Tudorel Andrei,
Bogdan Oancea,
Mihaela Păun
Abstract:
The modernization of offi cial statistics involves the use of new data sources, such as data collected through remote sensing. The document contains a description of how an urban green index, derived from the SDG 11.7 objective, was obtained for Romania's 41 county seat cities based on free data sets collected by remote sensing from the European and North American space agencies. The main result i…
▽ More
The modernization of offi cial statistics involves the use of new data sources, such as data collected through remote sensing. The document contains a description of how an urban green index, derived from the SDG 11.7 objective, was obtained for Romania's 41 county seat cities based on free data sets collected by remote sensing from the European and North American space agencies. The main result is represented by an estimate of the areas of surfaces covered with vegetation for the 40 county seat towns and the municipality of Bucharest, relative to the total surface. To estimate the area covered with vegetation, we used two data sets obtained by remote sensing, namely data provided by the MODIS mission, the TERRA satellite, and data provided by the Sentinel 2 mission from the Copernicus space program. Based on the results obtained, namely the surface area covered with vegetation, estimated in square kilometers, and the percentage of the total surface area or urban green index, we have created a national top of the county seat cities
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
A set of R packages to estimate population counts from mobile phone data
Authors:
Bogdan Oancea,
David Salgado,
Luis Sanguiao Sande,
Sandra Barragan
Abstract:
In this paper, we describe the software implementation of the methodological framework designed to incorporate mobile phone data into the current production chain of official statistics during the ESSnet Big Data II project. We present an overview of the architecture of the software stack, its components, the interfaces between them, and show how they can be used. Our software implementation consi…
▽ More
In this paper, we describe the software implementation of the methodological framework designed to incorporate mobile phone data into the current production chain of official statistics during the ESSnet Big Data II project. We present an overview of the architecture of the software stack, its components, the interfaces between them, and show how they can be used. Our software implementation consists in four R packages: destim for estimation of the spatial distribution of the mobile devices, deduplication for classification of the devices as being in 1:1 or 2:1 correspondence with its owner, aggregation for estimation of the number of individuals detected by the network starting from the geolocation probabilities and the duplicity probabilities and inference which combines the number of individuals provided by the previous package with other information like the population counts from an official register and the mobile operator penetration rates to provide an estimation of the target population counts.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
The performances of R GPU implementations of the GMRES method
Authors:
Bogdan Oancea,
Richard Pospisil
Abstract:
Although the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of the computational power available now for most desktops and laptops. Modern statistical software packages rely on high performance implementations…
▽ More
Although the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of the computational power available now for most desktops and laptops. Modern statistical software packages rely on high performance implementations of the linear algebra routines there are at the core of several important leading edge statistical methods. In this paper we present a GPU implementation of the GMRES iterative method for solving linear systems. We compare the performance of this implementation with a pure single threaded version of the CPU. We also investigate the performance of our implementation using different GPU packages available now for R such as gmatrix, gputools or gpuR which are based on CUDA or OpenCL frameworks.
△ Less
Submitted 6 February, 2018;
originally announced February 2018.
-
The return to higher education: evidence from Romania
Authors:
Bogdan Oancea,
Richard Pospisil,
Raluca Mariana Dragoescu
Abstract:
Education is one of the most important components of the human capital, and an important determinant of the personal income. Estimating the rate of return to education is a main topic of economic research. In this paper we analyzed the rate of return to higher education in Romania using the well-known Mincer equation. Besides the educational level and the number of years of experience on the labor…
▽ More
Education is one of the most important components of the human capital, and an important determinant of the personal income. Estimating the rate of return to education is a main topic of economic research. In this paper we analyzed the rate of return to higher education in Romania using the well-known Mincer equation. Besides the educational level and the number of years of experience on the labor market we also used a series of socio-demographic variables such as gender, civil status, the area of residence. We were interested mainly in calculating the rate of return to higher education, therefore we computed this rate for bachelor, master and doctoral degrees separately. We also investigated the rate of return to higher education on technical, science, economics, law, medicine, and arts fields. Our results showed that the rate of return to higher education has a greater value than most of the developed countries of EU and the field of higher education that brings the highest rate of return is medicine
△ Less
Submitted 17 September, 2017;
originally announced November 2017.
-
Improving the performance of the linear systems solvers using CUDA
Authors:
Bogdan Oancea,
Tudorel Andrei,
Raluca Mariana Dragoescu
Abstract:
Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core processors that can obtain very high FLOP rates. Since the first idea of using GPU for general purpose computing, things have evolved and now there are several…
▽ More
Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core processors that can obtain very high FLOP rates. Since the first idea of using GPU for general purpose computing, things have evolved and now there are several approaches to GPU programming: CUDA from NVIDIA and Stream from AMD. CUDA is now a popular programming model for general purpose computations on GPU for C/C++ programmers. A great number of applications were ported to CUDA programming model and they obtain speedups of orders of magnitude comparing to optimized CPU implementations. In this paper we present an implementation of a library for solving linear systems using the CCUDA framework. We present the results of performance tests and show that using GPU one can obtain speedups of about of approximately 80 times comparing with a CPU implementation.
△ Less
Submitted 23 November, 2015;
originally announced November 2015.
-
Developing a High Performance Software Library with MPI and CUDA for Matrix Computations
Authors:
Bogdan Oancea,
Tudorel Andrei
Abstract:
Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude comparing to optimized CPU implementations. Hybrid approaches that combine the message passing model with the shared memory model for parallel computing are a solut…
▽ More
Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude comparing to optimized CPU implementations. Hybrid approaches that combine the message passing model with the shared memory model for parallel computing are a solution for very large applications. We considered a heterogeneous cluster that combines the CPU and GPU computations using MPI and CUDA for developing a high performance linear algebra library. Our library deals with large linear systems solvers because they are a common problem in the fields of science and engineering. Direct methods for computing the solution of such systems can be very expensive due to high memory requirements and computational cost. An efficient alternative are iterative methods which computes only an approximation of the solution. In this paper we present an implementation of a library that uses a hybrid model of computation using MPI and CUDA implementing both direct and iterative linear systems solvers. Our library implements LU and Cholesky factorization based solvers and some of the non-stationary iterative methods using the MPI/CUDA combination. We compared the performance of our MPI/CUDA implementation with classic programs written to be run on a single CPU.
△ Less
Submitted 23 November, 2015;
originally announced November 2015.
-
Accelerating R with high performance linear algebra libraries
Authors:
Bogdan Oancea,
Tudorel Andrei,
Raluca Mariana Dragoescu
Abstract:
Linear algebra routines are basic building blocks for the statistical software. In this paper we analyzed how can we can improve R performance for matrix computations. We benchmarked few matrix operations using the standard linear algebra libraries included in the R distribution and high performance libraries like OpenBLAS, GotoBLAS and MKL. Our tests showed the the best results are obtained with…
▽ More
Linear algebra routines are basic building blocks for the statistical software. In this paper we analyzed how can we can improve R performance for matrix computations. We benchmarked few matrix operations using the standard linear algebra libraries included in the R distribution and high performance libraries like OpenBLAS, GotoBLAS and MKL. Our tests showed the the best results are obtained with the MKL library, the other two libraries having similar performances, but lower than MKL
△ Less
Submitted 4 August, 2015;
originally announced August 2015.
-
GPGPU Computing
Authors:
Bogdan Oancea,
Tudorel Andrei,
Raluca Mariana Dragoescu
Abstract:
Since the first idea of using GPU to general purpose computing, things have evolved over the years and now there are several approaches to GPU programming. GPU computing practically began with the introduction of CUDA (Compute Unified Device Architecture) by NVIDIA and Stream by AMD. These are APIs designed by the GPU vendors to be used together with the hardware that they provide. A new emerging…
▽ More
Since the first idea of using GPU to general purpose computing, things have evolved over the years and now there are several approaches to GPU programming. GPU computing practically began with the introduction of CUDA (Compute Unified Device Architecture) by NVIDIA and Stream by AMD. These are APIs designed by the GPU vendors to be used together with the hardware that they provide. A new emerging standard, OpenCL (Open Computing Language) tries to unify different GPU general computing API implementations and provides a framework for writing programs executed across heterogeneous platforms consisting of both CPUs and GPUs. OpenCL provides parallel computing using task-based and data-based parallelism. In this paper we will focus on the CUDA parallel computing architecture and programming model introduced by NVIDIA. We will present the benefits of the CUDA programming model. We will also compare the two main approaches, CUDA and AMD APP (STREAM) and the new framwork, OpenCL that tries to unify the GPGPU computing models.
△ Less
Submitted 29 August, 2014;
originally announced August 2014.
-
Integrating R and Hadoop for Big Data Analysis
Authors:
Bogdan Oancea,
Raluca Mariana Dragoescu
Abstract:
Analyzing and working with big data could be very diffi cult using classical means like relational database management systems or desktop software packages for statistics and visualization. Instead, big data requires large clusters with hundreds or even thousands of computing nodes. Offi cial statistics is increasingly considering big data for deriving new statistics because big data sources could…
▽ More
Analyzing and working with big data could be very diffi cult using classical means like relational database management systems or desktop software packages for statistics and visualization. Instead, big data requires large clusters with hundreds or even thousands of computing nodes. Offi cial statistics is increasingly considering big data for deriving new statistics because big data sources could produce more relevant and timely statistics than traditional sources. One of the software tools successfully and wide spread used for storage and processing of big data sets on clusters of commodity hardware is Hadoop. Hadoop framework contains libraries, a distributed fi le-system (HDFS), a resource-management platform and implements a version of the MapReduce programming model for large scale data processing. In this paper we investigate the possibilities of integrating Hadoop with R which is a popular software used for statistical computing and data visualization. We present three ways of integrating them: R with Streaming, Rhipe and RHadoop and we emphasize the advantages and disadvantages of each solution.
△ Less
Submitted 18 July, 2014;
originally announced July 2014.
-
Time series forecasting using neural networks
Authors:
Bogdan Oancea,
ŞTefan Cristian Ciucu
Abstract:
Recent studies have shown the classification and prediction power of the Neural Networks. It has been demonstrated that a NN can approximate any continuous function. Neural networks have been successfully used for forecasting of financial data series. The classical methods used for time series prediction like Box-Jenkins or ARIMA assumes that there is a linear relationship between inputs and outpu…
▽ More
Recent studies have shown the classification and prediction power of the Neural Networks. It has been demonstrated that a NN can approximate any continuous function. Neural networks have been successfully used for forecasting of financial data series. The classical methods used for time series prediction like Box-Jenkins or ARIMA assumes that there is a linear relationship between inputs and outputs. Neural Networks have the advantage that can approximate nonlinear functions. In this paper we compared the performances of different feed forward and recurrent neural networks and training algorithms for predicting the exchange rate EUR/RON and USD/RON. We used data series with daily exchange rates starting from 2005 until 2013.
△ Less
Submitted 7 January, 2014;
originally announced January 2014.