Search | arXiv e-print repository

Turing complete Navier-Stokes steady states via cosymplectic geometry

Authors: Søren Dyhr, Ángel González-Prieto, Eva Miranda, Daniel Peralta-Salas

Abstract: In this article, we construct stationary solutions to the Navier-Stokes equations on certain Riemannian $3$-manifolds that exhibit Turing completeness, in the sense that they are capable of performing universal computation. This universality arises on manifolds admitting nonvanishing harmonic 1-forms, thus showing that computational universality is not obstructed by viscosity, provided the underly… ▽ More In this article, we construct stationary solutions to the Navier-Stokes equations on certain Riemannian $3$-manifolds that exhibit Turing completeness, in the sense that they are capable of performing universal computation. This universality arises on manifolds admitting nonvanishing harmonic 1-forms, thus showing that computational universality is not obstructed by viscosity, provided the underlying geometry satisfies a mild cohomological condition. The proof makes use of a correspondence between nonvanishing harmonic $1$-forms and cosymplectic geometry, which extends the classical correspondence between Beltrami fields and Reeb flows on contact manifolds. △ Less

Submitted 10 July, 2025; originally announced July 2025.

Comments: 11 pages, no figures

arXiv:2503.16100 [pdf, other]

Topological Kleene Field Theories: A new model of computation

Authors: Ángel González-Prieto, Eva Miranda, Daniel Peralta-Salas

Abstract: In this article, we establish the foundations of a computational field theory, which we term Topological Kleene Field Theory (TKFT), inspired by Stephen Kleene's seminal work on partial recursive functions. Our central result shows that any computable function can be simulated by the flow on a smooth bordism of a vector field with good local properties. More precisely, we prove that reaching funct… ▽ More In this article, we establish the foundations of a computational field theory, which we term Topological Kleene Field Theory (TKFT), inspired by Stephen Kleene's seminal work on partial recursive functions. Our central result shows that any computable function can be simulated by the flow on a smooth bordism of a vector field with good local properties. More precisely, we prove that reaching functions on clean dynamical bordisms are exactly equivalent to computable functions, setting an alternative model of computation to Turing machines. The use of non-trivial topologies for the bordisms involved is essential for this equivalence, suggesting interesting connections between the topological structure of these flows and the computational complexity inherent in the functions. We emphasize that TKFT has the potential to surpass the computational complexity of both Turing machines and quantum computation. △ Less

Submitted 20 March, 2025; originally announced March 2025.

Comments: 29 pages

arXiv:2412.02384 [pdf, other]

Theory building for empirical software engineering in qualitative research: Operationalization

Authors: Jorge Pérez, Jessica Díaz, Ángel González-Prieto, Sergio Gil-Borrás

Abstract: Context: This work is part of a research project whose ultimate goal is to systematize theory building in qualitative research in the field of software engineering. The proposed methodology involves four phases: conceptualization, operationalization, testing, and application. In previous work, we performed the conceptualization of a theory that investigates the structure of IT departments and team… ▽ More Context: This work is part of a research project whose ultimate goal is to systematize theory building in qualitative research in the field of software engineering. The proposed methodology involves four phases: conceptualization, operationalization, testing, and application. In previous work, we performed the conceptualization of a theory that investigates the structure of IT departments and teams when software-intensive organizations adopt a culture called DevOps. Objective: This paper presents a set of procedures to systematize the operationalization phase in theory building and their application in the context of DevOps team structures. Method: We operationalize the concepts and propositions that make up our theory to generate constructs and empirically testable hypotheses. Instead of using causal relations to operationalize the propositions, we adopt logical implication, which avoids the problems associated with causal reasoning. Strategies are proposed to ensure that the resulting theory aligns with the criterion of parsimony. Results: The operationalization phase is described from three perspectives: specification, implementation, and practical application. First, the operationalization process is formally defined. Second, a set of procedures for operating both concepts and propositions is described. Finally, the usefulness of the proposed procedures is demonstrated in a case study. Conclusions: This paper is a pioneering contribution in offering comprehensive guidelines for theory operationalization using logical implication. By following established procedures and using concrete examples, researchers can better ensure the success of their theory-building efforts through careful operationalization. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: 22 pages, 7 figures

ACM Class: D.2.0

arXiv:2410.16838 [pdf]

doi 10.9781/ijimai.2021.08.010

Neural Collaborative Filtering Classification Model to Obtain Prediction Reliabilities

Authors: Jesús Bobadilla, Abraham Gutiérrez, Santiago Alonso, Ángel González-Prieto

Abstract: Neural collaborative filtering is the state of art field in the recommender systems area; it provides some models that obtain accurate predictions and recommendations. These models are regression-based, and they just return rating predictions. This paper proposes the use of a classification-based approach, returning both rating predictions and their reliabilities. The extra information (prediction… ▽ More Neural collaborative filtering is the state of art field in the recommender systems area; it provides some models that obtain accurate predictions and recommendations. These models are regression-based, and they just return rating predictions. This paper proposes the use of a classification-based approach, returning both rating predictions and their reliabilities. The extra information (prediction reliabilities) can be used in a variety of relevant collaborative filtering areas such as detection of shilling attacks, recommendations explanation or navigational tools to show users and items dependences. Additionally, recommendation reliabilities can be gracefully provided to users: "probably you will like this film", "almost certainly you will like this song", etc. This paper provides the proposed neural architecture; it also tests that the quality of its recommendation results is as good as the state of art baselines. Remarkably, individual rating predictions are improved by using the proposed architecture compared to baselines. Experiments have been performed making use of four popular public datasets, showing generalizable quality results. Overall, the proposed architecture improves individual rating predictions quality, maintains recommendation results and opens the doors to a set of relevant collaborative filtering fields. △ Less

Submitted 24 October, 2024; v1 submitted 22 October, 2024; originally announced October 2024.

Comments: 9 pages, 7 figures

Journal ref: International Journal of Interactive Multimedia and Artificial Intelligence, Volume 7, number 4, Pages 18-26, 2022

arXiv:2308.02058 [pdf, other]

Incorporating Recklessness to Collaborative Filtering based Recommender Systems

Authors: Diego Pérez-López, Fernando Ortega, Ángel González-Prieto, Jorge Dueñas-Lerín

Abstract: Recommender systems are intrinsically tied to a reliability/coverage dilemma: The more reliable we desire the forecasts, the more conservative the decision will be and thus, the fewer items will be recommended. This causes a detriment to the predictive capability of the system, as it is only able to estimate potential interest in items for which there is a consensus in their evaluation, rather tha… ▽ More Recommender systems are intrinsically tied to a reliability/coverage dilemma: The more reliable we desire the forecasts, the more conservative the decision will be and thus, the fewer items will be recommended. This causes a detriment to the predictive capability of the system, as it is only able to estimate potential interest in items for which there is a consensus in their evaluation, rather than being able to estimate potential interest in any item. In this paper, we propose the inclusion of a new term in the learning process of matrix factorization-based recommender systems, called recklessness, that takes into account the variance of the output probability distribution of the predicted ratings. In this way, gauging this recklessness measure we can force more spiky output distribution, enabling the control of the risk level desired when making decisions about the reliability of a prediction. Experimental results demonstrate that recklessness not only allows for risk regulation but also improves the quantity and quality of predictions provided by the recommender system. △ Less

Submitted 21 May, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: 15 pages, 4 figures, 2 tables

MSC Class: Primary: 68T05; Secondary: 68T42; 62M20 ACM Class: I.2; I.5

arXiv:2303.09909 [pdf, other]

An evaluation framework for dimensionality reduction through sectional curvature

Authors: Raúl Lara-Cabrera, Ángel González-Prieto, Diego Pérez-López, Diego Trujillo, Fernando Ortega

Abstract: Unsupervised machine learning lacks ground truth by definition. This poses a major difficulty when designing metrics to evaluate the performance of such algorithms. In sharp contrast with supervised learning, for which plenty of quality metrics have been studied in the literature, in the field of dimensionality reduction only a few over-simplistic metrics has been proposed. In this work, we aim to… ▽ More Unsupervised machine learning lacks ground truth by definition. This poses a major difficulty when designing metrics to evaluate the performance of such algorithms. In sharp contrast with supervised learning, for which plenty of quality metrics have been studied in the literature, in the field of dimensionality reduction only a few over-simplistic metrics has been proposed. In this work, we aim to introduce the first highly non-trivial dimensionality reduction performance metric. This metric is based on the sectional curvature behaviour arising from Riemannian geometry. To test its feasibility, this metric has been used to evaluate the performance of the most commonly used dimension reduction algorithms in the state of the art. Furthermore, to make the evaluation of the algorithms robust and representative, using curvature properties of planar curves, a new parameterized problem instance generator has been constructed in the form of a function generator. Experimental results are consistent with what could be expected based on the design and characteristics of the evaluated algorithms and the features of the data instances used to feed the method. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 16 pages, 4 figures, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:2210.10619 [pdf, other]

Restricted Bernoulli Matrix Factorization: Balancing the trade-off between prediction accuracy and coverage in classification based collaborative filtering

Authors: Ángel González-Prieto, Abraham Gutiérrez, Fernando Ortega, Raúl Lara-Cabrera

Abstract: Reliability measures associated with the prediction of the machine learning models are critical to strengthening user confidence in artificial intelligence. Therefore, those models that are able to provide not only predictions, but also reliability, enjoy greater popularity. In the field of recommender systems, reliability is crucial, since users tend to prefer those recommendations that are sure… ▽ More Reliability measures associated with the prediction of the machine learning models are critical to strengthening user confidence in artificial intelligence. Therefore, those models that are able to provide not only predictions, but also reliability, enjoy greater popularity. In the field of recommender systems, reliability is crucial, since users tend to prefer those recommendations that are sure to interest them, that is, high predictions with high reliabilities. In this paper, we propose Restricted Bernoulli Matrix Factorization (ResBeMF), a new algorithm aimed at enhancing the performance of classification-based collaborative filtering. The proposed model has been compared to other existing solutions in the literature in terms of prediction quality (Mean Absolute Error and accuracy scores), prediction quantity (coverage score) and recommendation quality (Mean Average Precision score). The experimental results demonstrate that the proposed model provides a good balance in terms of the quality measures used compared to other recommendation models. △ Less

Submitted 21 December, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: Several changes performed, including a title change. 21 pages, 7 figures, 2 tables

MSC Class: Primary: 68T05; Secondary: 62M20 ACM Class: I.2; I.5

arXiv:2209.01842 [pdf, other]

doi 10.3390/math9040325

Dynamics of Fourier Modes in Torus Generative Adversarial Networks

Authors: Ángel González-Prieto, Alberto Mozo, Edgar Talavera, Sandra Gómez-Canaval

Abstract: Generative Adversarial Networks (GANs) are powerful Machine Learning models capable of generating fully synthetic samples of a desired phenomenon with a high resolution. Despite their success, the training process of a GAN is highly unstable and typically it is necessary to implement several accessory heuristics to the networks to reach an acceptable convergence of the model. In this paper, we int… ▽ More Generative Adversarial Networks (GANs) are powerful Machine Learning models capable of generating fully synthetic samples of a desired phenomenon with a high resolution. Despite their success, the training process of a GAN is highly unstable and typically it is necessary to implement several accessory heuristics to the networks to reach an acceptable convergence of the model. In this paper, we introduce a novel method to analyze the convergence and stability in the training of Generative Adversarial Networks. For this purpose, we propose to decompose the objective function of the adversary min-max game defining a periodic GAN into its Fourier series. By studying the dynamics of the truncated Fourier series for the continuous Alternating Gradient Descend algorithm, we are able to approximate the real flow and to identify the main features of the convergence of the GAN. This approach is confirmed empirically by studying the training flow in a $2$-parametric GAN aiming to generate an unknown exponential distribution. As byproduct, we show that convergent orbits in GANs are small perturbations of periodic orbits so the Nash equillibria are spiral attractors. This theoretically justifies the slow and unstable training observed in GANs. △ Less

Submitted 5 September, 2022; originally announced September 2022.

Comments: 27 pages, 8 figures, 1 table. Minor typos corrected from the published version

MSC Class: Primary: 37N99; Secondary: 68T07 ACM Class: G.1.7; I.2.6

arXiv:2206.13508 [pdf, other]

doi 10.1007/s00521-023-08459-3

Data Augmentation techniques in time series domain: A survey and taxonomy

Authors: Guillermo Iglesias, Edgar Talavera, Ángel González-Prieto, Alberto Mozo, Sandra Gómez-Canaval

Abstract: With the latest advances in Deep Learning-based generative models, it has not taken long to take advantage of their remarkable performance in the area of time series. Deep neural networks used to work with time series heavily depend on the size and consistency of the datasets used in training. These features are not usually abundant in the real world, where they are usually limited and often have… ▽ More With the latest advances in Deep Learning-based generative models, it has not taken long to take advantage of their remarkable performance in the area of time series. Deep neural networks used to work with time series heavily depend on the size and consistency of the datasets used in training. These features are not usually abundant in the real world, where they are usually limited and often have constraints that must be guaranteed. Therefore, an effective way to increase the amount of data is by using Data Augmentation techniques, either by adding noise or permutations and by generating new synthetic data. This work systematically reviews the current state-of-the-art in the area to provide an overview of all available algorithms and proposes a taxonomy of the most relevant research. The efficiency of the different variants will be evaluated as a central part of the process, as well as the different metrics to evaluate the performance and the main problems concerning each model will be analysed. The ultimate aim of this study is to provide a summary of the evolution and performance of areas that produce better results to guide future researchers in this field. △ Less

Submitted 16 February, 2024; v1 submitted 25 June, 2022; originally announced June 2022.

Comments: 33 pages, 9 figures

ACM Class: I.2.6

Journal ref: Neural Computing and Applications, 35(14), 10123-10145 (2023)

arXiv:2110.15914 [pdf, other]

doi 10.1016/j.ins.2022.07.066

Improving the quality of generative models through Smirnov transformation

Authors: Ángel González-Prieto, Alberto Mozo, Sandra Gómez-Canaval, Edgar Talavera

Abstract: Solving the convergence issues of Generative Adversarial Networks (GANs) is one of the most outstanding problems in generative models. In this work, we propose a novel activation function to be used as output of the generator agent. This activation function is based on the Smirnov probabilistic transformation and it is specifically designed to improve the quality of the generated data. In sharp co… ▽ More Solving the convergence issues of Generative Adversarial Networks (GANs) is one of the most outstanding problems in generative models. In this work, we propose a novel activation function to be used as output of the generator agent. This activation function is based on the Smirnov probabilistic transformation and it is specifically designed to improve the quality of the generated data. In sharp contrast with previous works, our activation function provides a more general approach that deals not only with the replication of categorical variables but with any type of data distribution (continuous or discrete). Moreover, our activation function is derivable and therefore, it can be seamlessly integrated in the backpropagation computations during the GAN training processes. To validate this approach, we evaluate our proposal against two different data sets: a) an artificially rendered data set containing a mixture of discrete and continuous variables, and b) a real data set of flow-based network traffic data containing both normal connections and cryptomining attacks. To evaluate the fidelity of the generated data, we analyze both their results in terms of quality measures of statistical nature and also regarding the use of these synthetic data to feed a nested machine learning-based classifier. The experimental results evince a clear outperformance of the GAN network tuned with this new activation function with respect to both a naïve mean-based generator and a standard GAN. The quality of the data is so high that the generated data can fully substitute real data for training the nested classifier without a fall in the obtained accuracy. This result encourages the use of GANs to produce high-quality synthetic data that are applicable in scenarios in which data privacy must be guaranteed. △ Less

Submitted 29 October, 2021; originally announced October 2021.

Comments: 28 pages, 16 Figures, 4 Tables

ACM Class: I.2.m

Journal ref: Information Sciences Volume 609, 2022

arXiv:2107.14776 [pdf, other]

doi 10.1038/s41598-022-06057-2

Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks

Authors: Alberto Mozo, Ángel González-Prieto, Antonio Pastor, Sandra Gómez-Canaval, Edgar Talavera

Abstract: Due to the growing rise of cyber attacks in the Internet, flow-based data sets are crucial to increase the performance of the Machine Learning (ML) components that run in network-based intrusion detection systems (IDS). To overcome the existing network traffic data shortage in attack analysis, recent works propose Generative Adversarial Networks (GANs) for synthetic flow-based network traffic gene… ▽ More Due to the growing rise of cyber attacks in the Internet, flow-based data sets are crucial to increase the performance of the Machine Learning (ML) components that run in network-based intrusion detection systems (IDS). To overcome the existing network traffic data shortage in attack analysis, recent works propose Generative Adversarial Networks (GANs) for synthetic flow-based network traffic generation. Data privacy is appearing more and more as a strong requirement when processing such network data, which suggests to find solutions where synthetic data can fully replace real data. Because of the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of IDS ML components. Therefore, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a byproduct, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a stopping criterion. An additional heuristic is proposed to select the best performing GANs when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attack traffic and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. We show that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector. △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: 35 pages, 13 figures, 8 tables

ACM Class: I.2.1

Journal ref: Scientific Reports (2022)

arXiv:2107.12677 [pdf, other]

Deep Variational Models for Collaborative Filtering-based Recommender Systems

Authors: Jesús Bobadilla, Fernando Ortega, Abraham Gutiérrez, Ángel González-Prieto

Abstract: Deep learning provides accurate collaborative filtering models to improve recommender system results. Deep matrix factorization and their related collaborative neural networks are the state-of-art in the field; nevertheless, both models lack the necessary stochasticity to create the robust, continuous, and structured latent spaces that variational autoencoders exhibit. On the other hand, data augm… ▽ More Deep learning provides accurate collaborative filtering models to improve recommender system results. Deep matrix factorization and their related collaborative neural networks are the state-of-art in the field; nevertheless, both models lack the necessary stochasticity to create the robust, continuous, and structured latent spaces that variational autoencoders exhibit. On the other hand, data augmentation through variational autoencoder does not provide accurate results in the collaborative filtering field due to the high sparsity of recommender systems. Our proposed models apply the variational concept to inject stochasticity in the latent space of the deep architecture, introducing the variational technique in the neural collaborative filtering field. This method does not depend on the particular model used to generate the latent representation. In this way, this approach can be applied as a plugin to any current and future specific models. The proposed models have been tested using four representative open datasets, three different quality measures, and state-of-art baselines. The results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect. Additionally, a framework is provided to enable the reproducibility of the conducted experiments. △ Less

Submitted 27 July, 2021; originally announced July 2021.

Comments: 14 pages, 8 figures, 3 tables

ACM Class: I.5.1

arXiv:2107.11449 [pdf, other]

Applying Inter-rater Reliability and Agreement in Grounded Theory Studies in Software Engineering

Authors: Jessica Díaz, Jorge Pérez, Carolina Gallardo, Ángel González-Prieto

Abstract: In recent years, the qualitative research on empirical software engineering that applies Grounded Theory is increasing. Grounded Theory (GT) is a technique for developing theory inductively e iteratively from qualitative data based on theoretical sampling, coding, constant comparison, memoing, and saturation, as main characteristics. Large or controversial GT studies may involve multiple researche… ▽ More In recent years, the qualitative research on empirical software engineering that applies Grounded Theory is increasing. Grounded Theory (GT) is a technique for developing theory inductively e iteratively from qualitative data based on theoretical sampling, coding, constant comparison, memoing, and saturation, as main characteristics. Large or controversial GT studies may involve multiple researchers in collaborative coding, which requires a kind of rigor and consensus that an individual coder does not. Although many qualitative researchers reject quantitative measures in favor of other qualitative criteria, many others are committed to measuring consensus through Inter-Rater Reliability (IRR) and/or Inter-Rater Agreement (IRA) techniques to develop a shared understanding of the phenomenon being studied. However, there are no specific guidelines about how and when to apply IRR/IRA during the iterative process of GT, so researchers have been using ad hoc methods for years. This paper presents a process for systematically applying IRR/IRA in GT studies that meets the iterative nature of this qualitative research method, which is supported by a previous systematic literature review on applying IRR/RA in GT studies in software engineering. This process allows researchers to incrementally generate a theory while ensuring consensus on the constructs that support it and, thus, improving the rigor of qualitative research. This formalization helps researchers to apply IRR/IRA to GT studies when various raters are involved in coding. Measuring consensus among raters promotes communicability, transparency, reflexivity, replicability, and trustworthiness of the research. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: 20 pages, 5 figures, 8 tables

ACM Class: D.2

arXiv:2106.11847 [pdf, other]

Machine learning for risk assessment in gender-based crime

Authors: Ángel González-Prieto, Antonio Brú, Juan Carlos Nuño, José Luis González-Álvarez

Abstract: Gender-based crime is one of the most concerning scourges of contemporary society. Governments worldwide have invested lots of economic and human resources to radically eliminate this threat. Despite these efforts, providing accurate predictions of the risk that a victim of gender violence has of being attacked again is still a very hard open problem. The development of new methods for issuing acc… ▽ More Gender-based crime is one of the most concerning scourges of contemporary society. Governments worldwide have invested lots of economic and human resources to radically eliminate this threat. Despite these efforts, providing accurate predictions of the risk that a victim of gender violence has of being attacked again is still a very hard open problem. The development of new methods for issuing accurate, fair and quick predictions would allow police forces to select the most appropriate measures to prevent recidivism. In this work, we propose to apply Machine Learning (ML) techniques to create models that accurately predict the recidivism risk of a gender-violence offender. The relevance of the contribution of this work is threefold: (i) the proposed ML method outperforms the preexisting risk assessment algorithm based on classical statistical techniques, (ii) the study has been conducted through an official specific-purpose database with more than 40,000 reports of gender violence, and (iii) two new quality measures are proposed for assessing the effective police protection that a model supplies and the overload in the invested resources that it generates. Additionally, we propose a hybrid model that combines the statistical prediction methods with the ML method, permitting authorities to implement a smooth transition from the preexisting model to the ML-based model. This hybrid nature enables a decision-making process to optimally balance between the efficiency of the police system and aggressiveness of the protection measures taken. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Comments: 17 pages, 5 figures, 4 tables. This work has been submitted to the IEEE for possible publication

ACM Class: I.5.4; J.4; K.4

arXiv:2101.02361 [pdf, other]

DevOps Team Structures: Characterization and Implications

Authors: Daniel López-Fernández, Jessica Díaz, Javier García, Jorge Pérez, Ángel González-Prieto

Abstract: Context: DevOps can be defined as a cultural movement to improve and accelerate the delivery of business value by making the collaboration between development and operations effective. Objective: This paper aims to help practitioners and researchers to better understand the organizational structure and characteristics of teams adopting DevOps. Method: We conducted an exploratory study by leveragin… ▽ More Context: DevOps can be defined as a cultural movement to improve and accelerate the delivery of business value by making the collaboration between development and operations effective. Objective: This paper aims to help practitioners and researchers to better understand the organizational structure and characteristics of teams adopting DevOps. Method: We conducted an exploratory study by leveraging in depth, semi-structured interviews to relevant stakeholders of 31 multinational software-intensive companies, together with industrial workshops and observations at organizations' facilities that supported triangulation. We used Grounded Theory as qualitative research method to explore the structure and characteristics of teams, and statistical analysis to discover their implications in software delivery performance. Results: We describe a taxonomy of team structure patterns that shows emerging, stable and consolidated product teams that are classified according to six variables, such as collaboration frequency, product ownership sharing, autonomy, among others, as well as their implications on software delivery performance. These teams are often supported by horizontal teams (DevOps platform teams, Centers of Excellence, and chapters) that provide them with platform technical capability, mentoring and evangelization, and even temporarily facilitate human resources. Conclusion: This study aims to strengthen evidence and support practitioners in making better informed about organizational team structures by analyzing their main characteristics and implications in software delivery performance. △ Less

Submitted 6 January, 2021; originally announced January 2021.

Comments: 18 pages, 5 figures

arXiv:2008.00977 [pdf, other]

Reliability in Software Engineering Qualitative Research through Inter-Coder Agreement: A guide using Krippendorff's $α$ & Atlas.ti

Authors: Ángel González-Prieto, Jorge Perez, Jessica Diaz, Daniel López-Fernández

Abstract: In recent years, the research on empirical software engineering that uses qualitative data analysis (e.g., cases studies, interview surveys, and grounded theory studies) is increasing. However, most of this research does not deep into the reliability and validity of findings, specifically in the reliability of coding in which these methodologies rely on, despite there exist a variety of statistica… ▽ More In recent years, the research on empirical software engineering that uses qualitative data analysis (e.g., cases studies, interview surveys, and grounded theory studies) is increasing. However, most of this research does not deep into the reliability and validity of findings, specifically in the reliability of coding in which these methodologies rely on, despite there exist a variety of statistical techniques known as Inter-Coder Agreement (ICA) for analyzing consensus in team coding. This paper aims to establish a novel theoretical framework that enables a methodological approach for conducting this validity analysis. This framework is based on a set of coefficients for measuring the degree of agreement that different coders achieve when judging a common matter. We analyze different reliability coefficients and provide detailed examples of calculation, with special attention to Krippendorff's $α$ coefficients. We systematically review several variants of Krippendorff's $α$ reported in the literature and provide a novel common mathematical framework in which all of them are unified through a universal $α$ coefficient. Finally, this paper provides a detailed guide of the use of this theoretical framework in a large case study on DevOps culture. We explain how $α$ coefficients are computed and interpreted using a widely used software tool for qualitative analysis like Atlas.ti. We expect that this work will help empirical researchers, particularly in software engineering, to improve the quality and trustworthiness of their studies. △ Less

Submitted 10 January, 2021; v1 submitted 31 July, 2020; originally announced August 2020.

Comments: 35 pages, 15 figures, 12 tables

ACM Class: A.1; D.2.1; G.3

arXiv:2006.12379 [pdf, other]

doi 10.1007/s00521-020-05494-2

Deep Learning feature selection to unhide demographic recommender systems factors

Authors: Jesús Bobadilla, Ángel González-Prieto, Fernando Ortega, Raúl Lara-Cabrera

Abstract: Extracting demographic features from hidden factors is an innovative concept that provides multiple and relevant applications. The matrix factorization model generates factors which do not incorporate semantic knowledge. This paper provides a deep learning-based method: DeepUnHide, able to extract demographic information from the users and items factors in collaborative filtering recommender syste… ▽ More Extracting demographic features from hidden factors is an innovative concept that provides multiple and relevant applications. The matrix factorization model generates factors which do not incorporate semantic knowledge. This paper provides a deep learning-based method: DeepUnHide, able to extract demographic information from the users and items factors in collaborative filtering recommender systems. The core of the proposed method is the gradient-based localization used in the image processing literature to highlight the representative areas of each classification class. Validation experiments make use of two public datasets and current baselines. Results show the superiority of DeepUnHide to make feature selection and demographic classification, compared to the state of art of feature selection methods. Relevant and direct applications include recommendations explanation, fairness in collaborative filtering and recommendation to groups of users. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: 20 pages, 14 figures, 1 table

ACM Class: I.5.1

Journal ref: Neural Computing and Applications, 1-18, 2020

arXiv:2006.05255 [pdf, other]

doi 10.9781/ijimai.2020.11.001

DeepFair: Deep Learning for Improving Fairness in Recommender Systems

Authors: Jesús Bobadilla, Raúl Lara-Cabrera, Ángel González-Prieto, Fernando Ortega

Abstract: The lack of bias management in Recommender Systems leads to minority groups receiving unfair recommendations. Moreover, the trade-off between equity and precision makes it difficult to obtain recommendations that meet both criteria. Here we propose a Deep Learning based Collaborative Filtering algorithm that provides recommendations with an optimum balance between fairness and accuracy without kno… ▽ More The lack of bias management in Recommender Systems leads to minority groups receiving unfair recommendations. Moreover, the trade-off between equity and precision makes it difficult to obtain recommendations that meet both criteria. Here we propose a Deep Learning based Collaborative Filtering algorithm that provides recommendations with an optimum balance between fairness and accuracy without knowing demographic information about the users. Experimental results show that it is possible to make fair recommendations without losing a significant proportion of accuracy. △ Less

Submitted 9 June, 2020; originally announced June 2020.

Comments: 18 pages, 9 figures, 4 tables

ACM Class: I.5.1

Journal ref: International Journal of Interactive Multimedia and Artificial Intelligence, 2020

arXiv:2006.03481 [pdf, other]

doi 10.1016/j.ins.2020.12.001

Providing reliability in Recommender Systems through Bernoulli Matrix Factorization

Authors: Fernando Ortega, Raúl Lara-Cabrera, Ángel González-Prieto, Jesús Bobadilla

Abstract: Beyond accuracy, quality measures are gaining importance in modern recommender systems, with reliability being one of the most important indicators in the context of collaborative filtering. This paper proposes Bernoulli Matrix Factorization (BeMF), which is a matrix factorization model, to provide both prediction values and reliability values. BeMF is a very innovative approach from several persp… ▽ More Beyond accuracy, quality measures are gaining importance in modern recommender systems, with reliability being one of the most important indicators in the context of collaborative filtering. This paper proposes Bernoulli Matrix Factorization (BeMF), which is a matrix factorization model, to provide both prediction values and reliability values. BeMF is a very innovative approach from several perspectives: a) it acts on model-based collaborative filtering rather than on memory-based filtering, b) it does not use external methods or extended architectures, such as existing solutions, to provide reliability, c) it is based on a classification-based model instead of traditional regression-based models, and d) matrix factorization formalism is supported by the Bernoulli distribution to exploit the binary nature of the designed classification model. The experimental results show that the more reliable a prediction is, the less liable it is to be wrong: recommendation quality improves after the most reliable predictions are selected. State-of-the-art quality measures for reliability have been tested, which shows that BeMF outperforms previous baseline methods and models. △ Less

Submitted 4 March, 2022; v1 submitted 5 June, 2020; originally announced June 2020.

Comments: 28 pages, 8 figures, 8 tables

ACM Class: I.5.1

Journal ref: Information Sciences, 2020

arXiv:2005.10388 [pdf, other]

doi 10.1007/s10664-020-09919-3

Why are many businesses instilling a DevOps culture into their organization?

Authors: Jessica Diaz, Daniel López-Fernández, Jorge Perez, Ángel González-Prieto

Abstract: Context: DevOps can be defined as a cultural movement to improve and accelerate the delivery of business value by making the collaboration between development and operations effective. Although this movement is relatively recent, there exist an intensive research around DevOps. However, the real reasons why companies move to DevOps and the results they expect to obtain have been paid little attent… ▽ More Context: DevOps can be defined as a cultural movement to improve and accelerate the delivery of business value by making the collaboration between development and operations effective. Although this movement is relatively recent, there exist an intensive research around DevOps. However, the real reasons why companies move to DevOps and the results they expect to obtain have been paid little attention in real contexts. Objective: This paper aims to help practitioners and researchers to better understand the context and the problems that many companies face day to day in their organizations when they try to accelerate software delivery and the main drivers that move these companies to adopting DevOps. Method: We conducted an exploratory study by leveraging in depth, semi-structured interviews to relevant stakeholders of 30 multinational software-intensive companies, together industrial workshops and observations at organizations' facilities that supported triangulation. Additionally, we conducted an inter-coder agreement analysis, which is not usually addressed in qualitative studies in software engineering, to increase reliability and reduce authors bias of the drawn findings. Results: The research explores the problems and expected outcomes that moved companies to adopt DevOps and reveals a set of patterns and anti-patterns about the reasons why companies are instilling a DevOps culture. Conclusions: This study aims to strengthen evidence and support practitioners in making better informed about which problems trigger a DevOps transition and most common expected results. △ Less

Submitted 10 March, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

Comments: 47 pages, 9 figures, 17 tables

Journal ref: Empir Software Eng 26, 25 (2021)

Showing 1–20 of 20 results for author: González-Prieto, Á