-
Assessment of FAIR (Findability, Accessibility, Interoperability, and Reusability) data implementation frameworks: a parametric approach
Authors:
Ranjeet Kumar Singh,
Akanksha Nagpal,
Arun Jadhav,
Devika P. Madalli
Abstract:
Open science movement has established reproducibility, transparency, and validation of research outputs as essential norms for conducting scientific research. It advocates for open access to research outputs, especially research data, to enable verification of published findings and its optimum reuse. The FAIR (Findable, Accessible, Interoperable, and Reusable) data principles support the philosop…
▽ More
Open science movement has established reproducibility, transparency, and validation of research outputs as essential norms for conducting scientific research. It advocates for open access to research outputs, especially research data, to enable verification of published findings and its optimum reuse. The FAIR (Findable, Accessible, Interoperable, and Reusable) data principles support the philosophy of open science and have emerged as a foundational framework for making digital assets machine-actionable and enhancing their reusability and value in various domains, particularly in scientific research and data management. In response to the growing demand for making data FAIR, various FAIR implementation frameworks have been developed by various organizations to educate and make the scientific community more aware of FAIR and its principles and to make the adoption and implementation of FAIR easier. This paper provides a comprehensive review of the openly available FAIR implementation frameworks based on a parametric evaluation of these frameworks. The current work identifies 13 frameworks and compares them against their coverage of the four foundational principles of FAIR, including an assessment of these frameworks against 36 parameters related to technical specifications, basic features, and FAIR implementation features and FAIR coverage. The study identifies that most of the frameworks only offer a step-by-step guide to FAIR implementation and seem to be adopting the technology-first approach, mostly guiding the deployment of various tools for FAIR implementation. Many frameworks are missing the critical aspects of explaining what, why, and how for the four foundational principles of FAIR, giving less consideration to the social aspects of FAIR. The study concludes that more such frameworks should be developed, considering the people-first approach rather than the technology-first.
△ Less
Submitted 27 December, 2024;
originally announced April 2025.
-
A Theoretical Framework for AI-driven data quality monitoring in high-volume data environments
Authors:
Nikhil Bangad,
Vivekananda Jayaram,
Manjunatha Sughaturu Krishnappa,
Amey Ram Banarse,
Darshan Mohan Bidkar,
Akshay Nagpal,
Vidyasagar Parlapalli
Abstract:
This paper presents a theoretical framework for an AI-driven data quality monitoring system designed to address the challenges of maintaining data quality in high-volume environments. We examine the limitations of traditional methods in managing the scale, velocity, and variety of big data and propose a conceptual approach leveraging advanced machine learning techniques. Our framework outlines a s…
▽ More
This paper presents a theoretical framework for an AI-driven data quality monitoring system designed to address the challenges of maintaining data quality in high-volume environments. We examine the limitations of traditional methods in managing the scale, velocity, and variety of big data and propose a conceptual approach leveraging advanced machine learning techniques. Our framework outlines a system architecture that incorporates anomaly detection, classification, and predictive analytics for real-time, scalable data quality management. Key components include an intelligent data ingestion layer, adaptive preprocessing mechanisms, context-aware feature extraction, and AI-based quality assessment modules. A continuous learning paradigm is central to our framework, ensuring adaptability to evolving data patterns and quality requirements. We also address implications for scalability, privacy, and integration within existing data ecosystems. While practical results are not provided, it lays a robust theoretical foundation for future research and implementations, advancing data quality management and encouraging the exploration of AI-driven solutions in dynamic environments.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Audience Creation for Consumables -- Simple and Scalable Precision Merchandising for a Growing Marketplace
Authors:
Shreyas S,
Harsh Maheshwari,
Avijit Saha,
Samik Datta,
Shashank Jain,
Disha Makhija,
Anuj Nagpal,
Sneha Shukla,
Suyash S
Abstract:
Consumable categories, such as grocery and fast-moving consumer goods, are quintessential to the growth of e-commerce marketplaces in developing countries. In this work, we present the design and implementation of a precision merchandising system, which creates audience sets from over 10 million consumers and is deployed at Flipkart Supermart, one of the largest online grocery stores in India. We…
▽ More
Consumable categories, such as grocery and fast-moving consumer goods, are quintessential to the growth of e-commerce marketplaces in developing countries. In this work, we present the design and implementation of a precision merchandising system, which creates audience sets from over 10 million consumers and is deployed at Flipkart Supermart, one of the largest online grocery stores in India. We employ temporal point process to model the latent periodicity and mutual-excitation in the purchase dynamics of consumables. Further, we develop a likelihood-free estimation procedure that is robust against data sparsity, censure and noise typical of a growing marketplace. Lastly, we scale the inference by quantizing the triggering kernels and exploiting sparse matrix-vector multiplication primitive available on a commercial distributed linear algebra backend. In operation spanning more than a year, we have witnessed a consistent increase in click-through rate in the range of 25-70% for banner-based merchandising in the storefront, and in the range of 12-26% for push notification-based campaigns.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
On the Benefit of Combining Neural, Statistical and External Features for Fake News Identification
Authors:
Gaurav Bhatt,
Aman Sharma,
Shivam Sharma,
Ankush Nagpal,
Balasubramanian Raman,
Ankush Mittal
Abstract:
Identifying the veracity of a news article is an interesting problem while automating this process can be a challenging task. Detection of a news article as fake is still an open question as it is contingent on many factors which the current state-of-the-art models fail to incorporate. In this paper, we explore a subtask to fake news identification, and that is stance detection. Given a news artic…
▽ More
Identifying the veracity of a news article is an interesting problem while automating this process can be a challenging task. Detection of a news article as fake is still an open question as it is contingent on many factors which the current state-of-the-art models fail to incorporate. In this paper, we explore a subtask to fake news identification, and that is stance detection. Given a news article, the task is to determine the relevance of the body and its claim. We present a novel idea that combines the neural, statistical and external features to provide an efficient solution to this problem. We compute the neural embedding from the deep recurrent model, statistical features from the weighted n-gram bag-of-words model and handcrafted external features with the help of feature engineering heuristics. Finally, using deep neural layer all the features are combined, thereby classifying the headline-body news pair as agree, disagree, discuss, or unrelated. We compare our proposed technique with the current state-of-the-art models on the fake news challenge dataset. Through extensive experiments, we find that the proposed model outperforms all the state-of-the-art techniques including the submissions to the fake news challenge.
△ Less
Submitted 11 December, 2017;
originally announced December 2017.
-
Career Path Suggestion using String Matching and Decision Trees
Authors:
Akshay Nagpal,
Supriya P. Panda
Abstract:
High school and college graduates seemingly are often battling for the courses they should major in order to achieve their target career. In this paper, we worked on suggesting a career path to a graduate to reach his/her dream career given the current educational status. Firstly, we collected the career data of professionals and academicians from various career fields and compiled the data set by…
▽ More
High school and college graduates seemingly are often battling for the courses they should major in order to achieve their target career. In this paper, we worked on suggesting a career path to a graduate to reach his/her dream career given the current educational status. Firstly, we collected the career data of professionals and academicians from various career fields and compiled the data set by using the necessary information from the data. Further, this was used as the basis to suggest the most appropriate career path for the person given his/her current educational status. Decision trees and string matching algorithms were employed to suggest the appropriate career path for a person. Finally, an analysis of the result has been done directing to further improvements in the model.
△ Less
Submitted 23 May, 2015;
originally announced May 2015.