-
Insights into mobile health application market via a content analysis of marketplace data with machine learning
Authors:
Gokhan Aydin,
Gokhan Silahtaroglu
Abstract:
Background Despite the benefits offered by an abundance of health applications promoted on app marketplaces (e.g., Google Play Store), the wide adoption of mobile health and e-health apps is yet to come. Objective This study aims to investigate the current landscape of smartphone apps that focus on improving and sustaining health and wellbeing. Understanding the categories that popular apps focus…
▽ More
Background Despite the benefits offered by an abundance of health applications promoted on app marketplaces (e.g., Google Play Store), the wide adoption of mobile health and e-health apps is yet to come. Objective This study aims to investigate the current landscape of smartphone apps that focus on improving and sustaining health and wellbeing. Understanding the categories that popular apps focus on and the relevant features provided to users, which lead to higher user scores and downloads will offer insights to enable higher adoption in the general populace. This study on 1,000 mobile health applications aims to shed light on the reasons why particular apps are liked and adopted while many are not. Methods User-generated data (i.e. review scores) and company-generated data (i.e. app descriptions) were collected from app marketplaces and manually coded and categorized by two researchers. For analysis, Artificial Neural Networks, Random Forest and Naïve Bayes Artificial Intelligence algorithms were used. Results The analysis led to features that attracted more download behavior and higher user scores. The findings suggest that apps that mention a privacy policy or provide videos in description lead to higher user scores, whereas free apps with in-app purchase possibilities, social networking and sharing features and feedback mechanisms lead to higher number of downloads. Moreover, differences in user scores and the total number of downloads are detected in distinct subcategories of mobile health apps. Conclusion This study contributes to the current knowledge of m-health application use by reviewing mobile health applications using content analysis and machine learning algorithms. The content analysis adds significant value by providing classification, keywords and factors that influence download behavior and user scores in a m-health context.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Find the Funding: Entity Linking with Incomplete Funding Knowledge Bases
Authors:
Gizem Aydin,
Seyed Amin Tabatabaei,
Giorgios Tsatsaronis,
Faegheh Hasibi
Abstract:
Automatic extraction of funding information from academic articles adds significant value to industry and research communities, such as tracking research outcomes by funding organizations, profiling researchers and universities based on the received funding, and supporting open access policies. Two major challenges of identifying and linking funding entities are: (i) sparse graph structure of the…
▽ More
Automatic extraction of funding information from academic articles adds significant value to industry and research communities, such as tracking research outcomes by funding organizations, profiling researchers and universities based on the received funding, and supporting open access policies. Two major challenges of identifying and linking funding entities are: (i) sparse graph structure of the Knowledge Base (KB), which makes the commonly used graph-based entity linking approaches suboptimal for the funding domain, (ii) missing entities in KB, which (unlike recent zero-shot approaches) requires marking entity mentions without KB entries as NIL. We propose an entity linking model that can perform NIL prediction and overcome data scarcity issues in a time and data-efficient manner. Our model builds on a transformer-based mention detection and bi-encoder model to perform entity linking. We show that our model outperforms strong existing baselines.
△ Less
Submitted 20 September, 2022; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Classification of Scientific Papers With Big Data Technologies
Authors:
Selen Gurbuz,
Galip Aydin
Abstract:
Data sizes that cannot be processed by conventional data storage and analysis systems are named as Big Data.It also refers to nex technologies developed to store, process and analyze large amounts of data. Automatic information retrieval about the contents of a large number of documents produced by different sources, identifying research fields and topics, extraction of the document abstracts, or…
▽ More
Data sizes that cannot be processed by conventional data storage and analysis systems are named as Big Data.It also refers to nex technologies developed to store, process and analyze large amounts of data. Automatic information retrieval about the contents of a large number of documents produced by different sources, identifying research fields and topics, extraction of the document abstracts, or discovering patterns are some of the topics that have been studied in the field of big data.In this study, Naive Bayes classification algorithm, which is run on a data set consisting of scientific articles, has been tried to automatically determine the classes to which these documents belong. We have developed an efficient system that can analyze the Turkish scientific documents with the distributed document classification algorithm run on the Cloud Computing infrastructure. The Apache Mahout library is used in the study. The servers required for classifying and clustering distributed documents are
△ Less
Submitted 14 February, 2018;
originally announced February 2018.
-
Distributed Readability Analysis Of Turkish Elementary School Textbooks
Authors:
Betul Karakus,
Ibrahim Riza Hallac,
Galip Aydin
Abstract:
The readability assessment deals with estimating the level of difficulty in reading texts.Many readability tests, which do not indicate execution efficiency, have been applied on specific texts to measure the reading grade level in science textbooks. In this paper, we analyze the content covered in elementary school Turkish textbooks by employing a distributed parallel processing framework based o…
▽ More
The readability assessment deals with estimating the level of difficulty in reading texts.Many readability tests, which do not indicate execution efficiency, have been applied on specific texts to measure the reading grade level in science textbooks. In this paper, we analyze the content covered in elementary school Turkish textbooks by employing a distributed parallel processing framework based on popular MapReduce paradigm. We outline the architecture of a distributed Big Data processing system which uses Hadoop for full-text readability analysis. The readability scores of the textbooks and system performance measurements are also given in the paper.
△ Less
Submitted 11 February, 2018;
originally announced February 2018.
-
Distributed NLP
Authors:
Galip Aydin,
Ibrahim Riza Hallac
Abstract:
In this paper we present the performance of parallel text processing with Map Reduce on a cloud platform. Scientific papers in Turkish language are processed using Zemberek NLP library. Experiments were run on a Hadoop cluster and compared with the single machines performance.
In this paper we present the performance of parallel text processing with Map Reduce on a cloud platform. Scientific papers in Turkish language are processed using Zemberek NLP library. Experiments were run on a Hadoop cluster and compared with the single machines performance.
△ Less
Submitted 10 February, 2018;
originally announced February 2018.
-
Running genetic algorithms on Hadoop for solving high dimensional optimization problems
Authors:
Güngör Yildirim,
İbrahim R Hallac,
Galip Aydin,
Yetkin Tatar
Abstract:
Hadoop is a popular MapReduce framework for developing parallel applications in distributed environments. Several advantages of MapReduce such as programming ease and ability to use commodity hardware make the applicability of soft computing methods for parallel and distributed systems easier than before. In this paper, we present the results of an experimental study on running soft computing algo…
▽ More
Hadoop is a popular MapReduce framework for developing parallel applications in distributed environments. Several advantages of MapReduce such as programming ease and ability to use commodity hardware make the applicability of soft computing methods for parallel and distributed systems easier than before. In this paper, we present the results of an experimental study on running soft computing algorithms using Hadoop. This study shows how a simple genetic algorithm running on Hadoop can be used to produce solutions for high dimensional optimization problems. In addition, a simple but effective technique, which did not need MapReduce chains, has been proposed.
△ Less
Submitted 10 February, 2018;
originally announced February 2018.
-
Document Classification Using Distributed Machine Learning
Authors:
Galip Aydin,
Ibrahim Riza Hallac
Abstract:
In this paper, we investigate the performance and success rates of Naïve Bayes Classification Algorithm for automatic classification of Turkish news into predetermined categories like economy, life, health etc. We use Apache Big Data technologies such as Hadoop, HDFS, Spark and Mahout, and apply these distributed technologies to Machine Learning.
In this paper, we investigate the performance and success rates of Naïve Bayes Classification Algorithm for automatic classification of Turkish news into predetermined categories like economy, life, health etc. We use Apache Big Data technologies such as Hadoop, HDFS, Spark and Mahout, and apply these distributed technologies to Machine Learning.
△ Less
Submitted 10 February, 2018;
originally announced February 2018.
-
Distributed Log Analysis on the Cloud Using MapReduce
Authors:
Galip Aydin,
Ibrahim Riza Hallac
Abstract:
In this paper we describe our work on designing a web based, distributed data analysis system based on the popular MapReduce framework deployed on a small cloud; developed specifically for analyzing web server logs. The log analysis system consists of several cluster nodes, it splits the large log files on a distributed file system and quickly processes them using MapReduce programming model. The…
▽ More
In this paper we describe our work on designing a web based, distributed data analysis system based on the popular MapReduce framework deployed on a small cloud; developed specifically for analyzing web server logs. The log analysis system consists of several cluster nodes, it splits the large log files on a distributed file system and quickly processes them using MapReduce programming model. The cluster is created using an open source cloud infrastructure, which allows us to easily expand the computational power by adding new nodes. This gives us the ability to automatically resize the cluster according to the data analysis requirements. We implemented MapReduce programs for basic log analysis needs like frequency analysis, error detection, busy hour detection etc. as well as more complex analyses which require running several jobs. The system can automatically identify and analyze several web server log types such as Apache, IIS, Squid etc. We use open source projects for creating the cloud infrastructure and running MapReduce jobs.
△ Less
Submitted 10 February, 2018;
originally announced February 2018.
-
Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media
Authors:
Semiha Makinist,
Ibrahim Riza Hallac,
Betul Ay Karakus,
Galip Aydin
Abstract:
A public dataset, with a variety of properties suitable for sentiment analysis [1], event prediction, trend detection and other text mining applications, is needed in order to be able to successfully perform analysis studies. The vast majority of data on social media is text-based and it is not possible to directly apply machine learning processes into these raw data, since several different proce…
▽ More
A public dataset, with a variety of properties suitable for sentiment analysis [1], event prediction, trend detection and other text mining applications, is needed in order to be able to successfully perform analysis studies. The vast majority of data on social media is text-based and it is not possible to directly apply machine learning processes into these raw data, since several different processes are required to prepare the data before the implementation of the algorithms. For example, different misspellings of same word enlarge the word vector space unnecessarily, thereby it leads to reduce the success of the algorithm and increase the computational power requirement. This paper presents an improved Turkish dataset with an effective spelling correction algorithm based on Hadoop [2]. The collected data is recorded on the Hadoop Distributed File System and the text based data is processed by MapReduce programming model. This method is suitable for the storage and processing of large sized text based social media data. In this study, movie reviews have been automatically recorded with Apache ManifoldCF (MCF) [3] and data clusters have been created. Various methods compared such as Levenshtein and Fuzzy String Matching have been proposed to create a public dataset from collected data. Experimental results show that the proposed algorithm, which can be used as an open source dataset in sentiment analysis studies, have been performed successfully to the detection and correction of spelling errors.
△ Less
Submitted 31 January, 2018; v1 submitted 30 January, 2018;
originally announced January 2018.