-
Data Lakes: A Survey of Functions and Systems
Authors:
Rihan Hai,
Christos Koutras,
Christoph Quix,
Matthias Jarke
Abstract:
Data lakes are becoming increasingly prevalent for big data management and data analytics. In contrast to traditional 'schema-on-write' approaches such as data warehouses, data lakes are repositories storing raw data in its original formats and providing a common access interface. Despite the strong interest raised from both academia and industry, there is a large body of ambiguity regarding the d…
▽ More
Data lakes are becoming increasingly prevalent for big data management and data analytics. In contrast to traditional 'schema-on-write' approaches such as data warehouses, data lakes are repositories storing raw data in its original formats and providing a common access interface. Despite the strong interest raised from both academia and industry, there is a large body of ambiguity regarding the definition, functions and available technologies for data lakes. A complete, coherent picture of data lake challenges and solutions is still missing. This survey reviews the development, architectures, and systems of data lakes. We provide a comprehensive overview of research questions for designing and building data lakes. We classify the existing approaches and systems based on their provided functions for data lakes, which makes this survey a useful technical reference for designing, implementing and deploying data lakes. We hope that the thorough comparison of existing solutions and the discussion of open research challenges in this survey will motivate the future development of data lake research and practice.
△ Less
Submitted 17 February, 2023; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Knowledge-driven Data Ecosystems Towards Data Transparency
Authors:
Sandra Geisler,
Maria-Esther Vidal,
Cinzia Cappiello,
Bernadette Farias Lóscio,
Avigdor Gal,
Matthias Jarke,
Maurizio Lenzerini,
Paolo Missier,
Boris Otto,
Elda Paja,
Barbara Pernici,
Jakob Rehof
Abstract:
A Data Ecosystem offers a keystone-player or alliance-driven infrastructure that enables the interaction of different stakeholders and the resolution of interoperability issues among shared data. However, despite years of research in data governance and management, trustability is still affected by the absence of transparent and traceable data-driven pipelines. In this work, we focus on requiremen…
▽ More
A Data Ecosystem offers a keystone-player or alliance-driven infrastructure that enables the interaction of different stakeholders and the resolution of interoperability issues among shared data. However, despite years of research in data governance and management, trustability is still affected by the absence of transparent and traceable data-driven pipelines. In this work, we focus on requirements and challenges that data ecosystems face when ensuring data transparency. Requirements are derived from the data and organizational management, as well as from broader legal and ethical considerations. We propose a novel knowledge-driven data ecosystem architecture, providing the pillars for satisfying the analyzed requirements. We illustrate the potential of our proposal in a real-world scenario. Lastly, we discuss and rate the potential of the proposed architecture in the fulfillment of these requirements.
△ Less
Submitted 21 May, 2021; v1 submitted 19 May, 2021;
originally announced May 2021.
-
Ranking and Cooperation in Real-World Complex Networks
Authors:
Mohsen Shahriari,
Ralf Klamma,
Matthias Jarke
Abstract:
People participate and activate in online social networks and thus tremendous amount of network data is generated; data regarding their interactions, interests and activities. Some people search for specific questions through online social platforms such as forums and they may receive a suitable response via experts. To categorize people as experts and to evaluate their willingness to cooperate, o…
▽ More
People participate and activate in online social networks and thus tremendous amount of network data is generated; data regarding their interactions, interests and activities. Some people search for specific questions through online social platforms such as forums and they may receive a suitable response via experts. To categorize people as experts and to evaluate their willingness to cooperate, one can use ranking and cooperation problems from complex networks. In this paper, we investigate classical ranking algorithms besides the prisoner dilemma game to simulate cooperation and defection of agents. We compute the correlation among the node rank and node cooperativity via three strategies. The first strategy is involved in node level; however, other strategies are calculated regarding neighborhood of nodes. We find out correlations among specific ranking algorithms and cooperativtiy of nodes. Our observations may be applied to estimate the propensity of people (experts) to cooperate in future based on their ranking values.
△ Less
Submitted 19 January, 2019;
originally announced January 2019.
-
Investigating Cooperativity of Overlapping Community Structures in Social Networks
Authors:
Mohsen Shahriari,
Ralf Klamma,
Matthias Jarke
Abstract:
Many real-world networks can be modeled by networks of interacting agents. Analysis of these interactions can reveal fundamental properties from these networks. Estimating the amount of collaboration in a network corresponding to connections in a learning environment can reveal to what extent learners share their experience and knowledge with other learners. Alternatively, analyzing the network of…
▽ More
Many real-world networks can be modeled by networks of interacting agents. Analysis of these interactions can reveal fundamental properties from these networks. Estimating the amount of collaboration in a network corresponding to connections in a learning environment can reveal to what extent learners share their experience and knowledge with other learners. Alternatively, analyzing the network of interactions in an open source software project can manifest indicators showing the efficiency of collaborations. One central problem in such domains is the low cooperativity values of networks due to the low cooperativity values of their respective communities. So administrators should not only understand and predict the cooperativity of networks but also they need to evaluate their respective community structures. To approach this issue, in this paper, we address two domains of open source software projects and learning forums. As such, we calculate the amount of cooperativity in the corresponding networks and communities of these domains by applying several community detection algorithms. Moreover, we investigated the community properties and identified the significant properties for estimating the network and community cooperativity. Correspondingly, we identified to what extent various community detection algorithms affect the identification of significant properties and prediction of cooperativity. We also fabricated binary and regression prediction models using the community properties. Our results and constructed models can be used to infer cooperativity of community structures from their respective properties. When predicting high defective structures in networks, administrators can look for useful drives to increase the collaborations.
△ Less
Submitted 19 January, 2019;
originally announced January 2019.
-
Development of Computer Science Disciplines - A Social Network Analysis Approach
Authors:
Manh Cuong Pham,
Ralf Klamma,
Matthias Jarke
Abstract:
In contrast to many other scientific disciplines, computer science considers conference publications. Conferences have the advantage of providing fast publication of papers and of bringing researchers together to present and discuss the paper with peers. Previous work on knowledge mapping focused on the map of all sciences or a particular domain based on ISI published JCR (Journal Citation Report)…
▽ More
In contrast to many other scientific disciplines, computer science considers conference publications. Conferences have the advantage of providing fast publication of papers and of bringing researchers together to present and discuss the paper with peers. Previous work on knowledge mapping focused on the map of all sciences or a particular domain based on ISI published JCR (Journal Citation Report). Although this data covers most of important journals, it lacks computer science conference and workshop proceedings. That results in an imprecise and incomplete analysis of the computer science knowledge. This paper presents an analysis on the computer science knowledge network constructed from all types of publications, aiming at providing a complete view of computer science research. Based on the combination of two important digital libraries (DBLP and CiteSeerX), we study the knowledge network created at journal/conference level using citation linkage, to identify the development of sub-disciplines. We investigate the collaborative and citation behavior of journals/conferences by analyzing the properties of their co-authorship and citation subgraphs. The paper draws several important conclusions. First, conferences constitute social structures that shape the computer science knowledge. Second, computer science is becoming more interdisciplinary. Third, experts are the key success factor for sustainability of journals/conferences.
△ Less
Submitted 10 March, 2011;
originally announced March 2011.
-
A Performance Evaluation of Mobile Web Services Security
Authors:
Satish Narayana Srirama,
Matthias Jarke,
Wolfgang Prinz
Abstract:
It is now feasible to host basic web services on a smart phone due to the advances in wireless devices and mobile communication technologies. The market capture of mobile web services also has increased significantly, in the past years. While the applications are quite welcoming, the ability to provide secure and reliable communication in the vulnerable and volatile mobile ad-hoc topologies is vas…
▽ More
It is now feasible to host basic web services on a smart phone due to the advances in wireless devices and mobile communication technologies. The market capture of mobile web services also has increased significantly, in the past years. While the applications are quite welcoming, the ability to provide secure and reliable communication in the vulnerable and volatile mobile ad-hoc topologies is vastly becoming necessary. Even though a lot of standardized security specifications like WS-Security, SAML exist for web services in the wired networks, not much has been analyzed and standardized in the wireless environments. In this paper we give our analysis of adapting some of the security standards, especially WS-Security to the cellular domain, with performance statistics. The performance latencies are obtained and analyzed while observing the performance and quality of service of our Mobile Host.
△ Less
Submitted 21 July, 2010;
originally announced July 2010.
-
Security Aware Mobile Web Service Provisioning
Authors:
Satish Narayana Srirama,
Matthias Jarke,
Wolfgang Prinz,
Kiran Pendyala
Abstract:
Mobile data services in combination with profluent web services are seemingly the path breaking domain in current information research. Effectively, these mobile web services will pave the way for exciting performance and security challenges, the core need-to-be-addressed issues. On security front, though a lot of standardized security specifications and implementations exist for web services in t…
▽ More
Mobile data services in combination with profluent web services are seemingly the path breaking domain in current information research. Effectively, these mobile web services will pave the way for exciting performance and security challenges, the core need-to-be-addressed issues. On security front, though a lot of standardized security specifications and implementations exist for web services in the wired networks, not much has been analysed and standardized in the wireless environments. This paper addresses some of the critical challenges in providing security to the mobile web service domain. We first explore mobile web services and their key security issues, with special focus on provisioning based on a mobile web service provider realized by us. Later we discuss state-of-the-art security awareness in the wired and wireless web services, and finally address the realization of security for the mobile web service provisioning with performance analysis results.
△ Less
Submitted 21 July, 2010;
originally announced July 2010.
-
Mobile Web Service Discovery in Peer to Peer Networks
Authors:
Satish Narayana Srirama,
Matthias Jarke,
Wolfgang Prinz
Abstract:
The advanced features of today's smart phones and hand held devices, like the increased memory and processing capabilities, allowed them to act even as information providers. Thus a smart phone hosting web services is not a fancy anymore. But the relevant discovery of these services provided by the smart phones has became quite complex, because of the volume of services possible with each Mobile H…
▽ More
The advanced features of today's smart phones and hand held devices, like the increased memory and processing capabilities, allowed them to act even as information providers. Thus a smart phone hosting web services is not a fancy anymore. But the relevant discovery of these services provided by the smart phones has became quite complex, because of the volume of services possible with each Mobile Host providing some services. Centralized registries have severe drawbacks in such a scenario and alternate means of service discovery are to be addressed. P2P domain with it resource sharing capabilities comes quite handy and here in this paper we provide an alternate approach to UDDI registry for discovering mobile web services. The services are published into the P2P network as JXTA modules and the discovery issues of these module advertisements are addressed. The approach also provides alternate means of identifying the Mobile Host.
△ Less
Submitted 21 July, 2010;
originally announced July 2010.
-
A Mediation Framework for Mobile Web Service Provisioning
Authors:
Satish Narayana Srirama,
Matthias Jarke,
Wolfgang Prinz
Abstract:
Web Services and mobile data services are the newest trends in information systems engineering in wired and wireless domains, respectively. Web Services have a broad range of service distributions while mobile phones have large and expanding user base. To address the confluence of Web Services and pervasive mobile devices and communication environments, a basic mobile Web Service provider was deve…
▽ More
Web Services and mobile data services are the newest trends in information systems engineering in wired and wireless domains, respectively. Web Services have a broad range of service distributions while mobile phones have large and expanding user base. To address the confluence of Web Services and pervasive mobile devices and communication environments, a basic mobile Web Service provider was developed for smart phones. The performance of this Mobile Host was also analyzed in detail. Further analysis of the Mobile Host to provide proper QoS and to check Mobile Host's feasibility in the P2P networks, identified the necessity of a mediation framework. The paper describes the research conducted with the Mobile Host, identifies the tasks of the mediation framework and then discusses the feasible realization details of such a mobile Web Services mediation framework.
△ Less
Submitted 18 July, 2010;
originally announced July 2010.