-
Adaptive Indexing for Approximate Query Processing in Exploratory Data Analysis
Authors:
Stavros Maroulis,
Nikos Bikakis,
Vassilis Stamatopoulos,
George Papastefanatos
Abstract:
Minimizing data-to-analysis time while enabling real-time interaction and efficient analytical computations on large datasets are fundamental objectives of contemporary exploratory systems. Although some of the recent adaptive indexing and on-the-fly processing approaches address most of these needs, there are cases, where they do not always guarantee reliable performance. Some examples of such ca…
▽ More
Minimizing data-to-analysis time while enabling real-time interaction and efficient analytical computations on large datasets are fundamental objectives of contemporary exploratory systems. Although some of the recent adaptive indexing and on-the-fly processing approaches address most of these needs, there are cases, where they do not always guarantee reliable performance. Some examples of such cases include: exploring areas with a high density of objects; executing the first exploratory queries or exploring previously unseen areas (where the index has not yet adapted sufficiently); and working with very large data files on commodity hardware, such as low-specification laptops. In such demanding cases, approximate and incremental techniques can be exploited to ensure efficiency and scalability by allowing users to prioritize response time over result accuracy, acknowledging that exact results are not always necessary. Therefore, approximation mechanisms that enable smooth user interaction by defining the trade-off between accuracy and performance based on vital factors (e.g., task, preferences, available resources) are of great importance. Considering the aforementioned, in this work, we present an adaptive approximate query processing framework for interactive on-the-fly analysis (with out a preprocessing phase) over large raw data. The core component of the framework is a main-memory adaptive indexing scheme (VALINOR-A) that interoperates with user-driven sampling and incremental aggregation computations. Additionally, an effective error-bounded approximation strategy is designed and integrated in the query processing process. We conduct extensive experiments using both real and synthetic datasets, demonstrating the efficiency and effectiveness of the proposed framework.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Visual Analytics Challenges and Trends in the Age of AI: The BigVis Community Perspective
Authors:
Nikos Bikakis,
Panos K. Chrysanthis,
Guoliang Li,
George Papastefanatos,
Lingyun Yu
Abstract:
This report provides insights into the challenges, emerging topics, and opportunities related to human-data interaction and visual analytics in the AI era. The BigVis 2024 organizing committee conducted a survey among experts in the field. They invite the Program Committee members and the authors of accepted papers to share their views. Thirty-two scientists from diverse research communities, incl…
▽ More
This report provides insights into the challenges, emerging topics, and opportunities related to human-data interaction and visual analytics in the AI era. The BigVis 2024 organizing committee conducted a survey among experts in the field. They invite the Program Committee members and the authors of accepted papers to share their views. Thirty-two scientists from diverse research communities, including Databases, Information Visualization, and Human-Computer Interaction, participated in the study. These scientists, representing both industry and academia, provided valuable insights into the current and future landscape of the field.
In this report, we analyze the survey responses and compare them to the findings of a similar study conducted four years ago. The results reveal some interesting insights. First, many of the critical challenges identified in the previous survey remain highly relevant today, despite being unrelated to AI. Meanwhile, the field's landscape has significantly evolved, with most of today's vital challenges not even being mentioned in the earlier survey, underscoring the profound impact of AI-related advancements.
By summarizing the perspectives of the research community, this report aims to shed light on the key challenges, emerging trends, and potential research directions in human-data interaction and visual analytics in the AI era.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
Partial Adaptive Indexing for Approximate Query Answering
Authors:
Stavros Maroulis,
Nikos Bikakis,
Vassilis Stamatopoulos,
George Papastefanatos
Abstract:
In data exploration, users need to analyze large data files quickly, aiming to minimize data-to-analysis time. While recent adaptive indexing approaches address this need, they are cases where demonstrate poor performance. Particularly, during the initial queries, in regions with a high density of objects, and in very large files over commodity hardware. This work introduces an approach for adapti…
▽ More
In data exploration, users need to analyze large data files quickly, aiming to minimize data-to-analysis time. While recent adaptive indexing approaches address this need, they are cases where demonstrate poor performance. Particularly, during the initial queries, in regions with a high density of objects, and in very large files over commodity hardware. This work introduces an approach for adaptive indexing driven by both query workload and user-defined accuracy constraints to support approximate query answering. The approach is based on partial index adaptation which reduces the costs associated with reading data files and refining indexes. We leverage a hierarchical tile-based indexing scheme and its stored metadata to provide efficient query evaluation, ensuring accuracy within user-specified bounds. Our preliminary evaluation demonstrates improvement on query evaluation time, especially during initial user exploration.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Attendance Maximization for Successful Social Event Planning
Authors:
Nikos Bikakis,
Vana Kalogeraki,
Dimitrios Gunupulos
Abstract:
Social event planning has received a great deal of attention in recent years where various entities, such as event planners and marketing companies, organizations, venues, or users in Event-based Social Networks, organize numerous social events (e.g., festivals, conferences, promotion parties). Recent studies show that "attendance" is the most common metric used to capture the success of social ev…
▽ More
Social event planning has received a great deal of attention in recent years where various entities, such as event planners and marketing companies, organizations, venues, or users in Event-based Social Networks, organize numerous social events (e.g., festivals, conferences, promotion parties). Recent studies show that "attendance" is the most common metric used to capture the success of social events, since the number of attendees has great impact on the event's expected gains (e.g., revenue, artist/brand publicity). In this work, we study the Social Event Scheduling (SES) problem which aims at identifying and assigning social events to appropriate time slots, so that the number of events attendees is maximized. We show that, even in highly restricted instances, the SES problem is NP-hard to be approximated over a factor. To solve the SES problem, we design three efficient and scalable algorithms. These algorithms exploit several novel schemes that we design. We conduct extensive experiments using several real and synthetic datasets, and demonstrate that the proposed algorithms perform on average half the computations compared to the existing solution and, in several cases, are 3-5 times faster.
△ Less
Submitted 28 November, 2018;
originally announced November 2018.
-
Social Event Scheduling
Authors:
Nikos Bikakis,
Vana Kalogeraki,
Dimitrios Gunopulos
Abstract:
A major challenge for social event organizers (e.g., event planning and marketing companies, venues) is attracting the maximum number of participants, since it has great impact on the success of the event, and, consequently, the expected gains (e.g., revenue, artist/brand publicity). In this paper, we introduce the Social Event Scheduling (SES) problem, which schedules a set of social events consi…
▽ More
A major challenge for social event organizers (e.g., event planning and marketing companies, venues) is attracting the maximum number of participants, since it has great impact on the success of the event, and, consequently, the expected gains (e.g., revenue, artist/brand publicity). In this paper, we introduce the Social Event Scheduling (SES) problem, which schedules a set of social events considering user preferences and behavior, events' spatiotemporal conflicts, and competing vents, in order to maximize the overall number of attendees. We show that SES is strongly NP-hard, even in highly restricted instances. To cope with the hardness of the SES problem we design a greedy approximation algorithm. Finally, we evaluate our method experimentally using a dataset from the Meetup event-based social network.
△ Less
Submitted 6 March, 2018; v1 submitted 30 January, 2018;
originally announced January 2018.
-
Big Data Visualization Tools
Authors:
Nikos Bikakis
Abstract:
Data visualization and analytics are nowadays one of the corner-stones of Data Science, turning the abundance of Big Data being produced through modern systems into actionable knowledge. Indeed, the Big Data era has realized the availability of voluminous datasets that are dynamic, noisy and heterogeneous in nature. Transforming a data-curious user into someone who can access and analyze that data…
▽ More
Data visualization and analytics are nowadays one of the corner-stones of Data Science, turning the abundance of Big Data being produced through modern systems into actionable knowledge. Indeed, the Big Data era has realized the availability of voluminous datasets that are dynamic, noisy and heterogeneous in nature. Transforming a data-curious user into someone who can access and analyze that data is even more burdensome now for a great number of users with little or no support and expertise on the data processing part. Thus, the area of data visualization and analysis has gained great attention recently, calling for joint action from different research areas and communities such as information visualization, data management and mining, human-computer interaction, and computer graphics. This article presents the limitations of traditional visualization systems in the Big Data era. Additionally, it discusses the major prerequisites and challenges that should be addressed by modern visualization systems. Finally, the state-of-the-art methods that have been developed in the context of the Big Data visualization and analytics are presented, considering methods from the Data Management and Mining, Information Visualization and Human-Computer Interaction communities
△ Less
Submitted 19 November, 2023; v1 submitted 25 January, 2018;
originally announced January 2018.
-
The XML and Semantic Web Worlds: Technologies, Interoperability and Integration. A Survey of the State of the Art
Authors:
Nikos Bikakis,
Chrisa Tsinaraki,
Nektarios Gioldasis,
Ioannis Stavrakantonakis,
Stavros Christodoulakis
Abstract:
In the context of the emergent Web of Data, a large number of organizations, institutes and companies (e.g., DBpedia, Geonames, PubMed ACM, IEEE, NASA, BBC) adopt the Linked Data practices and publish their data utilizing Semantic Web (SW) technologies. On the other hand, the dominant standard for information exchange in the Web today is XML. Many international standards (e.g., Dublin Core, MPEG-7…
▽ More
In the context of the emergent Web of Data, a large number of organizations, institutes and companies (e.g., DBpedia, Geonames, PubMed ACM, IEEE, NASA, BBC) adopt the Linked Data practices and publish their data utilizing Semantic Web (SW) technologies. On the other hand, the dominant standard for information exchange in the Web today is XML. Many international standards (e.g., Dublin Core, MPEG-7, METS, TEI, IEEE LOM) have been expressed in XML Schema resulting to a large number of XML datasets. The SW and XML worlds and their developed infrastructures are based on different data models, semantics and query languages. Thus, it is crucial to provide interoperability and integration mechanisms to bridge the gap between the SW and XML worlds. In this chapter, we give an overview and a comparison of the technologies and the standards adopted by the XML and SW worlds. In addition, we outline the latest efforts from the W3C groups, including the latest working drafts and recommendations (e.g., OWL 2, SPARQL 1.1, XML Schema 1.1). Moreover, we present a survey of the research approaches which aim to provide interoperability and integration between the XML and SW worlds. Finally, we present the SPARQL2XQuery and XS2OWL Frameworks, which bridge the gap and create an interoperable environment between the two worlds. These Frameworks provide mechanisms for: (a) Query translation (SPARQL to XQuery translation); (b) Mapping specification and generation (Ontology to XML Schema mapping); and (c) Schema transformation (XML Schema to OWL transformation).
△ Less
Submitted 11 August, 2016;
originally announced August 2016.
-
graphVizdb: A Scalable Platform for Interactive Large Graph Visualization
Authors:
Nikos Bikakis,
John Liagouris,
Maria Krommyda,
George Papastefanatos,
Timos Sellis
Abstract:
We present a novel platform for the interactive visualization of very large graphs. The platform enables the user to interact with the visualized graph in a way that is very similar to the exploration of maps at multiple levels. Our approach involves an offline preprocessing phase that builds the layout of the graph by assigning coordinates to its nodes with respect to a Euclidean plane. The respe…
▽ More
We present a novel platform for the interactive visualization of very large graphs. The platform enables the user to interact with the visualized graph in a way that is very similar to the exploration of maps at multiple levels. Our approach involves an offline preprocessing phase that builds the layout of the graph by assigning coordinates to its nodes with respect to a Euclidean plane. The respective points are indexed with a spatial data structure, i.e., an R-tree, and stored in a database. Multiple abstraction layers of the graph based on various criteria are also created offline, and they are indexed similarly so that the user can explore the dataset at different levels of granularity, depending on her particular needs. Then, our system translates user operations into simple and very efficient spatial operations (i.e., window queries) in the backend. This technique allows for a fine-grained access to very large graphs with extremely low latency and memory requirements and without compromising the functionality of the tool. Our web-based prototype supports three main operations: (1) interactive navigation, (2) multi-level exploration, and (3) keyword search on the graph metadata.
△ Less
Submitted 20 February, 2016;
originally announced February 2016.
-
Exploration and Visualization in the Web of Big Linked Data: A Survey of the State of the Art
Authors:
Nikos Bikakis,
Timos Sellis
Abstract:
Data exploration and visualization systems are of great importance in the Big Data era. Exploring and visualizing very large datasets has become a major research challenge, of which scalability is a vital requirement. In this survey, we describe the major prerequisites and challenges that should be addressed by the modern exploration and visualization systems. Considering these challenges, we pres…
▽ More
Data exploration and visualization systems are of great importance in the Big Data era. Exploring and visualizing very large datasets has become a major research challenge, of which scalability is a vital requirement. In this survey, we describe the major prerequisites and challenges that should be addressed by the modern exploration and visualization systems. Considering these challenges, we present how state-of-the-art approaches from the Database and Information Visualization communities attempt to handle them. Finally, we survey the systems developed by Semantic Web community in the context of the Web of Linked Data, and discuss to which extent these satisfy the contemporary requirements.
△ Less
Submitted 29 January, 2016;
originally announced January 2016.
-
A Hierarchical Aggregation Framework for Efficient Multilevel Visual Exploration and Analysis
Authors:
Nikos Bikakis,
George Papastefanatos,
Melina Skourla,
Timos Sellis
Abstract:
Data exploration and visualization systems are of great importance in the Big Data era, in which the volume and heterogeneity of available information make it difficult for humans to manually explore and analyse data. Most traditional systems operate in an offline way, limited to accessing preprocessed (static) sets of data. They also restrict themselves to dealing with small dataset sizes, which…
▽ More
Data exploration and visualization systems are of great importance in the Big Data era, in which the volume and heterogeneity of available information make it difficult for humans to manually explore and analyse data. Most traditional systems operate in an offline way, limited to accessing preprocessed (static) sets of data. They also restrict themselves to dealing with small dataset sizes, which can be easily handled with conventional techniques. However, the Big Data era has realized the availability of a great amount and variety of big datasets that are dynamic in nature; most of them offer API or query endpoints for online access, or the data is received in a stream fashion. Therefore, modern systems must address the challenge of on-the-fly scalable visualizations over large dynamic sets of data, offering efficient exploration techniques, as well as mechanisms for information abstraction and summarization. In this work, we present a generic model for personalized multilevel exploration and analysis over large dynamic sets of numeric and temporal data. Our model is built on top of a lightweight tree-based structure which can be efficiently constructed on-the-fly for a given set of data. This tree structure aggregates input objects into a hierarchical multiscale model. Considering different exploration scenarios over large datasets, the proposed model enables efficient multilevel exploration, offering incremental construction and prefetching via user interaction, and dynamic adaptation of the hierarchies based on user preferences. A thorough theoretical analysis is presented, illustrating the efficiency of the proposed model. The proposed model is realized in a web-based prototype tool, called SynopsViz that offers multilevel visual exploration and analysis over Linked Data datasets.
△ Less
Submitted 19 February, 2016; v1 submitted 15 November, 2015;
originally announced November 2015.
-
Finding Desirable Objects under Group Categorical Preferences
Authors:
Nikos Bikakis,
Karim Benouaret,
Dimitris Sacharidis
Abstract:
Considering a group of users, each specifying individual preferences over categorical attributes, the problem of determining a set of objects that are objectively preferable by all users is challenging on two levels. First, we need to determine the preferable objects based on the categorical preferences for each user, and second we need to reconcile possible conflicts among users' preferences. A n…
▽ More
Considering a group of users, each specifying individual preferences over categorical attributes, the problem of determining a set of objects that are objectively preferable by all users is challenging on two levels. First, we need to determine the preferable objects based on the categorical preferences for each user, and second we need to reconcile possible conflicts among users' preferences. A naive solution would first assign degrees of match between each user and each object, by taking into account all categorical attributes, and then for each object combine these matching degrees across users to compute the total score of an object. Such an approach, however, performs two series of aggregation, among categorical attributes and then across users, which completely obscure and blur individual preferences. Our solution, instead of combining individual matching degrees, is to directly operate on categorical attributes, and define an objective Pareto-based aggregation for group preferences. Building on our interpretation, we tackle two distinct but relevant problems: finding the Pareto-optimal objects, and objectively ranking objects with respect to the group preferences. To increase the efficiency when dealing with categorical attributes, we introduce an elegant transformation of categorical attribute values into numerical values, which exhibits certain nice properties and allows us to use well-known index structures to accelerate the solutions to the two problems. In fact, experiments on real and synthetic data show that our index-based techniques are an order of magnitude faster than baseline approaches, scaling up to millions of objects and thousands of users.
△ Less
Submitted 29 September, 2015;
originally announced September 2015.
-
Towards Scalable Visual Exploration of Very Large RDF Graphs
Authors:
Nikos Bikakis,
John Liagouris,
Maria Krommyda,
George Papastefanatos,
Timos Sellis
Abstract:
In this paper, we outline our work on developing a disk-based infrastructure for efficient visualization and graph exploration operations over very large graphs. The proposed platform, called graphVizdb, is based on a novel technique for indexing and storing the graph. Particularly, the graph layout is indexed with a spatial data structure, i.e., an R-tree, and stored in a database. In runtime, us…
▽ More
In this paper, we outline our work on developing a disk-based infrastructure for efficient visualization and graph exploration operations over very large graphs. The proposed platform, called graphVizdb, is based on a novel technique for indexing and storing the graph. Particularly, the graph layout is indexed with a spatial data structure, i.e., an R-tree, and stored in a database. In runtime, user operations are translated into efficient spatial operations (i.e., window queries) in the backend.
△ Less
Submitted 16 June, 2015; v1 submitted 13 June, 2015;
originally announced June 2015.
-
rdf:SynopsViz - A Framework for Hierarchical Linked Data Visual Exploration and Analysis
Authors:
Nikos Bikakis,
Melina Skourla,
George Papastefanatos
Abstract:
The purpose of data visualization is to offer intuitive ways for information perception and manipulation, especially for non-expert users. The Web of Data has realized the availability of a huge amount of datasets. However, the volume and heterogeneity of available information make it difficult for humans to manually explore and analyse large datasets. In this paper, we present rdf:SynopsViz, a to…
▽ More
The purpose of data visualization is to offer intuitive ways for information perception and manipulation, especially for non-expert users. The Web of Data has realized the availability of a huge amount of datasets. However, the volume and heterogeneity of available information make it difficult for humans to manually explore and analyse large datasets. In this paper, we present rdf:SynopsViz, a tool for hierarchical charting and visual exploration of Linked Open Data (LOD). Hierarchical LOD exploration is based on the creation of multiple levels of hierarchically related groups of resources based on the values of one or more properties. The adopted hierarchical model provides effective information abstraction and summarization. Also, it allows efficient -on the fly- statistic computations, using aggregations over the hierarchy levels.
△ Less
Submitted 27 June, 2017; v1 submitted 13 August, 2014;
originally announced August 2014.
-
Supporting SPARQL Update Queries in RDF-XML Integration
Authors:
Nikos Bikakis,
Chrisa Tsinaraki,
Ioannis Stavrakantonakis,
Stavros Christodoulakis
Abstract:
The Web of Data encourages organizations and companies to publish their data according to the Linked Data practices and offer SPARQL endpoints. On the other hand, the dominant standard for information exchange is XML. The SPARQL2XQuery Framework focuses on the automatic translation of SPARQL queries in XQuery expressions in order to access XML data across the Web. In this paper, we outline our ong…
▽ More
The Web of Data encourages organizations and companies to publish their data according to the Linked Data practices and offer SPARQL endpoints. On the other hand, the dominant standard for information exchange is XML. The SPARQL2XQuery Framework focuses on the automatic translation of SPARQL queries in XQuery expressions in order to access XML data across the Web. In this paper, we outline our ongoing work on supporting update queries in the RDF-XML integration scenario.
△ Less
Submitted 27 August, 2014; v1 submitted 12 August, 2014;
originally announced August 2014.
-
The SPARQL2XQuery Interoperability Framework. Utilizing Schema Mapping, Schema Transformation and Query Translation to Integrate XML and the Semantic Web
Authors:
Nikos Bikakis,
Chrisa Tsinaraki,
Ioannis Stavrakantonakis,
Nektarios Gioldasis,
Stavros Christodoulakis
Abstract:
The Web of Data is an open environment consisting of a great number of large inter-linked RDF datasets from various domains. In this environment, organizations and companies adopt the Linked Data practices utilizing Semantic Web (SW) technologies, in order to publish their data and offer SPARQL endpoints (i.e., SPARQL-based search services). On the other hand, the dominant standard for information…
▽ More
The Web of Data is an open environment consisting of a great number of large inter-linked RDF datasets from various domains. In this environment, organizations and companies adopt the Linked Data practices utilizing Semantic Web (SW) technologies, in order to publish their data and offer SPARQL endpoints (i.e., SPARQL-based search services). On the other hand, the dominant standard for information exchange in the Web today is XML. The SW and XML worlds and their developed infrastructures are based on different data models, semantics and query languages. Thus, it is crucial to develop interoperability mechanisms that allow the Web of Data users to access XML datasets, using SPARQL, from their own working environments. It is unrealistic to expect that all the existing legacy data (e.g., Relational, XML, etc.) will be transformed into SW data. Therefore, publishing legacy data as Linked Data and providing SPARQL endpoints over them has become a major research challenge. In this direction, we introduce the SPARQL2XQuery Framework which creates an interoperable environment, where SPARQL queries are automatically translated to XQuery queries, in order to access XML data across the Web. The SPARQL2XQuery Framework provides a mapping model for the expression of OWL-RDF/S to XML Schema mappings as well as a method for SPARQL to XQuery translation. To this end, our Framework supports both manual and automatic mapping specification between ontologies and XML Schemas. In the automatic mapping specification scenario, the SPARQL2XQuery exploits the XS2OWL component which transforms XML Schemas into OWL ontologies. Finally, extensive experiments have been conducted in order to evaluate the schema transformation, mapping generation, query translation and query evaluation efficiency, using both real and synthetic datasets.
△ Less
Submitted 1 January, 2014; v1 submitted 3 November, 2013;
originally announced November 2013.
-
Publishing Life Science Data as Linked Open Data: the Case Study of miRBase
Authors:
Theodore Dalamagas,
Nikos Bikakis,
George Papastefanatos,
Yannis Stavrakas,
Artemis G. Hatzigeorgiou
Abstract:
This paper presents our Linked Open Data (LOD) infrastructures for genomic and experimental data related to microRNA biomolecules. Legacy data from two well-known microRNA databases with experimental data and observations, as well as change and version information about microRNA entities, are fused and exported as LOD. Our LOD server assists biologists to explore biological entities and their evol…
▽ More
This paper presents our Linked Open Data (LOD) infrastructures for genomic and experimental data related to microRNA biomolecules. Legacy data from two well-known microRNA databases with experimental data and observations, as well as change and version information about microRNA entities, are fused and exported as LOD. Our LOD server assists biologists to explore biological entities and their evolution, and provides a SPARQL endpoint for applications and services to query historical miRNA data and track changes, their causes and effects.
△ Less
Submitted 10 May, 2012;
originally announced May 2012.