Skip to main content

Showing 1–10 of 10 results for author: Floratou, A

.
  1. arXiv:2310.00815  [pdf

    cs.DB

    ReAcTable: Enhancing ReAct for Table Question Answering

    Authors: Yunjia Zhang, Jordan Henkel, Avrilia Floratou, Joyce Cahoon, Shaleen Deep, Jignesh M. Patel

    Abstract: Table Question Answering (TQA) presents a substantial challenge at the intersection of natural language processing and data analytics. This task involves answering natural language (NL) questions on top of tabular data, demanding proficiency in logical reasoning, understanding of data semantics, and fundamental analytical capabilities. Due to its significance, a substantial volume of research has… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  2. arXiv:2309.12436  [pdf, other

    cs.DB

    Rapidash: Efficient Constraint Discovery via Rapid Verification

    Authors: Zifan Liu, Shaleen Deep, Anna Fariha, Fotis Psallidas, Ashish Tiwari, Avrilia Floratou

    Abstract: Denial Constraint (DC) is a well-established formalism that captures a wide range of integrity constraints commonly encountered, including candidate keys, functional dependencies, and ordering constraints, among others. Given their significance, there has been considerable research interest in achieving fast verification and discovery of exact DCs within the database community. Despite the signifi… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: comments and suggestions are welcome!

  3. arXiv:2305.01598  [pdf, other

    cs.DB cs.AI cs.HC

    From Words to Code: Harnessing Data for Program Synthesis from Natural Language

    Authors: Anirudh Khatry, Joyce Cahoon, Jordan Henkel, Shaleen Deep, Venkatesh Emani, Avrilia Floratou, Sumit Gulwani, Vu Le, Mohammad Raza, Sherry Shi, Mukul Singh, Ashish Tiwari

    Abstract: Creating programs to correctly manipulate data is a difficult task, as the underlying programming languages and APIs can be challenging to learn for many users who are not skilled programmers. Large language models (LLMs) demonstrate remarkable potential for generating code from natural language, but in the data manipulation domain, apart from the natural language (NL) description of the intended… ▽ More

    Submitted 3 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 14 pages

  4. LST-Bench: Benchmarking Log-Structured Tables in the Cloud

    Authors: Jesús Camacho-Rodríguez, Ashvin Agrawal, Anja Gruenheid, Ashit Gosalia, Cristian Petculescu, Josep Aguilar-Saborit, Avrilia Floratou, Carlo Curino, Raghu Ramakrishnan

    Abstract: Data processing engines increasingly leverage distributed file systems for scalable, cost-effective storage. While the Apache Parquet columnar format has become a popular choice for data storage and retrieval, the immutability of Parquet files renders it impractical to meet the demands of frequent updates in contemporary analytical workloads. Log-Structured Tables (LSTs), such as Delta Lake, Apach… ▽ More

    Submitted 19 January, 2024; v1 submitted 1 May, 2023; originally announced May 2023.

    Journal ref: Proceedings of the ACM on Management of Data (2024) Volume 2 Issue 1

  5. arXiv:2210.14047  [pdf, other

    cs.DB

    OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance from Database Logs [Technical Report]

    Authors: Fotis Psallidas, Ashvin Agrawal, Chandru Sugunan, Khaled Ibrahim, Konstantinos Karanasos, Jesús Camacho-Rodríguez, Avrilia Floratou, Carlo Curino, Raghu Ramakrishnan

    Abstract: Provenance encodes information that connects datasets, their generation workflows, and associated metadata (e.g., who or when executed a query). As such, it is instrumental for a wide range of critical governance applications (e.g., observability and auditing). Unfortunately, in the context of database systems, extracting coarse-grained provenance is a long-standing problem due to the complexity a… ▽ More

    Submitted 3 March, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    ACM Class: H.2

  6. arXiv:2001.01861  [pdf, other

    cs.LG cs.DC stat.ML

    Vamsa: Automated Provenance Tracking in Data Science Scripts

    Authors: Mohammad Hossein Namaki, Avrilia Floratou, Fotis Psallidas, Subru Krishnan, Ashvin Agrawal, Yinghui Wu, Yiwen Zhu, Markus Weimer

    Abstract: There has recently been a lot of ongoing research in the areas of fairness, bias and explainability of machine learning (ML) models due to the self-evident or regulatory requirements of various ML applications. We make the following observation: All of these approaches require a robust understanding of the relationship between ML models and the data used to train them. In this work, we introduce t… ▽ More

    Submitted 30 July, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

  7. arXiv:1912.09536  [pdf, other

    cs.LG cs.DC stat.ML

    Data Science through the looking glass and what we found there

    Authors: Fotis Psallidas, Yiwen Zhu, Bojan Karlas, Matteo Interlandi, Avrilia Floratou, Konstantinos Karanasos, Wentao Wu, Ce Zhang, Subru Krishnan, Carlo Curino, Markus Weimer

    Abstract: The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners. This quickly shifting panorama of technologies and applications is challenging for builders and practitioners alike to follow. In this paper, we set out to c… ▽ More

    Submitted 19 December, 2019; originally announced December 2019.

  8. arXiv:1909.00084  [pdf, other

    cs.DB cs.DC cs.LG

    Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML

    Authors: Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Gowdal, Matteo Interlandi, Alekh Jindal, Kostantinos Karanasos, Subru Krishnan, Brian Kroth, Jyoti Leeka, Kwanghyun Park, Hiren Patel, Olga Poppe, Fotis Psallidas, Raghu Ramakrishnan, Abhishek Roy, Karla Saur, Rathijit Sen, Markus Weimer, Travis Wright, Yiwen Zhu

    Abstract: Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, intelligent feedback loops in large-scale sysops, manufacturing and autonomous vehicle management, complex… ▽ More

    Submitted 27 December, 2019; v1 submitted 30 August, 2019; originally announced September 2019.

  9. arXiv:1208.4166  [pdf, other

    cs.DB

    Can the Elephants Handle the NoSQL Onslaught?

    Authors: Avrilia Floratou, Nikhil Teletia, David J. Dewitt, Jignesh M. Patel, Donghui Zhang

    Abstract: In this new era of "big data", traditional DBMSs are under attack from two sides. At one end of the spectrum, the use of document store NoSQL systems (e.g. MongoDB) threatens to move modern Web 2.0 applications away from traditional RDBMSs. At the other end of the spectrum, big data DSS analytics that used to be the domain of parallel RDBMSs is now under attack by another class of NoSQL data analy… ▽ More

    Submitted 20 August, 2012; originally announced August 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 12, pp. 1712-1723 (2012)

  10. arXiv:1105.4252  [pdf

    cs.DB cs.DC

    Column-Oriented Storage Techniques for MapReduce

    Authors: Avrilia Floratou, Jignesh Patel, Eugene Shekita, Sandeep Tata

    Abstract: Users of MapReduce often run into performance problems when they scale up their workloads. Many of the problems they encounter can be overcome by applying techniques learned from over three decades of research on parallel DBMSs. However, translating these techniques to a MapReduce implementation such as Hadoop presents unique challenges that can lead to new design choices. This paper describes how… ▽ More

    Submitted 21 May, 2011; originally announced May 2011.

    Comments: VLDB2011

    Report number: Proceedings of the VLDB Endowment (PVLDB), Vol. 4, No. 7, pp. 419-429 (2011)