Skip to main content

Showing 1–22 of 22 results for author: Deep, S

.
  1. arXiv:2504.12251  [pdf, other

    cs.DB

    An Evaluation of N-Gram Selection Strategies for Regular Expression Indexing in Contemporary Text Analysis Tasks

    Authors: Ling Zhang, Shaleen Deep, Jignesh M. Patel, Karthikeyan Sankaralingam

    Abstract: Efficient evaluation of regular expressions (regex, for short) is crucial for text analysis, and n-gram indexes are fundamental to achieving fast regex evaluation performance. However, these indexes face scalability challenges because of the exponential number of possible n-grams that must be indexed. Many existing selection strategies, developed decades ago, have not been rigorously evaluated on… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  2. arXiv:2406.07847  [pdf, other

    cs.DB

    Output-sensitive Conjunctive Query Evaluation

    Authors: Shaleen Deep, Hangdong Zhao, Austen Z. Fan, Paraschos Koutris

    Abstract: Join evaluation is one of the most fundamental operations performed by database systems and arguably the most well-studied problem in the Database community. A staggering number of join algorithms have been developed, and commercial database engines use finely tuned join heuristics that take into account many factors including the selectivity of predicates, memory, IO, etc. However, most of the re… ▽ More

    Submitted 23 October, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 24 pages, accepted to PODS'2025

  3. arXiv:2403.12436  [pdf, ps, other

    cs.DB

    Evaluating Datalog over Semirings: A Grounding-based Approach

    Authors: Hangdong Zhao, Shaleen Deep, Paraschos Koutris, Sudeepa Roy, Val Tannen

    Abstract: Datalog is a powerful yet elegant language that allows expressing recursive computation. Although Datalog evaluation has been extensively studied in the literature, so far, only loose upper bounds are known on how fast a Datalog program can be evaluated. In this work, we ask the following question: given a Datalog program over a naturally-ordered semiring $σ$, what is the tightest possible runtime… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: To appear at PODS 2024

  4. arXiv:2311.04824  [pdf, other

    cs.DB cs.DC cs.PL

    Multi-Relational Algebra and Its Applications to Data Insights

    Authors: Xi Wu, Zichen Zhu, Xiangyao Yu, Shaleen Deep, Stratis Viglas, John Cieslewicz, Somesh Jha, Jeffrey F. Naughton

    Abstract: A range of data insight analytical tasks involves analyzing a large set of tables of different schemas, possibly induced by various groupings, to find salient patterns. This paper presents Multi-Relational Algebra, an extension of the classic Relational Algebra, to facilitate such transformations and their compositions. Multi-Relational Algebra has two main characteristics: (1) Information Unit. T… ▽ More

    Submitted 29 September, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

  5. arXiv:2310.00815  [pdf

    cs.DB

    ReAcTable: Enhancing ReAct for Table Question Answering

    Authors: Yunjia Zhang, Jordan Henkel, Avrilia Floratou, Joyce Cahoon, Shaleen Deep, Jignesh M. Patel

    Abstract: Table Question Answering (TQA) presents a substantial challenge at the intersection of natural language processing and data analytics. This task involves answering natural language (NL) questions on top of tabular data, demanding proficiency in logical reasoning, understanding of data semantics, and fundamental analytical capabilities. Due to its significance, a substantial volume of research has… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  6. arXiv:2309.12436  [pdf, other

    cs.DB

    Rapidash: Efficient Constraint Discovery via Rapid Verification

    Authors: Zifan Liu, Shaleen Deep, Anna Fariha, Fotis Psallidas, Ashish Tiwari, Avrilia Floratou

    Abstract: Denial Constraint (DC) is a well-established formalism that captures a wide range of integrity constraints commonly encountered, including candidate keys, functional dependencies, and ordering constraints, among others. Given their significance, there has been considerable research interest in achieving fast verification and discovery of exact DCs within the database community. Despite the signifi… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: comments and suggestions are welcome!

  7. arXiv:2308.09284  [pdf, ps, other

    cs.FL

    The Fine-Grained Complexity of CFL Reachability

    Authors: Paraschos Koutris, Shaleen Deep

    Abstract: Many problems in static program analysis can be modeled as the context-free language (CFL) reachability problem on directed labeled graphs. The CFL reachability problem can be generally solved in time $O(n^3)$, where $n$ is the number of vertices in the graph, with some specific cases that can be solved faster. In this work, we ask the following question: given a specific CFL, what is the exact ex… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Appeared in POPL 2023. Please note the erratum on the first page

  8. arXiv:2305.01598  [pdf, other

    cs.DB cs.AI cs.HC

    From Words to Code: Harnessing Data for Program Synthesis from Natural Language

    Authors: Anirudh Khatry, Joyce Cahoon, Jordan Henkel, Shaleen Deep, Venkatesh Emani, Avrilia Floratou, Sumit Gulwani, Vu Le, Mohammad Raza, Sherry Shi, Mukul Singh, Ashish Tiwari

    Abstract: Creating programs to correctly manipulate data is a difficult task, as the underlying programming languages and APIs can be challenging to learn for many users who are not skilled programmers. Large language models (LLMs) demonstrate remarkable potential for generating code from natural language, but in the data manipulation domain, apart from the natural language (NL) description of the intended… ▽ More

    Submitted 3 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 14 pages

  9. Space-Time Tradeoffs for Conjunctive Queries with Access Patterns

    Authors: Hangdong Zhao, Shaleen Deep, Paraschos Koutris

    Abstract: In this paper, we investigate space-time tradeoffs for answering conjunctive queries with access patterns (CQAPs). The goal is to create a space-efficient data structure in an initial preprocessing phase and use it for answering (multiple) queries in an online phase. Previous work has developed data structures that trades off space usage for answering time for queries of practical interest, such a… ▽ More

    Submitted 2 May, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

  10. arXiv:2302.00120  [pdf, other

    cs.DB cs.DC cs.PL

    Holistic Cube Analysis: A Query Framework for Data Insights

    Authors: Xi Wu, Shaleen Deep, Joe Benassi, Fengan Li, Yaqi Zhang, Uyeong Jang, James Foster, Stella Kim, Yujing Sun, Long Nguyen, Stratis Viglas, Somesh Jha, John Cieslewicz, Jeffrey F. Naughton

    Abstract: Many data insight questions can be viewed as searching in a large space of tables and finding important ones, where the notion of importance is defined in some adhoc user defined manner. This paper presents Holistic Cube Analysis (HoCA), a framework that augments the capabilities of relational queries for such problems. HoCA first augments the relational data model and introduces a new data type A… ▽ More

    Submitted 1 July, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

    Comments: Establishing initial concepts of HoCA

  11. arXiv:2201.05566  [pdf, other

    cs.DB cs.DS

    Ranked Enumeration of Join Queries with Projections

    Authors: Shaleen Deep, Xiao Hu, Paraschos Koutris

    Abstract: Join query evaluation with ordering is a fundamental data processing task in relational database management systems. SQL and custom graph query languages such as Cypher offer this functionality by allowing users to specify the order via the ORDER BY clause. In many scenarios, the users also want to see the first $k$ results quickly (expressed by the LIMIT clause), but the value of $k$ is not prede… ▽ More

    Submitted 22 January, 2022; v1 submitted 14 January, 2022; originally announced January 2022.

    Comments: Accepted at VLDB 2022. Comments and suggestions are always welcome

  12. arXiv:2109.10889  [pdf, ps, other

    cs.DS cs.DB

    General Space-Time Tradeoffs via Relational Queries

    Authors: Shaleen Deep, Xiao Hu, Paraschos Koutris

    Abstract: In this paper, we investigate space-time tradeoffs for answering Boolean conjunctive queries. The goal is to create a data structure in an initial preprocessing phase and use it for answering (multiple) queries. Previous work has developed data structures that trade off space usage for answering time and has proved conditional space lower bounds for queries of practical interest such as the path a… ▽ More

    Submitted 13 August, 2023; v1 submitted 22 September, 2021; originally announced September 2021.

    Comments: Appeared in WADS 2023. Comments and suggestions are always welcome

  13. arXiv:2101.03712  [pdf, other

    cs.DB cs.DS

    Enumeration Algorithms for Conjunctive Queries with Projection

    Authors: Shaleen Deep, Xiao Hu, Paraschos Koutris

    Abstract: We investigate the enumeration of query results for an important subset of CQs with projections, namely star and path queries. The task is to design data structures and algorithms that allow for efficient enumeration with delay guarantees after a preprocessing phase. Our main contribution is a series of results based on the idea of interleaving precomputed output with further join processing to ma… ▽ More

    Submitted 26 May, 2025; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: Accepted journal version for LMCS

  14. arXiv:2011.05549  [pdf, other

    cs.DB

    Comprehensive and Efficient Workload Compression

    Authors: Shaleen Deep, Anja Gruenheid, Paraschos Koutris, Jeffrey Naughton, Stratis Viglas

    Abstract: This work studies the problem of constructing a representative workload from a given input analytical query workload where the former serves as an approximation with guarantees of the latter. We discuss our work in the context of workload analysis and monitoring. As an example, evolving system usage patterns in a database system can cause load imbalance and performance regressions which can be con… ▽ More

    Submitted 3 February, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

  15. arXiv:2002.12459  [pdf, other

    cs.DB

    Fast Join Project Query Evaluation using Matrix Multiplication

    Authors: Shaleen Deep, Xiao Hu, Paraschos Koutris

    Abstract: In the last few years, much effort has been devoted to developing join algorithms in order to achieve worst-case optimality for join queries over relational databases. Towards this end, the database community has had considerable success in developing succinct algorithms that achieve worst-case optimal runtime for full join queries, i.e the join is over all variables present in the input database.… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

  16. arXiv:2002.02154  [pdf, other

    cs.CL

    Related Tasks can Share! A Multi-task Framework for Affective language

    Authors: Kumar Shikhar Deep, Md Shad Akhtar, Asif Ekbal, Pushpak Bhattacharyya

    Abstract: Expressing the polarity of sentiment as 'positive' and 'negative' usually have limited scope compared with the intensity/degree of polarity. These two tasks (i.e. sentiment classification and sentiment intensity prediction) are closely related and may offer assistance to each other during the learning process. In this paper, we propose to leverage the relatedness of multiple tasks in a multi-task… ▽ More

    Submitted 6 February, 2020; originally announced February 2020.

    Comments: 12 pages, 3 figures and 3 tables. Accepted in 20th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2019. To be published in Springer LNCS volume

    ACM Class: I.2.7

  17. A Deep Neural Framework for Contextual Affect Detection

    Authors: Kumar Shikhar Deep, Asif Ekbal, Pushpak Bhattacharyya

    Abstract: A short and simple text carrying no emotion can represent some strong emotions when reading along with its context, i.e., the same sentence can express extreme anger as well as happiness depending on its context. In this paper, we propose a Contextual Affect Detection (CAD) framework which learns the inter-dependence of words in a sentence, and at the same time the inter-dependence of sentences in… ▽ More

    Submitted 28 January, 2020; originally announced January 2020.

    Comments: 12 pages, 5 tables and 3 figures. Accepted in ICONIP 2019 (International Conference on Neural Information Processing) Published in Lecture Notes in Computer Science, vol 11955. Springer, Cham https://link.springer.com/chapter/10.1007/978-3-030-36718-3_34

    ACM Class: I.2.7

    Journal ref: LNCS 11955 (2019) 398-409

  18. arXiv:1909.00845  [pdf, other

    cs.DB

    Revenue Maximization for Query Pricing

    Authors: Shuchi Chawla, Shaleen Deep, Paraschos Koutris, Yifeng Teng

    Abstract: Buying and selling of data online has increased substantially over the last few years. Several frameworks have already been proposed that study query pricing in theory and practice. The key guiding principle in these works is the notion of {\em arbitrage-freeness} where the broker can set different prices for different queries made to the dataset, but must ensure that the pricing function does not… ▽ More

    Submitted 9 September, 2019; v1 submitted 2 September, 2019; originally announced September 2019.

    Comments: To appear in PVLDB; version 2 with some cosmetic changes

  19. arXiv:1903.00846  [pdf, other

    cs.NI cs.CR

    A survey of security and privacy issues in the Internet of Things from the layered context

    Authors: Samundra Deep, Xi Zheng, Alireza Jolfaei, Dongjin Yu, Pouya Ostovari, Ali Kashif Bashir

    Abstract: Internet of Things (IoT) is a novel paradigm, which not only facilitates a large number of devices to be ubiquitously connected over the Internet but also provides a mechanism to remotely control these devices. The IoT is pervasive and is almost an integral part of our daily life. As devices are becoming increasingly connected, privacy and security issues become more and more critical and these ne… ▽ More

    Submitted 24 February, 2020; v1 submitted 3 March, 2019; originally announced March 2019.

  20. Ranked Enumeration of Conjunctive Query Results

    Authors: Shaleen Deep, Paraschos Koutris

    Abstract: We study the problem of enumerating answers of Conjunctive Queries ranked according to a given ranking function. Our main contribution is a novel algorithm with small preprocessing time, logarithmic delay, and non-trivial space usage during execution. To allow for efficient enumeration, we exploit certain properties of ranking functions that frequently occur in practice. To this end, we introduce… ▽ More

    Submitted 15 May, 2025; v1 submitted 7 February, 2019; originally announced February 2019.

    Journal ref: Logical Methods in Computer Science, Volume 21, Issue 2 (May 16, 2025) lmcs:8638

  21. arXiv:1709.06186  [pdf, ps, other

    cs.DB

    Compressed Representations of Conjunctive Query Results

    Authors: Shaleen Deep, Paraschos Koutris

    Abstract: Relational queries, and in particular join queries, often generate large output results when executed over a huge dataset. In such cases, it is often infeasible to store the whole materialized output if we plan to reuse it further down a data processing pipeline. Motivated by this problem, we study the construction of space-efficient compressed representations of the output of conjunctive queries,… ▽ More

    Submitted 27 March, 2018; v1 submitted 18 September, 2017; originally announced September 2017.

    Comments: To appear in PODS'18; 35 pages; comments welcome

  22. arXiv:1606.09376  [pdf, ps, other

    cs.DB cs.GT

    The Design of Arbitrage-Free Data Pricing Schemes

    Authors: Shaleen Deep, Paraschos Koutris

    Abstract: Motivated by a growing market that involves buying and selling data over the web, we study pricing schemes that assign value to queries issued over a database. Previous work studied pricing mechanisms that compute the price of a query by extending a data seller's explicit prices on certain queries, or investigated the properties that a pricing function should exhibit without detailing a generic co… ▽ More

    Submitted 30 June, 2016; originally announced June 2016.

    Comments: full paper