Skip to main content

Showing 1–16 of 16 results for author: Patel, J M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2508.05029  [pdf, ps, other

    cs.DC cs.DB

    Theseus: A Distributed and Scalable GPU-Accelerated Query Processing Platform Optimized for Efficient Data Movement

    Authors: Felipe Aramburú, William Malpica, Kaouther Abrougui, Amin Aramoon, Romulo Auccapuclla, Claude Brisson, Matthijs Brobbel, Colby Farrell, Pradeep Garigipati, Joost Hoozemans, Supun Kamburugamuve, Akhil Nair, Alexander Ocsa, Johan Peltenburg, Rubén Quesada López, Deepak Sihag, Ahmet Uyar, Dhruv Vats, Michael Wendt, Jignesh M. Patel, Rodrigo Aramburú

    Abstract: Online analytical processing of queries on datasets in the many-terabyte range is only possible with costly distributed computing systems. To decrease the cost and increase the throughput, systems can leverage accelerators such as GPUs, which are now ubiquitous in the compute infrastructure. This introduces many challenges, the majority of which are related to when, where, and how to best move dat… ▽ More

    Submitted 7 August, 2025; originally announced August 2025.

    Comments: 6 Pages,6 Figures

    ACM Class: H.2.4

  2. arXiv:2507.18320  [pdf, ps, other

    cs.LG

    State of Health Estimation of Batteries Using a Time-Informed Dynamic Sequence-Inverted Transformer

    Authors: Janak M. Patel, Milad Ramezankhani, Anirudh Deodhar, Dagnachew Birru

    Abstract: The rapid adoption of battery-powered vehicles and energy storage systems over the past decade has made battery health monitoring increasingly critical. Batteries play a central role in the efficiency and safety of these systems, yet they inevitably degrade over time due to repeated charge-discharge cycles. This degradation leads to reduced energy efficiency and potential overheating, posing signi… ▽ More

    Submitted 24 July, 2025; originally announced July 2025.

    Comments: 11 pages, 3 figures

  3. arXiv:2506.13906  [pdf, ps, other

    cs.LG

    GITO: Graph-Informed Transformer Operator for Learning Complex Partial Differential Equations

    Authors: Milad Ramezankhani, Janak M. Patel, Anirudh Deodhar, Dagnachew Birru

    Abstract: We present a novel graph-informed transformer operator (GITO) architecture for learning complex partial differential equation systems defined on irregular geometries and non-uniform meshes. GITO consists of two main modules: a hybrid graph transformer (HGT) and a transformer neural operator (TNO). HGT leverages a graph neural network (GNN) to encode local spatial relationships and a transformer to… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  4. arXiv:2504.12251  [pdf, ps, other

    cs.DB

    An Evaluation of N-Gram Selection Strategies for Regular Expression Indexing in Contemporary Text Analysis Tasks. Extended Version

    Authors: Ling Zhang, Shaleen Deep, Jignesh M. Patel, Karthikeyan Sankaralingam

    Abstract: Efficient evaluation of regular expressions (regex, for short) is crucial for text analysis, and n-gram indexes are fundamental to achieving fast regex evaluation performance. However, these indexes face scalability challenges because of the exponential number of possible n-grams that must be indexed. Many existing selection strategies, developed decades ago, have not been rigorously evaluated on… ▽ More

    Submitted 4 September, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

  5. arXiv:2504.11259  [pdf, ps, other

    cs.DB

    The Cambridge Report on Database Research

    Authors: Anastasia Ailamaki, Samuel Madden, Daniel Abadi, Gustavo Alonso, Sihem Amer-Yahia, Magdalena Balazinska, Philip A. Bernstein, Peter Boncz, Michael Cafarella, Surajit Chaudhuri, Susan Davidson, David DeWitt, Yanlei Diao, Xin Luna Dong, Michael Franklin, Juliana Freire, Johannes Gehrke, Alon Halevy, Joseph M. Hellerstein, Mark D. Hill, Stratos Idreos, Yannis Ioannidis, Christoph Koch, Donald Kossmann, Tim Kraska , et al. (21 additional authors not shown)

    Abstract: On October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community engagement. This gathering continues a long standing tradition in the database community, dating back to the late 1980s, in which researchers meet roughly every five… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  6. Accelerated Gradient-based Design Optimization Via Differentiable Physics-Informed Neural Operator: A Composites Autoclave Processing Case Study

    Authors: Janak M. Patel, Milad Ramezankhani, Anirudh Deodhar, Dagnachew Birru

    Abstract: Simulation and optimization are crucial for advancing the engineering design of complex systems and processes. Traditional optimization methods require substantial computational time and effort due to their reliance on resource-intensive simulations, such as finite element analysis, and the complexity of rigorous optimization algorithms. Data-agnostic AI-based surrogate models, such as Physics-Inf… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 15 pages, 7 figures

    Journal ref: j.compositesb. 1359-8368 (2025) 112935

  7. arXiv:2310.00815  [pdf

    cs.DB

    ReAcTable: Enhancing ReAct for Table Question Answering

    Authors: Yunjia Zhang, Jordan Henkel, Avrilia Floratou, Joyce Cahoon, Shaleen Deep, Jignesh M. Patel

    Abstract: Table Question Answering (TQA) presents a substantial challenge at the intersection of natural language processing and data analytics. This task involves answering natural language (NL) questions on top of tabular data, demanding proficiency in logical reasoning, understanding of data semantics, and fundamental analytical capabilities. Due to its significance, a substantial volume of research has… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  8. arXiv:2206.12380  [pdf, other

    cs.DB

    VIP Hashing -- Adapting to Skew in Popularity of Data on the Fly (extended version)

    Authors: Aarati Kakaraparthy, Jignesh M. Patel, Brian P. Kroth, Kwanghyun Park

    Abstract: All data is not equally popular. Often, some portion of data is more frequently accessed than the rest, which causes a skew in popularity of the data items. Adapting to this skew can improve performance, and this topic has been studied extensively in the past for disk-based settings. In this work, we consider an in-memory data structure, namely hash table, and show how one can leverage the skew in… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  9. arXiv:2002.00866  [pdf, other

    cs.DB

    To pipeline or not to pipeline, that is the question

    Authors: Harshad Deshmukh, Bruhathi Sundarmurthy, Jignesh M. Patel

    Abstract: In designing query processing primitives, a crucial design choice is the method for data transfer between two operators in a query plan. As we were considering this critical design mechanism for an in-memory database system that we are building, we quickly realized that (surprisingly) there isn't a clear definition of this concept. Papers are full or ad hoc use of terms like pipelining and blockin… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

  10. arXiv:1704.02996  [pdf, other

    cs.PL cs.DB cs.PF

    ROSA: R Optimizations with Static Analysis

    Authors: Rathijit Sen, Jianqiao Zhu, Jignesh M. Patel, Somesh Jha

    Abstract: R is a popular language and programming environment for data scientists. It is increasingly co-packaged with both relational and Hadoop-based data platforms and can often be the most dominant computational component in data analytics pipelines. Recent work has highlighted inefficiencies in executing R programs, both in terms of execution time and memory requirements, which in practice limit the si… ▽ More

    Submitted 3 July, 2017; v1 submitted 10 April, 2017; originally announced April 2017.

    Comments: A talk on this work will be presented at RIOT 2017 (3rd Workshop on R Implementation, Optimization and Tooling)

  11. arXiv:1702.06943  [pdf, other

    cs.LG cs.DB stat.ML

    Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent

    Authors: Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey F. Naughton, Jignesh M. Patel, Xi Wu

    Abstract: Data compression is a popular technique for improving the efficiency of data processing workloads such as SQL queries and more recently, machine learning (ML) with classical batch gradient methods. But the efficacy of such ideas for mini-batch stochastic gradient descent (MGD), arguably the workhorse algorithm of modern ML, is an open question. MGD's unique data access pattern renders prior art, i… ▽ More

    Submitted 20 January, 2019; v1 submitted 22 February, 2017; originally announced February 2017.

    Comments: Accepted to Sigmod 2019

  12. arXiv:1612.07448  [pdf, other

    cs.DB

    Towards Linear Algebra over Normalized Data

    Authors: Lingjiao Chen, Arun Kumar, Jeffrey Naughton, Jignesh M. Patel

    Abstract: Providing machine learning (ML) over relational data is a mainstream requirement for data analytics systems. While almost all the ML tools require the input data to be presented as a single table, many datasets are multi-table, which forces data scientists to join those tables first, leading to data redundancy and runtime waste. Recent works on "factorized" ML mitigate this issue for a few specifi… ▽ More

    Submitted 26 June, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

  13. arXiv:1208.4166  [pdf, other

    cs.DB

    Can the Elephants Handle the NoSQL Onslaught?

    Authors: Avrilia Floratou, Nikhil Teletia, David J. Dewitt, Jignesh M. Patel, Donghui Zhang

    Abstract: In this new era of "big data", traditional DBMSs are under attack from two sides. At one end of the spectrum, the use of document store NoSQL systems (e.g. MongoDB) threatens to move modern Web 2.0 applications away from traditional RDBMSs. At the other end of the spectrum, big data DSS analytics that used to be the domain of parallel RDBMSs is now under attack by another class of NoSQL data analy… ▽ More

    Submitted 20 August, 2012; originally announced August 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 12, pp. 1712-1723 (2012)

  14. arXiv:1208.1933  [pdf, other

    cs.DB

    Towards Energy-Efficient Database Cluster Design

    Authors: Willis Lang, Stavros Harizopoulos, Jignesh M. Patel, Mehul A. Shah, Dimitris Tsirogiannis

    Abstract: Energy is a growing component of the operational cost for many "big data" deployments, and hence has become increasingly important for practitioners of large-scale data analysis who require scale-out clusters or parallel DBMS appliances. Although a number of recent studies have investigated the energy efficiency of DBMSs, none of these studies have looked at the architectural design space of energ… ▽ More

    Submitted 9 August, 2012; originally announced August 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1684-1695 (2012)

  15. arXiv:1201.0228  [pdf, other

    cs.DB

    High-Performance Concurrency Control Mechanisms for Main-Memory Databases

    Authors: Per-Åke Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, Mike Zwilling

    Abstract: A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper we introduce two efficient concurrency control methods specifically designed for main-memory databases. Both use multiversioning to isolate read… ▽ More

    Submitted 31 December, 2011; originally announced January 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 4, pp. 298-309 (2011)

  16. arXiv:1201.0226  [pdf, other

    cs.DB

    Towards Cost-Effective Storage Provisioning for DBMSs

    Authors: Ning Zhang, Junichi Tatemura, Jignesh M. Patel, Hakan Hacıgümüş

    Abstract: Data center operators face a bewildering set of choices when considering how to provision resources on machines with complex I/O subsystems. Modern I/O subsystems often have a rich mix of fast, high performing, but expensive SSDs sitting alongside with cheaper but relatively slower (for random accesses) traditional hard disk drives. The data center operators need to determine how to provision the… ▽ More

    Submitted 31 December, 2011; originally announced January 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 4, pp. 274-285 (2011)