Skip to main content

Showing 1–17 of 17 results for author: Afrati, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2208.09671  [pdf, other

    cs.DB

    Safe Subjoins in Acyclic Joins

    Authors: Foto N. Afrati

    Abstract: It is expensive to compute joins, often due to large intermediate relations. For acyclic joins, monotone join expressions are guaranteed to produce intermediate relations not larger than the size of the output of the join when it is computed on a fully reduced database. Any subexpression of an acyclic join does not offer this guarantee, as it is easy to prove. In this paper, we consider joins with… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

  2. arXiv:2102.06563  [pdf, other

    cs.DB

    Querying collections of tree-structured records in the presence of within-record referential constraints

    Authors: Foto N. Afrati, Matthew Damigos

    Abstract: In this paper, we consider a tree-structured data model used in many commercial databases like Dremel, F1, JSON stores. We define identity and referential constraints within each tree-structured record. The query language is a variant of SQL and flattening is used as an evaluation mechanism. We investigate querying in the presence of these constraints, and point out the challenges that arise from… ▽ More

    Submitted 30 August, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

  3. arXiv:2008.10986  [pdf, other

    cs.DB

    On the complexity of query containment and computing certain answers in the presence of ACs

    Authors: Foto N. Afrati, Matthew Damigos

    Abstract: We often add arithmetic to extend the expressiveness of query languages and study the complexity of problems such as testing query containment and finding certain answers in the framework of answering queries using views. When adding arithmetic comparisons, the complexity of such problems is higher than the complexity of their counterparts without them. It has been observed that we can achieve low… ▽ More

    Submitted 18 November, 2020; v1 submitted 25 August, 2020; originally announced August 2020.

  4. SharesSkew: An Algorithm to Handle Skew for Joins in MapReduce

    Authors: Foto Afrati, Nikos Stasinopoulos, Jeffrey D. Ullman, Angelos Vassilakopoulos

    Abstract: In this paper, we investigate the problem of computing a multiway join in one round of MapReduce when the data may be skewed. We optimize on communication cost, i.e., the amount of data that is transferred from the mappers to the reducers. We identify join attributes values that appear very frequently, Heavy Hitters (HH). We distribute HH valued records to reducers avoiding skew by using an adapta… ▽ More

    Submitted 12 December, 2015; originally announced December 2015.

  5. arXiv:1509.08855  [pdf, ps, other

    cs.DB

    Computing Marginals Using MapReduce

    Authors: Foto Afrati, Shantanu Sharma, Jeffrey D. Ullman, Jonathan R. Ullman

    Abstract: We consider the problem of computing the data-cube marginals of a fixed order $k$ (i.e., all marginals that aggregate over $k$ dimensions), using a single round of MapReduce. The focus is on the relationship between the reducer size (number of inputs allowed at a single reducer) and the replication rate (number of reducers to which an input is sent). We show that the replication rate is minimized… ▽ More

    Submitted 29 September, 2015; originally announced September 2015.

  6. arXiv:1508.01171  [pdf, other

    cs.DB cs.DC

    Meta-MapReduce: A Technique for Reducing Communication in MapReduce Computations

    Authors: Foto Afrati, Shlomi Dolev, Shantanu Sharma, Jeffrey D. Ullman

    Abstract: MapReduce has proven to be one of the most useful paradigms in the revolution of distributed computing, where cloud services and cluster computing become the standard venue for computing. The federation of cloud and big data activities is the next challenge where MapReduce should be modified to avoid (big) data migration across remote (cloud) sites. This is exactly our scope of research, where onl… ▽ More

    Submitted 28 July, 2016; v1 submitted 5 August, 2015; originally announced August 2015.

  7. arXiv:1507.04461  [pdf, other

    cs.DB cs.CC cs.DC

    Assignment Problems of Different-Sized Inputs in MapReduce

    Authors: Foto Afrati, Shlomi Dolev, Ephraim Korach, Shantanu Sharma, Jeffrey D. Ullman

    Abstract: A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this output. Reducers have a capacity, which limits the sets of inputs that they can be assigned. However, individual inputs may vary in terms of size. We consider, for th… ▽ More

    Submitted 20 October, 2016; v1 submitted 16 July, 2015; originally announced July 2015.

    Comments: This paper is accepted in ACM Transactions on Knowledge Discovery from Data (TKDD), August 2016. Preliminary versions of this paper have appeared in the proceeding of DISC 2014 and BeyondMR 2015

  8. arXiv:1504.03247  [pdf, other

    cs.DB

    Handling Skew in Multiway Joins in Parallel Processing

    Authors: Foto N. Afrati, Jeffrey D. Ullman, Angelos Vasilakopoulos

    Abstract: Handling skew is one of the major challenges in query processing. In distributed computational environments such as MapReduce, uneven distribution of the data to the servers is not desired. One of the dominant measures that we want to optimize in distributed environments is communication cost. In a MapReduce job this is the amount of data that is transferred from the mappers to the reducers. In th… ▽ More

    Submitted 13 April, 2015; originally announced April 2015.

    Comments: 4 pages

  9. arXiv:1503.00650  [pdf, other

    cs.DB

    Consistent Answers of Conjunctive Queries on Graphs

    Authors: Foto N. Afrati, Phokion G. Kolaitis, Angelos Vasilakopoulos

    Abstract: During the past decade, there has been an extensive investigation of the computational complexity of the consistent answers of Boolean conjunctive queries under primary key constraints. Much of this investigation has focused on self-join-free Boolean conjunctive queries. In this paper, we study the consistent answers of Boolean conjunctive queries involving a single binary relation, i.e., we consi… ▽ More

    Submitted 2 March, 2015; originally announced March 2015.

  10. arXiv:1501.06758  [pdf, ps, other

    cs.DB

    Assignment of Different-Sized Inputs in MapReduce

    Authors: Foto Afrati, Shlomi Dolev, Ephraim Korach, Shantanu Sharma, Jeffrey D. Ullman

    Abstract: A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this output. Reducers have a capacity, which limits the sets of inputs that they can be assigned. However, individual inputs may vary in terms of size. We consider, for th… ▽ More

    Submitted 27 January, 2015; originally announced January 2015.

    Comments: Brief announcement in International Symposium on Distributed Computing (DISC), 2014

  11. arXiv:1410.4156  [pdf, other

    cs.DB

    GYM: A Multiround Join Algorithm In MapReduce

    Authors: Foto Afrati, Manas Joglekar, Christopher RĂ©, Semih Salihoglu, Jeffrey D. Ullman

    Abstract: Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for a spectrum of rounds for the problem of computing the equijoin of $n$ relations. Specifically, given any query $Q$ with width $\w$, {\em intersection width} $\iw$, input size… ▽ More

    Submitted 25 January, 2017; v1 submitted 15 October, 2014; originally announced October 2014.

  12. arXiv:1312.2990  [pdf, ps, other

    cs.DB

    Efficient Lineage for SUM Aggregate Queries

    Authors: Foto N. Afrati, Dimitris Fotakis, Angelos Vasilakopoulos

    Abstract: AI systems typically make decisions and find patterns in data based on the computation of aggregate and specifically sum functions, expressed as queries, on data's attributes. This computation can become costly or even inefficient when these queries concern the whole or big parts of the data and especially when we are dealing with big data. New types of intelligent analytics require also the expla… ▽ More

    Submitted 9 June, 2014; v1 submitted 10 December, 2013; originally announced December 2013.

  13. arXiv:1208.0615  [pdf, ps, other

    cs.DC

    Enumerating Subgraph Instances Using Map-Reduce

    Authors: Foto N. Afrati, Dimitris Fotakis, Jeffrey D. Ullman

    Abstract: The theme of this paper is how to find all instances of a given "sample" graph in a larger "data graph," using a single round of map-reduce. For the simplest sample graph, the triangle, we improve upon the best known such algorithm. We then examine the general case, considering both the communication cost between mappers and reducers and the total computation cost at the reducers. To minimize comm… ▽ More

    Submitted 21 November, 2012; v1 submitted 2 August, 2012; originally announced August 2012.

    Comments: 37 pages

  14. arXiv:1206.4377  [pdf, other

    cs.DC cs.DS

    Upper and Lower Bounds on the Cost of a Map-Reduce Computation

    Authors: Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, Jeffrey D. Ullman

    Abstract: In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can be extracted, the greater will be the total communication between mappers and reducers. We introduce a model of problems that can be solved in a single round of… ▽ More

    Submitted 19 June, 2012; originally announced June 2012.

    Comments: 14 pages

  15. arXiv:1204.1754  [pdf, other

    cs.DB cs.DC

    Vision Paper: Towards an Understanding of the Limits of Map-Reduce Computation

    Authors: Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, Jeffrey D. Ullman

    Abstract: A significant amount of recent research work has addressed the problem of solving various data management problems in the cloud. The major algorithmic challenges in map-reduce computations involve balancing a multitude of factors such as the number of machines available for mappers/reducers, their memory requirements, and communication cost (total amount of data sent from mappers to reducers). Mos… ▽ More

    Submitted 8 April, 2012; originally announced April 2012.

    Comments: 5 pages

  16. A New Framework for Join Product Skew

    Authors: Foto Afrati, Victor Kyritsis, Paraskevas V. Lekeas, Dora Souliou

    Abstract: Different types of data skew can result in load imbalance in the context of parallel joins under the shared nothing architecture. We study one important type of skew, join product skew (JPS). A static approach based on frequency classes is proposed which takes for granted the data distribution of join attribute values. It comes from the observation that the join selectivity can be expressed as a s… ▽ More

    Submitted 3 June, 2010; v1 submitted 31 May, 2010; originally announced May 2010.

  17. arXiv:cs/0510012  [pdf, ps, other

    cs.LO

    On relating CTL to Datalog

    Authors: Foto Afrati, Theodore Andronikos, Vassia Pavlaki, Eugenie Foustoucos, Irene Guessarian

    Abstract: CTL is the dominant temporal specification language in practice mainly due to the fact that it admits model checking in linear time. Logic programming and the database query language Datalog are often used as an implementation platform for logic languages. In this paper we present the exact relation between CTL and Datalog and moreover we build on this relation and known efficient algorithms for… ▽ More

    Submitted 6 October, 2005; v1 submitted 4 October, 2005; originally announced October 2005.

    Comments: 34 pages, 1 figure (file .eps)