Search | arXiv e-print repository

arXiv:2507.06261 [pdf, ps, other]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3278 additional authors not shown)

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving. △ Less

Submitted 7 July, 2025; originally announced July 2025.

Comments: 72 pages, 17 figures

arXiv:2503.09312 [pdf, ps, other]

Terrier: A Deep Learning Repeat Classifier

Authors: Robert Turnbull, Neil D. Young, Edoardo Tescari, Lee F. Skerratt, Tiffany A. Kosch

Abstract: Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Poor representation of taxa within repeat databases ofte… ▽ More Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Poor representation of taxa within repeat databases often limits the classification accuracy and reproducibility of current repeat annotation methods, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on Repbase, which includes over 100,000 repeat families -- four times more than Dfam -- Terrier maps 97.1% of Repbase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice, fruit flies, humans, and mice), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian, flatworm and Northern krill genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation. △ Less

Submitted 8 July, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

Comments: 14 pages, 9 figures

ACM Class: I.2

arXiv:2501.07663 [pdf, other]

Enhancing Talent Employment Insights Through Feature Extraction with LLM Finetuning

Authors: Karishma Thakrar, Nick Young

Abstract: This paper explores the application of large language models (LLMs) to extract nuanced and complex job features from unstructured job postings. Using a dataset of 1.2 million job postings provided by AdeptID, we developed a robust pipeline to identify and classify variables such as remote work availability, remuneration structures, educational requirements, and work experience preferences. Our met… ▽ More This paper explores the application of large language models (LLMs) to extract nuanced and complex job features from unstructured job postings. Using a dataset of 1.2 million job postings provided by AdeptID, we developed a robust pipeline to identify and classify variables such as remote work availability, remuneration structures, educational requirements, and work experience preferences. Our methodology combines semantic chunking, retrieval-augmented generation (RAG), and fine-tuning DistilBERT models to overcome the limitations of traditional parsing tools. By leveraging these techniques, we achieved significant improvements in identifying variables often mislabeled or overlooked, such as non-salary-based compensation and inferred remote work categories. We present a comprehensive evaluation of our fine-tuned models and analyze their strengths, limitations, and potential for scaling. This work highlights the promise of LLMs in labor market analytics, providing a foundation for more accurate and actionable insights into job data. △ Less

Submitted 13 January, 2025; originally announced January 2025.

arXiv:2408.12065 [pdf, ps, other]

Transformers As Approximations of Solomonoff Induction

Authors: Nathan Young, Michael Witbrock

Abstract: Solomonoff Induction is an optimal-in-the-limit unbounded algorithm for sequence prediction, representing a Bayesian mixture of every computable probability distribution and performing close to optimally in predicting any computable sequence. Being an optimal form of computational sequence prediction, it seems plausible that it may be used as a model against which other methods of sequence predi… ▽ More Solomonoff Induction is an optimal-in-the-limit unbounded algorithm for sequence prediction, representing a Bayesian mixture of every computable probability distribution and performing close to optimally in predicting any computable sequence. Being an optimal form of computational sequence prediction, it seems plausible that it may be used as a model against which other methods of sequence prediction might be compared. We put forth and explore the hypothesis that Transformer models - the basis of Large Language Models - approximate Solomonoff Induction better than any other extant sequence prediction method. We explore evidence for and against this hypothesis, give alternate hypotheses that take this evidence into account, and outline next steps for modelling Transformers and other kinds of AI in this way. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2404.10179 [pdf, other]

Scaling Instructable Agents Across Many Simulated Worlds

Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi , et al. (69 additional authors not shown)

Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio… ▽ More Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as open-ended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games. △ Less

Submitted 11 October, 2024; v1 submitted 13 March, 2024; originally announced April 2024.

arXiv:2305.12599 [pdf, other]

Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning

Authors: Qiming Bao, Alex Yuxuan Peng, Zhenyun Deng, Wanjun Zhong, Gael Gendron, Timothy Pistotti, Neset Tan, Nathan Young, Yang Chen, Yonghua Zhu, Paul Denny, Michael Witbrock, Jiamou Liu

Abstract: Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data… ▽ More Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard at https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347. The source code and data are publicly available at https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning. △ Less

Submitted 17 April, 2025; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 21 pages, 8 figures, the Findings of ACL 2024

arXiv:2302.09692 [pdf, ps, other]

doi 10.1145/3709361

Classification via Two-Way Comparisons

Authors: Marek Chrobak, Neal E. Young

Abstract: Given a weighted, ordered query set $Q$ and a partition of $Q$ into classes, we study the problem of computing a minimum-cost decision tree that, given any query $q$ in $Q$, uses equality tests and less-than comparisons to determine the class to which $q$ belongs. Such a tree can be much smaller than a lookup table, and much faster and smaller than a conventional search tree. We give the first pol… ▽ More Given a weighted, ordered query set $Q$ and a partition of $Q$ into classes, we study the problem of computing a minimum-cost decision tree that, given any query $q$ in $Q$, uses equality tests and less-than comparisons to determine the class to which $q$ belongs. Such a tree can be much smaller than a lookup table, and much faster and smaller than a conventional search tree. We give the first polynomial-time algorithm for the problem. The algorithm extends naturally to the setting where each query has multiple allowed classes. △ Less

Submitted 25 January, 2025; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: Appears in WADS 2023 and TALG 2024

MSC Class: 68P10; 68P30; 68W25; 94A45 ACM Class: E.4; G.1.6; G.2.2; H.3.1; I.4.2

Journal ref: ACM Transactions on Algorithms (2024)

arXiv:2302.04106 [pdf]

Detecting Data Type Inconsistencies in a Property Graph Database

Authors: Joshua R. Porter, Michael N. Young, Aleks Y. M. Ontman

Abstract: Some property graph databases do not have a fixed schema, which can result in data type inconsistencies for properties on nodes and relationships, especially when importing data into a running database. Here we present a tool which can rapidly produce a detailed report on every property in the graph. When executed on a large knowledge graph, it allowed us to debug a complex ETL process and enforce… ▽ More Some property graph databases do not have a fixed schema, which can result in data type inconsistencies for properties on nodes and relationships, especially when importing data into a running database. Here we present a tool which can rapidly produce a detailed report on every property in the graph. When executed on a large knowledge graph, it allowed us to debug a complex ETL process and enforce 100% data type consistency. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: 5 pages, 3 figures, general approach applied to production databases

ACM Class: E.0

arXiv:2206.05579 [pdf, other]

doi 10.1007/s00453-024-01270-z

Online Paging with Heterogeneous Cache Slots

Authors: Marek Chrobak, Samuel Haney, Mehraneh Liaee, Debmalya Panigrahi, Rajmohan Rajaraman, Ravi Sundaram, Neal E. Young

Abstract: It is natural to generalize the online $k$-Server problem by allowing each request to specify not only a point $p$, but also a subset $S$ of servers that may serve it. For uniform metrics, the problem is equivalent to a generalization of Paging in which each request specifies not only a page $p$, but also a subset $S$ of cache slots, and is satisfied by having a copy of $p$ in some slot in $S$. We… ▽ More It is natural to generalize the online $k$-Server problem by allowing each request to specify not only a point $p$, but also a subset $S$ of servers that may serve it. For uniform metrics, the problem is equivalent to a generalization of Paging in which each request specifies not only a page $p$, but also a subset $S$ of cache slots, and is satisfied by having a copy of $p$ in some slot in $S$. We call this problem Slot-Heterogenous Paging. We parameterize the problem by specifying a family $\mathcal S \subseteq 2^{[k]}$ of requestable slot sets, and we establish bounds on the competitive ratio as a function of the cache size $k$ and family $\mathcal S$: - If all request sets are allowed ($\mathcal S=2^{[k]}\setminus\{\emptyset\}$), the optimal deterministic and randomized competitive ratios are exponentially worse than for standard \Paging ($\mathcal S=\{[k]\}$). - As a function of $|\mathcal S|$ and $k$, the optimal deterministic ratio is polynomial: at most $O(k^2|\mathcal S|)$ and at least $Ω(\sqrt{|\mathcal S|})$. - For any laminar family $\mathcal S$ of height $h$, the optimal ratios are $O(hk)$ (deterministic) and $O(h^2\log k)$ (randomized). - The special case of laminar $\mathcal S$ that we call All-or-One Paging extends standard Paging by allowing each request to specify a specific slot to put the requested page in. The optimal deterministic ratio for weighted All-or-One Paging is $Θ(k)$. Offline All-or-One Paging is NP-hard. Some results for the laminar case are shown via a reduction to the generalization of Paging in which each request specifies a set $\mathcal P of pages, and is satisfied by fetching any page from $\mathcal P into the cache. The optimal ratios for the latter problem (with laminar family of height $h$) are at most $hk$ (deterministic) and $h\,H_k$ (randomized). △ Less

Submitted 19 October, 2024; v1 submitted 11 June, 2022; originally announced June 2022.

Comments: conference and journal versions appear in STACS 2023 and Algorithmica (2004)

ACM Class: F.2.0; F.1.2; C.0

Journal ref: Algorithmica (2004)

arXiv:2203.12186 [pdf, other]

AbductionRules: Training Transformers to Explain Unexpected Inputs

Authors: Nathan Young, Qiming Bao, Joshua Bensemann, Michael Witbrock

Abstract: Transformers have recently been shown to be capable of reliably performing logical reasoning over facts and rules expressed in natural language, but abductive reasoning - inference to the best explanation of an unexpected observation - has been underexplored despite significant applications to scientific discovery, common-sense reasoning, and model interpretability. We present AbductionRules, a… ▽ More Transformers have recently been shown to be capable of reliably performing logical reasoning over facts and rules expressed in natural language, but abductive reasoning - inference to the best explanation of an unexpected observation - has been underexplored despite significant applications to scientific discovery, common-sense reasoning, and model interpretability. We present AbductionRules, a group of natural language datasets designed to train and test generalisable abduction over natural-language knowledge bases. We use these datasets to finetune pretrained Transformers and discuss their performance, finding that our models learned generalisable abductive techniques but also learned to exploit the structure of our data. Finally, we discuss the viability of this approach to abductive reasoning and ways in which it may be improved in future work. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Findings of ACL 2022

arXiv:2103.11294 [pdf, other]

doi 10.1002/rob.21794

High Precision Control of Tracked Field Robots in the Presence of Unknown Traction Coefficients

Authors: Erkan Kayacan, Sierra N. Young, Joshua M. Peschel, Girish Chowdhary

Abstract: Accurate steering through crop rows that avoids crop damage is one of the most important tasks for agricultural robots utilized in various field operations, such as monitoring, mechanical weeding, or spraying. In practice, varying soil conditions can result in off-track navigation due to unknown traction coefficients so that it can cause crop damage. To address this problem, this paper presents th… ▽ More Accurate steering through crop rows that avoids crop damage is one of the most important tasks for agricultural robots utilized in various field operations, such as monitoring, mechanical weeding, or spraying. In practice, varying soil conditions can result in off-track navigation due to unknown traction coefficients so that it can cause crop damage. To address this problem, this paper presents the development, application, and experimental results of a real-time receding horizon estimation and control (RHEC) framework applied to a fully autonomous mobile robotic platform to increase its steering accuracy. Recent advances in cheap and fast microprocessors, as well as advances in solution methods for nonlinear optimization problems, have made nonlinear receding horizon control (RHC) and receding horizon estimation (RHE) methods suitable for field robots that require high frequency (milliseconds) updates. A real-time RHEC framework is developed and applied to a fully autonomous mobile robotic platform designed by the authors for in-field phenotyping applications in Sorghum fields. Nonlinear RHE is used to estimate constrained states and parameters, and nonlinear RHC is designed based on an adaptive system model which contains time-varying parameters. The capabilities of the real-time RHEC framework are verified experimentally, and the results show an accurate tracking performance on a bumpy and wet soil field. The mean values of the Euclidean error and required computation time of the RHEC framework are respectively equal to $0.0423$ m and $0.88$ milliseconds. △ Less

Submitted 20 March, 2021; originally announced March 2021.

Journal ref: Journal of Field Robotics, vol. 35, pp. 1050-1062, 2018

arXiv:2103.01084 [pdf, ps, other]

doi 10.1145/3477910

A Simple Algorithm for Optimal Search Trees with Two-Way Comparisons

Authors: Marek Chrobak, Mordecai Golin, J. Ian Munro, Neal E. Young

Abstract: We present a simple $O(n^4)$-time algorithm for computing optimal search trees with two-way comparisons. The only previous solution to this problem, by Anderson et al., has the same running time, but is significantly more complicated and is restricted to the variant where only successful queries are allowed. Our algorithm extends directly to solve the standard full variant of the problem, which al… ▽ More We present a simple $O(n^4)$-time algorithm for computing optimal search trees with two-way comparisons. The only previous solution to this problem, by Anderson et al., has the same running time, but is significantly more complicated and is restricted to the variant where only successful queries are allowed. Our algorithm extends directly to solve the standard full variant of the problem, which also allows unsuccessful queries and for which no polynomial-time algorithm was previously known. The correctness proof of our algorithm relies on a new structural theorem for two-way-comparison search trees. △ Less

Submitted 4 October, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: v3 adds Appendix B, with a stronger alternative to Theorem 1

MSC Class: 68P10; 68P30; 68W25; 94A45 ACM Class: E.4; G.1.6; G.2.2; H.3.1; I.4.2

Journal ref: ACM Transactions on Algorithms 18(1) (2022) 1-11

arXiv:2103.01052 [pdf, other]

doi 10.1016/j.ic.2021.104707

On the Cost of Unsuccessful Searches in Search Trees with Two-way Comparisons

Authors: Marek Chrobak, Mordecai Golin, J. Ian Munro, Neal E. Young

Abstract: Search trees are commonly used to implement access operations to a set of stored keys. If this set is static and the probabilities of membership queries are known in advance, then one can precompute an optimal search tree, namely one that minimizes the expected access cost. For a non-key query, a search tree can determine its approximate location by returning the inter-key interval containing the… ▽ More Search trees are commonly used to implement access operations to a set of stored keys. If this set is static and the probabilities of membership queries are known in advance, then one can precompute an optimal search tree, namely one that minimizes the expected access cost. For a non-key query, a search tree can determine its approximate location by returning the inter-key interval containing the query. This is in contrast to other dictionary data structures, like hash tables, that only report a failed search. We address the question "what is the additional cost of determining approximate locations for non-key queries"? We prove that for two-way comparison trees this additional cost is at most 1. Our proof is based on a novel probabilistic argument that involves converting a search tree that does not identify non-key queries into a random tree that does. △ Less

Submitted 9 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: v2 has updated bibliography

MSC Class: 68P10; 68P30; 68W25; 94A45 ACM Class: E.4; G.1.6; G.2.2; H.3.1; I.4.2

Journal ref: Information and Computation 281 (2021)

arXiv:2011.02615 [pdf, other]

doi 10.1145/3672614

Competitive Data-Structure Dynamization

Authors: Claire Mathieu, Rajmohan Rajaraman, Neal E. Young, Arman Yousefi

Abstract: Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read… ▽ More Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read rate. In practice, merge policies must not only handle batch insertions and varying read/write ratios, they can take advantage of such non-uniformity to reduce cost on a per-input basis. To model this, we initiate the study of data-structure dynamization through the lens of competitive analysis, via two new online set-cover problems. For each, the input is a sequence of disjoint sets of weighted items. The sets are revealed one at a time. The algorithm must respond to each with a set cover that covers all items revealed so far. It obtains the cover incrementally from the previous cover by adding one or more sets and optionally removing existing sets. For each new set the algorithm incurs build cost equal to the weight of the items in the set. In the first problem the objective is to minimize total build cost plus total query cost, where the algorithm incurs a query cost at each time $t$ equal to the current cover size. In the second problem, the objective is to minimize the build cost while keeping the query cost from exceeding $k$ (a given parameter) at any time. We give deterministic online algorithms for both variants, with competitive ratios of $Θ(\log^* n)$ and $k$, respectively. The latter ratio is optimal for the second variant. △ Less

Submitted 23 July, 2024; v1 submitted 4 November, 2020; originally announced November 2020.

Comments: Conference version in SODA (2021). Journal version in ACM TALG (accepted June 2024)

MSC Class: 68W27; 68P15; 68R05 ACM Class: F.1.2; H.2.4

Journal ref: ACM Trans. Algorithms. June 2024

arXiv:2005.13645 [pdf, ps, other]

doi 10.1142/S0219720007002977

Algorithmic approaches to selecting control clones in DNA array hybridization experiments

Authors: Qi Fu, Elizabeth Bent, James Borneman, Marek Chrobak, Neal E. Young

Abstract: We study the problem of selecting control clones in DNA array hybridization experiments. The problem arises in the OFRG method for analyzing microbial communities. The OFRG method performs classification of rRNA gene clones using binary fingerprints created from a series of hybridization experiments, where each experiment consists of hybridizing a collection of arrayed clones with a single oligonu… ▽ More We study the problem of selecting control clones in DNA array hybridization experiments. The problem arises in the OFRG method for analyzing microbial communities. The OFRG method performs classification of rRNA gene clones using binary fingerprints created from a series of hybridization experiments, where each experiment consists of hybridizing a collection of arrayed clones with a single oligonucleotide probe. This experiment produces analog signals, one for each clone, which then need to be classified, that is, converted into binary values 1 and 0 that represent hybridization and non-hybridization events. In addition to the sample rRNA gene clones, the array contains a number of control clones needed to calibrate the classification procedure of the hybridization signals. These control clones must be selected with care to optimize the classification process. We formulate this as a combinatorial optimization problem called Balanced Covering. We prove that the problem is NP-hard, and we show some results on hardness of approximation. We propose approximation algorithms based on randomized rounding and we show that, with high probability, our algorithms approximate well the optimum solution. The experimental results confirm that the algorithms find high quality control clones. The algorithms have been implemented and are publicly available as part of the software package called CloneTools. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Journal ref: Journal of Bioinformatics and Computational Biology 5(4) 937-961, 2007

arXiv:2005.13628 [pdf, ps, other]

doi 10.1007/s00446-011-0127-7

Distributed algorithms for covering, packing and maximum weighted matching

Authors: Christos Koufogiannakis, Neal E. Young

Abstract: This paper gives poly-logarithmic-round, distributed D-approximation algorithms for covering problems with submodular cost and monotone covering constraints (Submodular-cost Covering). The approximation ratio D is the maximum number of variables in any constraint. Special cases include Covering Mixed Integer Linear Programs (CMIP), and Weighted Vertex Cover (with D=2). Via duality, the paper also… ▽ More This paper gives poly-logarithmic-round, distributed D-approximation algorithms for covering problems with submodular cost and monotone covering constraints (Submodular-cost Covering). The approximation ratio D is the maximum number of variables in any constraint. Special cases include Covering Mixed Integer Linear Programs (CMIP), and Weighted Vertex Cover (with D=2). Via duality, the paper also gives poly-logarithmic-round, distributed D-approximation algorithms for Fractional Packing linear programs (where D is the maximum number of constraints in which any variable occurs), and for Max Weighted c-Matching in hypergraphs (where D is the maximum size of any of the hyperedges; for graphs D=2). The paper also gives parallel (RNC) 2-approximation algorithms for CMIP with two variables per constraint and Weighted Vertex Cover. The algorithms are randomized. All of the approximation ratios exactly match those of comparable centralized algorithms. △ Less

Submitted 27 May, 2020; originally announced May 2020.

MSC Class: 90C26; 68W15 ACM Class: C.2.4; G.1.6

Journal ref: Distributed Computing 24, 45--63 (2011)

arXiv:1901.03783 [pdf, ps, other]

doi 10.1007/s00236-021-00411-z

On Huang and Wong's Algorithm for Generalized Binary Split Trees

Authors: Marek Chrobak, Mordecai Golin, J. Ian Munro, Neal E. Young

Abstract: Huang and Wong [1984] proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler [1994] proposed modifying Huang and Wong's algorithm to obtain an algorithm for a different problem: computing optimal two-way-comparison se… ▽ More Huang and Wong [1984] proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler [1994] proposed modifying Huang and Wong's algorithm to obtain an algorithm for a different problem: computing optimal two-way-comparison search trees. We show that the dynamic program underlying Spuler's algorithm is not valid, in that it does not satisfy the necessary optimal-substructure property and its proposed recurrence relation is incorrect. It remains unknown whether the algorithm is guaranteed to compute a correct overall solution. △ Less

Submitted 14 February, 2022; v1 submitted 11 January, 2019; originally announced January 2019.

MSC Class: 68P10; 68P30; 68W25; 94A45 ACM Class: E.4; G.1.6; G.2.2; H.3.1; I.4.2

Journal ref: Acta Informatica (2022)

arXiv:1803.11119 [pdf, other]

Decentralized Control Systems Laboratory Using Human Centered Robotic Actuators

Authors: Binghan He, Kunye Chen, Rachel Schlossman, Neal Ormsbee, Mara Altman, Nathan Young, Matt Mangum, Luis Sentis

Abstract: University laboratories deliver unique hands-on experimentation for STEM students but often lack state-of-the-art equipment and provide limited access to their equipment. The University of Texas Cloud Laboratory provides remote access to a cutting-edge series elastic actuators for student experimentation regarding human-centered robotics, dynamical systems, and controls. Through a browser-based in… ▽ More University laboratories deliver unique hands-on experimentation for STEM students but often lack state-of-the-art equipment and provide limited access to their equipment. The University of Texas Cloud Laboratory provides remote access to a cutting-edge series elastic actuators for student experimentation regarding human-centered robotics, dynamical systems, and controls. Through a browser-based interface, students are provided with various learning materials using the remote hardware-in-the-loop system for effective experiment-based education. This paper discusses the methods used to connect remote hardware to mobile browsers, the adaptation of textbook materials regarding system identification and feedback control, data processing to generate clean and useful results for student interpretation, and initial usage of the end-to-end system for individual and group learning. △ Less

Submitted 8 April, 2019; v1 submitted 29 March, 2018; originally announced March 2018.

arXiv:1710.03358 [pdf, other]

Balanced power diagrams for redistricting

Authors: Vincent Cohen-Addad, Philip N. Klein, Neal E. Young

Abstract: We propose a method for redistricting, decomposing a geographical area into subareas, called districts, so that the populations of the districts are as close as possible and the districts are compact and contiguous. Each district is the intersection of a polygon with the geographical area. The polygons are convex and the average number of sides per polygon is less than six. The polygons tend to be… ▽ More We propose a method for redistricting, decomposing a geographical area into subareas, called districts, so that the populations of the districts are as close as possible and the districts are compact and contiguous. Each district is the intersection of a polygon with the geographical area. The polygons are convex and the average number of sides per polygon is less than six. The polygons tend to be quite compact. With each polygon is associated a center. The center is the centroid of the locations of the residents associated with the polygon. The algorithm can be viewed as a heuristic for finding centers and a balanced assignment of residents to centers so as to minimize the sum of squared distances of residents to centers; hence the solution can be said to have low dispersion. △ Less

Submitted 7 January, 2018; v1 submitted 9 October, 2017; originally announced October 2017.

arXiv:1709.10180 [pdf, other]

Possibilistic Fuzzy Local Information C-Means for Sonar Image Segmentation

Authors: Alina Zare, Nicholas Young, Daniel Suen, Thomas Nabelek, Aquila Galusha, James Keller

Abstract: Side-look synthetic aperture sonar (SAS) can produce very high quality images of the sea-floor. When viewing this imagery, a human observer can often easily identify various sea-floor textures such as sand ripple, hard-packed sand, sea grass and rock. In this paper, we present the Possibilistic Fuzzy Local Information C-Means (PFLICM) approach to segment SAS imagery into sea-floor regions that exh… ▽ More Side-look synthetic aperture sonar (SAS) can produce very high quality images of the sea-floor. When viewing this imagery, a human observer can often easily identify various sea-floor textures such as sand ripple, hard-packed sand, sea grass and rock. In this paper, we present the Possibilistic Fuzzy Local Information C-Means (PFLICM) approach to segment SAS imagery into sea-floor regions that exhibit these various natural textures. The proposed PFLICM method incorporates fuzzy and possibilistic clustering methods and leverages (local) spatial information to perform soft segmentation. Results are shown on several SAS scenes and compared to alternative segmentation approaches. △ Less

Submitted 28 September, 2017; originally announced September 2017.

Comments: 8 pages, 11 figures, to appear in the 2017 IEEE Symposium Series on Computational Intelligence (SSCI) Proceedings

arXiv:1505.00357 [pdf, other]

doi 10.1007/978-3-662-48971-0_7

Optimal Search Trees with 2-Way Comparisons

Authors: Marek Chrobak, Mordecai Golin, J. Ian Munro, Neal E. Young

Abstract: In 1971, Knuth gave an $O(n^2)$-time algorithm for the classic problem of finding an optimal binary search tree. Knuth's algorithm works only for search trees based on 3-way comparisons, while most modern computers support only 2-way comparisons (e.g., $<, \le, =, \ge$, and $>$). Until this paper, the problem of finding an optimal search tree using 2-way comparisons remained open -- poly-time algo… ▽ More In 1971, Knuth gave an $O(n^2)$-time algorithm for the classic problem of finding an optimal binary search tree. Knuth's algorithm works only for search trees based on 3-way comparisons, while most modern computers support only 2-way comparisons (e.g., $<, \le, =, \ge$, and $>$). Until this paper, the problem of finding an optimal search tree using 2-way comparisons remained open -- poly-time algorithms were known only for restricted variants. We solve the general case, giving (i) an $O(n^4)$-time algorithm and (ii) an $O(n \log n)$-time additive-3 approximation algorithm. Also, for finding optimal binary split trees, we (iii) obtain a linear speedup and (iv) prove some previous work incorrect. △ Less

Submitted 9 March, 2021; v1 submitted 2 May, 2015; originally announced May 2015.

Comments: ERRATUM: The proof of Theorem 3 of the ISAAC'15 paper (v4 here) is incorrect. Version v5 here contains: a full erratum, proofs of the other results, and pointers to journal versions expanding those results

MSC Class: 68P10; 68P30; 68W25; 94A45; ACM Class: E.4; G.1.6; G.2.2; H.3.1; I.4.2

Journal ref: Optimal Search Trees with 2-Way Comparisons. In: Elbassioni K., Makino K. (eds) Algorithms and Computation. ISAAC 2015. Lecture Notes in Computer Science, vol 9472 (2105). Springer, Berlin, Heidelberg

arXiv:1407.3015 [pdf, ps, other]

Nearly Linear-Work Algorithms for Mixed Packing/Covering and Facility-Location Linear Programs

Authors: Neal E. Young

Abstract: We describe the first nearly linear-time approximation algorithms for explicitly given mixed packing/covering linear programs, and for (non-metric) fractional facility location. We also describe the first parallel algorithms requiring only near-linear total work and finishing in polylog time. The algorithms compute $(1+ε)$-approximate solutions in time (and work) $O^*(N/ε^2)$, where $N$ is the num… ▽ More We describe the first nearly linear-time approximation algorithms for explicitly given mixed packing/covering linear programs, and for (non-metric) fractional facility location. We also describe the first parallel algorithms requiring only near-linear total work and finishing in polylog time. The algorithms compute $(1+ε)$-approximate solutions in time (and work) $O^*(N/ε^2)$, where $N$ is the number of non-zeros in the constraint matrix. For facility location, $N$ is the number of eligible client/facility pairs. △ Less

Submitted 5 November, 2014; v1 submitted 10 July, 2014; originally announced July 2014.

MSC Class: 90-08; 90C05; 49M29; 65K05 ACM Class: F.2.1; G.1.6

arXiv:1407.3008 [pdf, other]

Bigtable Merge Compaction

Authors: Claire Mathieu, Carl Staelin, Neal E. Young, Arman Yousefi

Abstract: NoSQL databases are widely used for massive data storage and real-time web applications. Yet important aspects of these data structures are not well understood. For example, NoSQL databases write most of their data to a collection of files on disk, meanwhile periodically compacting subsets of these files. A compaction policy must choose which files to compact, and when to compact them, without kno… ▽ More NoSQL databases are widely used for massive data storage and real-time web applications. Yet important aspects of these data structures are not well understood. For example, NoSQL databases write most of their data to a collection of files on disk, meanwhile periodically compacting subsets of these files. A compaction policy must choose which files to compact, and when to compact them, without knowing the future workload. Although these choices can affect computational efficiency by orders of magnitude, existing literature lacks tools for designing and analyzing online compaction policies --- policies are now chosen largely by trial and error. Here we introduce tools for the design and analysis of compaction policies for Google Bigtable, propose new policies, give average-case and worst-case competitive analyses, and present preliminary empirical benchmarks. △ Less

Submitted 9 July, 2015; v1 submitted 10 July, 2014; originally announced July 2014.

MSC Class: 68W27; 68P15; 68R05 ACM Class: F.1.2; H.2.4

Journal ref: SUPERSEDED BY https://arxiv.boxedpaper.com/abs/2011.02615

arXiv:1307.5296 [pdf, ps, other]

First-Come-First-Served for Online Slot Allocation and Huffman Coding

Authors: Monik Khare, Claire Mathieu, Neal E. Young

Abstract: Can one choose a good Huffman code on the fly, without knowing the underlying distribution? Online Slot Allocation (OSA) models this and similar problems: There are n slots, each with a known cost. There are n items. Requests for items are drawn i.i.d. from a fixed but hidden probability distribution p. After each request, if the item, i, was not previously requested, then the algorithm (knowing t… ▽ More Can one choose a good Huffman code on the fly, without knowing the underlying distribution? Online Slot Allocation (OSA) models this and similar problems: There are n slots, each with a known cost. There are n items. Requests for items are drawn i.i.d. from a fixed but hidden probability distribution p. After each request, if the item, i, was not previously requested, then the algorithm (knowing the slot costs and the requests so far, but not p) must place the item in some vacant slot j(i). The goal is to minimize the sum, over the items, of the probability of the item times the cost of its assigned slot. The optimal offline algorithm is trivial: put the most probable item in the cheapest slot, the second most probable item in the second cheapest slot, etc. The optimal online algorithm is First Come First Served (FCFS): put the first requested item in the cheapest slot, the second (distinct) requested item in the second cheapest slot, etc. The optimal competitive ratios for any online algorithm are 1+H(n-1) ~ ln n for general costs and 2 for concave costs. For logarithmic costs, the ratio is, asymptotically, 1: FCFS gives cost opt + O(log opt). For Huffman coding, FCFS yields an online algorithm (one that allocates codewords on demand, without knowing the underlying probability distribution) that guarantees asymptotically optimal cost: at most opt + 2 log(1+opt) + 2. △ Less

Submitted 7 October, 2013; v1 submitted 19 July, 2013; originally announced July 2013.

Comments: ACM-SIAM Symposium on Discrete Algorithms (SODA) 2014

MSC Class: 68W40; 68Q87 ACM Class: F.1.2; F.2.0; H.1.1

arXiv:1303.2920 [pdf, ps, other]

Approximating 1-dimensional TSP Requires Omega(n log n) Comparisons

Authors: Neal E. Young

Abstract: We give a short proof that any comparison-based n^(1-epsilon)-approximation algorithm for the 1-dimensional Traveling Salesman Problem (TSP) requires Omega(n log n) comparisons. We give a short proof that any comparison-based n^(1-epsilon)-approximation algorithm for the 1-dimensional Traveling Salesman Problem (TSP) requires Omega(n log n) comparisons. △ Less

Submitted 26 March, 2013; v1 submitted 12 March, 2013; originally announced March 2013.

Comments: Superseded by "On the complexity of approximating Euclidean traveling salesman tours and minimum spanning trees", by Das et al; Algorithmica 19:447-460 (1997)

MSC Class: 68W25 ACM Class: F.2.2; G.2.2

arXiv:1212.3233 [pdf, other]

doi 10.1007/s10951-014-0392-y

Approximation Algorithms for the Joint Replenishment Problem with Deadlines

Authors: Marcin Bienkowski, Jaroslaw Byrka, Marek Chrobak, Neil Dobbs, Tomasz Nowicki, Maxim Sviridenko, Grzegorz Swirszcz, Neal E. Young

Abstract: The Joint Replenishment Problem (JRP) is a fundamental optimization problem in supply-chain management, concerned with optimizing the flow of goods from a supplier to retailers. Over time, in response to demands at the retailers, the supplier ships orders, via a warehouse, to the retailers. The objective is to schedule these orders to minimize the sum of ordering costs and retailers' waiting costs… ▽ More The Joint Replenishment Problem (JRP) is a fundamental optimization problem in supply-chain management, concerned with optimizing the flow of goods from a supplier to retailers. Over time, in response to demands at the retailers, the supplier ships orders, via a warehouse, to the retailers. The objective is to schedule these orders to minimize the sum of ordering costs and retailers' waiting costs. We study the approximability of JRP-D, the version of JRP with deadlines, where instead of waiting costs the retailers impose strict deadlines. We study the integrality gap of the standard linear-program (LP) relaxation, giving a lower bound of 1.207, a stronger, computer-assisted lower bound of 1.245, as well as an upper bound and approximation ratio of 1.574. The best previous upper bound and approximation ratio was 1.667; no lower bound was previously published. For the special case when all demand periods are of equal length we give an upper bound of 1.5, a lower bound of 1.2, and show APX-hardness. △ Less

Submitted 2 December, 2015; v1 submitted 13 December, 2012; originally announced December 2012.

MSC Class: 68W25; 90C05 ACM Class: G.1.6

Journal ref: J. Scheduling 18(6): 545-560 (2015)

arXiv:1208.2724 [pdf, ps, other]

Caching with rental cost and zapping

Authors: Monik Khare, Neal E. Young

Abstract: The \emph{file caching} problem is defined as follows. Given a cache of size $k$ (a positive integer), the goal is to minimize the total retrieval cost for the given sequence of requests to files. A file $f$ has size $size(f)$ (a positive integer) and retrieval cost $cost(f)$ (a non-negative number) for bringing the file into the cache. A \emph{miss} or \emph{fault} occurs when the requested file… ▽ More The \emph{file caching} problem is defined as follows. Given a cache of size $k$ (a positive integer), the goal is to minimize the total retrieval cost for the given sequence of requests to files. A file $f$ has size $size(f)$ (a positive integer) and retrieval cost $cost(f)$ (a non-negative number) for bringing the file into the cache. A \emph{miss} or \emph{fault} occurs when the requested file is not in the cache and the file has to be retrieved into the cache by paying the retrieval cost, and some other file may have to be removed (\emph{evicted}) from the cache so that the total size of the files in the cache does not exceed $k$. We study the following variants of the online file caching problem. \textbf{\emph{Caching with Rental Cost} (or \emph{Rental Caching})}: There is a rental cost $λ$ (a positive number) for each file in the cache at each time unit. The goal is to minimize the sum of the retrieval costs and the rental costs. \textbf{\emph{Caching with Zapping}}: A file can be \emph{zapped} by paying a zapping cost $N \ge 1$. Once a file is zapped, all future requests of the file don't incur any cost. The goal is to minimize the sum of the retrieval costs and the zapping costs. We study these two variants and also the variant which combines these two (rental caching with zapping). We present deterministic lower and upper bounds in the competitive-analysis framework. We study and extend the online covering algorithm from \citep{young02online} to give deterministic online algorithms. We also present randomized lower and upper bounds for some of these problems. △ Less

Submitted 18 October, 2012; v1 submitted 13 August, 2012; originally announced August 2012.

Comments: Caching with rental cost, caching with zapping

arXiv:1208.0257 [pdf, other]

doi 10.4086/toc.2013.v009a022

Hamming Approximation of NP Witnesses

Authors: Daniel Sheldon, Neal E. Young

Abstract: Given a satisfiable 3-SAT formula, how hard is it to find an assignment to the variables that has Hamming distance at most n/2 to a satisfying assignment? More generally, consider any polynomial-time verifier for any NP-complete language. A d(n)-Hamming-approximation algorithm for the verifier is one that, given any member x of the language, outputs in polynomial time a string a with Hamming dista… ▽ More Given a satisfiable 3-SAT formula, how hard is it to find an assignment to the variables that has Hamming distance at most n/2 to a satisfying assignment? More generally, consider any polynomial-time verifier for any NP-complete language. A d(n)-Hamming-approximation algorithm for the verifier is one that, given any member x of the language, outputs in polynomial time a string a with Hamming distance at most d(n) to some witness w, where (x,w) is accepted by the verifier. Previous results have shown that, if P != NP, then every NP-complete language has a verifier for which there is no (n/2-n^(2/3+d))-Hamming-approximation algorithm, for various constants d > 0. Our main result is that, if P != NP, then every paddable NP-complete language has a verifier that admits no (n/2+O(sqrt(n log n)))-Hamming-approximation algorithm. That is, one cannot get even half the bits right. We also consider natural verifiers for various well-known NP-complete problems. They do have n/2-Hamming-approximation algorithms, but, if P != NP, have no (n/2-n^epsilon)-Hamming-approximation algorithms for any constant epsilon > 0. We show similar results for randomized algorithms. △ Less

Submitted 19 July, 2013; v1 submitted 1 August, 2012; originally announced August 2012.

MSC Class: 03D15; 68Q25; 90C59 ACM Class: F.1.3; F.2.2

Journal ref: Theory of Computing 9(22), 2013, pp. 685-702

arXiv:1111.5305 [pdf, other]

doi 10.1137/120887928

On a Linear Program for Minimum-Weight Triangulation

Authors: Arman Yousefi, Neal E. Young

Abstract: Minimum-weight triangulation (MWT) is NP-hard. It has a polynomial-time constant-factor approximation algorithm, and a variety of effective polynomial- time heuristics that, for many instances, can find the exact MWT. Linear programs (LPs) for MWT are well-studied, but previously no connection was known between any LP and any approximation algorithm or heuristic for MWT. Here we show the first suc… ▽ More Minimum-weight triangulation (MWT) is NP-hard. It has a polynomial-time constant-factor approximation algorithm, and a variety of effective polynomial- time heuristics that, for many instances, can find the exact MWT. Linear programs (LPs) for MWT are well-studied, but previously no connection was known between any LP and any approximation algorithm or heuristic for MWT. Here we show the first such connections: for an LP formulation due to Dantzig et al. (1985): (i) the integrality gap is bounded by a constant; (ii) given any instance, if the aforementioned heuristics find the MWT, then so does the LP. △ Less

Submitted 4 October, 2013; v1 submitted 22 November, 2011; originally announced November 2011.

Comments: To appear in SICOMP. Extended abstract appeared in SODA 2012

MSC Class: 68W25; 90C05 ACM Class: G.1.6; I.3.5

Journal ref: SIAM Journal on Computing 43(1):25-51(2014)

arXiv:1007.0217 [pdf, ps, other]

A Bound on the Sum of Weighted Pairwise Distances of Points Constrained to Balls

Authors: Neal E. Young

Abstract: We consider the problem of choosing Euclidean points to maximize the sum of their weighted pairwise distances, when each point is constrained to a ball centered at the origin. We derive a dual minimization problem and show strong duality holds (i.e., the resulting upper bound is tight) when some locally optimal configuration of points is affinely independent. We sketch a polynomial time algorithm… ▽ More We consider the problem of choosing Euclidean points to maximize the sum of their weighted pairwise distances, when each point is constrained to a ball centered at the origin. We derive a dual minimization problem and show strong duality holds (i.e., the resulting upper bound is tight) when some locally optimal configuration of points is affinely independent. We sketch a polynomial time algorithm for finding a near-optimal set of points. △ Less

Submitted 1 July, 2010; originally announced July 2010.

Comments: Cornell ORIE Tech Report

Report number: 1103 MSC Class: 90C27 (Primary) 90C22; 52A40 (Secondary) ACM Class: G.1.6

arXiv:0807.0644 [pdf, other]

doi 10.1007/978-3-642-02927-1_53

Greedy D-Approximation Algorithm for Covering with Arbitrary Constraints and Submodular Cost

Authors: Christos Koufogiannakis, Neal E. Young

Abstract: This paper describes a simple greedy D-approximation algorithm for any covering problem whose objective function is submodular and non-decreasing, and whose feasible region can be expressed as the intersection of arbitrary (closed upwards) covering constraints, each of which constrains at most D variables of the problem. (A simple example is Vertex Cover, with D = 2.) The algorithm generalizes pre… ▽ More This paper describes a simple greedy D-approximation algorithm for any covering problem whose objective function is submodular and non-decreasing, and whose feasible region can be expressed as the intersection of arbitrary (closed upwards) covering constraints, each of which constrains at most D variables of the problem. (A simple example is Vertex Cover, with D = 2.) The algorithm generalizes previous approximation algorithms for fundamental covering problems and online paging and caching problems. △ Less

Submitted 30 December, 2011; v1 submitted 4 July, 2008; originally announced July 2008.

MSC Class: 68W25 ACM Class: G.1.6

Journal ref: Algorithmica 66(1):113-152 (2013)

arXiv:0801.1987 [pdf, ps, other]

doi 10.1007/s00453-013-9771-6

A Nearly Linear-Time PTAS for Explicit Fractional Packing and Covering Linear Programs

Authors: Christos Koufogiannakis, Neal E. Young

Abstract: We give an approximation algorithm for packing and covering linear programs (linear programs with non-negative coefficients). Given a constraint matrix with n non-zeros, r rows, and c columns, the algorithm computes feasible primal and dual solutions whose costs are within a factor of 1+eps of the optimal cost in time O((r+c)log(n)/eps^2 + n). We give an approximation algorithm for packing and covering linear programs (linear programs with non-negative coefficients). Given a constraint matrix with n non-zeros, r rows, and c columns, the algorithm computes feasible primal and dual solutions whose costs are within a factor of 1+eps of the optimal cost in time O((r+c)log(n)/eps^2 + n). △ Less

Submitted 13 March, 2013; v1 submitted 13 January, 2008; originally announced January 2008.

Comments: corrected version of FOCS 2007 paper: 10.1109/FOCS.2007.62. Accepted to Algorithmica, 2013

MSC Class: 68W25 ACM Class: G.1.6

Journal ref: Algorithmica 70(4):648-674(2014)

arXiv:cs/0504104 [pdf, ps, other]

doi 10.1016/j.ipl.2005.09.009

The reverse greedy algorithm for the metric k-median problem

Authors: Marek Chrobak, Claire Kenyon, Neal E. Young

Abstract: The Reverse Greedy algorithm (RGreedy) for the k-median problem works as follows. It starts by placing facilities on all nodes. At each step, it removes a facility to minimize the resulting total distance from the customers to the remaining facilities. It stops when k facilities remain. We prove that, if the distance function is metric, then the approximation ratio of RGreedy is between ?(log n/… ▽ More The Reverse Greedy algorithm (RGreedy) for the k-median problem works as follows. It starts by placing facilities on all nodes. At each step, it removes a facility to minimize the resulting total distance from the customers to the remaining facilities. It stops when k facilities remain. We prove that, if the distance function is metric, then the approximation ratio of RGreedy is between ?(log n/ log log n) and O(log n). △ Less

Submitted 27 September, 2005; v1 submitted 27 April, 2005; originally announced April 2005.

Comments: to appear in IPL. preliminary version in COCOON '05

ACM Class: G.1.6; G.2.2; F.2.2

Journal ref: Information Processing Letters 97:68-72(2006)

arXiv:cs/0504103 [pdf, other]

doi 10.1007/s00453-007-9005-x

Incremental Medians via Online Bidding

Authors: Marek Chrobak, Claire Kenyon, John Noga, Neal E. Young

Abstract: In the k-median problem we are given sets of facilities and customers, and distances between them. For a given set F of facilities, the cost of serving a customer u is the minimum distance between u and a facility in F. The goal is to find a set F of k facilities that minimizes the sum, over all customers, of their service costs. Following Mettu and Plaxton, we study the incremental medians prob… ▽ More In the k-median problem we are given sets of facilities and customers, and distances between them. For a given set F of facilities, the cost of serving a customer u is the minimum distance between u and a facility in F. The goal is to find a set F of k facilities that minimizes the sum, over all customers, of their service costs. Following Mettu and Plaxton, we study the incremental medians problem, where k is not known in advance, and the algorithm produces a nested sequence of facility sets where the kth set has size k. The algorithm is c-cost-competitive if the cost of each set is at most c times the cost of the optimum set of size k. We give improved incremental algorithms for the metric version: an 8-cost-competitive deterministic algorithm, a 2e ~ 5.44-cost-competitive randomized algorithm, a (24+epsilon)-cost-competitive, poly-time deterministic algorithm, and a (6e+epsilon ~ .31)-cost-competitive, poly-time randomized algorithm. The algorithm is s-size-competitive if the cost of the kth set is at most the minimum cost of any set of size k, and has size at most s k. The optimal size-competitive ratios for this problem are 4 (deterministic) and e (randomized). We present the first poly-time O(log m)-size-approximation algorithm for the offline problem and first poly-time O(log m)-size-competitive algorithm for the incremental problem. Our proofs reduce incremental medians to the following online bidding problem: faced with an unknown threshold T, an algorithm submits "bids" until it submits a bid that is at least the threshold. It pays the sum of all its bids. We prove that folklore algorithms for online bidding are optimally competitive. △ Less

Submitted 28 May, 2020; v1 submitted 26 April, 2005; originally announced April 2005.

Comments: conference version appeared in LATIN 2006 as "Oblivious Medians via Online Bidding"

ACM Class: G.1.6; G.2.2; F.2.2

Journal ref: Algorithmica 50(4):455-478(2008)

arXiv:math/0205218 [pdf, ps, other]

doi 10.1006/jcta.1996.0087

A New Operation on Sequences: the Boustrouphedon Transform

Authors: Jessica Millar, N. J. A. Sloane, Neal E. Young

Abstract: A generalization of the Seidel-Entringer-Arnold method for calculating the alternating permutation numbers (or secant-tangent numbers) leads to a new operation on integer sequences, the Boustrophedon transform. A generalization of the Seidel-Entringer-Arnold method for calculating the alternating permutation numbers (or secant-tangent numbers) leads to a new operation on integer sequences, the Boustrophedon transform. △ Less

Submitted 24 June, 2002; v1 submitted 20 May, 2002; originally announced May 2002.

Comments: very minor change: corrected typo in author list. June 24 2002: correction to a proof; additional references

MSC Class: 05A15

Journal ref: J. Combinatorial Theory, Series A 76(1):44-54 (1996)

arXiv:cs/0205077 [pdf, ps, other]

doi 10.1016/0020-0190(94)90044-2

Designing Multi-Commodity Flow Trees

Authors: Samir Khuller, Balaji Raghavachari, Neal E. Young

Abstract: The traditional multi-commodity flow problem assumes a given flow network in which multiple commodities are to be maximally routed in response to given demands. This paper considers the multi-commodity flow network-design problem: given a set of multi-commodity flow demands, find a network subject to certain constraints such that the commodities can be maximally routed. This paper focuses on t… ▽ More The traditional multi-commodity flow problem assumes a given flow network in which multiple commodities are to be maximally routed in response to given demands. This paper considers the multi-commodity flow network-design problem: given a set of multi-commodity flow demands, find a network subject to certain constraints such that the commodities can be maximally routed. This paper focuses on the case when the network is required to be a tree. The main result is an approximation algorithm for the case when the tree is required to be of constant degree. The algorithm reduces the problem to the minimum-weight balanced-separator problem; the performance guarantee of the algorithm is within a factor of 4 of the performance guarantee of the balanced-separator procedure. If Leighton and Rao's balanced-separator procedure is used, the performance guarantee is O(log n). This improves the O(log^2 n) approximation factor that is trivial to obtain by a direct application of the balanced-separator method. △ Less

Submitted 30 May, 2002; originally announced May 2002.

Comments: Conference version in WADS'93

ACM Class: F.2.2; G.2.2

Journal ref: Information Processing Letters 50:49-55 (1994)

arXiv:cs/0205051 [pdf, ps, other]

doi 10.1287/moor.1030.0086

Rounding Algorithms for a Geometric Embedding of Minimum Multiway Cut

Authors: David Karger, Phil Klein, Cliff Stein, Mikkel Thorup, Neal E. Young

Abstract: The multiway-cut problem is, given a weighted graph and k >= 2 terminal nodes, to find a minimum-weight set of edges whose removal separates all the terminals. The problem is NP-hard, and even NP-hard to approximate within 1+delta for some small delta > 0. Calinescu, Karloff, and Rabani (1998) gave an algorithm with performance guarantee 3/2-1/k, based on a geometric relaxation of the problem.… ▽ More The multiway-cut problem is, given a weighted graph and k >= 2 terminal nodes, to find a minimum-weight set of edges whose removal separates all the terminals. The problem is NP-hard, and even NP-hard to approximate within 1+delta for some small delta > 0. Calinescu, Karloff, and Rabani (1998) gave an algorithm with performance guarantee 3/2-1/k, based on a geometric relaxation of the problem. In this paper, we give improved randomized rounding schemes for their relaxation, yielding a 12/11-approximation algorithm for k=3 and a 1.3438-approximation algorithm in general. Our approach hinges on the observation that the problem of designing a randomized rounding scheme for a geometric relaxation is itself a linear programming problem. The paper explores computational solutions to this problem, and gives a proof that for a general class of geometric relaxations, there are always randomized rounding schemes that match the integrality gap. △ Less

Submitted 15 September, 2003; v1 submitted 19 May, 2002; originally announced May 2002.

Comments: Conference version in ACM Symposium on Theory of Computing (1999). To appear in Mathematics of Operations Research

ACM Class: F.2.0; G.1.6; G.2.2

Journal ref: Mathematics of Operations Research 29(3):436-461(2004)

arXiv:cs/0205050 [pdf, ps, other]

doi 10.1006/jagm.1997.0862

A Network-Flow Technique for Finding Low-Weight Bounded-Degree Spanning Trees

Authors: S. Fekete, S. Khuller, M. Klemmstein, B. Raghavachari, Neal E. Young

Abstract: The problem considered is the following. Given a graph with edge weights satisfying the triangle inequality, and a degree bound for each vertex, compute a low-weight spanning tree such that the degree of each vertex is at most its specified bound. The problem is NP-hard (it generalizes Traveling Salesman (TSP)). This paper describes a network-flow heuristic for modifying a given tree T to meet t… ▽ More The problem considered is the following. Given a graph with edge weights satisfying the triangle inequality, and a degree bound for each vertex, compute a low-weight spanning tree such that the degree of each vertex is at most its specified bound. The problem is NP-hard (it generalizes Traveling Salesman (TSP)). This paper describes a network-flow heuristic for modifying a given tree T to meet the constraints. Choosing T to be a minimum spanning tree (MST) yields approximation algorithms with performance guarantee less than 2 for the problem on geometric graphs with L_p-norms. The paper also describes a Euclidean graph whose minimum TSP costs twice the MST, disproving a conjecture made in ``Low-Degree Spanning Trees of Small Weight'' (1996). △ Less

Submitted 18 May, 2002; originally announced May 2002.

ACM Class: F.2.2; G.2.2

Journal ref: Journal of Algorithms 24(2):310-324 (1997)

arXiv:cs/0205049 [pdf, ps, other]

doi 10.1137/S0097539794268388

Prefix Codes: Equiprobable Words, Unequal Letter Costs

Authors: Mordecai Golin, Neal E. Young

Abstract: Describes a near-linear-time algorithm for a variant of Huffman coding, in which the letters may have non-uniform lengths (as in Morse code), but with the restriction that each word to be encoded has equal probability. [See also ``Huffman Coding with Unequal Letter Costs'' (2002).] Describes a near-linear-time algorithm for a variant of Huffman coding, in which the letters may have non-uniform lengths (as in Morse code), but with the restriction that each word to be encoded has equal probability. [See also ``Huffman Coding with Unequal Letter Costs'' (2002).] △ Less

Submitted 18 May, 2002; originally announced May 2002.

Comments: proceedings version in ICALP (1994)

ACM Class: F.2.0; E.4; I.4.2

Journal ref: SIAM J. Computing 25(6):1281-1304 (1996)

arXiv:cs/0205048 [pdf, other]

doi 10.1137/100794092

Huffman Coding with Letter Costs: A Linear-Time Approximation Scheme

Authors: Mordecai Golin, Claire Mathieu, Neal E. Young

Abstract: We give a polynomial-time approximation scheme for the generalization of Huffman Coding in which codeword letters have non-uniform costs (as in Morse code, where the dash is twice as long as the dot). The algorithm computes a (1+epsilon)-approximate solution in time O(n + f(epsilon) log^3 n), where n is the input size. We give a polynomial-time approximation scheme for the generalization of Huffman Coding in which codeword letters have non-uniform costs (as in Morse code, where the dash is twice as long as the dot). The algorithm computes a (1+epsilon)-approximate solution in time O(n + f(epsilon) log^3 n), where n is the input size. △ Less

Submitted 23 April, 2012; v1 submitted 18 May, 2002; originally announced May 2002.

ACM Class: F.2.0; E.4; I.4.2

Journal ref: SIAM Journal on Computing 41(3):684-713(2012)

arXiv:cs/0205047 [pdf, ps, other]

K-Medians, Facility Location, and the Chernoff-Wald Bound

Authors: Neal E. Young

Abstract: The paper gives approximation algorithms for the k-medians and facility-location problems (both NP-hard). For k-medians, the algorithm returns a solution using at most ln(n+n/epsilon)k medians and having cost at most (1+epsilon) times the cost of the best solution that uses at most k medians. Here epsilon > 0 is an input to the algorithm. In comparison, the best previous algorithm (Jyh-Han Lin a… ▽ More The paper gives approximation algorithms for the k-medians and facility-location problems (both NP-hard). For k-medians, the algorithm returns a solution using at most ln(n+n/epsilon)k medians and having cost at most (1+epsilon) times the cost of the best solution that uses at most k medians. Here epsilon > 0 is an input to the algorithm. In comparison, the best previous algorithm (Jyh-Han Lin and Jeff Vitter, 1992) had a (1+1/epsilon)ln(n) term instead of the ln(n+n/epsilon) term in the performance guarantee. For facility location, the algorithm returns a solution of cost at most d+ln(n) k, provided there exists a solution of cost d+k where d is the assignment cost and k is the facility cost. In comparison, the best previous algorithm (Dorit Hochbaum, 1982) returned a solution of cost at most ln(n)(d+k). For both problems, the algorithms currently provide the best performance guarantee known for the general (non-metric) problems. The paper also introduces a new probabilistic bound (called "Chernoff-Wald bound") for bounding the expectation of the maximum of a collection of sums of random variables, when each sum contains a random number of terms. The bound is used to analyze the randomized rounding scheme that underlies the algorithms. △ Less

Submitted 8 April, 2005; v1 submitted 18 May, 2002; originally announced May 2002.

ACM Class: F.2.1; G.1.6; G.2.2; G.3

Journal ref: ACM-SIAM Symposium on Discrete Algorithms (2000)

arXiv:cs/0205046 [pdf, ps, other]

doi 10.1007/3-540-48777-8_24

On the Number of Iterations for Dantzig-Wolfe Optimization and Packing-Covering Approximation Algorithms

Authors: Phil Klein, Neal E. Young

Abstract: We give a lower bound on the iteration complexity of a natural class of Lagrangean-relaxation algorithms for approximately solving packing/covering linear programs. We show that, given an input with $m$ random 0/1-constraints on $n$ variables, with high probability, any such algorithm requires $Ω(ρ\log(m)/ε^2)$ iterations to compute a $(1+ε)$-approximate solution, where $ρ$ is the width of the inp… ▽ More We give a lower bound on the iteration complexity of a natural class of Lagrangean-relaxation algorithms for approximately solving packing/covering linear programs. We show that, given an input with $m$ random 0/1-constraints on $n$ variables, with high probability, any such algorithm requires $Ω(ρ\log(m)/ε^2)$ iterations to compute a $(1+ε)$-approximate solution, where $ρ$ is the width of the input. The bound is tight for a range of the parameters $(m,n,ρ,ε)$. The algorithms in the class include Dantzig-Wolfe decomposition, Benders' decomposition, Lagrangean relaxation as developed by Held and Karp [1971] for lower-bounding TSP, and many others (e.g. by Plotkin, Shmoys, and Tardos [1988] and Grigoriadis and Khachiyan [1996]). To prove the bound, we use a discrepancy argument to show an analogous lower bound on the support size of $(1+ε)$-approximate mixed strategies for random two-player zero-sum 0/1-matrix games. △ Less

Submitted 19 November, 2015; v1 submitted 19 May, 2002; originally announced May 2002.

ACM Class: F.2.1; G.1.6

Journal ref: LNCS 1610 (IPCO): 320-327 (1999); SIAM Journal on Computing 44(4):1154-1172(2015)

arXiv:cs/0205045 [pdf, ps, other]

doi 10.1007/BF01294129

Balancing Minimum Spanning and Shortest Path Trees

Authors: Samir Khuller, Balaji Raghavachari, Neal E. Young

Abstract: This paper give a simple linear-time algorithm that, given a weighted digraph, finds a spanning tree that simultaneously approximates a shortest-path tree and a minimum spanning tree. The algorithm provides a continuous trade-off: given the two trees and epsilon > 0, the algorithm returns a spanning tree in which the distance between any vertex and the root of the shortest-path tree is at most 1… ▽ More This paper give a simple linear-time algorithm that, given a weighted digraph, finds a spanning tree that simultaneously approximates a shortest-path tree and a minimum spanning tree. The algorithm provides a continuous trade-off: given the two trees and epsilon > 0, the algorithm returns a spanning tree in which the distance between any vertex and the root of the shortest-path tree is at most 1+epsilon times the shortest-path distance, and yet the total weight of the tree is at most 1+2/epsilon times the weight of a minimum spanning tree. This is the best tradeoff possible. The paper also describes a fast parallel implementation. △ Less

Submitted 18 May, 2002; originally announced May 2002.

Comments: conference version: ACM-SIAM Symposium on Discrete Algorithms (1993)

ACM Class: F.2.2; G.2.2

Journal ref: Algorithmica 14(4):305-322 (1995)

arXiv:cs/0205044 [pdf, ps, other]

doi 10.1007/BF01189992

The K-Server Dual and Loose Competitiveness for Paging

Authors: Neal E. Young

Abstract: This paper has two results. The first is based on the surprising observation that the well-known ``least-recently-used'' paging algorithm and the ``balance'' algorithm for weighted caching are linear-programming primal-dual algorithms. This observation leads to a strategy (called ``Greedy-Dual'') that generalizes them both and has an optimal performance guarantee for weighted caching. For the… ▽ More This paper has two results. The first is based on the surprising observation that the well-known ``least-recently-used'' paging algorithm and the ``balance'' algorithm for weighted caching are linear-programming primal-dual algorithms. This observation leads to a strategy (called ``Greedy-Dual'') that generalizes them both and has an optimal performance guarantee for weighted caching. For the second result, the paper presents empirical studies of paging algorithms, documenting that in practice, on ``typical'' cache sizes and sequences, the performance of paging strategies are much better than their worst-case analyses in the standard model suggest. The paper then presents theoretical results that support and explain this. For example: on any input sequence, with almost all cache sizes, either the performance guarantee of least-recently-used is O(log k) or the fault rate (in an absolute sense) is insignificant. Both of these results are strengthened and generalized in``On-line File Caching'' (1998). △ Less

Submitted 18 May, 2002; originally announced May 2002.

Comments: conference version: "On-Line Caching as Cache Size Varies", SODA (1991)

ACM Class: F.1.2; F.2.0; C.2.0

Journal ref: Algorithmica 11(6):525-541 (1994)

arXiv:cs/0205043 [pdf, ps, other]

doi 10.1137/S0097539794264585

Low-Degree Spanning Trees of Small Weight

Authors: Samir Khuller, Balaji Raghavachari, Neal E. Young

Abstract: The degree-d spanning tree problem asks for a minimum-weight spanning tree in which the degree of each vertex is at most d. When d=2 the problem is TSP, and in this case, the well-known Christofides algorithm provides a 1.5-approximation algorithm (assuming the edge weights satisfy the triangle inequality). In 1984, Christos Papadimitriou and Umesh Vazirani posed the challenge of finding an al… ▽ More The degree-d spanning tree problem asks for a minimum-weight spanning tree in which the degree of each vertex is at most d. When d=2 the problem is TSP, and in this case, the well-known Christofides algorithm provides a 1.5-approximation algorithm (assuming the edge weights satisfy the triangle inequality). In 1984, Christos Papadimitriou and Umesh Vazirani posed the challenge of finding an algorithm with performance guarantee less than 2 for Euclidean graphs (points in R^n) and d > 2. This paper gives the first answer to that challenge, presenting an algorithm to compute a degree-3 spanning tree of cost at most 5/3 times the MST. For points in the plane, the ratio improves to 3/2 and the algorithm can also find a degree-4 spanning tree of cost at most 5/4 times the MST. △ Less

Submitted 18 May, 2002; originally announced May 2002.

Comments: conference version in Symposium on Theory of Computing (1994)

ACM Class: F.2.2; G.2.2

Journal ref: SIAM J. Computing 25(2):355-368 (1996)

arXiv:cs/0205042 [pdf, ps, other]

doi 10.1016/S0020-0190(97)00129-4

Orienting Graphs to Optimize Reachability

Authors: S. L. Hakimi, E. Schmeichel, Neal E. Young

Abstract: The paper focuses on two problems: (i) how to orient the edges of an undirected graph in order to maximize the number of ordered vertex pairs (x,y) such that there is a directed path from x to y, and (ii) how to orient the edges so as to minimize the number of such pairs. The paper describes a quadratic-time algorithm for the first problem, and a proof that the second problem is NP-hard to appro… ▽ More The paper focuses on two problems: (i) how to orient the edges of an undirected graph in order to maximize the number of ordered vertex pairs (x,y) such that there is a directed path from x to y, and (ii) how to orient the edges so as to minimize the number of such pairs. The paper describes a quadratic-time algorithm for the first problem, and a proof that the second problem is NP-hard to approximate within some constant 1+epsilon > 1. The latter proof also shows that the second problem is equivalent to ``comparability graph completion''; neither problem was previously known to be NP-hard. △ Less

Submitted 18 May, 2002; originally announced May 2002.

ACM Class: F.2.2; G.2.2

Journal ref: Information Processing Letters 63:229-235 (1997)

arXiv:cs/0205041 [pdf, ps, other]

doi 10.1002/net.3230210206

Faster Parametric Shortest Path and Minimum Balance Algorithms

Authors: Neal Young, Robert Tarjan, James Orlin

Abstract: The parametric shortest path problem is to find the shortest paths in graph where the edge costs are of the form w_ij+lambda where each w_ij is constant and lambda is a parameter that varies. The problem is to find shortest path trees for every possible value of lambda. The minimum-balance problem is to find a ``weighting'' of the vertices so that adjusting the edge costs by the vertex weights… ▽ More The parametric shortest path problem is to find the shortest paths in graph where the edge costs are of the form w_ij+lambda where each w_ij is constant and lambda is a parameter that varies. The problem is to find shortest path trees for every possible value of lambda. The minimum-balance problem is to find a ``weighting'' of the vertices so that adjusting the edge costs by the vertex weights yields a graph in which, for every cut, the minimum weight of any edge crossing the cut in one direction equals the minimum weight of any edge crossing the cut in the other direction. The paper presents fast algorithms for both problems. The algorithms run in O(nm+n^2 log n) time. The paper also describes empirical studies of the algorithms on random graphs, suggesting that the expected time for finding a minimum-mean cycle (an important special case of both problems) is O(n log(n) + m). △ Less

Submitted 18 May, 2002; originally announced May 2002.

ACM Class: F.2.2; G.2.2; G.1.6

Journal ref: Networks 21(2):205-221 (1991)

arXiv:cs/0205040 [pdf, ps, other]

doi 10.1137/S0097539793256685

Approximating the Minimum Equivalent Digraph

Authors: Samir Khuller, Balaji Raghavachari, Neal E. Young

Abstract: The MEG (minimum equivalent graph) problem is, given a directed graph, to find a small subset of the edges that maintains all reachability relations between nodes. The problem is NP-hard. This paper gives an approximation algorithm with performance guarantee of pi^2/6 ~ 1.64. The algorithm and its analysis are based on the simple idea of contracting long cycles. (This result is strengthened slig… ▽ More The MEG (minimum equivalent graph) problem is, given a directed graph, to find a small subset of the edges that maintains all reachability relations between nodes. The problem is NP-hard. This paper gives an approximation algorithm with performance guarantee of pi^2/6 ~ 1.64. The algorithm and its analysis are based on the simple idea of contracting long cycles. (This result is strengthened slightly in ``On strongly connected digraphs with bounded cycle length'' (1996).) The analysis applies directly to 2-Exchange, a simple ``local improvement'' algorithm, showing that its performance guarantee is 1.75. △ Less

Submitted 18 May, 2002; originally announced May 2002.

Comments: conference version in ACM-SIAM Symposium on Discrete Algorithms (1994)

ACM Class: F.2.2; G.2.2

Journal ref: SIAM J. Computing 24(4):859-872 (1995)

arXiv:cs/0205039 [pdf, ps, other]

doi 10.1109/SFCS.2001.959930

Sequential and Parallel Algorithms for Mixed Packing and Covering

Authors: Neal E. Young

Abstract: Mixed packing and covering problems are problems that can be formulated as linear programs using only non-negative coefficients. Examples include multicommodity network flow, the Held-Karp lower bound on TSP, fractional relaxations of set cover, bin-packing, knapsack, scheduling problems, minimum-weight triangulation, etc. This paper gives approximation algorithms for the general class of proble… ▽ More Mixed packing and covering problems are problems that can be formulated as linear programs using only non-negative coefficients. Examples include multicommodity network flow, the Held-Karp lower bound on TSP, fractional relaxations of set cover, bin-packing, knapsack, scheduling problems, minimum-weight triangulation, etc. This paper gives approximation algorithms for the general class of problems. The sequential algorithm is a simple greedy algorithm that can be implemented to find an epsilon-approximate solution in O(epsilon^-2 log m) linear-time iterations. The parallel algorithm does comparable work but finishes in polylogarithmic time. The results generalize previous work on pure packing and covering (the special case when the constraints are all "less-than" or all "greater-than") by Michael Luby and Noam Nisan (1993) and Naveen Garg and Jochen Konemann (1998). △ Less

Submitted 18 May, 2002; originally announced May 2002.

ACM Class: F.2.1, G.1.6

arXiv:cs/0205038 [pdf, ps, other]

doi 10.1016/0196-6774(91)90041-V

Competitive Paging Algorithms

Authors: Amos Fiat, Richard Karp, Mike Luby, Lyle McGeoch, Daniel Sleator, Neal E. Young

Abstract: The paging problem is that of deciding which pages to keep in a memory of k pages in order to minimize the number of page faults. This paper introduces the marking algorithm, a simple randomized on-line algorithm for the paging problem, and gives a proof that its performance guarantee (competitive ratio) is O(log k). In contrast, no deterministic on-line algorithm can have a performance guarante… ▽ More The paging problem is that of deciding which pages to keep in a memory of k pages in order to minimize the number of page faults. This paper introduces the marking algorithm, a simple randomized on-line algorithm for the paging problem, and gives a proof that its performance guarantee (competitive ratio) is O(log k). In contrast, no deterministic on-line algorithm can have a performance guarantee better than k. △ Less

Submitted 18 May, 2002; originally announced May 2002.

ACM Class: F.2.0; F.1.2; C.0

Journal ref: Journal of Algorithms 12:685-699 (1991)

Showing 1–50 of 64 results for author: Young, N