Search | arXiv e-print repository

arXiv:2410.20268 [pdf, other]

Centaur: a foundation model of human cognition

Authors: Marcel Binz, Elif Akata, Matthias Bethge, Franziska Brändle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K. Eckstein, Noémi Éltető, Thomas L. Griffiths, Susanne Haridi, Akshay K. Jagadish, Li Ji-An, Alexander Kipnis, Sreejan Kumar, Tobias Ludwig, Marvin Mathony, Marcelo Mattar, Alireza Modirshanechi, Surabhi S. Nath, Joshua C. Peterson, Milena Rmus, Evan M. Russek, Tankred Saanum , et al. (15 additional authors not shown)

Abstract: Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. A first step in this direction is to create a model that can predict human behavior in a wide range of settings. Here we introduce Centa… ▽ More Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. A first step in this direction is to create a model that can predict human behavior in a wide range of settings. Here we introduce Centaur, a computational model that can predict and simulate human behavior in any experiment expressible in natural language. We derived Centaur by finetuning a state-of-the-art language model on a novel, large-scale data set called Psych-101. Psych-101 reaches an unprecedented scale, covering trial-by-trial data from over 60,000 participants performing over 10,000,000 choices in 160 experiments. Centaur not only captures the behavior of held-out participants better than existing cognitive models, but also generalizes to new cover stories, structural task modifications, and entirely new domains. Furthermore, we find that the model's internal representations become more aligned with human neural activity after finetuning. Taken together, our results demonstrate that it is possible to discover computational models that capture human behavior across a wide range of domains. We believe that such models provide tremendous potential for guiding the development of cognitive theories and present a case study to demonstrate this. △ Less

Submitted 28 April, 2025; v1 submitted 26 October, 2024; originally announced October 2024.

arXiv:2302.00558 [pdf, ps, other]

Ideas for the future of Prolog inspired by Oz

Authors: Peter Van Roy, Seif Haridi

Abstract: Both Prolog and Oz are multiparadigm languages with a logic programming core. There is a significant subset of Oz that is a syntactic variant of Prolog: pure Prolog programs with green or blue cuts and bagof/3 or setof/3 can be translated directly to Oz. Because of this close relationship between Prolog and Oz, we propose that the extensions made by Oz to logic programming can be an inspiration fo… ▽ More Both Prolog and Oz are multiparadigm languages with a logic programming core. There is a significant subset of Oz that is a syntactic variant of Prolog: pure Prolog programs with green or blue cuts and bagof/3 or setof/3 can be translated directly to Oz. Because of this close relationship between Prolog and Oz, we propose that the extensions made by Oz to logic programming can be an inspiration for the future evolution of Prolog. We explain three extensions, namely deterministic logic programming, lazy concurrent functional programming, and purely functional distributed computing. We briefly present these extensions and we explain how they can help Prolog evolve in its next 50 years. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: 15 pages, 0 figures

ACM Class: D.3

arXiv:2008.13456 [pdf, ps, other]

Lecture Notes on Leader-based Sequence Paxos -- An Understandable Sequence Consensus Algorithm

Authors: Seif Haridi, Lars Kroll, Paris Carbone

Abstract: Agreement among a set of processes and in the presence of partial failures is one of the fundamental problems of distributed systems. In the most general case, many decisions must be agreed upon over the lifetime of a system with dynamically changing membership. Such a sequence of decisions represents a distributed log, and can form the underlying abstraction for driving a replicated state machine… ▽ More Agreement among a set of processes and in the presence of partial failures is one of the fundamental problems of distributed systems. In the most general case, many decisions must be agreed upon over the lifetime of a system with dynamically changing membership. Such a sequence of decisions represents a distributed log, and can form the underlying abstraction for driving a replicated state machine. While this abstraction is at the core of many systems with strong consistency requirements, algorithms that achieve such sequence consensus are often poorly understood by developers and have presented a significant challenge to many students of distributed systems. In these lecture notes we present a complete and practical Paxos-based algorithm for reconfigurable sequence consensus in the fail-recovery model, and a clear path of simple step-by-step transformations to it from the basic Paxos algorithm. △ Less

Submitted 31 August, 2020; originally announced August 2020.

Comments: First public draft

arXiv:1705.04669 [pdf, ps, other]

KompicsTesting - Unit Testing Event Streams

Authors: Ifeanyi W. Ubah, Lars Kroll, Alexandru A. Ormenisan, Seif Haridi

Abstract: In this paper we present KompicsTesting, a framework for unit testing components in the Kompics component model. Components in Kompics are event-driven entities which communicate asynchronously solely by message passing. Similar to actors in the actor model, they do not share their internal state in message-passing, making them less prone to errors, compared to other models of concurrency using sh… ▽ More In this paper we present KompicsTesting, a framework for unit testing components in the Kompics component model. Components in Kompics are event-driven entities which communicate asynchronously solely by message passing. Similar to actors in the actor model, they do not share their internal state in message-passing, making them less prone to errors, compared to other models of concurrency using shared state. However, they are neither immune to simpler logical and specification errors nor errors such as dataraces that stem from nondeterminism. As a result, there exists a need for tools that enable rapid and iterative development and testing of message passing components in general, in a manner similar to the xUnit frameworks for functions and modular segments code. These frameworks work in an imperative manner, ill suited for testing message-passing components given that the behavior of such components are encoded in the streams of messages that they send and receive. In this work, we present a theoretical framework for describing and verifying the behavior of message-passing components, independent of the model and framework implementation, in a manner similar to describing a stream of characters using regular expressions. We show how this approach can be used to perform both black box and white box testing of components and illustrate its feasibility through the design and implementation a prototype based on this approach, KompicsTesting. △ Less

Submitted 12 May, 2017; originally announced May 2017.

arXiv:1608.02442 [pdf, ps, other]

A Fault-Tolerant Sequentially Consistent DSM With a Compositional Correctness Proof

Authors: Niklas Ekström, Seif Haridi

Abstract: We present the SC-ABD algorithm that implements sequentially consistent distributed shared memory (DSM). The algorithm tolerates that less than half of the processes are faulty (crash-stop). Compared to the multi-writer ABD algorithm, SC-ABD requires one instead of two round-trips of communication to perform a write operation, and an equal number of round-trips (two) to perform a read operation. A… ▽ More We present the SC-ABD algorithm that implements sequentially consistent distributed shared memory (DSM). The algorithm tolerates that less than half of the processes are faulty (crash-stop). Compared to the multi-writer ABD algorithm, SC-ABD requires one instead of two round-trips of communication to perform a write operation, and an equal number of round-trips (two) to perform a read operation. Although sequential consistency is not a compositional consistency condition, the provided correctness proof is compositional. △ Less

Submitted 8 August, 2016; originally announced August 2016.

Comments: Paper presented at the 4th Edition of The International Conference on NETworked sYStems, May 18-20, 2016

arXiv:1607.02646 [pdf, other]

doi 10.1109/TKDE.2017.2762294

High-Level Programming Abstractions for Distributed Graph Processing

Authors: Vasiliki Kalavri, Vladimir Vlassov, Seif Haridi

Abstract: Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs arise in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can… ▽ More Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs arise in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can cover a diverse set of problem domains, including iterative refinement algorithms, graph transformations, graph aggregations, pattern matching, ego-network analysis, and graph traversals. Several high-level programming abstractions have been proposed and adopted by distributed graph processing systems and big data platforms. Even though significant work has been done to experimentally compare distributed graph processing frameworks, no qualitative study and comparison of graph programming abstractions has been conducted yet. In this survey, we review and analyze the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability. We identify the classes of graph applications that can be naturally expressed by each abstraction and we also give examples of applications that are hard or impossible to express. We review 34 distributed graph processing systems with respect to their programming abstractions, execution models, and communication mechanisms. Finally, we discuss trends and open research questions in the area of distributed graph processing. △ Less

Submitted 9 July, 2016; originally announced July 2016.

arXiv:1606.01588 [pdf, other]

HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases

Authors: Salman Niazi, Mahmoud Ismail, Steffen Grohsschmiedt, Mikael Ronström, Seif Haridi, Jim Dowling

Abstract: Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS' single… ▽ More Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS' single node in-memory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS enables an order of magnitude larger and higher throughput clusters compared to HDFS. Metadata capacity has been increased to at least 37 times HDFS' capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS. HopsFS also has lower latency for many concurrent clients, and no downtime during failover. Finally, as metadata is now stored in a commodity database, it can be safely extended and easily exported to external systems for online analysis and free-text search. △ Less

Submitted 22 February, 2017; v1 submitted 5 June, 2016; originally announced June 2016.

Journal ref: The 15th USENIX Conference on File and Storage Technologies (FAST 17) (2017) 89-104

arXiv:1506.08603 [pdf, other]

Lightweight Asynchronous Snapshots for Distributed Dataflows

Authors: Paris Carbone, Gyula Fóra, Stephan Ewen, Seif Haridi, Kostas Tzoumas

Abstract: Distributed stateful stream processing enables the deployment and execution of large scale continuous computations in the cloud, targeting both low latency and high throughput. One of the most fundamental challenges of this paradigm is providing processing guarantees under potential failures. Existing approaches rely on periodic global state snapshots that can be used for failure recovery. Those a… ▽ More Distributed stateful stream processing enables the deployment and execution of large scale continuous computations in the cloud, targeting both low latency and high throughput. One of the most fundamental challenges of this paradigm is providing processing guarantees under potential failures. Existing approaches rely on periodic global state snapshots that can be used for failure recovery. Those approaches suffer from two main drawbacks. First, they often stall the overall computation which impacts ingestion. Second, they eagerly persist all records in transit along with the operation states which results in larger snapshots than required. In this work we propose Asynchronous Barrier Snapshotting (ABS), a lightweight algorithm suited for modern dataflow execution engines that minimises space requirements. ABS persists only operator states on acyclic execution topologies while keeping a minimal record log on cyclic dataflows. We implemented ABS on Apache Flink, a distributed analytics engine that supports stateful stream processing. Our evaluation shows that our algorithm does not have a heavy impact on the execution, maintaining linear scalability and performing well with frequent snapshots. △ Less

Submitted 29 June, 2015; originally announced June 2015.

Comments: 8 pages, 7 figures

Report number: ISBN 978-91-7595-651-0

arXiv:0710.0386 [pdf, ps, other]

Comparing Maintenance Strategies for Overlays

Authors: Supriya Krishnamurthy, Sameh El-Ansary, Erik Aurell, Seif Haridi

Abstract: In this paper, we present an analytical tool for understanding the performance of structured overlay networks under churn based on the master-equation approach of physics. We motivate and derive an equation for the average number of hops taken by lookups during churn, for the Chord network. We analyse this equation in detail to understand the behaviour with and without churn. We then use this un… ▽ More In this paper, we present an analytical tool for understanding the performance of structured overlay networks under churn based on the master-equation approach of physics. We motivate and derive an equation for the average number of hops taken by lookups during churn, for the Chord network. We analyse this equation in detail to understand the behaviour with and without churn. We then use this understanding to predict how lookups will scale for varying peer population as well as varying the sizes of the routing tables. We then consider a change in the maintenance algorithm of the overlay, from periodic stabilisation to a reactive one which corrects fingers only when a change is detected. We generalise our earlier analysis to underdstand how the reactive strategy compares with the periodic one. △ Less

Submitted 1 October, 2007; originally announced October 2007.

Comments: 10 pages, 8 figures

Report number: Tech. Report TR-2007-01, Swedish Institute of Computer Science

arXiv:0710.0270 [pdf, ps, other]

doi 10.1109/TNET.2007.905590

An Analytical Study of a Structured Overlay in the presence of Dynamic Membership

Authors: Supriya Krishnamurthy, Sameh El-Ansary, Erik Aurell, Seif Haridi

Abstract: In this paper we present an analytical study of dynamic membership (aka churn) in structured peer-to-peer networks. We use a fluid model approach to describe steady-state or transient phenomena, and apply it to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately account for the functional form of the probability of network disconnection as well as… ▽ More In this paper we present an analytical study of dynamic membership (aka churn) in structured peer-to-peer networks. We use a fluid model approach to describe steady-state or transient phenomena, and apply it to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately account for the functional form of the probability of network disconnection as well as the fraction of failed or incorrect successor and finger pointers. We show how we can use these quantities to predict both the performance and consistency of lookups under churn. All theoretical predictions match simulation results. The analysis includes both features that are generic to structured overlays deploying a ring as well as Chord-specific details, and opens the door to a systematic comparative analysis of, at least, ring-based structured overlay systems under churn. △ Less

Submitted 1 October, 2007; originally announced October 2007.

Comments: 12 pages, 14 figures, to appear in IEEE/ACM Transactions on Networking

arXiv:cs/0501069 [pdf, ps, other]

A Statistical Theory of Chord under Churn

Authors: Supriya Krishnamurthy, Sameh El-Ansary, Erik Aurell, Seif Haridi

Abstract: Most earlier studies of Distributed Hash Tables (DHTs) under churn have either depended on simulations as the primary investigation tool, or on establishing bounds for DHTs to function. In this paper, we present a complete analytical study of churn using a master-equation-based approach, used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena.… ▽ More Most earlier studies of Distributed Hash Tables (DHTs) under churn have either depended on simulations as the primary investigation tool, or on establishing bounds for DHTs to function. In this paper, we present a complete analytical study of churn using a master-equation-based approach, used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena. Simulations are used to verify all theoretical predictions. We demonstrate the application of our methodology to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately predict the fraction of failed or incorrect successor and finger pointers and show how we can use these quantities to predict the performance and consistency of lookups under churn. We also discuss briefly how churn may actually be of different 'types' and the implications this will have for the functioning of DHTs in general. △ Less

Submitted 24 January, 2005; originally announced January 2005.

Comments: 6 pages, In the 4th International Workshop on Peer-to- Peer Systems (IPTPS'05), Ithaca, New York, USA, 2005

ACM Class: I.6; G.3; E.1

arXiv:cs/0208029 [pdf, ps, other]

Logic programming in the context of multiparadigm programming: the Oz experience

Authors: Peter Van Roy, Per Brand, Denys Duchier, Seif Haridi, Martin Henz, Christian Schulte

Abstract: Oz is a multiparadigm language that supports logic programming as one of its major paradigms. A multiparadigm language is designed to support different programming paradigms (logic, functional, constraint, object-oriented, sequential, concurrent, etc.) with equal ease. This article has two goals: to give a tutorial of logic programming in Oz and to show how logic programming fits naturally into… ▽ More Oz is a multiparadigm language that supports logic programming as one of its major paradigms. A multiparadigm language is designed to support different programming paradigms (logic, functional, constraint, object-oriented, sequential, concurrent, etc.) with equal ease. This article has two goals: to give a tutorial of logic programming in Oz and to show how logic programming fits naturally into the wider context of multiparadigm programming. Our experience shows that there are two classes of problems, which we call algorithmic and search problems, for which logic programming can help formulate practical solutions. Algorithmic problems have known efficient algorithms. Search problems do not have known efficient algorithms but can be solved with search. The Oz support for logic programming targets these two problem classes specifically, using the concepts needed for each. This is in contrast to the Prolog approach, which targets both classes with one set of concepts, which results in less than optimal support for each class. To explain the essential difference between algorithmic and search programs, we define the Oz execution model. This model subsumes both concurrent logic programming (committed-choice-style) and search-based logic programming (Prolog-style). Instead of Horn clause syntax, Oz has a simple, fully compositional, higher-order syntax that accommodates the abilities of the language. We conclude with lessons learned from this work, a brief history of Oz, and many entry points into the Oz literature. △ Less

Submitted 20 August, 2002; originally announced August 2002.

Comments: 48 pages, to appear in the journal "Theory and Practice of Logic Programming"

ACM Class: D.1.6; D.3.2; D.3.3; F.3.3

Showing 1–12 of 12 results for author: Haridi, S