-
Conditional Deep Canonical Time Warping
Authors:
Afek Steinberg,
Ran Eisenberg,
Ofir Lindenbaum
Abstract:
Temporal alignment of sequences is a fundamental challenge in many applications, such as computer vision and bioinformatics, where local time shifting needs to be accounted for. Misalignment can lead to poor model generalization, especially in high-dimensional sequences. Existing methods often struggle with optimization when dealing with high-dimensional sparse data, falling into poor alignments.…
▽ More
Temporal alignment of sequences is a fundamental challenge in many applications, such as computer vision and bioinformatics, where local time shifting needs to be accounted for. Misalignment can lead to poor model generalization, especially in high-dimensional sequences. Existing methods often struggle with optimization when dealing with high-dimensional sparse data, falling into poor alignments. Feature selection is frequently used to enhance model performance for sparse data. However, a fixed set of selected features would not generally work for dynamically changing sequences and would need to be modified based on the state of the sequence. Therefore, modifying the selected feature based on contextual input would result in better alignment. Our suggested method, Conditional Deep Canonical Temporal Time Warping (CDCTW), is designed for temporal alignment in sparse temporal data to address these challenges. CDCTW enhances alignment accuracy for high dimensional time-dependent views be performing dynamic time warping on data embedded in maximally correlated subspace which handles sparsity with novel feature selection method. We validate the effectiveness of CDCTW through extensive experiments on various datasets, demonstrating superior performance over previous techniques.
△ Less
Submitted 9 January, 2025; v1 submitted 24 December, 2024;
originally announced December 2024.
-
Self Supervised Correlation-based Permutations for Multi-View Clustering
Authors:
Ran Eisenberg,
Jonathan Svirsky,
Ofir Lindenbaum
Abstract:
Combining data from different sources can improve data analysis tasks such as clustering. However, most of the current multi-view clustering methods are limited to specific domains or rely on a suboptimal and computationally intensive two-stage process of representation learning and clustering. We propose an end-to-end deep learning-based multi-view clustering framework for general data types (suc…
▽ More
Combining data from different sources can improve data analysis tasks such as clustering. However, most of the current multi-view clustering methods are limited to specific domains or rely on a suboptimal and computationally intensive two-stage process of representation learning and clustering. We propose an end-to-end deep learning-based multi-view clustering framework for general data types (such as images and tables). Our approach involves generating meaningful fused representations using a novel permutation-based canonical correlation objective. We provide a theoretical analysis showing how the learned embeddings approximate those obtained by supervised linear discriminant analysis (LDA). Cluster assignments are learned by identifying consistent pseudo-labels across multiple views. Additionally, we establish a theoretical bound on the error caused by incorrect pseudo-labels in the unsupervised representations compared to LDA. Extensive experiments on ten multi-view clustering benchmark datasets provide empirical evidence for the effectiveness of the proposed model.
△ Less
Submitted 20 May, 2025; v1 submitted 26 February, 2024;
originally announced February 2024.
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Authors:
BigScience Workshop,
:,
Teven Le Scao,
Angela Fan,
Christopher Akiki,
Ellie Pavlick,
Suzana Ilić,
Daniel Hesslow,
Roman Castagné,
Alexandra Sasha Luccioni,
François Yvon,
Matthias Gallé,
Jonathan Tow,
Alexander M. Rush,
Stella Biderman,
Albert Webson,
Pawan Sasanka Ammanamanchi,
Thomas Wang,
Benoît Sagot,
Niklas Muennighoff,
Albert Villanova del Moral,
Olatunji Ruwase,
Rachel Bawden,
Stas Bekman,
Angelina McMillan-Major
, et al. (369 additional authors not shown)
Abstract:
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access…
▽ More
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
△ Less
Submitted 27 June, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Eiger: Auditable, executable, flexible legal regulations
Authors:
Alexander Bernauer,
Richard A. Eisenberg
Abstract:
Despite recent advances in communication and automation, regulations are still written in natural-language prose, subject to ambiguity, inconsistency, and incompleteness. How can we craft regulations with precision? Our solution is embodied in Eiger, a domain-specific programming language embedded in Haskell. A domain expert pairs with a software engineer to write regulations in Eiger. The domain…
▽ More
Despite recent advances in communication and automation, regulations are still written in natural-language prose, subject to ambiguity, inconsistency, and incompleteness. How can we craft regulations with precision? Our solution is embodied in Eiger, a domain-specific programming language embedded in Haskell. A domain expert pairs with a software engineer to write regulations in Eiger. The domain expert needs only to read and audit the code, but not write it. A first, limited, user study suggests that this works well in practice because Eiger code mostly looks like Excel formulas with simple SQL queries.
Eiger forms the kernel of a new strategy to deliver value to clients in our professional services business with increased automation and precision. The framework is executable: based on client data, we can use Eiger both to deduce how best to adapt to a new regulation and then maintain compliance. This paper reviews the design of Eiger and walks through its implementation. To preserve a straightforward surface syntax but with monadic semantics, we have leveraged advanced features, including GHC.Generics, the new OverloadedRecordDot extension, and a novel approach to performing class instance selection at run-time.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
Seeking Stability by being Lazy and Shallow
Authors:
Gert-Jan Bottu,
Richard A. Eisenberg
Abstract:
Designing a language feature often requires a choice between several, similarly expressive possibilities. Given that user studies are generally impractical, we propose using stability as a way of making such decisions. Stability is a measure of whether the meaning of a program alters under small, seemingly innocuous changes in the code. Directly motivated by a need to pin down a feature in GHC/Has…
▽ More
Designing a language feature often requires a choice between several, similarly expressive possibilities. Given that user studies are generally impractical, we propose using stability as a way of making such decisions. Stability is a measure of whether the meaning of a program alters under small, seemingly innocuous changes in the code. Directly motivated by a need to pin down a feature in GHC/Haskell, we apply this notion of stability to analyse four approaches to the instantiation of polymorphic types, concluding that the most stable approach is lazy (instantiate a polytype only when absolutely necessary) and shallow (instantiate only top-level type variables, not variables that appear after explicit arguments).
△ Less
Submitted 5 July, 2021; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Linearly Qualified Types: Generic inference for capabilities and uniqueness
Authors:
Arnaud Spiwack,
Csongor Kiss,
Jean-Philippe Bernardy,
Nicolas Wu,
Richard Eisenberg
Abstract:
A linear parameter must be consumed exactly once in the body of its function. When declaring resources such as file handles and manually managed memory as linear arguments, a linear type system can verify that these resources are used safely. However, writing code with explicit linear arguments requires bureaucracy. This paper presents linear constraints, a front-end feature for linear typing that…
▽ More
A linear parameter must be consumed exactly once in the body of its function. When declaring resources such as file handles and manually managed memory as linear arguments, a linear type system can verify that these resources are used safely. However, writing code with explicit linear arguments requires bureaucracy. This paper presents linear constraints, a front-end feature for linear typing that decreases the bureaucracy of working with linear types. Linear constraints are implicit linear arguments that are filled in automatically by the compiler. We present linear constraints as a qualified type system,together with an inference algorithm which extends GHC's existing constraint solver algorithm. Soundness of linear constraints is ensured by the fact that they desugar into Linear Haskell.
△ Less
Submitted 22 July, 2022; v1 submitted 10 March, 2021;
originally announced March 2021.
-
A graded dependent type system with a usage-aware semantics (extended version)
Authors:
Pritam Choudhury,
Harley Eades III,
Richard A. Eisenberg,
Stephanie C Weirich
Abstract:
Graded Type Theory provides a mechanism to track and reason about resource usage in type systems. In this paper, we develop GraD, a novel version of such a graded dependent type system that includes functions, tensor products, additive sums, and a unit type. Since standard operational semantics is resource-agnostic, we develop a heap-based operational semantics and prove a soundness theorem that s…
▽ More
Graded Type Theory provides a mechanism to track and reason about resource usage in type systems. In this paper, we develop GraD, a novel version of such a graded dependent type system that includes functions, tensor products, additive sums, and a unit type. Since standard operational semantics is resource-agnostic, we develop a heap-based operational semantics and prove a soundness theorem that shows correct accounting of resource usage. Several useful properties, including the standard type soundness theorem, non-interference of irrelevant resources in computation and single pointer property for linear resources, can be derived from this theorem. We hope that our work will provide a base for integrating linearity, irrelevance and dependent types in practical programming languages like Haskell.
△ Less
Submitted 6 January, 2021; v1 submitted 8 November, 2020;
originally announced November 2020.
-
Kind Inference for Datatypes: Technical Supplement
Authors:
Ningning Xie,
Richard A. Eisenberg,
Bruno C. d. S. Oliveira
Abstract:
In recent years, languages like Haskell have seen a dramatic surge of new features that significantly extends the expressive power of their type systems. With these features, the challenge of kind inference for datatype declarations has presented itself and become a worthy research problem on its own.
This paper studies kind inference for datatypes. Inspired by previous research on type-inferenc…
▽ More
In recent years, languages like Haskell have seen a dramatic surge of new features that significantly extends the expressive power of their type systems. With these features, the challenge of kind inference for datatype declarations has presented itself and become a worthy research problem on its own.
This paper studies kind inference for datatypes. Inspired by previous research on type-inference, we offer declarative specifications for what datatype declarations should be accepted, both for Haskell98 and for a more advanced system we call PolyKinds, based on the extensions in modern Haskell, including a limited form of dependent types. We believe these formulations to be novel and without precedent, even for Haskell98. These specifications are complemented with implementable algorithmic versions. We study soundness, completeness and the existence of principal kinds in these systems, proving the properties where they hold. This work can serve as a guide both to language designers who wish to formalize their datatype declarations and also to implementors keen to have principled inference of principal types.
This technical supplement to Kind Inference for Datatypes serves to expand upon the text in the main paper. It contains detailed typing rules, proofs, and connections to the Glasgow Haskell Compiler (GHC).
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
A Role for Dependent Types in Haskell (Extended version)
Authors:
Stephanie Weirich,
Pritam Choudhury,
Antoine Voizard,
Richard A. Eisenberg
Abstract:
Modern Haskell supports zero-cost coercions, a mechanism where types that share the same run-time representation may be freely converted between. To make sure such conversions are safe and desirable, this feature relies on a mechanism of roles to prohibit invalid coercions. In this work, we show how to integrate roles with dependent type systems and prove, using the Coq proof assistant, that the r…
▽ More
Modern Haskell supports zero-cost coercions, a mechanism where types that share the same run-time representation may be freely converted between. To make sure such conversions are safe and desirable, this feature relies on a mechanism of roles to prohibit invalid coercions. In this work, we show how to integrate roles with dependent type systems and prove, using the Coq proof assistant, that the resulting system is sound. We have designed this work as a foundation for the addition of dependent types to the Glasgow Haskell Compiler, but we also expect that it will be of use to designers of other dependently-typed languages who might want to adopt Haskell's safe coercions feature.
△ Less
Submitted 31 May, 2019;
originally announced May 2019.
-
Type variables in patterns
Authors:
Richard A. Eisenberg,
Joachim Breitner,
Simon Peyton Jones
Abstract:
For many years, GHC has implemented an extension to Haskell that allows type variables to be bound in type signatures and patterns, and to scope over terms. This extension was never properly specified. We rectify that oversight here. With the formal specification in hand, the otherwise-labyrinthine path toward a design for binding type variables in patterns becomes blindingly clear. We thus extend…
▽ More
For many years, GHC has implemented an extension to Haskell that allows type variables to be bound in type signatures and patterns, and to scope over terms. This extension was never properly specified. We rectify that oversight here. With the formal specification in hand, the otherwise-labyrinthine path toward a design for binding type variables in patterns becomes blindingly clear. We thus extend ScopedTypeVariables to bind type variables explicitly, obviating the Proxy workaround to the dustbin of history.
△ Less
Submitted 9 June, 2018;
originally announced June 2018.
-
Constrained Type Families
Authors:
J. Garrett Morris,
Richard Eisenberg
Abstract:
We present an approach to support partiality in type-level computation without compromising expressiveness or type safety. Existing frameworks for type-level computation either require totality or implicitly assume it. For example, type families in Haskell provide a powerful, modular means of defining type-level computation. However, their current design implicitly assumes that type families are t…
▽ More
We present an approach to support partiality in type-level computation without compromising expressiveness or type safety. Existing frameworks for type-level computation either require totality or implicitly assume it. For example, type families in Haskell provide a powerful, modular means of defining type-level computation. However, their current design implicitly assumes that type families are total, introducing nonsensical types and significantly complicating the metatheory of type families and their extensions. We propose an alternative design, using qualified types to pair type-level computations with predicates that capture their domains. Our approach naturally captures the intuitive partiality of type families, simplifying their metatheory. As evidence, we present the first complete proof of consistency for a language with closed type families.
△ Less
Submitted 29 June, 2017;
originally announced June 2017.
-
Dependent Types in Haskell: Theory and Practice
Authors:
Richard A. Eisenberg
Abstract:
Haskell, as implemented in the Glasgow Haskell Compiler (GHC), has been adding new type-level programming features for some time. Many of these features---chiefly: generalized algebraic datatypes (GADTs), type families, kind polymorphism, and promoted datatypes---have brought Haskell to the doorstep of dependent types. Many dependently typed programs can even currently be encoded, but often the co…
▽ More
Haskell, as implemented in the Glasgow Haskell Compiler (GHC), has been adding new type-level programming features for some time. Many of these features---chiefly: generalized algebraic datatypes (GADTs), type families, kind polymorphism, and promoted datatypes---have brought Haskell to the doorstep of dependent types. Many dependently typed programs can even currently be encoded, but often the constructions are painful.
In this dissertation, I describe Dependent Haskell, which supports full dependent types via a backward-compatible extension to today's Haskell. An important contribution of this work is an implementation, in GHC, of a portion of Dependent Haskell, with the rest to follow. The features I have implemented are already released, in GHC 8.0. This dissertation contains several practical examples of Dependent Haskell code, a full description of the differences between Dependent Haskell and today's Haskell, a novel type-safe dependently typed lambda-calculus (called Pico) suitable for use as an intermediate language for compiling Dependent Haskell, and a type inference and elaboration algorithm, Bake, that translates Dependent Haskell to type-correct Pico.
△ Less
Submitted 12 August, 2017; v1 submitted 25 October, 2016;
originally announced October 2016.