-
A Community-driven vision for a new Knowledge Resource for AI
Authors:
Vinay K Chaudhri,
Chaitan Baru,
Brandon Bennett,
Mehul Bhatt,
Darion Cassel,
Anthony G Cohn,
Rina Dechter,
Esra Erdem,
Dave Ferrucci,
Ken Forbus,
Gregory Gelfond,
Michael Genesereth,
Andrew S. Gordon,
Benjamin Grosof,
Gopal Gupta,
Jim Hendler,
Sharat Israni,
Tyler R. Josephson,
Patrick Kyllonen,
Yuliya Lierler,
Vladimir Lifschitz,
Clifton McFate,
Hande K. McGinty,
Leora Morgenstern,
Alessandro Oltramari
, et al. (7 additional authors not shown)
Abstract:
The long-standing goal of creating a comprehensive, multi-purpose knowledge resource, reminiscent of the 1984 Cyc project, still persists in AI. Despite the success of knowledge resources like WordNet, ConceptNet, Wolfram|Alpha and other commercial knowledge graphs, verifiable, general-purpose widely available sources of knowledge remain a critical deficiency in AI infrastructure. Large language m…
▽ More
The long-standing goal of creating a comprehensive, multi-purpose knowledge resource, reminiscent of the 1984 Cyc project, still persists in AI. Despite the success of knowledge resources like WordNet, ConceptNet, Wolfram|Alpha and other commercial knowledge graphs, verifiable, general-purpose widely available sources of knowledge remain a critical deficiency in AI infrastructure. Large language models struggle due to knowledge gaps; robotic planning lacks necessary world knowledge; and the detection of factually false information relies heavily on human expertise. What kind of knowledge resource is most needed in AI today? How can modern technology shape its development and evaluation? A recent AAAI workshop gathered over 50 researchers to explore these questions. This paper synthesizes our findings and outlines a community-driven vision for a new knowledge infrastructure. In addition to leveraging contemporary advances in knowledge representation and reasoning, one promising idea is to build an open engineering framework to exploit knowledge modules effectively within the context of practical applications. Such a framework should include sets of conventions and social structures that are adopted by contributors.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Modal Verification Patterns for Systems
Authors:
Ismail Kuru,
Colin S. Gordon
Abstract:
Although they differ in the functionality they offer, low-level systems exhibit certain patterns of design and utilization of computing resources. In this paper, we argue the position that modalities, in the sense of modal logic, should be a go-to approach when specifying and verifying low-level systems code. We explain how the concept of a resource context helps guide the design of new modalities…
▽ More
Although they differ in the functionality they offer, low-level systems exhibit certain patterns of design and utilization of computing resources. In this paper, we argue the position that modalities, in the sense of modal logic, should be a go-to approach when specifying and verifying low-level systems code. We explain how the concept of a resource context helps guide the design of new modalities for verification of systems code, and we justify our perspective by discussing prior systems that have used modalities for systems verification successfully, arguing that they fit into the verification design pattern we articulate, and explaining how this approach might apply to other systems verification challenges.
△ Less
Submitted 25 June, 2025; v1 submitted 2 June, 2025;
originally announced June 2025.
-
Hybrid Team Tetris: A New Platform For Hybrid Multi-Agent, Multi-Human Teaming
Authors:
Kaleb Mcdowell,
Nick Waytowich,
Javier Garcia,
Stephen Gordon,
Bryce Bartlett,
Jeremy Gaston
Abstract:
Metcalfe et al (1) argue that the greatest potential for human-AI partnerships lies in their application to highly complex problem spaces. Herein, we discuss three different forms of hybrid team intelligence and posit that across all three forms, the hybridization of man and machine intelligence can be effective under the right conditions. We foresee two significant research and development (R&D)…
▽ More
Metcalfe et al (1) argue that the greatest potential for human-AI partnerships lies in their application to highly complex problem spaces. Herein, we discuss three different forms of hybrid team intelligence and posit that across all three forms, the hybridization of man and machine intelligence can be effective under the right conditions. We foresee two significant research and development (R&D) challenges underlying the creation of effective hybrid intelligence. First, rapid advances in machine intelligence and/or fundamental changes in human behaviors or capabilities over time can outpace R&D. Second, the future conditions under which hybrid intelligence will operate are unknown, but unlikely to be the same as the conditions of today. Overcoming both of these challenges requires a deep understanding of multiple human-centric and machine-centric disciplines that creates a large barrier to entry into the field. Herein, we outline an open, shareable research platform that creates a form of hybrid team intelligence that functions under representative future conditions. The intent for the platform is to facilitate new forms of hybrid intelligence research allowing individuals with human-centric or machine-centric backgrounds to rapidly enter the field and initiate research. Our hope is that through open, community research on the platform, state-of-the-art advances in human and machine intelligence can quickly be communicated across what are currently different R&D communities and allow hybrid team intelligence research to stay at the forefront of scientific advancement.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
Actor Capabilities for Message Ordering (Extended Version)
Authors:
Colin S. Gordon
Abstract:
Actor systems are a flexible model of concurrent and distributed programming, which are efficiently implementable, and avoid many classic concurrency bugs by construction. However actor systems must still deal with the challenge of messages arriving in unexpected orderings.
We describe an approach to restricting the orders in which actors send messages to each other, by equipping actor reference…
▽ More
Actor systems are a flexible model of concurrent and distributed programming, which are efficiently implementable, and avoid many classic concurrency bugs by construction. However actor systems must still deal with the challenge of messages arriving in unexpected orderings.
We describe an approach to restricting the orders in which actors send messages to each other, by equipping actor references -- the handle used to address another actor -- with a protocol restricting which message types can be sent to another actor and in which order using that particular actor reference. This endows the actor references with the properties of static (flow-sensitive) capabilities, which we call actor capabilities.
By sending other actors only restricted actor references, they may control which messages are sent in which orders by other actors. Rules for duplicating (splitting) actor references ensure that these restrictions apply even in the presence of delegation. The capabilities themselves restrict message ordering, which may form the foundation for stronger forms of reasoning. We demonstrate this by layering an effect system over the base type system, where the relationships enforced between the actor capabilities and the effects of an actor's behaviour ensure that an actor's behaviour is always prepared to handle any message that may arrive.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Data-NoMAD: A Tool for Boosting Confidence in the Integrity of Social Science Survey Data
Authors:
Sanford C. Gordon,
Cyrus Samii,
Zhihao Su
Abstract:
To safeguard against data fabrication and enhance trust in quantitative social science, we present Data Non-Manipulation Authentication Digest (Data-NoMAD). Data-NoMAD is a tool that allows researchers to certify, and others to verify, that a dataset has not been inappropriately manipulated between the point of data collection and the point at which a replication archive is made publicly available…
▽ More
To safeguard against data fabrication and enhance trust in quantitative social science, we present Data Non-Manipulation Authentication Digest (Data-NoMAD). Data-NoMAD is a tool that allows researchers to certify, and others to verify, that a dataset has not been inappropriately manipulated between the point of data collection and the point at which a replication archive is made publicly available. Data-NoMAD creates and stores a column hash digest of a raw dataset upon initial download from a survey platform (the current version works with Qualtrics and SurveyCTO), but before it is subject to appropriate manipulations such as anonymity-preserving redactions. Data-NoMAD can later be used to verify the integrity of a publicly archived dataset by identifying columns that have been deleted, added, or altered. Data-NoMAD complements existing efforts at ensuring research integrity and integrates seamlessly with extant replication practices.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
Monotone Contractions
Authors:
Eleni Batziou,
John Fearnley,
Spencer Gordon,
Ruta Mehta,
Rahul Savani
Abstract:
We study functions $f : [0, 1]^d \rightarrow [0, 1]^d$ that are both monotone and contracting, and we consider the problem of finding an $\varepsilon$-approximate fixed point of $f$. We show that the problem lies in the complexity class UEOPL. We give an algorithm that finds an $\varepsilon$-approximate fixed point of a three-dimensional monotone contraction using $O(\log (1/\varepsilon))$ queries…
▽ More
We study functions $f : [0, 1]^d \rightarrow [0, 1]^d$ that are both monotone and contracting, and we consider the problem of finding an $\varepsilon$-approximate fixed point of $f$. We show that the problem lies in the complexity class UEOPL. We give an algorithm that finds an $\varepsilon$-approximate fixed point of a three-dimensional monotone contraction using $O(\log (1/\varepsilon))$ queries to $f$. We also give a decomposition theorem that allows us to use this result to obtain an algorithm that finds an $\varepsilon$-approximate fixed point of a $d$-dimensional monotone contraction using $O((c \cdot \log (1/\varepsilon))^{\lceil d / 3 \rceil})$ queries to $f$ for some constant $c$. Moreover, each step of both of our algorithms takes time that is polynomial in the representation of $f$. These results are strictly better than the best-known results for functions that are only monotone, or only contracting.
All of our results also apply to Shapley stochastic games, which are known to be reducible to the monotone contraction problem. Thus we put Shapley games in UEOPL, and we give a faster algorithm for approximating the value of a Shapley game.
△ Less
Submitted 27 March, 2025; v1 submitted 15 November, 2024;
originally announced November 2024.
-
From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items
Authors:
Melissa Roemmele,
Andrew S. Gordon
Abstract:
LLMs can now perform a variety of complex writing tasks. They also excel in answering questions pertaining to natural language inference and commonsense reasoning. Composing these questions is itself a skilled writing task, so in this paper we consider LLMs as authors of commonsense assessment items. We prompt LLMs to generate items in the style of a prominent benchmark for commonsense reasoning,…
▽ More
LLMs can now perform a variety of complex writing tasks. They also excel in answering questions pertaining to natural language inference and commonsense reasoning. Composing these questions is itself a skilled writing task, so in this paper we consider LLMs as authors of commonsense assessment items. We prompt LLMs to generate items in the style of a prominent benchmark for commonsense reasoning, the Choice of Plausible Alternatives (COPA). We examine the outcome according to analyses facilitated by the LLMs and human annotation. We find that LLMs that succeed in answering the original COPA benchmark are also more successful in authoring their own items.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices
Authors:
Yijie Zheng,
Robert J. Kilpatrick,
David B. Phillips,
George S. D. Gordon
Abstract:
Multimode optical fibres are hair-thin strands of glass that efficiently transport light. They promise next-generation medical endoscopes that provide unprecedented sub-cellular image resolution deep inside the body. However, confining light to such fibres means that images are inherently scrambled in transit. Conventionally, this scrambling has been compensated by pre-calibrating how a specific f…
▽ More
Multimode optical fibres are hair-thin strands of glass that efficiently transport light. They promise next-generation medical endoscopes that provide unprecedented sub-cellular image resolution deep inside the body. However, confining light to such fibres means that images are inherently scrambled in transit. Conventionally, this scrambling has been compensated by pre-calibrating how a specific fibre scrambles light and solving a stationary linear matrix equation that represents a physical model of the fibre. However, as the technology develops towards real-world deployment, the unscrambling process must account for dynamic changes in the matrix representing the fibre's effect on light, due to factors such as movement and temperature shifts, and non-linearities resulting from the inaccessibility of the fibre tip when inside the body. Such complex, dynamic and nonlinear behaviour is well-suited to approximation by neural networks, but most leading image reconstruction networks rely on convolutional layers, which assume strong correlations between adjacent pixels, a strong inductive bias that is inappropriate for fibre matrices which may be expressed in a range of arbitrary coordinate representations with long-range correlations. We introduce a new concept that uses self-attention layers to dynamically transform the coordinate representations of varying fibre matrices to a basis that admits compact, low-dimensional representations suitable for further processing. We demonstrate the effectiveness of this approach on diverse fibre matrix datasets. We show our models significantly improve the sparsity of fibre bases in their transformed bases with a participation ratio, p, as a measure of sparsity, of between 0.01 and 0.11. Further, we show that these transformed representations admit reconstruction of the original matrices with < 10% reconstruction error, demonstrating the invertibility.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Interpretable cancer cell detection with phonon microscopy using multi-task conditional neural networks for inter-batch calibration
Authors:
Yijie Zheng,
Rafael Fuentes-Dominguez,
Matt Clark,
George S. D. Gordon,
Fernando Perez-Cota
Abstract:
Advances in artificial intelligence (AI) show great potential in revealing underlying information from phonon microscopy (high-frequency ultrasound) data to identify cancerous cells. However, this technology suffers from the 'batch effect' that comes from unavoidable technical variations between each experiment, creating confounding variables that the AI model may inadvertently learn. We therefore…
▽ More
Advances in artificial intelligence (AI) show great potential in revealing underlying information from phonon microscopy (high-frequency ultrasound) data to identify cancerous cells. However, this technology suffers from the 'batch effect' that comes from unavoidable technical variations between each experiment, creating confounding variables that the AI model may inadvertently learn. We therefore present a multi-task conditional neural network framework to simultaneously achieve inter-batch calibration, by removing confounding variables, and accurate cell classification of time-resolved phonon-derived signals. We validate our approach by training and validating on different experimental batches, achieving a balanced precision of 89.22% and an average cross-validated precision of 89.07% for classifying background, healthy and cancerous regions. Classification can be performed in 0.5 seconds with only simple prior batch information required for multiple batch corrections. Further, we extend our model to reconstruct denoised signals, enabling physical interpretation of salient features indicating disease state including sound velocity, sound attenuation and cell-adhesion to substrate.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Characterizing Role Models in Software Practitioners' Career: An Interview Study
Authors:
Mary Sánchez-Gordón,
Ricardo Colomo-Palacios,
Alex Sanchez Gordon
Abstract:
A role model is a person who serves as an example for others to follow, especially in terms of values, behavior, achievements, and personal characteristics. In this paper, authors study how role models influence software practitioners careers, an aspect not studied in the literature before. By means of this study, authors aim to understand if there are any salient role model archetypes and what ch…
▽ More
A role model is a person who serves as an example for others to follow, especially in terms of values, behavior, achievements, and personal characteristics. In this paper, authors study how role models influence software practitioners careers, an aspect not studied in the literature before. By means of this study, authors aim to understand if there are any salient role model archetypes and what characteristics are valued by participants in their role models. To do so, authors use a thematic coding approach to analyze the data collected from interviewing ten Latin American software practitioners. Findings reveal that role models were perceived as sources of knowledge, yet the majority of participants, regardless of their career stage, displayed a stronger interest in the human side and the moral values that their role models embodied. This study also shows that any practitioner can be viewed as a role model.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Two Choices are Enough for P-LCPs, USOs, and Colorful Tangents
Authors:
Michaela Borzechowski,
John Fearnley,
Spencer Gordon,
Rahul Savani,
Patrick Schnider,
Simon Weber
Abstract:
We provide polynomial-time reductions between three search problems from three distinct areas: the P-matrix linear complementarity problem (P-LCP), finding the sink of a unique sink orientation (USO), and a variant of the $α$-Ham Sandwich problem. For all three settings, we show that "two choices are enough", meaning that the general non-binary version of the problem can be reduced in polynomial t…
▽ More
We provide polynomial-time reductions between three search problems from three distinct areas: the P-matrix linear complementarity problem (P-LCP), finding the sink of a unique sink orientation (USO), and a variant of the $α$-Ham Sandwich problem. For all three settings, we show that "two choices are enough", meaning that the general non-binary version of the problem can be reduced in polynomial time to the binary version. This specifically means that generalized P-LCPs are equivalent to P-LCPs, and grid USOs are equivalent to cube USOs. These results are obtained by showing that both the P-LCP and our $α$-Ham Sandwich variant are equivalent to a new problem we introduce, P-Lin-Bellman. This problem can be seen as a new tool for formulating problems as P-LCPs.
△ Less
Submitted 21 May, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Causal Discovery under Latent Class Confounding
Authors:
Bijan Mazaheri,
Spencer Gordon,
Yuval Rabani,
Leonard Schulman
Abstract:
An acyclic causal structure can be described with directed acyclic graph (DAG), where arrows indicate the possibility of direct causation. The task of learning this structure from data is known as "causal discovery." Diverse populations or changing environments can sometimes give rise to data that is heterogeneous in the following sense: each population/environment is a "source" which idiosyncrati…
▽ More
An acyclic causal structure can be described with directed acyclic graph (DAG), where arrows indicate the possibility of direct causation. The task of learning this structure from data is known as "causal discovery." Diverse populations or changing environments can sometimes give rise to data that is heterogeneous in the following sense: each population/environment is a "source" which idiosyncratically determines the forms of those direct causal effects. From this perspective, the source is a latent common cause for every observed variable. While some methods for causal discovery are able to work around latent confounding in special cases, especially when only few observables are confounded, a global confounder is a difficult challenge. The only known ways to deal with latent global confounding involve assumptions that limit the structural equations and/or noise functions. We demonstrate that globally confounded causal structures can still be identifiable with arbitrary structural equations and noise functions, so long as the number of latent classes remains small relative to the size and sparsity of the underlying DAG.
△ Less
Submitted 16 October, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Identifiability of Product of Experts Models
Authors:
Spencer L. Gordon,
Manav Kant,
Eric Ma,
Leonard J. Schulman,
Andrei Staicu
Abstract:
Product of experts (PoE) are layered networks in which the value at each node is an AND (or product) of the values (possibly negated) at its inputs. These were introduced as a neural network architecture that can efficiently learn to generate high-dimensional data which satisfy many low-dimensional constraints -- thereby allowing each individual expert to perform a simple task. PoEs have found a v…
▽ More
Product of experts (PoE) are layered networks in which the value at each node is an AND (or product) of the values (possibly negated) at its inputs. These were introduced as a neural network architecture that can efficiently learn to generate high-dimensional data which satisfy many low-dimensional constraints -- thereby allowing each individual expert to perform a simple task. PoEs have found a variety of applications in learning.
We study the problem of identifiability of a product of experts model having a layer of binary latent variables, and a layer of binary observables that are iid conditional on the latents. The previous best upper bound on the number of observables needed to identify the model was exponential in the number of parameters. We show: (a) When the latents are uniformly distributed, the model is identifiable with a number of observables equal to the number of parameters (and hence best possible). (b) In the more general case of arbitrarily distributed latents, the model is identifiable for a number of observables that is still linear in the number of parameters (and within a factor of two of best-possible). The proofs rely on root interlacing phenomena for some special three-term recurrences.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Trustworthy Formal Natural Language Specifications
Authors:
Colin S. Gordon,
Sergey Matskevich
Abstract:
Interactive proof assistants are computer programs carefully constructed to check a human-designed proof of a mathematical claim with high confidence in the implementation. However, this only validates truth of a formal claim, which may have been mistranslated from a claim made in natural language. This is especially problematic when using proof assistants to formally verify the correctness of sof…
▽ More
Interactive proof assistants are computer programs carefully constructed to check a human-designed proof of a mathematical claim with high confidence in the implementation. However, this only validates truth of a formal claim, which may have been mistranslated from a claim made in natural language. This is especially problematic when using proof assistants to formally verify the correctness of software with respect to a natural language specification. The translation from informal to formal remains a challenging, time-consuming process that is difficult to audit for correctness.
This paper shows that it is possible to build support for specifications written in expressive subsets of natural language, within existing proof assistants, consistent with the principles used to establish trust and auditability in proof assistants themselves. We implement a means to provide specifications in a modularly extensible formal subset of English, and have them automatically translated into formal claims, entirely within the Lean proof assistant. Our approach is extensible (placing no permanent restrictions on grammatical structure), modular (allowing information about new words to be distributed alongside libraries), and produces proof certificates explaining how each word was interpreted and how the sentence's structure was used to compute the meaning.
We apply our prototype to the translation of various English descriptions of formal specifications from a popular textbook into Lean formalizations; all can be translated correctly with a modest lexicon with only minor modifications related to lexicon size.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity
Authors:
Spencer L. Gordon,
Erik Jahn,
Bijan Mazaheri,
Yuval Rabani,
Leonard J. Schulman
Abstract:
We consider the problem of identifying, from statistics, a distribution of discrete random variables $X_1,\ldots,X_n$ that is a mixture of $k$ product distributions. The best previous sample complexity for $n \in O(k)$ was $(1/ζ)^{O(k^2 \log k)}$ (under a mild separation assumption parameterized by $ζ$). The best known lower bound was $\exp(Ω(k))$. It is known that $n\geq 2k-1$ is necessary and su…
▽ More
We consider the problem of identifying, from statistics, a distribution of discrete random variables $X_1,\ldots,X_n$ that is a mixture of $k$ product distributions. The best previous sample complexity for $n \in O(k)$ was $(1/ζ)^{O(k^2 \log k)}$ (under a mild separation assumption parameterized by $ζ$). The best known lower bound was $\exp(Ω(k))$. It is known that $n\geq 2k-1$ is necessary and sufficient for identification. We show, for any $n\geq 2k-1$, how to achieve sample complexity and run-time complexity $(1/ζ)^{O(k)}$. We also extend the known lower bound of $e^{Ω(k)}$ to match our upper bound across a broad range of $ζ$. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Error Localization for Sequential Effect Systems (Extended Version)
Authors:
Colin S. Gordon,
Chaewon Yun
Abstract:
We describe a new concrete approach to giving predictable error locations for sequential (flow-sensitive) effect systems. Prior implementations of sequential effect systems rely on either computing a bottom-up effect and comparing it to a declaration (e.g., method annotation) or leaning on constraint-based type inference. These approaches do not necessarily report program locations that precisely…
▽ More
We describe a new concrete approach to giving predictable error locations for sequential (flow-sensitive) effect systems. Prior implementations of sequential effect systems rely on either computing a bottom-up effect and comparing it to a declaration (e.g., method annotation) or leaning on constraint-based type inference. These approaches do not necessarily report program locations that precisely indicate where a program may "go wrong" at runtime.
Instead of relying on constraint solving, we draw on the notion of a residual from literature on ordered algebraic structures. Applying these to effect quantales (a large class of sequential effect systems) yields an implementation approach which accepts exactly the same program as an original effect quantale, but for effect-incorrect programs is guaranteed to fail type-checking with predictable error locations tied to evaluation order. We have implemented this idea in a generic effect system implementation framework for Java, and report on experiences applying effect systems from the literature and novel effect systems to Java programs. We find that the reported error locations with our technique are significantly closer to the program points that lead to failed effect checks.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Modal Abstractions for Virtualizing Memory Addresses
Authors:
Ismail Kuru,
Colin S. Gordon
Abstract:
Operating system kernels employ virtual memory subsystems, which use a CPU's memory management units (MMUs) to virtualize the addresses of memory regions Operating systems manipulate these virtualized memory mappings to isolate untrusted processes, restrict which memory is accessible to different processes, hide memory limits from user programs, ensure process isolation, implement demand-paging an…
▽ More
Operating system kernels employ virtual memory subsystems, which use a CPU's memory management units (MMUs) to virtualize the addresses of memory regions Operating systems manipulate these virtualized memory mappings to isolate untrusted processes, restrict which memory is accessible to different processes, hide memory limits from user programs, ensure process isolation, implement demand-paging and copy-on-write behaviors for performance and resource controls.
Virtual memory management (VMM) code is a critical piece of general-purpose OS kernels, but verification of this functionality is challenging due to the complexity of the hardware interface. In this paper, we introduce a modal abstraction to describe the truth of assertions relative to a specific virtual address space: [r]P indicating that P holds in the virtual address space rooted at r. Such modal assertions allow different address spaces to refer to each other, enabling complete verification of instruction sequences manipulating multiple address spaces. Using them effectively requires working with other assertions, such as points-to assertions in our separation logic, as relative to a given address space. We therefore define virtual points-to relations, which mimic hardware address translation, relative to a page table root. We demonstrate our approach with challenging fragments of VMM code showing that our approach handles examples beyond what prior work can address, including reasoning about a sequence of instructions as it changes address spaces. All definitions and theorems mentioned in this paper including the operational model of a RISC-like fragment of x86-64, a simple language run on this operational model, and a logic as an instantiation of the Iris framework are mechanized inside Coq.
△ Less
Submitted 14 September, 2024; v1 submitted 26 July, 2023;
originally announced July 2023.
-
Decoding Neural Activity to Assess Individual Latent State in Ecologically Valid Contexts
Authors:
Stephen M. Gordon,
Jonathan R. McDaniel,
Kevin W. King,
Vernon J. Lawhern,
Jonathan Touryan
Abstract:
There exist very few ways to isolate cognitive processes, historically defined via highly controlled laboratory studies, in more ecologically valid contexts. Specifically, it remains unclear as to what extent patterns of neural activity observed under such constraints actually manifest outside the laboratory in a manner that can be used to make an accurate inference about the latent state, associa…
▽ More
There exist very few ways to isolate cognitive processes, historically defined via highly controlled laboratory studies, in more ecologically valid contexts. Specifically, it remains unclear as to what extent patterns of neural activity observed under such constraints actually manifest outside the laboratory in a manner that can be used to make an accurate inference about the latent state, associated cognitive process, or proximal behavior of the individual. Improving our understanding of when and how specific patterns of neural activity manifest in ecologically valid scenarios would provide validation for laboratory-based approaches that study similar neural phenomena in isolation and meaningful insight into the latent states that occur during complex tasks. We argue that domain generalization methods from the brain-computer interface community have the potential to address this challenge. We previously used such an approach to decode phasic neural responses associated with visual target discrimination. Here, we extend that work to more tonic phenomena such as internal latent states. We use data from two highly controlled laboratory paradigms to train two separate domain-generalized models. We apply the trained models to an ecologically valid paradigm in which participants performed multiple, concurrent driving-related tasks. Using the pretrained models, we derive estimates of the underlying latent state and associated patterns of neural activity. Importantly, as the patterns of neural activity change along the axis defined by the original training data, we find changes in behavior and task performance consistent with the observations from the original, laboratory paradigms. We argue that these results lend ecological validity to those experimental designs and provide a methodology for understanding the relationship between observed neural activity and behavior during complex tasks.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Preprocessing Source Code Comments for Linguistic Models
Authors:
Sergey Matskevich,
Colin S. Gordon
Abstract:
Comments are an important part of the source code and are a primary source of documentation. This has driven interest in using large bodies of comments to train or evaluate tools that consume or produce them -- such as generating oracles or even code from comments, or automatically generating code summaries. Most of this work makes strong assumptions about the structure and quality of comments, su…
▽ More
Comments are an important part of the source code and are a primary source of documentation. This has driven interest in using large bodies of comments to train or evaluate tools that consume or produce them -- such as generating oracles or even code from comments, or automatically generating code summaries. Most of this work makes strong assumptions about the structure and quality of comments, such as assuming they consist mostly of proper English sentences. However, we know little about the actual quality of existing comments for these use cases. Comments often contain unique structures and elements that are not seen in other types of text, and filtering or extracting information from them requires some extra care. This paper explores the contents and quality of Python comments drawn from 840 most popular open source projects from GitHub and 8422 projects from SriLab dataset, and the impact of naïve vs. in-depth filtering can have on the use of existing comments for training and evaluation of systems that generate comments.
△ Less
Submitted 26 August, 2022; v1 submitted 23 August, 2022;
originally announced August 2022.
-
Co-creation and ownership for AI radio
Authors:
Skylar Gordon,
Robert Mahari,
Manaswi Mishra,
Ziv Epstein
Abstract:
Recent breakthroughs in AI-generated music open the door for new forms for co-creation and co-creativity. We present Artificial$.\!$fm, a proof-of-concept casual creator that blends AI-music generation, subjective ratings, and personalized recommendation for the creation and curation of AI-generated music. Listeners can rate emergent songs to steer the evolution of future music. They can also pers…
▽ More
Recent breakthroughs in AI-generated music open the door for new forms for co-creation and co-creativity. We present Artificial$.\!$fm, a proof-of-concept casual creator that blends AI-music generation, subjective ratings, and personalized recommendation for the creation and curation of AI-generated music. Listeners can rate emergent songs to steer the evolution of future music. They can also personalize their preferences to better navigate the possibility space. As a "slow creator" with many human stakeholders, Artificial$.\!$fm is an example of how casual creators can leverage human curation at scale to collectively navigate a possibility space. It also provides a case study to reflect on how ownership should be considered in these contexts. We report on the design and development of Artificial$.\!$fm, and provide a legal analysis on the ownership of artifacts generated on the platform.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Natural Language Specifications in Proof Assistants
Authors:
Colin S. Gordon,
Sergey Matskevich
Abstract:
Interactive proof assistants are computer programs carefully constructed to check a human-designed proof of a mathematical claim with high confidence in the implementation. However, this only validates truth of a formal claim, which may have been mistranslated from a claim made in natural language. This is especially problematic when using proof assistants to formally verify the correctness of sof…
▽ More
Interactive proof assistants are computer programs carefully constructed to check a human-designed proof of a mathematical claim with high confidence in the implementation. However, this only validates truth of a formal claim, which may have been mistranslated from a claim made in natural language. This is especially problematic when using proof assistants to formally verify the correctness of software with respect to a natural language specification. The translation from informal to formal remains a challenging, time-consuming process that is difficult to audit for correctness. This paper argues that it is possible to build support for natural language specifications within existing proof assistants, in a way that complements the principles used to establish trust and auditability in proof assistants themselves.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
2021 BEETL Competition: Advancing Transfer Learning for Subject Independence & Heterogenous EEG Data Sets
Authors:
Xiaoxi Wei,
A. Aldo Faisal,
Moritz Grosse-Wentrup,
Alexandre Gramfort,
Sylvain Chevallier,
Vinay Jayaram,
Camille Jeunet,
Stylianos Bakas,
Siegfried Ludwig,
Konstantinos Barmpas,
Mehdi Bahri,
Yannis Panagakis,
Nikolaos Laskaris,
Dimitrios A. Adamos,
Stefanos Zafeiriou,
William C. Duong,
Stephen M. Gordon,
Vernon J. Lawhern,
Maciej Śliwowski,
Vincent Rouanne,
Piotr Tempczyk
Abstract:
Transfer learning and meta-learning offer some of the most promising avenues to unlock the scalability of healthcare and consumer technologies driven by biosignal data. This is because current methods cannot generalise well across human subjects' data and handle learning from different heterogeneously collected data sets, thus limiting the scale of training data. On the other side, developments in…
▽ More
Transfer learning and meta-learning offer some of the most promising avenues to unlock the scalability of healthcare and consumer technologies driven by biosignal data. This is because current methods cannot generalise well across human subjects' data and handle learning from different heterogeneously collected data sets, thus limiting the scale of training data. On the other side, developments in transfer learning would benefit significantly from a real-world benchmark with immediate practical application. Therefore, we pick electroencephalography (EEG) as an exemplar for what makes biosignal machine learning hard. We design two transfer learning challenges around diagnostics and Brain-Computer-Interfacing (BCI), that have to be solved in the face of low signal-to-noise ratios, major variability among subjects, differences in the data recording sessions and techniques, and even between the specific BCI tasks recorded in the dataset. Task 1 is centred on the field of medical diagnostics, addressing automatic sleep stage annotation across subjects. Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets. The BEETL competition with its over 30 competing teams and its 3 winning entries brought attention to the potential of deep transfer learning and combinations of set theory and conventional machine learning techniques to overcome the challenges. The results set a new state-of-the-art for the real-world BEETL benchmark.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Towards Property-Based Tests in Natural Language
Authors:
Colin S. Gordon
Abstract:
We consider a new approach to generate tests from natural language. Rather than relying on machine learning or templated extraction from structured comments, we propose to apply classic ideas from linguistics to translate natural-language sentences into executable tests. This paper explores the application of combinatory categorial grammars (CCGs) to generating property-based tests. Our prototype…
▽ More
We consider a new approach to generate tests from natural language. Rather than relying on machine learning or templated extraction from structured comments, we propose to apply classic ideas from linguistics to translate natural-language sentences into executable tests. This paper explores the application of combinatory categorial grammars (CCGs) to generating property-based tests. Our prototype is able to generate tests from English descriptions for each example in a textbook chapter on property-based testing.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Causal Inference Despite Limited Global Confounding via Mixture Models
Authors:
Spencer L. Gordon,
Bijan Mazaheri,
Yuval Rabani,
Leonard J. Schulman
Abstract:
A Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random variables (the vertices); a Bayesian Network Distribution (BND) is a probability distribution on the random variables that is Markovian on the graph. A finite $k$-mixture of such models is graphically represented by a larger graph which has an additional ``hidden'' (or ``latent'') random variable $U$, ranging in…
▽ More
A Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random variables (the vertices); a Bayesian Network Distribution (BND) is a probability distribution on the random variables that is Markovian on the graph. A finite $k$-mixture of such models is graphically represented by a larger graph which has an additional ``hidden'' (or ``latent'') random variable $U$, ranging in $\{1,\ldots,k\}$, and a directed edge from $U$ to every other vertex. Models of this type are fundamental to causal inference, where $U$ models an unobserved confounding effect of multiple populations, obscuring the causal relationships in the observable DAG. By solving the mixture problem and recovering the joint probability distribution with $U$, traditionally unidentifiable causal relationships become identifiable. Using a reduction to the more well-studied ``product'' case on empty graphs, we give the first algorithm to learn mixtures of non-empty DAGs.
△ Less
Submitted 31 May, 2023; v1 submitted 21 December, 2021;
originally announced December 2021.
-
On the computational equivalence of co-NP refutations of a matrix being a P-matrix
Authors:
Spencer Gordon,
Kevin Shu
Abstract:
A P-matrix is a square matrix $X$ such that all principal submatrices of $X$ have positive determinant.
Such matrices appear naturally in instances of the linear complementarity problem, where these are precisely the matrices for which the corresponding linear complementarity problem has a unique solution for any input vector.
Testing whether or not a square matrix is a P-matrix is co-NP compl…
▽ More
A P-matrix is a square matrix $X$ such that all principal submatrices of $X$ have positive determinant.
Such matrices appear naturally in instances of the linear complementarity problem, where these are precisely the matrices for which the corresponding linear complementarity problem has a unique solution for any input vector.
Testing whether or not a square matrix is a P-matrix is co-NP complete, so while it is possible to exhibit polynomially-sized witnesses for the fact that a matrix is not a P-matrix, it is believed that there is no efficient way to prove that a given matrix is a P-matrix.
We will show that several well known witnesses for the fact that a matrix is not a P-matrix are computationally equivalent, so that we are able to convert between them in polynomial time, answering a question raised in arXiv:1811.03841 .
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Compressed Oblivious Encoding for Homomorphically Encrypted Search
Authors:
Seung Geol Choi,
Dana Dachman-Soled,
S. Dov Gordon,
Linsheng Liu,
Arkady Yerukhimovich
Abstract:
Fully homomorphic encryption (FHE) enables a simple, attractive framework for secure search. Compared to other secure search systems, no costly setup procedure is necessary; it is sufficient for the client merely to upload the encrypted database to the server. Confidentiality is provided because the server works only on the encrypted query and records. While the search functionality is enabled by…
▽ More
Fully homomorphic encryption (FHE) enables a simple, attractive framework for secure search. Compared to other secure search systems, no costly setup procedure is necessary; it is sufficient for the client merely to upload the encrypted database to the server. Confidentiality is provided because the server works only on the encrypted query and records. While the search functionality is enabled by the full homomorphism of the encryption scheme.
For this reason, researchers have been paying increasing attention to this problem. Since Akavia et al. (CCS 2018) presented a framework for secure search on FHE encrypted data and gave a working implementation called SPiRiT, several more efficient realizations have been proposed.
In this paper, we identify the main bottlenecks of this framework and show how to significantly improve the performance of FHE-base secure search. In particular,
1. To retrieve $\ell$ matching items, the existing framework needs to repeat the protocol $\ell$ times sequentially. In our new framework, all matching items are retrieved in parallel in a single protocol execution.
2. The most recent work by Wren et al. (CCS 2020) requires $O(n)$ multiplications to compute the first matching index. Our solution requires no homomorphic multiplication, instead using only additions and scalar multiplications to encode all matching indices.
3. Our implementation and experiments show that to fetch 16 matching records, our system gives an 1800X speed-up over the state of the art in fetching the query results resulting in a 26X speed-up for the full search functionality.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
First-Generation Inference Accelerator Deployment at Facebook
Authors:
Michael Anderson,
Benny Chen,
Stephen Chen,
Summer Deng,
Jordan Fix,
Michael Gschwind,
Aravind Kalaiah,
Changkyu Kim,
Jaewon Lee,
Jason Liang,
Haixin Liu,
Yinghai Lu,
Jack Montgomery,
Arun Moorthy,
Satish Nadathur,
Sam Naghshineh,
Avinash Nayak,
Jongsoo Park,
Chris Petersen,
Martin Schatz,
Narayanan Sundaram,
Bangsheng Tang,
Peter Tang,
Amy Yang,
Jiecao Yu
, et al. (90 additional authors not shown)
Abstract:
In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the in…
▽ More
In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the inference accelerator platform ecosystem we developed and deployed at Facebook: both hardware, through Open Compute Platform (OCP), and software framework and tooling, through Pytorch/Caffe2/Glow. A characteristic of this ecosystem from the start is its openness to enable a variety of AI accelerators from different vendors. This platform, with six low-power accelerator cards alongside a single-socket host CPU, allows us to serve models of high complexity that cannot be easily or efficiently run on CPUs. We describe various performance optimizations, at both platform and accelerator level, which enables this platform to serve production traffic at Facebook. We also share deployment challenges, lessons learned during performance optimization, as well as provide guidance for future inference hardware co-design.
△ Less
Submitted 4 August, 2021; v1 submitted 8 July, 2021;
originally announced July 2021.
-
Markov Cricket: Using Forward and Inverse Reinforcement Learning to Model, Predict And Optimize Batting Performance in One-Day International Cricket
Authors:
Manohar Vohra,
George S. D. Gordon
Abstract:
In this paper, we model one-day international cricket games as Markov processes, applying forward and inverse Reinforcement Learning (RL) to develop three novel tools for the game. First, we apply Monte-Carlo learning to fit a nonlinear approximation of the value function for each state of the game using a score-based reward model. We show that, when used as a proxy for remaining scoring resources…
▽ More
In this paper, we model one-day international cricket games as Markov processes, applying forward and inverse Reinforcement Learning (RL) to develop three novel tools for the game. First, we apply Monte-Carlo learning to fit a nonlinear approximation of the value function for each state of the game using a score-based reward model. We show that, when used as a proxy for remaining scoring resources, this approach outperforms the state-of-the-art Duckworth-Lewis-Stern method used in professional matches by 3 to 10 fold. Next, we use inverse reinforcement learning, specifically a variant of guided-cost learning, to infer a linear model of rewards based on expert performances, assumed here to be play sequences of winning teams. From this model we explicitly determine the optimal policy for each state and find this agrees with common intuitions about the game. Finally, we use the inferred reward models to construct a game simulator that models the posterior distribution of final scores under different policies. We envisage our prediction and simulation techniques may provide a fairer alternative for estimating final scores in interrupted games, while the inferred reward model may provide useful insights for the professional game to optimize playing strategy. Further, we anticipate our method of applying RL to this game may have broader application to other sports with discrete states of play where teams take turns, such as baseball and rounders.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Hadamard Extensions and the Identification of Mixtures of Product Distributions
Authors:
Spencer L. Gordon,
Leonard J. Schulman
Abstract:
The Hadamard Extension of a matrix is the matrix consisting of all Hadamard products of subsets of its rows. This construction arises in the context of identifying a mixture of product distributions on binary random variables: full column rank of such extensions is a necessary ingredient of identification algorithms. We provide several results concerning when a Hadamard Extension has full column r…
▽ More
The Hadamard Extension of a matrix is the matrix consisting of all Hadamard products of subsets of its rows. This construction arises in the context of identifying a mixture of product distributions on binary random variables: full column rank of such extensions is a necessary ingredient of identification algorithms. We provide several results concerning when a Hadamard Extension has full column rank.
△ Less
Submitted 12 February, 2021; v1 submitted 27 January, 2021;
originally announced January 2021.
-
Source Identification for Mixtures of Product Distributions
Authors:
Spencer L. Gordon,
Bijan Mazaheri,
Yuval Rabani,
Leonard J. Schulman
Abstract:
We give an algorithm for source identification of a mixture of $k$ product distributions on $n$ bits. This is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments (derived, for instance, from a sufficiently large sample), using $2^{O(k^2)} n^{O(k)}$ ari…
▽ More
We give an algorithm for source identification of a mixture of $k$ product distributions on $n$ bits. This is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments (derived, for instance, from a sufficiently large sample), using $2^{O(k^2)} n^{O(k)}$ arithmetic operations. Our result is the first explicit bound on the computational complexity of source identification of such mixtures. The running time improves previous results by Feldman, O'Donnell, and Servedio (FOCS 2005) and Chen and Moitra (STOC 2019) that guaranteed only learning the mixture (without parametric identification of the source). Our analysis gives a quantitative version of a qualitative characterization of identifiable sources that is due to Tahmasebi, Motahari, and Maddah-Ali (ISIT 2018).
△ Less
Submitted 28 December, 2020;
originally announced December 2020.
-
Interpolating GANs to Scaffold Autotelic Creativity
Authors:
Ziv Epstein,
Océane Boulais,
Skylar Gordon,
Matt Groh
Abstract:
The latent space modeled by generative adversarial networks (GANs) represents a large possibility space. By interpolating categories generated by GANs, it is possible to create novel hybrid images. We present "Meet the Ganimals," a casual creator built on interpolations of BigGAN that can generate novel, hybrid animals called ganimals by efficiently searching this possibility space. Like tradition…
▽ More
The latent space modeled by generative adversarial networks (GANs) represents a large possibility space. By interpolating categories generated by GANs, it is possible to create novel hybrid images. We present "Meet the Ganimals," a casual creator built on interpolations of BigGAN that can generate novel, hybrid animals called ganimals by efficiently searching this possibility space. Like traditional casual creators, the system supports a simple creative flow that encourages rapid exploration of the possibility space. Users can discover new ganimals, create their own, and share their reactions to aesthetic, emotional, and morphological characteristics of the ganimals. As users provide input to the system, the system adapts and changes the distribution of categories upon which ganimals are generated. As one of the first GAN-based casual creators, Meet the Ganimals is an example how casual creators can leverage human curation and citizen science to discover novel artifacts within a large possibility space.
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
The Sparse Hausdorff Moment Problem, with Application to Topic Models
Authors:
Spencer Gordon,
Bijan Mazaheri,
Leonard J. Schulman,
Yuval Rabani
Abstract:
We consider the problem of identifying, from its first $m$ noisy moments, a probability distribution on $[0,1]$ of support $k<\infty$. This is equivalent to the problem of learning a distribution on $m$ observable binary random variables $X_1,X_2,\dots,X_m$ that are iid conditional on a hidden random variable $U$ taking values in $\{1,2,\dots,k\}$. Our focus is on accomplishing this with $m=2k$, w…
▽ More
We consider the problem of identifying, from its first $m$ noisy moments, a probability distribution on $[0,1]$ of support $k<\infty$. This is equivalent to the problem of learning a distribution on $m$ observable binary random variables $X_1,X_2,\dots,X_m$ that are iid conditional on a hidden random variable $U$ taking values in $\{1,2,\dots,k\}$. Our focus is on accomplishing this with $m=2k$, which is the minimum $m$ for which verifying that the source is a $k$-mixture is possible (even with exact statistics). This problem, so simply stated, is quite useful: e.g., by a known reduction, any algorithm for it lifts to an algorithm for learning pure topic models.
We give an algorithm for identifying a $k$-mixture using samples of $m=2k$ iid binary random variables using a sample of size $\left(1/w_{\min}\right)^2 \cdot\left(1/ζ\right)^{O(k)}$ and post-sampling runtime of only $O(k^{2+o(1)})$ arithmetic operations. Here $w_{\min}$ is the minimum probability of an outcome of $U$, and $ζ$ is the minimum separation between the distinct success probabilities of the $X_i$s. Stated in terms of the moment problem, it suffices to know the moments to additive accuracy $w_{\min}\cdotζ^{O(k)}$. It is known that the sample complexity of any solution to the identification problem must be at least exponential in $k$. Previous results demonstrated either worse sample complexity and worse $O(k^c)$ runtime for some $c$ substantially larger than $2$, or similar sample complexity and much worse $k^{O(k^2)}$ runtime.
△ Less
Submitted 7 September, 2020; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Designing with Static Capabilities and Effects: Use, Mention, and Invariants
Authors:
Colin S. Gordon
Abstract:
Capabilities (whether object or reference capabilities) are fundamentally tools to restrict effects. Thus static capabilities (object or reference) and effect systems take different technical machinery to the same core problem of statically restricting or reasoning about effects in programs. Any time two approaches can in principle address the same sets of problems, it becomes important to underst…
▽ More
Capabilities (whether object or reference capabilities) are fundamentally tools to restrict effects. Thus static capabilities (object or reference) and effect systems take different technical machinery to the same core problem of statically restricting or reasoning about effects in programs. Any time two approaches can in principle address the same sets of problems, it becomes important to understand the trade-offs between the approaches, how these trade-offs might interact with the problem at hand.
Experts who have worked in these areas tend to find the trade-offs somewhat obvious, having considered them in context before. However, this kind of design discussion is often written down only implicitly as comparison between two approaches for a specific program reasoning problem, rather than as a discussion of general trade-offs between general classes of techniques. As a result, it is not uncommon to set out to solve a problem with one technique, only to find the other better-suited.
We discuss the trade-offs between static capabilities (specifically reference capabilities) and effect systems, articulating the challenges each approach tends to have in isolation, and how these are sometimes mitigated. We also put our discussion in context, by appealing to examples of how these trade-offs were considered in the course of developing prior systems in the area. Along the way, we highlight how seemingly-minor aspects of type systems -- weakening/framing and the mere existence of type contexts -- play a subtle role in the efficacy of these systems.
△ Less
Submitted 26 May, 2020; v1 submitted 22 May, 2020;
originally announced May 2020.
-
Sequential Effect Systems with Control Operators
Authors:
Colin S. Gordon
Abstract:
Sequential effect systems are a class of effect system that exploits information about program order, rather than discarding it as traditional commutative effect systems do. This extra expressive power allows effect systems to reason about behavior over time, capturing properties such as atomicity, unstructured lock ownership, or even general safety properties. While we now understand the essentia…
▽ More
Sequential effect systems are a class of effect system that exploits information about program order, rather than discarding it as traditional commutative effect systems do. This extra expressive power allows effect systems to reason about behavior over time, capturing properties such as atomicity, unstructured lock ownership, or even general safety properties. While we now understand the essential denotational (categorical) models fairly well, application of these ideas to real software is hampered by the sheer variety of source level control flow constructs in real languages. Denotational approaches are general enough to accommodate any particular control flow construct, but provide no guidance on the details, let alone applications.
We address this new problem by appeal to a classic idea: macro-expression of commonly-used programming constructs in terms of control operators. We give an effect system for a subset of Racket's tagged delimited control operators, as a lifting of an effect system for a language without direct control operators. This gives the first account of sequential effects in the presence of general control operators. Using this system, we also re-derive the sequential effect system rules for control flow constructs previously shown sound directly, and derive sequential effect rules for new constructs not previously studied in the context of source-level sequential effect systems. This offers a way to directly extend source-level support for sequential effect systems to real programming languages.
△ Less
Submitted 15 May, 2020; v1 submitted 29 November, 2018;
originally announced November 2018.
-
Safe Deferred Memory Reclamation with Types
Authors:
Ismail Kuru,
Colin S. Gordon
Abstract:
Memory management in lock-free data structures remains a major challenge in concurrent programming. Design techniques including read-copy-update (RCU) and hazard pointers provide workable solutions, and are widely used to great effect. These techniques rely on the concept of a grace period: nodes that should be freed are not deallocated immediately, and all threads obey a protocol to ensure that t…
▽ More
Memory management in lock-free data structures remains a major challenge in concurrent programming. Design techniques including read-copy-update (RCU) and hazard pointers provide workable solutions, and are widely used to great effect. These techniques rely on the concept of a grace period: nodes that should be freed are not deallocated immediately, and all threads obey a protocol to ensure that the deallocating thread can detect when all possible readers have completed their use of the object. This provides an approach to safe deallocation, but only when these subtle protocols are implemented correctly.
We present a static type system to ensure the correct use of RCU memory management: that nodes removed from a data structure are always scheduled for subsequent deallocation, and that nodes are scheduled for deallocation at most once. As part of our soundness proof, we give an abstract semantics for RCU memory management primitives which captures the fundamental properties of RCU. Our type system allows us to give the first proofs of memory safety for RCU linked list and binary search tree implementations without requiring full verification.
△ Less
Submitted 18 February, 2019; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Unique End of Potential Line
Authors:
John Fearnley,
Spencer Gordon,
Ruta Mehta,
Rahul Savani
Abstract:
This paper studies the complexity of problems in PPAD $\cap$ PLS that have unique solutions. Three well-known examples of such problems are the problem of finding a fixpoint of a contraction map, finding the unique sink of a Unique Sink Orientation (USO), and solving the P-matrix Linear Complementarity Problem (P-LCP). Each of these are promise-problems, and when the promise holds, they always pos…
▽ More
This paper studies the complexity of problems in PPAD $\cap$ PLS that have unique solutions. Three well-known examples of such problems are the problem of finding a fixpoint of a contraction map, finding the unique sink of a Unique Sink Orientation (USO), and solving the P-matrix Linear Complementarity Problem (P-LCP). Each of these are promise-problems, and when the promise holds, they always possess unique solutions.
We define the complexity class UEOPL to capture problems of this type. We first define a class that we call EOPL, which consists of all problems that can be reduced to End-of-Potential-Line. This problem merges the canonical PPAD-complete problem End-of-Line, with the canonical PLS-complete problem Sink-of-Dag, and so EOPL captures problems that can be solved by a line-following algorithm that also simultaneously decreases a potential function.
Promise-UEOPL is a promise-subclass of EOPL in which the line in the End-of-Potential-Line instance is guaranteed to be unique via a promise. We turn this into a non-promise class UEOPL, by adding an extra solution type to EOPL that captures any pair of points that are provably on two different lines.
We show that UEOPL $\subseteq$ EOPL $\subseteq$ CLS, and that all of our motivating problems are contained in UEOPL: specifically USO, P-LCP, and finding a fixpoint of a Piecewise-Linear Contraction under an $\ell_p$-norm all lie in UEOPL. Our results also imply that parity games, mean-payoff games, discounted games, and simple-stochastic games lie in UEOPL.
All of our containment results are proved via a reduction to a problem that we call One-Permutation Discrete Contraction (OPDC). This problem is motivated by a discretized version of contraction, but it is also closely related to the USO problem. We show that OPDC lies in UEOPL, and we are also able to show that OPDC is UEOPL-complete.
△ Less
Submitted 9 November, 2018;
originally announced November 2018.
-
Synthesizing Program-Specific Static Analyses
Authors:
Colin S. Gordon
Abstract:
Designing a static analysis is generally a substantial undertaking, requiring significant expertise in both program analysis and the domain of the program analysis, and significant development resources. As a result, most program analyses target properties that are universallly of interest (e.g., absence of null pointer dereference) or nearly so (e.g., deadlock freedom). However, many interesting…
▽ More
Designing a static analysis is generally a substantial undertaking, requiring significant expertise in both program analysis and the domain of the program analysis, and significant development resources. As a result, most program analyses target properties that are universallly of interest (e.g., absence of null pointer dereference) or nearly so (e.g., deadlock freedom). However, many interesting program properties that would benefit from static checking are specific to individual programs, or sometimes programs utilizing a certain library. It is impractical to devote program analysis and verification experts to these problems.
We propose instead to work on example-based synthesis of program analyses within well-understood domains like type qualifier systems and effect systems. The dynamic behaviors behind the classes of problems these systems prevent correspond to examples that developers who lack expertise in static analysis can readily provide (data flow paths, or stack traces).
△ Less
Submitted 15 October, 2018;
originally announced October 2018.
-
Generating Comments From Source Code with CCGs
Authors:
Sergey Matskevich,
Colin S. Gordon
Abstract:
Good comments help developers understand software faster and provide better maintenance. However, comments are often missing, generally inaccurate, or out of date. Many of these problems can be avoided by automatic comment generation. This paper presents a method to generate informative comments directly from the source code using general-purpose techniques from natural language processing. We gen…
▽ More
Good comments help developers understand software faster and provide better maintenance. However, comments are often missing, generally inaccurate, or out of date. Many of these problems can be avoided by automatic comment generation. This paper presents a method to generate informative comments directly from the source code using general-purpose techniques from natural language processing. We generate comments using an existing natural language model that couples words with their individual logical meaning and grammar rules, allowing comment generation to proceed by search from declarative descriptions of program text. We evaluate our algorithm on several classic algorithms implemented in Python.
△ Less
Submitted 15 October, 2018;
originally announced October 2018.
-
Polymorphic Iterable Sequential Effect Systems
Authors:
Colin S. Gordon
Abstract:
Effect systems are lightweight extensions to type systems that can verify a wide range of important properties with modest developer burden. But our general understanding of effect systems is limited primarily to systems where the order of effects is irrelevant. Understanding such systems in terms of a semilattice of effects grounds understanding of the essential issues, and provides guidance when…
▽ More
Effect systems are lightweight extensions to type systems that can verify a wide range of important properties with modest developer burden. But our general understanding of effect systems is limited primarily to systems where the order of effects is irrelevant. Understanding such systems in terms of a semilattice of effects grounds understanding of the essential issues, and provides guidance when designing new effect systems. By contrast, sequential effect systems -- where the order of effects is important -- lack an established algebraic structure on effects.
We present an abstract polymorphic effect system parameterized by an effect quantale -- an algebraic structure with well-defined properties that can model the effects of a range of existing sequential effect systems. We define effect quantales, derive useful properties, and show how they cleanly model a variety of known sequential effect systems.
We show that for most effect quantales, there is an induced notion of iterating a sequential effect; that for systems we consider the derived iteration agrees with the manually designed iteration operators in prior work; and that this induced notion of iteration is as precise as possible when defined. We also position effect quantales with respect to work on categorical semantics for sequential effect systems, clarifying the distinctions between these systems and our own in the course of giving a thorough survey of these frameworks. Our derived iteration construct should generalize to these semantic structures, addressing limitations of that work. Finally, we consider the relationship between sequential effects and Kleene Algebras, where the latter may be used as instances of the former.
△ Less
Submitted 15 July, 2021; v1 submitted 6 August, 2018;
originally announced August 2018.
-
Agreement Rate Initialized Maximum Likelihood Estimator for Ensemble Classifier Aggregation and Its Application in Brain-Computer Interface
Authors:
Dongrui Wu,
Vernon J. Lawhern,
Stephen Gordon,
Brent J. Lance,
Chin-Teng Lin
Abstract:
Ensemble learning is a powerful approach to construct a strong learner from multiple base learners. The most popular way to aggregate an ensemble of classifiers is majority voting, which assigns a sample to the class that most base classifiers vote for. However, improved performance can be obtained by assigning weights to the base classifiers according to their accuracy. This paper proposes an agr…
▽ More
Ensemble learning is a powerful approach to construct a strong learner from multiple base learners. The most popular way to aggregate an ensemble of classifiers is majority voting, which assigns a sample to the class that most base classifiers vote for. However, improved performance can be obtained by assigning weights to the base classifiers according to their accuracy. This paper proposes an agreement rate initialized maximum likelihood estimator (ARIMLE) to optimally fuse the base classifiers. ARIMLE first uses a simplified agreement rate method to estimate the classification accuracy of each base classifier from the unlabeled samples, then employs the accuracies to initialize a maximum likelihood estimator (MLE), and finally uses the expectation-maximization algorithm to refine the MLE. Extensive experiments on visually evoked potential classification in a brain-computer interface application show that ARIMLE outperforms majority voting, and also achieves better or comparable performance with several other state-of-the-art classifier combination approaches.
△ Less
Submitted 12 May, 2018;
originally announced May 2018.
-
Offline EEG-Based Driver Drowsiness Estimation Using Enhanced Batch-Mode Active Learning (EBMAL) for Regression
Authors:
Dongrui Wu,
Vernon J. Lawhern,
Stephen Gordon,
Brent J. Lance,
Chin-Teng Lin
Abstract:
There are many important regression problems in real-world brain-computer interface (BCI) applications, e.g., driver drowsiness estimation from EEG signals. This paper considers offline analysis: given a pool of unlabeled EEG epochs recorded during driving, how do we optimally select a small number of them to label so that an accurate regression model can be built from them to label the rest? Acti…
▽ More
There are many important regression problems in real-world brain-computer interface (BCI) applications, e.g., driver drowsiness estimation from EEG signals. This paper considers offline analysis: given a pool of unlabeled EEG epochs recorded during driving, how do we optimally select a small number of them to label so that an accurate regression model can be built from them to label the rest? Active learning is a promising solution to this problem, but interestingly, to our best knowledge, it has not been used for regression problems in BCI so far. This paper proposes a novel enhanced batch-mode active learning (EBMAL) approach for regression, which improves upon a baseline active learning algorithm by increasing the reliability, representativeness and diversity of the selected samples to achieve better regression performance. We validate its effectiveness using driver drowsiness estimation from EEG signals. However, EBMAL is a general approach that can also be applied to many other offline regression problems beyond BCI.
△ Less
Submitted 12 May, 2018;
originally announced May 2018.
-
End of Potential Line
Authors:
John Fearnley,
Spencer Gordon,
Ruta Mehta,
Rahul Savani
Abstract:
We introduce the problem EndOfPotentialLine and the corresponding complexity class EOPL of all problems that can be reduced to it in polynomial time. This class captures problems that admit a single combinatorial proof of their joint membership in the complexity classes PPAD of fixpoint problems and PLS of local search problems. EOPL is a combinatorially-defined alternative to the class CLS (for C…
▽ More
We introduce the problem EndOfPotentialLine and the corresponding complexity class EOPL of all problems that can be reduced to it in polynomial time. This class captures problems that admit a single combinatorial proof of their joint membership in the complexity classes PPAD of fixpoint problems and PLS of local search problems. EOPL is a combinatorially-defined alternative to the class CLS (for Continuous Local Search), which was introduced in with the goal of capturing the complexity of some well-known problems in PPAD $\cap$ PLS that have resisted, in some cases for decades, attempts to put them in polynomial time. Two of these are Contraction, the problem of finding a fixpoint of a contraction map, and P-LCP, the problem of solving a P-matrix Linear Complementarity Problem.
We show that EndOfPotentialLine is in CLS via a two-way reduction to EndOfMeteredLine. The latter was defined in to show query and cryptographic lower bounds for CLS. Our two main results are to show that both PL-Contraction (Piecewise-Linear Contraction) and P-LCP are in EOPL. Our reductions imply that the promise versions of PL-Contraction and P-LCP are in the promise class UniqueEOPL, which corresponds to the case of a single potential line. This also shows that simple-stochastic, discounted, mean-payoff, and parity games are in EOPL.
Using the insights from our reduction for PL-Contraction, we obtain the first polynomial-time algorithms for finding fixed points of contraction maps in fixed dimension for any $\ell_p$ norm, where previously such algorithms were only known for the $\ell_2$ and $\ell_\infty$ norms. Our reduction from P-LCP to EndOfPotentialLine allows a technique of Aldous to be applied, which in turn gives the fastest-known randomized algorithm for the P-LCP.
△ Less
Submitted 18 April, 2018; v1 submitted 10 April, 2018;
originally announced April 2018.
-
An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor
Authors:
Vladimir Mironov,
Yuri Alexeev,
Kristopher Keipert,
Michael D'mello,
Alexander Moskovsky,
Mark S. Gordon
Abstract:
Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi processors. With 64 cores p…
▽ More
Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi processors. With 64 cores per processor, scaling numbers are reported on up to 192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory footprint by approximately 200 times compared to the legacy code. The MPI/OpenMP code was shown to run up to six times faster than the original for a range of molecular system sizes.
△ Less
Submitted 14 August, 2017; v1 submitted 31 July, 2017;
originally announced August 2017.
-
A Generic Approach to Flow-Sensitive Polymorphic Effects (Extended Version)
Authors:
Colin S. Gordon
Abstract:
Effect systems are lightweight extensions to type systems that can verify a wide range of important properties with modest developer burden. But our general understanding of effect systems is limited primarily to systems where the order of effects is irrelevant. Understanding such systems in terms of a lattice of effects grounds understanding of the essential issues, and provides guidance when des…
▽ More
Effect systems are lightweight extensions to type systems that can verify a wide range of important properties with modest developer burden. But our general understanding of effect systems is limited primarily to systems where the order of effects is irrelevant. Understanding such systems in terms of a lattice of effects grounds understanding of the essential issues, and provides guidance when designing new effect systems. By contrast, sequential effect systems --- where the order of effects is important --- lack a clear algebraic characterization.
We derive an algebraic characterization from the shape of prior concrete sequential effect systems. We present an abstract polymorphic effect system with singleton effects parameterized by an effect quantale --- an algebraic structure with well-defined properties that can model a range of existing order-sensitive effect systems. We define effect quantales, derive useful properties, and show how they cleanly model a variety of known sequential effect systems. We show that effect quantales provide a free, general notion of iterating a sequential effect, and that for systems we consider the derived iteration agrees with the manually designed iteration operators in prior work. Identifying and applying the right algebraic structure led us to subtle insights into the design of order-sensitive effect systems, which provides guidance on non-obvious points of designing order-sensitive effect systems. Effect quantales have clear relationships to the recent category theoretic work on order-sensitive effect systems, but are explained without recourse to category theory. In addition, our derived iteration construct should generalize to these semantic structures, addressing limitations of that work.
△ Less
Submitted 5 May, 2017;
originally announced May 2017.
-
EEG-Based User Reaction Time Estimation Using Riemannian Geometry Features
Authors:
Dongrui Wu,
Brent J. Lance,
Vernon J. Lawhern,
Stephen Gordon,
Tzyy-Ping Jung,
Chin-Teng Lin
Abstract:
Riemannian geometry has been successfully used in many brain-computer interface (BCI) classification problems and demonstrated superior performance. In this paper, for the first time, it is applied to BCI regression problems, an important category of BCI applications. More specifically, we propose a new feature extraction approach for Electroencephalogram (EEG) based BCI regression problems: a spa…
▽ More
Riemannian geometry has been successfully used in many brain-computer interface (BCI) classification problems and demonstrated superior performance. In this paper, for the first time, it is applied to BCI regression problems, an important category of BCI applications. More specifically, we propose a new feature extraction approach for Electroencephalogram (EEG) based BCI regression problems: a spatial filter is first used to increase the signal quality of the EEG trials and also to reduce the dimensionality of the covariance matrices, and then Riemannian tangent space features are extracted. We validate the performance of the proposed approach in reaction time estimation from EEG signals measured in a large-scale sustained-attention psychomotor vigilance task, and show that compared with the traditional powerband features, the tangent space features can reduce the root mean square estimation error by 4.30-8.30%, and increase the estimation correlation coefficient by 6.59-11.13%.
△ Less
Submitted 27 April, 2017;
originally announced April 2017.
-
CLS: New Problems and Completeness
Authors:
John Fearnley,
Spencer Gordon,
Ruta Mehta,
Rahul Savani
Abstract:
The complexity class CLS was introduced by Daskalakis and Papadimitriou with the goal of capturing the complexity of some well-known problems in PPAD$~\cap~$PLS that have resisted, in some cases for decades, attempts to put them in polynomial time. No complete problem was known for CLS, and in previous work, the problems ContractionMap, i.e., the problem of finding an approximate fixpoint of a con…
▽ More
The complexity class CLS was introduced by Daskalakis and Papadimitriou with the goal of capturing the complexity of some well-known problems in PPAD$~\cap~$PLS that have resisted, in some cases for decades, attempts to put them in polynomial time. No complete problem was known for CLS, and in previous work, the problems ContractionMap, i.e., the problem of finding an approximate fixpoint of a contraction map, and PLCP, i.e., the problem of solving a P-matrix Linear Complementarity Problem, were identified as prime candidates.
First, we present a new CLS-complete problem MetaMetricContractionMap, which is closely related to the ContractionMap. Second, we introduce EndOfPotentialLine, which captures aspects of PPAD and PLS directly via a monotonic directed path, and show that EndOfPotentialLine is in CLS via a two-way reduction to EndOfMeteredLine. The latter was defined to keep track of how far a vertex is on the PPAD path via a restricted potential function. Third, we reduce PLCP to EndOfPotentialLine, thus making EndOfPotentialLine and EndOfMeteredLine at least as likely to be hard for CLS as PLCP. This last result leverages the monotonic structure of Lemke paths for PLCP problems, making EndOfPotentialLine a likely candidate to capture the exact complexity of PLCP; we note that the structure of Lemke-Howson paths for finding a Nash equilibrium in a two-player game very directly motivated the definition of the complexity class PPAD, which eventually ended up capturing this problem's complexity exactly.
△ Less
Submitted 7 April, 2017; v1 submitted 20 February, 2017;
originally announced February 2017.
-
Driver Drowsiness Estimation from EEG Signals Using Online Weighted Adaptation Regularization for Regression (OwARR)
Authors:
Dongrui Wu,
Vernon J. Lawhern,
Stephen Gordon,
Brent J. Lance,
Chin-Teng Lin
Abstract:
One big challenge that hinders the transition of brain-computer interfaces (BCIs) from laboratory settings to real-life applications is the availability of high-performance and robust learning algorithms that can effectively handle individual differences, i.e., algorithms that can be applied to a new subject with zero or very little subject-specific calibration data. Transfer learning and domain a…
▽ More
One big challenge that hinders the transition of brain-computer interfaces (BCIs) from laboratory settings to real-life applications is the availability of high-performance and robust learning algorithms that can effectively handle individual differences, i.e., algorithms that can be applied to a new subject with zero or very little subject-specific calibration data. Transfer learning and domain adaptation have been extensively used for this purpose. However, most previous works focused on classification problems. This paper considers an important regression problem in BCI, namely, online driver drowsiness estimation from EEG signals. By integrating fuzzy sets with domain adaptation, we propose a novel online weighted adaptation regularization for regression (OwARR) algorithm to reduce the amount of subject-specific calibration data, and also a source domain selection (SDS) approach to save about half of the computational cost of OwARR. Using a simulated driving dataset with 15 subjects, we show that OwARR and OwARR-SDS can achieve significantly smaller estimation errors than several other approaches. We also provide comprehensive analyses on the robustness of OwARR and OwARR-SDS.
△ Less
Submitted 9 February, 2017;
originally announced February 2017.
-
EEGNet: A Compact Convolutional Network for EEG-based Brain-Computer Interfaces
Authors:
Vernon J. Lawhern,
Amelia J. Solon,
Nicholas R. Waytowich,
Stephen M. Gordon,
Chou P. Hung,
Brent J. Lance
Abstract:
Brain computer interfaces (BCI) enable direct communication with a computer, using neural activity as the control signal. This neural signal is generally chosen from a variety of well-studied electroencephalogram (EEG) signals. For a given BCI paradigm, feature extractors and classifiers are tailored to the distinct characteristics of its expected EEG control signal, limiting its application to th…
▽ More
Brain computer interfaces (BCI) enable direct communication with a computer, using neural activity as the control signal. This neural signal is generally chosen from a variety of well-studied electroencephalogram (EEG) signals. For a given BCI paradigm, feature extractors and classifiers are tailored to the distinct characteristics of its expected EEG control signal, limiting its application to that specific signal. Convolutional Neural Networks (CNNs), which have been used in computer vision and speech recognition, have successfully been applied to EEG-based BCIs; however, they have mainly been applied to single BCI paradigms and thus it remains unclear how these architectures generalize to other paradigms. Here, we ask if we can design a single CNN architecture to accurately classify EEG signals from different BCI paradigms, while simultaneously being as compact as possible. In this work we introduce EEGNet, a compact convolutional network for EEG-based BCIs. We introduce the use of depthwise and separable convolutions to construct an EEG-specific model which encapsulates well-known EEG feature extraction concepts for BCI. We compare EEGNet to current state-of-the-art approaches across four BCI paradigms: P300 visual-evoked potentials, error-related negativity responses (ERN), movement-related cortical potentials (MRCP), and sensory motor rhythms (SMR). We show that EEGNet generalizes across paradigms better than the reference algorithms when only limited training data is available. We demonstrate three different approaches to visualize the contents of a trained EEGNet model to enable interpretation of the learned features. Our results suggest that EEGNet is robust enough to learn a wide variety of interpretable features over a range of BCI tasks, suggesting that the observed performances were not due to artifact or noise sources in the data.
△ Less
Submitted 15 May, 2018; v1 submitted 23 November, 2016;
originally announced November 2016.
-
Type Inference for Static Compilation of JavaScript (Extended Version)
Authors:
Satish Chandra,
Colin S. Gordon,
Jean-Baptiste Jeannin,
Cole Schlesinger,
Manu Sridharan,
Frank Tip,
Youngil Choi
Abstract:
We present a type system and inference algorithm for a rich subset of JavaScript equipped with objects, structural subtyping, prototype inheritance, and first-class methods. The type system supports abstract and recursive objects, and is expressive enough to accommodate several standard benchmarks with only minor workarounds. The invariants enforced by the types enable an ahead-of-time compiler to…
▽ More
We present a type system and inference algorithm for a rich subset of JavaScript equipped with objects, structural subtyping, prototype inheritance, and first-class methods. The type system supports abstract and recursive objects, and is expressive enough to accommodate several standard benchmarks with only minor workarounds. The invariants enforced by the types enable an ahead-of-time compiler to carry out optimizations typically beyond the reach of static compilers for dynamic languages. Unlike previous inference techniques for prototype inheritance, our algorithm uses a combination of lower and upper bound propagation to infer types and discover type errors in all code, including uninvoked functions. The inference is expressed in a simple constraint language, designed to leverage off-the-shelf fixed point solvers. We prove soundness for both the type system and inference algorithm. An experimental evaluation showed that the inference is powerful, handling the aforementioned benchmarks with no manual type annotation, and that the inferred types enable effective static compilation.
△ Less
Submitted 18 October, 2016; v1 submitted 25 August, 2016;
originally announced August 2016.
-
An Adaptation of Topic Modeling to Sentences
Authors:
Ruey-Cheng Chen,
Reid Swanson,
Andrew S. Gordon
Abstract:
Advances in topic modeling have yielded effective methods for characterizing the latent semantics of textual data. However, applying standard topic modeling approaches to sentence-level tasks introduces a number of challenges. In this paper, we adapt the approach of latent-Dirichlet allocation to include an additional layer for incorporating information about the sentence boundaries in documents.…
▽ More
Advances in topic modeling have yielded effective methods for characterizing the latent semantics of textual data. However, applying standard topic modeling approaches to sentence-level tasks introduces a number of challenges. In this paper, we adapt the approach of latent-Dirichlet allocation to include an additional layer for incorporating information about the sentence boundaries in documents. We show that the addition of this minimal information of document structure improves the perplexity results of a trained model.
△ Less
Submitted 20 July, 2016;
originally announced July 2016.