-
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches
Authors:
Daniel Fried,
Nicholas Tomlin,
Jennifer Hu,
Roma Patel,
Aida Nematzadeh
Abstract:
People rely heavily on context to enrich meaning beyond what is literally said, enabling concise but effective communication. To interact successfully and naturally with people, user-facing artificial intelligence systems will require similar skills in pragmatics: relying on various types of context -- from shared linguistic goals and conventions, to the visual and embodied world -- to use languag…
▽ More
People rely heavily on context to enrich meaning beyond what is literally said, enabling concise but effective communication. To interact successfully and naturally with people, user-facing artificial intelligence systems will require similar skills in pragmatics: relying on various types of context -- from shared linguistic goals and conventions, to the visual and embodied world -- to use language effectively. We survey existing grounded settings and pragmatic modeling approaches and analyze how the task goals, environmental contexts, and communicative affordances in each work enrich linguistic meaning. We present recommendations for future grounded task design to naturally elicit pragmatic phenomena, and suggest directions that focus on a broader range of communicative contexts and affordances.
△ Less
Submitted 21 November, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Contrastive Decoding: Open-ended Text Generation as Optimization
Authors:
Xiang Lisa Li,
Ari Holtzman,
Daniel Fried,
Percy Liang,
Jason Eisner,
Tatsunori Hashimoto,
Luke Zettlemoyer,
Mike Lewis
Abstract:
Given a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts from the original topics. We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint. The…
▽ More
Given a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts from the original topics. We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint. The contrastive objective returns the difference between the likelihood under a large LM (called the expert, e.g. OPT-13B) and a small LM (called the amateur, e.g. OPT-125M), and the constraint ensures that the outputs are plausible. CD is inspired by the fact that the failures of larger LMs (e.g., repetition, incoherence) are even more prevalent in smaller LMs, and that this difference signals which texts should be preferred. CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone. It also works across model scales (OPT-13B and GPT2-1.5B) and significantly outperforms four strong decoding algorithms (e.g., nucleus, top-k) in automatic and human evaluations across wikipedia, news and story domains.
△ Less
Submitted 10 July, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Authors:
Maarten Sap,
Ronan LeBras,
Daniel Fried,
Yejin Choi
Abstract:
Social intelligence and Theory of Mind (ToM), i.e., the ability to reason about the different mental states, intents, and reactions of all people involved, allow humans to effectively navigate and understand everyday social interactions. As NLP systems are used in increasingly complex social situations, their ability to grasp social dynamics becomes crucial. In this work, we examine the open quest…
▽ More
Social intelligence and Theory of Mind (ToM), i.e., the ability to reason about the different mental states, intents, and reactions of all people involved, allow humans to effectively navigate and understand everyday social interactions. As NLP systems are used in increasingly complex social situations, their ability to grasp social dynamics becomes crucial. In this work, we examine the open question of social intelligence and Theory of Mind in modern NLP systems from an empirical and theory-based perspective. We show that one of today's largest language models (GPT-3; Brown et al., 2020) lacks this kind of social intelligence out-of-the box, using two tasks: SocialIQa (Sap et al., 2019), which measures models' ability to understand intents and reactions of participants of social interactions, and ToMi (Le et al., 2019), which measures whether models can infer mental states and realities of participants of situations. Our results show that models struggle substantially at these Theory of Mind tasks, with well-below-human accuracies of 55% and 60% on SocialIQa and ToMi, respectively. To conclude, we draw on theories from pragmatics to contextualize this shortcoming of large language models, by examining the limitations stemming from their data, neural architecture, and training paradigms. Challenging the prevalent narrative that only scale is needed, we posit that person-centric NLP approaches might be more effective towards neural Theory of Mind.
In our updated version, we also analyze newer instruction tuned and RLFH models for neural ToM. We find that even ChatGPT and GPT-4 do not display emergent Theory of Mind; strikingly even GPT-4 performs only 60% accuracy on the ToMi questions related to mental states and realities.
△ Less
Submitted 3 April, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Taming Genus 0 (or 1) components on variables-separated equations
Authors:
Michael D. Fried
Abstract:
To figure properties of a curve of form $C_{f,g} = {(x,y)| f(x) - g(y)= 0}$ you must address the genus 0 and 1 components of its projective normalization $\tilde C_{f,g}$. For $f$ and $g$ polynomials with $f$ indecomposable, [Fr73a] distinguished $\tilde C_{f,g}$ with $u=1$ versus $u > 1$ components (Schinzel's problem). For $u = 1$, [Prop. 1, Fr73b] gave a direct genus formula. To complete…
▽ More
To figure properties of a curve of form $C_{f,g} = {(x,y)| f(x) - g(y)= 0}$ you must address the genus 0 and 1 components of its projective normalization $\tilde C_{f,g}$. For $f$ and $g$ polynomials with $f$ indecomposable, [Fr73a] distinguished $\tilde C_{f,g}$ with $u=1$ versus $u > 1$ components (Schinzel's problem). For $u = 1$, [Prop. 1, Fr73b] gave a direct genus formula. To complete $u > 1$ required an adhoc genus computation.
[Pak22] dropped the indecomposable and polynomial restrictions but added $\tilde C_{f,g}$ is irreducible ($u = 1$). He showed - for fixed $f$ - unless the Galois closure of the cover for $f$ has genus 0 or 1, the genus grows linearly in deg($g$). Method I and Method II extend [Prop. 1, Fr73b}] using Nielsen classes to generalize Pakovich's formulation for $u > 1$.
Method I plays on the covers $f$ and $g$ to the $z$-line, $P^1_z$, from which we compute the fiber product.
Method II uses the projection to the $y$-line, $P^1_y$, based on explicitly computing branch cycles for this cover.
Hurwitz families track the significance of these components. Expanding on [Prop. 2, Fr73a] shows how to approach Pakovich's problem. With no loss, start with ($f^*,g^*$) which have the same Galois closures, and for which their canonical representations are entangled. They, therefore, produce more than one component on the fiber product.
Then, we classify the possible component types, $W$, that appear on $\tilde C_{f^*,g^*}$ using the branch cycles for $W$ that come from Method II. The result is a Nielsen class formulation telling explicitly what $g_1\,$s to avoid to assure the growth of the component genuses of $\tilde C_{f*,g*og_1}$ as deg($g_1$) increases. Of particular note: using and expanding on Nielsen classes and the solution of the genus 0 problem (classifying the monodromy groups of indecomposable rational functions).
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Diophantine statements over Residue fields: Galois stratification and uniformity
Authors:
Michael D. Fried
Abstract:
Using Felgner's problem I revisit a key issue in using the "Galois Stratification Procedure" that first appeared in [FrS76]. The emphasis here is on using arithmetic homotopy to make the production of Poincare; series attached to general diophantine statements canonical.
According to work in progress of Michael Benedikt and E. Hrushovski, Galois stratification - over one finite field - is as eff…
▽ More
Using Felgner's problem I revisit a key issue in using the "Galois Stratification Procedure" that first appeared in [FrS76]. The emphasis here is on using arithmetic homotopy to make the production of Poincare; series attached to general diophantine statements canonical.
According to work in progress of Michael Benedikt and E. Hrushovski, Galois stratification - over one finite field - is as efficient as is possible: on a statement of length n, it requires time bounded by a stack of exponentials of length linear in n. This doesn't take advantage of problems prepped for using homotopy aspects, Chow Motives, efficiently as in the main example which comes from my paper on the generalization of exceptional covers.
That example [FrJ, Chap. 30], simplifies aspects of the original procedure. It combines this with the later theory of Frobenius fields to produce objects over Q whose reductions mod primes give the stratification procedure at the prime. The paper separates two different uses of the Chebotarev non-regular analog.
1. Field crossing to interpret Poincare series coefficients directly from traces on Chow motives (providing valuable statements on variation with the prime p; versus
2. Chebotarev using Lang-Weil to approximate the number of points on an appropriate variety for the Galois stratification procedure.
We consider variables taking values in the algebraic closure of Z/p but fixed by respective powers of the Frobenius: we call these Frobenius vectors. For this there is a twisted Chebotarev version stemming from a conjecture of Deligne, and outlined in a preprint of Hrushovski. This paper expands on the work of D. Wan, J. Denef and F. Loeser, J. Nicaise, I. Tomasic and E. Hrushovski, all relevant to taking the Galois stratification procedure beyond the original finite field framework.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Deep Learning Models for Automated Classification of Dog Emotional States from Facial Expressions
Authors:
Tali Boneh-Shitrit,
Shir Amir,
Annika Bremhorst,
Daniel S. Mills,
Stefanie Riemer,
Dror Fried,
Anna Zamansky
Abstract:
Similarly to humans, facial expressions in animals are closely linked with emotional states. However, in contrast to the human domain, automated recognition of emotional states from facial expressions in animals is underexplored, mainly due to difficulties in data collection and establishment of ground truth concerning emotional states of non-verbal users. We apply recent deep learning techniques…
▽ More
Similarly to humans, facial expressions in animals are closely linked with emotional states. However, in contrast to the human domain, automated recognition of emotional states from facial expressions in animals is underexplored, mainly due to difficulties in data collection and establishment of ground truth concerning emotional states of non-verbal users. We apply recent deep learning techniques to classify (positive) anticipation and (negative) frustration of dogs on a dataset collected in a controlled experimental setting. We explore the suitability of different backbones (e.g. ResNet, ViT) under different supervisions to this task, and find that features of a self-supervised pretrained ViT (DINO-ViT) are superior to the other alternatives. To the best of our knowledge, this work is the first to address the task of automatic classification of canine emotions on data acquired in a controlled experiment.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Mimicking Behaviors in Separated Domains
Authors:
Giuseppe De Giacomo,
Dror Fried,
Fabio Patrizi,
Shufang Zhu
Abstract:
Devising a strategy to make a system mimicking behaviors from another system is a problem that naturally arises in many areas of Computer Science. In this work, we interpret this problem in the context of intelligent agents, from the perspective of LTLf, a formalism commonly used in AI for expressing finite-trace properties. Our model consists of two separated dynamic domains, D_A and D_B, and an…
▽ More
Devising a strategy to make a system mimicking behaviors from another system is a problem that naturally arises in many areas of Computer Science. In this work, we interpret this problem in the context of intelligent agents, from the perspective of LTLf, a formalism commonly used in AI for expressing finite-trace properties. Our model consists of two separated dynamic domains, D_A and D_B, and an LTLf specification that formalizes the notion of mimicking by mapping properties on behaviors (traces) of D_A into properties on behaviors of D_B. The goal is to synthesize a strategy that step-by-step maps every behavior of D_A into a behavior of D_B so that the specification is met. We consider several forms of mapping specifications, ranging from simple ones to full LTLf, and for each we study synthesis algorithms and computational properties.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Natural Language to Code Translation with Execution
Authors:
Freda Shi,
Daniel Fried,
Marjan Ghazvininejad,
Luke Zettlemoyer,
Sida I. Wang
Abstract:
Generative models of code, pretrained on large corpora of programs, have shown great success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et al., 2022, inter alia). While these models do not explicitly incorporate program semantics (i.e., execution results) during training, they are able to generate correct solutions for many problems. However, choosing a sin…
▽ More
Generative models of code, pretrained on large corpora of programs, have shown great success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et al., 2022, inter alia). While these models do not explicitly incorporate program semantics (i.e., execution results) during training, they are able to generate correct solutions for many problems. However, choosing a single correct program from a generated set for each problem remains challenging. In this work, we introduce execution result--based minimum Bayes risk decoding (MBR-EXEC) for program selection and show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks. We select output programs from a generated candidate set by marginalizing over program implementations that share the same semantics. Because exact equivalence is intractable, we execute each program on a small number of test inputs to approximate semantic equivalence. Across datasets, execution or simulated execution significantly outperforms the methods that do not involve program semantics. We find that MBR-EXEC consistently improves over all execution-unaware selection methods, suggesting it as an effective approach for natural language to code translation. We open-source our code at github.com/facebookresearch/mbr-exec and data at dl.fbaipublicfiles.com/mbr-exec/mbr-exec-release.zip
△ Less
Submitted 1 November, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
InCoder: A Generative Model for Code Infilling and Synthesis
Authors:
Daniel Fried,
Armen Aghajanyan,
Jessy Lin,
Sida Wang,
Eric Wallace,
Freda Shi,
Ruiqi Zhong,
Wen-tau Yih,
Luke Zettlemoyer,
Mike Lewis
Abstract:
Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and move…
▽ More
Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. The InCoder models and code are publicly released. https://sites.google.com/view/incoder-code-models
△ Less
Submitted 9 April, 2023; v1 submitted 12 April, 2022;
originally announced April 2022.
-
Inferring Rewards from Language in Context
Authors:
Jessy Lin,
Daniel Fried,
Dan Klein,
Anca Dragan
Abstract:
In classic instruction following, language like "I'd like the JetBlue flight" maps to actions (e.g., selecting that flight). However, language also conveys information about a user's underlying reward function (e.g., a general preference for JetBlue), which can allow a model to carry out desirable actions in new contexts. We present a model that infers rewards from language pragmatically: reasonin…
▽ More
In classic instruction following, language like "I'd like the JetBlue flight" maps to actions (e.g., selecting that flight). However, language also conveys information about a user's underlying reward function (e.g., a general preference for JetBlue), which can allow a model to carry out desirable actions in new contexts. We present a model that infers rewards from language pragmatically: reasoning about how speakers choose utterances not only to elicit desired actions, but also to reveal information about their preferences. On a new interactive flight-booking task with natural language, our model more accurately infers rewards and predicts optimal actions in unseen environments, in comparison to past work that first maps language to actions (instruction following) and then maps actions to rewards (inverse reinforcement learning).
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
An Improved Algorithm for The $k$-Dyck Edit Distance Problem
Authors:
Dvir Fried,
Shay Golan,
Tomasz Kociumaka,
Tsvi Kopelowitz,
Ely Porat,
Tatiana Starikovskaya
Abstract:
A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses $S$ is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform $S$ into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses $S$ an…
▽ More
A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses $S$ is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform $S$ into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses $S$ and a positive integer $k$, and the goal is to compute the Dyck edit distance of $S$ only if the distance is at most $k$, and otherwise report that the distance is larger than $k$. Backurs and Onak [PODS'16] showed that the threshold Dyck edit distance problem can be solved in $O(n+k^{16})$ time.
In this work, we design new algorithms for the threshold Dyck edit distance problem which costs $O(n+k^{4.544184})$ time with high probability or $O(n+k^{4.853059})$ deterministically. Our algorithms combine several new structural properties of the Dyck edit distance problem, a refined algorithm for fast $(\min,+)$ matrix product, and a careful modification of ideas used in Valiant's parsing algorithm.
△ Less
Submitted 22 August, 2022; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Reference-Centric Models for Grounded Collaborative Dialogue
Authors:
Daniel Fried,
Justin T. Chiu,
Dan Klein
Abstract:
We present a grounded neural dialogue model that successfully collaborates with people in a partially-observable reference game. We focus on a setting where two agents each observe an overlapping part of a world context and need to identify and agree on some object they share. Therefore, the agents should pool their information and communicate pragmatically to solve the task. Our dialogue agent ac…
▽ More
We present a grounded neural dialogue model that successfully collaborates with people in a partially-observable reference game. We focus on a setting where two agents each observe an overlapping part of a world context and need to identify and agree on some object they share. Therefore, the agents should pool their information and communicate pragmatically to solve the task. Our dialogue agent accurately grounds referents from the partner's utterances using a structured reference resolver, conditions on these referents using a recurrent memory, and uses a pragmatic generation procedure to ensure the partner can resolve the references the agent produces. We evaluate on the OneCommon spatial grounding dialogue task (Udagawa and Aizawa 2019), involving a number of dots arranged on a board with continuously varying positions, sizes, and shades. Our agent substantially outperforms the previous state of the art for the task, obtaining a 20% relative improvement in successful task completion in self-play evaluations and a 50% relative improvement in success in human evaluations.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Adapting Behaviors via Reactive Synthesis
Authors:
Gal Amram,
Suguman Bansal,
Dror Fried,
Lucas M. Tabajara,
Moshe Y. Vardi,
Gera Weiss
Abstract:
In the \emph{Adapter Design Pattern}, a programmer implements a \emph{Target} interface by constructing an \emph{Adapter} that accesses an existing \emph{Adaptee} code. In this work, we present a reactive synthesis interpretation to the adapter design pattern, wherein an algorithm takes an \emph{Adaptee} and a \emph{Target} transducers, and the aim is to synthesize an \emph{Adapter} transducer tha…
▽ More
In the \emph{Adapter Design Pattern}, a programmer implements a \emph{Target} interface by constructing an \emph{Adapter} that accesses an existing \emph{Adaptee} code. In this work, we present a reactive synthesis interpretation to the adapter design pattern, wherein an algorithm takes an \emph{Adaptee} and a \emph{Target} transducers, and the aim is to synthesize an \emph{Adapter} transducer that, when composed with the {\em Adaptee}, generates a behavior that is equivalent to the behavior of the {\em Target}. One use of such an algorithm is to synthesize controllers that achieve similar goals on different hardware platforms. While this problem can be solved with existing synthesis algorithms, current state-of-the-art tools fail to scale. To cope with the computational complexity of the problem, we introduce a special form of specification format, called {\em Separated GR($k$)}, which can be solved with a scalable synthesis algorithm but still allows for a large set of realistic specifications. We solve the realizability and the synthesis problems for Separated GR($k$), and show how to exploit the separated nature of our specification to construct better algorithms, in terms of time complexity, than known algorithms for GR($k$) synthesis. We then describe a tool, called SGR($k$), that we have implemented based on the above approach and show, by experimental evaluation, how our tool outperforms current state-of-the-art tools on various benchmarks and test-cases.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
Modular Networks for Compositional Instruction Following
Authors:
Rodolfo Corona,
Daniel Fried,
Coline Devin,
Dan Klein,
Trevor Darrell
Abstract:
Standard architectures used in instruction following often struggle on novel compositions of subgoals (e.g. navigating to landmarks or picking up objects) observed during training. We propose a modular architecture for following natural language instructions that describe sequences of diverse subgoals. In our approach, subgoal modules each carry out natural language instructions for a specific sub…
▽ More
Standard architectures used in instruction following often struggle on novel compositions of subgoals (e.g. navigating to landmarks or picking up objects) observed during training. We propose a modular architecture for following natural language instructions that describe sequences of diverse subgoals. In our approach, subgoal modules each carry out natural language instructions for a specific subgoal type. A sequence of modules to execute is chosen by learning to segment the instructions and predicting a subgoal type for each segment. When compared to standard, non-modular sequence-to-sequence approaches on ALFRED, a challenging instruction following benchmark, we find that modularization improves generalization to novel subgoal compositions, as well as to environments unseen in training.
△ Less
Submitted 13 April, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Taming Discrete Integration via the Boon of Dimensionality
Authors:
Jeffrey M. Dudek,
Dror Fried,
Kuldeep S. Meel
Abstract:
Discrete integration is a fundamental problem in computer science that concerns the computation of discrete sums over exponentially large sets. Despite intense interest from researchers for over three decades, the design of scalable techniques for computing estimates with rigorous guarantees for discrete integration remains the holy grail. The key contribution of this work addresses this scalabili…
▽ More
Discrete integration is a fundamental problem in computer science that concerns the computation of discrete sums over exponentially large sets. Despite intense interest from researchers for over three decades, the design of scalable techniques for computing estimates with rigorous guarantees for discrete integration remains the holy grail. The key contribution of this work addresses this scalability challenge via an efficient reduction of discrete integration to model counting. The proposed reduction is achieved via a significant increase in the dimensionality that, contrary to conventional wisdom, leads to solving an instance of the relatively simpler problem of model counting.
Building on the promising approach proposed by Chakraborty et al, our work overcomes the key weakness of their approach: a restriction to dyadic weights. We augment our proposed reduction, called DeWeight, with a state of the art efficient approximate model counter and perform detailed empirical analysis over benchmarks arising from neural network verification domains, an emerging application area of critical importance. DeWeight, to the best of our knowledge, is the first technique to compute estimates with provable guarantees for this class of benchmarks.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Moduli relations between l-adic representations and the regular inverse Galois problem
Authors:
Michael David Fried
Abstract:
There are two famous Abel Theorems. Most well-known is his description of abelian (analytic) functions on a one dimensional compact complex torus. The other collects together those complex tori, with their prime degree isogenies, into one space. Riemann's generalization of the first features his famous theta functions. His deepest work aimed at extending Abel's second theorem; he died before he fu…
▽ More
There are two famous Abel Theorems. Most well-known is his description of abelian (analytic) functions on a one dimensional compact complex torus. The other collects together those complex tori, with their prime degree isogenies, into one space. Riemann's generalization of the first features his famous theta functions. His deepest work aimed at extending Abel's second theorem; he died before he fulfilled this. That extension is often pictured on complex higher dimension torii. For Riemann, though, it was to spaces of Jacobians of compact Riemann surfaces, W, of genus g, toward studying functions φ: W -> P^1_z, on them. Data for such pairs (W,φ) starts with a monodromy group, G, and conjugacy classes C in G. Many applications come from putting all such covers attached to (G,C) in natural -- Hurwitz -- families. We connect two such applications: The Regular Inverse Galois Problem (RIGP) and Serre's Open Image Theorem (OIT). We call the connecting device Modular Towers (MT s). Backdrop for the OIT and RIGP uses Serre's books Abelian l-adic representations and elliptic curves (1968) and Topics in Galois theory (1992). Serre's OIT example is the case where MT levels identify as modular curves. With an example that isn't modular curves, we explain conjectured MT properties -- generalizing a Theorem of Hilbert's -- that would conclude an OIT for all MTs. Solutions of pieces on both ends of these connections are known in significant cases.
△ Less
Submitted 2 August, 2020;
originally announced August 2020.
-
Syntactic Structure Distillation Pretraining For Bidirectional Encoders
Authors:
Adhiguna Kuncoro,
Lingpeng Kong,
Daniel Fried,
Dani Yogatama,
Laura Rimell,
Chris Dyer,
Phil Blunsom
Abstract:
Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence. Given this success, it remains an open question whether scalable learners like BERT can become fully proficient in the syntax of natural language by virtue of data scale alone, or whether they s…
▽ More
Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence. Given this success, it remains an open question whether scalable learners like BERT can become fully proficient in the syntax of natural language by virtue of data scale alone, or whether they still benefit from more explicit syntactic biases. To answer this question, we introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining, by distilling the syntactically informative predictions of a hierarchical---albeit harder to scale---syntactic language model. Since BERT models masked words in bidirectional context, we propose to distill the approximate marginal distribution over words in context from the syntactic LM. Our approach reduces relative error by 2-21% on a diverse set of structured prediction tasks, although we obtain mixed results on the GLUE benchmark. Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data, and contribute to a better understanding of where syntactic biases are most helpful in benchmarks of natural language understanding.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.
-
Learning to Segment Actions from Observation and Narration
Authors:
Daniel Fried,
Jean-Baptiste Alayrac,
Phil Blunsom,
Chris Dyer,
Stephen Clark,
Aida Nematzadeh
Abstract:
We apply a generative segmental model of task structure, guided by narration, to action segmentation in video. We focus on unsupervised and weakly-supervised settings where no action labels are known during training. Despite its simplicity, our model performs competitively with previous work on a dataset of naturalistic instructional videos. Our model allows us to vary the sources of supervision u…
▽ More
We apply a generative segmental model of task structure, guided by narration, to action segmentation in video. We focus on unsupervised and weakly-supervised settings where no action labels are known during training. Despite its simplicity, our model performs competitively with previous work on a dataset of naturalistic instructional videos. Our model allows us to vary the sources of supervision used in training, and we find that both task structure and narrative language provide large benefits in segmentation quality.
△ Less
Submitted 11 August, 2020; v1 submitted 7 May, 2020;
originally announced May 2020.
-
Cross-Domain Generalization of Neural Constituency Parsers
Authors:
Daniel Fried,
Nikita Kitaev,
Dan Klein
Abstract:
Neural parsers obtain state-of-the-art results on benchmark treebanks for constituency parsing -- but to what degree do they generalize to other domains? We present three results about the generalization of neural parsers in a zero-shot setting: training on trees from one corpus and evaluating on out-of-domain corpora. First, neural and non-neural parsers generalize comparably to new domains. Seco…
▽ More
Neural parsers obtain state-of-the-art results on benchmark treebanks for constituency parsing -- but to what degree do they generalize to other domains? We present three results about the generalization of neural parsers in a zero-shot setting: training on trees from one corpus and evaluating on out-of-domain corpora. First, neural and non-neural parsers generalize comparably to new domains. Second, incorporating pre-trained encoder representations into neural parsers substantially improves their performance across all domains, but does not give a larger relative improvement for out-of-domain treebanks. Finally, despite the rich input representations they learn, neural parsers still benefit from structured output prediction of output trees, yielding higher exact match accuracy and stronger generalization both to larger text spans and to out-of-domain corpora. We analyze generalization on English and Chinese corpora, and in the process obtain state-of-the-art parsing results for the Brown, Genia, and English Web treebanks.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation
Authors:
Ronghang Hu,
Daniel Fried,
Anna Rohrbach,
Dan Klein,
Trevor Darrell,
Kate Saenko
Abstract:
Vision-and-Language Navigation (VLN) requires grounding instructions, such as "turn right and stop at the door", to routes in a visual environment. The actual grounding can connect language to the environment through multiple modalities, e.g. "stop at the door" might ground into visual objects, while "turn right" might rely only on the geometric structure of a route. We investigate where the natur…
▽ More
Vision-and-Language Navigation (VLN) requires grounding instructions, such as "turn right and stop at the door", to routes in a visual environment. The actual grounding can connect language to the environment through multiple modalities, e.g. "stop at the door" might ground into visual objects, while "turn right" might rely only on the geometric structure of a route. We investigate where the natural language empirically grounds under two recent state-of-the-art VLN models. Surprisingly, we discover that visual features may actually hurt these models: models which only use route structure, ablating visual features, outperform their visual counterparts in unseen new environments on the benchmark Room-to-Room dataset. To better use all the available modalities, we propose to decompose the grounding procedure into a set of expert models with access to different modalities (including object detections) and ensemble them at prediction time, improving the performance of state-of-the-art models on the VLN task.
△ Less
Submitted 9 June, 2019; v1 submitted 2 June, 2019;
originally announced June 2019.
-
Pragmatically Informative Text Generation
Authors:
Sheng Shen,
Daniel Fried,
Jacob Andreas,
Dan Klein
Abstract:
We improve the informativeness of models for conditional text generation using techniques from computational pragmatics. These techniques formulate language production as a game between speakers and listeners, in which a speaker should generate output text that a listener can use to correctly identify the original input that the text describes. While such approaches are widely used in cognitive sc…
▽ More
We improve the informativeness of models for conditional text generation using techniques from computational pragmatics. These techniques formulate language production as a game between speakers and listeners, in which a speaker should generate output text that a listener can use to correctly identify the original input that the text describes. While such approaches are widely used in cognitive science and grounded language learning, they have received less attention for more standard language generation tasks. We consider two pragmatic modeling methods for text generation: one where pragmatics is imposed by information preservation, and another where pragmatics is imposed by explicit modeling of distractors. We find that these methods improve the performance of strong existing systems for abstractive summarization and generation from structured meaning representations.
△ Less
Submitted 4 April, 2019; v1 submitted 2 April, 2019;
originally announced April 2019.
-
Sequential Relational Decomposition
Authors:
Dror Fried,
Axel Legay,
Joël Ouaknine,
Moshe Y. Vardi
Abstract:
The concept of decomposition in computer science and engineering is considered a fundamental component of computational thinking and is prevalent in design of algorithms, software construction, hardware design, and more. We propose a simple and natural formalization of sequential decomposition, in which a task is decomposed into two sequential sub-tasks, with the first sub-task to be executed befo…
▽ More
The concept of decomposition in computer science and engineering is considered a fundamental component of computational thinking and is prevalent in design of algorithms, software construction, hardware design, and more. We propose a simple and natural formalization of sequential decomposition, in which a task is decomposed into two sequential sub-tasks, with the first sub-task to be executed before the second sub-task is executed. These tasks are specified by means of input/output relations. We define and study decomposition problems, which is to decide whether a given specification can be sequentially decomposed. Our main result is that decomposition itself is a difficult computational problem. More specifically, we study decomposition problems in three settings: where the input task is specified explicitly, by means of Boolean circuits, and by means of automatic relations. We show that in the first setting decomposition is NP-complete, in the second setting it is NEXPTIME-complete, and in the third setting there is evidence to suggest that it is undecidable. Our results indicate that the intuitive idea of decomposition as a system-design approach requires further investigation. In particular, we show that adding a human to the loop by asking for a decomposition hint lowers the complexity of decomposition problems considerably.
△ Less
Submitted 2 March, 2022; v1 submitted 4 March, 2019;
originally announced March 2019.
-
Functional Synthesis via Input-Output Separation
Authors:
Supratik Chakraborty,
Dror Fried,
Lucas M. Tabajara,
Moshe Y. Vardi
Abstract:
Boolean functional synthesis is the process of constructing a Boolean function from a Boolean specification that relates input and output variables. Despite significant recent developments in synthesis algorithms, Boolean functional synthesis remains a challenging problem even when state-of-the-art methods are used for decomposing the specification. In this work we bring a fresh decomposition appr…
▽ More
Boolean functional synthesis is the process of constructing a Boolean function from a Boolean specification that relates input and output variables. Despite significant recent developments in synthesis algorithms, Boolean functional synthesis remains a challenging problem even when state-of-the-art methods are used for decomposing the specification. In this work we bring a fresh decomposition approach, orthogonal to existing methods, that explores the decomposition of the specification into separate input and output components. We make use of an input-output decomposition of a given specification described as a CNF formula, by alternatingly analyzing the separate input and output components. We exploit well-defined properties of these components to ultimately synthesize a solution for the entire specification. We first provide a theoretical result that, for input components with specific structures, synthesis for CNF formulas via this framework can be performed more efficiently than in the general case. We then show by experimental evaluations that our algorithm performs well also in practice on instances which are challenging for existing state-of-the-art tools, serving as a good complement to modern synthesis techniques.
△ Less
Submitted 24 August, 2018;
originally announced August 2018.
-
Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing
Authors:
Daniel Fried,
Dan Klein
Abstract:
Dynamic oracles provide strong supervision for training constituency parsers with exploration, but must be custom defined for a given parser's transition system. We explore using a policy gradient method as a parser-agnostic alternative. In addition to directly optimizing for a tree-level metric such as F1, policy gradient has the potential to reduce exposure bias by allowing exploration during tr…
▽ More
Dynamic oracles provide strong supervision for training constituency parsers with exploration, but must be custom defined for a given parser's transition system. We explore using a policy gradient method as a parser-agnostic alternative. In addition to directly optimizing for a tree-level metric such as F1, policy gradient has the potential to reduce exposure bias by allowing exploration during training; moreover, it does not require a dynamic oracle for supervision. On four constituency parsers in three languages, the method substantially outperforms static oracle likelihood training in almost all settings. For parsers where a dynamic oracle is available (including a novel oracle which we define for the transition system of Dyer et al. 2016), policy gradient typically recaptures a substantial fraction of the performance gain afforded by the dynamic oracle.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Speaker-Follower Models for Vision-and-Language Navigation
Authors:
Daniel Fried,
Ronghang Hu,
Volkan Cirik,
Anna Rohrbach,
Jacob Andreas,
Louis-Philippe Morency,
Taylor Berg-Kirkpatrick,
Kate Saenko,
Dan Klein,
Trevor Darrell
Abstract:
Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. Natural language instructions typically identify only a few high-level decisions and landmarks rather than complete low-level motor behaviors; much of the missing information must be inferred based on perceptual context. In machine learning settings, this is doubly challenging: it…
▽ More
Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. Natural language instructions typically identify only a few high-level decisions and landmarks rather than complete low-level motor behaviors; much of the missing information must be inferred based on perceptual context. In machine learning settings, this is doubly challenging: it is difficult to collect enough annotated data to enable learning of this reasoning process from scratch, and also difficult to implement the reasoning process using generic sequence models. Here we describe an approach to vision-and-language navigation that addresses both these issues with an embedded speaker model. We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction. Both steps are supported by a panoramic action space that reflects the granularity of human-generated instructions. Experiments show that all three components of this approach---speaker-driven data augmentation, pragmatic reasoning and panoramic action space---dramatically improve the performance of a baseline instruction follower, more than doubling the success rate over the best existing approach on a standard benchmark.
△ Less
Submitted 26 October, 2018; v1 submitted 7 June, 2018;
originally announced June 2018.
-
An optimal approximation of discrete random variables with respect to the Kolmogorov distance
Authors:
Liat Cohen,
Dror Fried,
Gera Weiss
Abstract:
We present an algorithm that takes a discrete random variable $X$ and a number $m$ and computes a random variable whose support (set of possible outcomes) is of size at most $m$ and whose Kolmogorov distance from $X$ is minimal. In addition to a formal theoretical analysis of the correctness and of the computational complexity of the algorithm, we present a detailed empirical evaluation that shows…
▽ More
We present an algorithm that takes a discrete random variable $X$ and a number $m$ and computes a random variable whose support (set of possible outcomes) is of size at most $m$ and whose Kolmogorov distance from $X$ is minimal. In addition to a formal theoretical analysis of the correctness and of the computational complexity of the algorithm, we present a detailed empirical evaluation that shows how the proposed approach performs in practice in different applications and domains.
△ Less
Submitted 19 May, 2018;
originally announced May 2018.
-
Introduction to moduli, l-adic representations and the Regular Version of the Inverse Galois Problem
Authors:
Michael D. Fried
Abstract:
Sect 1 introduces Nielsen classes attached to (G,C), where C is r conjugacy classes in a finite group G, and a braid action on them. These give reduced Hurwitz spaces, denoted H(G,C)^rd. The section concludes with a braid formula for the genus of these spaces when r = 4.
If there is at least one prime l for which G is divisible by l, but has no Z/l quotient, then there is a canonical tower of re…
▽ More
Sect 1 introduces Nielsen classes attached to (G,C), where C is r conjugacy classes in a finite group G, and a braid action on them. These give reduced Hurwitz spaces, denoted H(G,C)^rd. The section concludes with a braid formula for the genus of these spaces when r = 4.
If there is at least one prime l for which G is divisible by l, but has no Z/l quotient, then there is a canonical tower of reduced Hurwitz spaces over H(G,C)^rd, using the Universal Frattini cover, ~G, of G, and ~G_ab, its abelianized version. The towers are nonempty assuming C are l' classes satisfying a cohomological condition from a lift invariant. A M(odular)T(ower) is a projective sequence of components of the canonical tower.
Sect 2 introduces the book [Fr18], which takes on generalizing Serre's O(pen)I(mage)T(heorem}, interpreted as the case when G is a dihedral group D_l and C is four repetitions of the involution conjugacy class. Serre's Theorem separated decomposition groups of projective sequences of points in the modular curve towers into two types: CM (complex multiplication) and GL_2. When r = 4, all MT levels are upper half-plane quotients ramified at 0 (of order 3), 1 (of order 2) and \infty (corresponding to the cusps). They are appropriate therefore to compare Serre's OIT with the cusps and decomposition group fibers. [Fr18] emphasizes new phenomena in cusps, and components, while still showing in high tower levels a valid comparison with modular curves.
It also aims to show how MTs can expand the applications usual for modular curve towers, recognizing those problems directly interpret from the Inverse Galois Problem. The l-adic representations of the title come from the abelianized version of the Universal Frattini cover.
△ Less
Submitted 28 March, 2018;
originally announced March 2018.
-
Unified Pragmatic Models for Generating and Following Instructions
Authors:
Daniel Fried,
Jacob Andreas,
Dan Klein
Abstract:
We show that explicit pragmatic inference aids in correctly generating and following natural language instructions for complex, sequential tasks. Our pragmatics-enabled models reason about why speakers produce certain instructions, and about how listeners will react upon hearing them. Like previous pragmatic models, we use learned base listener and speaker models to build a pragmatic speaker that…
▽ More
We show that explicit pragmatic inference aids in correctly generating and following natural language instructions for complex, sequential tasks. Our pragmatics-enabled models reason about why speakers produce certain instructions, and about how listeners will react upon hearing them. Like previous pragmatic models, we use learned base listener and speaker models to build a pragmatic speaker that uses the base listener to simulate the interpretation of candidate descriptions, and a pragmatic listener that reasons counterfactually about alternative descriptions. We extend these models to tasks with sequential structure. Evaluation of language generation and interpretation shows that pragmatic inference improves state-of-the-art listener models (at correctly interpreting human instructions) and speaker models (at producing instructions correctly interpreted by humans) in diverse settings.
△ Less
Submitted 28 May, 2018; v1 submitted 14 November, 2017;
originally announced November 2017.
-
Proton network flexibility enables robustness and large electric fields in the ketosteroid isomerase active site
Authors:
Lu Wang,
Stephen D. Fried,
Thomas E. Markland
Abstract:
Hydrogen bond networks play vital roles in biological functions ranging from protein folding to enzyme catalysis. Here we combine electronic structure calculations and ab initio path integral molecular dynamics simulations, which incorporate both nuclear and electronic quantum effects, to show why the network of short hydrogen bonds in the active site of ketosteroid isomerase is remarkably robust…
▽ More
Hydrogen bond networks play vital roles in biological functions ranging from protein folding to enzyme catalysis. Here we combine electronic structure calculations and ab initio path integral molecular dynamics simulations, which incorporate both nuclear and electronic quantum effects, to show why the network of short hydrogen bonds in the active site of ketosteroid isomerase is remarkably robust to mutations along the network and how this gives rise to large local electric fields. We demonstrate that these properties arise from the network's ability to respond to a perturbation by shifting proton positions and redistributing electronic charge density. This flexibility leads to small changes in properties such as the partial ionization of residues and $pK_a$ isotope effects upon mutation of the residues, consistent with recent experiments. This proton flexibility is further enhanced when an extended hydrogen bond network forms in the presence of an intermediate analog, which allows us to explain the chemical origins of the large electric fields in the enzyme's active site observed in recent experiments.
△ Less
Submitted 15 August, 2017;
originally announced August 2017.
-
Effective Inference for Generative Neural Parsing
Authors:
Mitchell Stern,
Daniel Fried,
Dan Klein
Abstract:
Generative neural models have recently achieved state-of-the-art results for constituency parsing. However, without a feasible search procedure, their use has so far been limited to reranking the output of external parsers in which decoding is more tractable. We describe an alternative to the conventional action-level beam search used for discriminative neural models that enables us to decode dire…
▽ More
Generative neural models have recently achieved state-of-the-art results for constituency parsing. However, without a feasible search procedure, their use has so far been limited to reranking the output of external parsers in which decoding is more tractable. We describe an alternative to the conventional action-level beam search used for discriminative neural models that enables us to decode directly in these generative models. We then show that by improving our basic candidate selection strategy and using a coarse pruning function, we can improve accuracy while exploring significantly less of the search space. Applied to the model of Choe and Charniak (2016), our inference procedure obtains 92.56 F1 on section 23 of the Penn Treebank, surpassing prior state-of-the-art results for single-model systems.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.
-
Improving Neural Parsing by Disentangling Model Combination and Reranking Effects
Authors:
Daniel Fried,
Mitchell Stern,
Dan Klein
Abstract:
Recent work has proposed several generative neural models for constituency parsing that achieve state-of-the-art results. Since direct search in these generative models is difficult, they have primarily been used to rescore candidate outputs from base parsers in which decoding is more straightforward. We first present an algorithm for direct search in these generative models. We then demonstrate t…
▽ More
Recent work has proposed several generative neural models for constituency parsing that achieve state-of-the-art results. Since direct search in these generative models is difficult, they have primarily been used to rescore candidate outputs from base parsers in which decoding is more straightforward. We first present an algorithm for direct search in these generative models. We then demonstrate that the rescoring results are at least partly due to implicit model combination rather than reranking effects. Finally, we show that explicit model combination can improve performance even further, resulting in new state-of-the-art numbers on the PTB of 94.25 F1 when training only on gold data and 94.66 F1 when using external data.
△ Less
Submitted 10 July, 2017;
originally announced July 2017.
-
Towards using social media to identify individuals at risk for preventable chronic illness
Authors:
Dane Bell,
Daniel Fried,
Luwen Huangfu,
Mihai Surdeanu,
Stephen Kobourov
Abstract:
We describe a strategy for the acquisition of training data necessary to build a social-media-driven early detection system for individuals at risk for (preventable) type 2 diabetes mellitus (T2DM). The strategy uses a game-like quiz with data and questions acquired semi-automatically from Twitter. The questions are designed to inspire participant engagement and collect relevant data to train a pu…
▽ More
We describe a strategy for the acquisition of training data necessary to build a social-media-driven early detection system for individuals at risk for (preventable) type 2 diabetes mellitus (T2DM). The strategy uses a game-like quiz with data and questions acquired semi-automatically from Twitter. The questions are designed to inspire participant engagement and collect relevant data to train a public-health model applied to individuals. Prior systems designed to use social media such as Twitter to predict obesity (a risk factor for T2DM) operate on entire communities such as states, counties, or cities, based on statistics gathered by government agencies. Because there is considerable variation among individuals within these groups, training data on the individual level would be more effective, but this data is difficult to acquire. The approach proposed here aims to address this issue. Our strategy has two steps. First, we trained a random forest classifier on data gathered from (public) Twitter statuses and state-level statistics with state-of-the-art accuracy. We then converted this classifier into a 20-questions-style quiz and made it available online. In doing so, we achieved high engagement with individuals that took the quiz, while also building a training set of voluntarily supplied individual-level data for future classification.
△ Less
Submitted 11 March, 2016;
originally announced March 2016.
-
Constrained Sampling and Counting: Universal Hashing Meets SAT Solving
Authors:
Kuldeep S. Meel,
Moshe Vardi,
Supratik Chakraborty,
Daniel J. Fremont,
Sanjit A. Seshia,
Dror Fried,
Alexander Ivrii,
Sharad Malik
Abstract:
Constrained sampling and counting are two fundamental problems in artificial intelligence with a diverse range of applications, spanning probabilistic reasoning and planning to constrained-random verification. While the theory of these problems was thoroughly investigated in the 1980s, prior work either did not scale to industrial size instances or gave up correctness guarantees to achieve scalabi…
▽ More
Constrained sampling and counting are two fundamental problems in artificial intelligence with a diverse range of applications, spanning probabilistic reasoning and planning to constrained-random verification. While the theory of these problems was thoroughly investigated in the 1980s, prior work either did not scale to industrial size instances or gave up correctness guarantees to achieve scalability. Recently, we proposed a novel approach that combines universal hashing and SAT solving and scales to formulas with hundreds of thousands of variables without giving up correctness guarantees. This paper provides an overview of the key ingredients of the approach and discusses challenges that need to be overcome to handle larger real-world instances.
△ Less
Submitted 21 December, 2015;
originally announced December 2015.
-
Quantum delocalization of protons in the hydrogen bond network of an enzyme active site
Authors:
Lu Wang,
Stephen D. Fried,
Steven G. Boxer,
Thomas E. Markland
Abstract:
Enzymes utilize protein architectures to create highly specialized structural motifs that can greatly enhance the rates of complex chemical transformations. Here we use experiments, combined with ab initio simulations that exactly include nuclear quantum effects, to show that a triad of strongly hydrogen bonded tyrosine residues within the active site of the enzyme ketosteroid isomerase (KSI) faci…
▽ More
Enzymes utilize protein architectures to create highly specialized structural motifs that can greatly enhance the rates of complex chemical transformations. Here we use experiments, combined with ab initio simulations that exactly include nuclear quantum effects, to show that a triad of strongly hydrogen bonded tyrosine residues within the active site of the enzyme ketosteroid isomerase (KSI) facilitates quantum proton delocalization. This delocalization dramatically stabilizes the deprotonation of an active site tyrosine residue, resulting in a very large isotope effect on its acidity. When an intermediate analog is docked, it is incorporated into the hydrogen bond network, giving rise to extended quantum proton delocalization in the active site. These results shed light on the role of nuclear quantum effects in the hydrogen bond network that stabilizes the reactive intermediate of KSI, and the behavior of protons in biological systems containing strong hydrogen bonds.
△ Less
Submitted 31 December, 2014;
originally announced January 2015.
-
Incorporating Both Distributional and Relational Semantics in Word Representations
Authors:
Daniel Fried,
Kevin Duh
Abstract:
We investigate the hypothesis that word representations ought to incorporate both distributional and relational semantics. To this end, we employ the Alternating Direction Method of Multipliers (ADMM), which flexibly optimizes a distributional objective on raw text and a relational objective on WordNet. Preliminary results on knowledge base completion, analogy tests, and parsing show that word rep…
▽ More
We investigate the hypothesis that word representations ought to incorporate both distributional and relational semantics. To this end, we employ the Alternating Direction Method of Multipliers (ADMM), which flexibly optimizes a distributional objective on raw text and a relational objective on WordNet. Preliminary results on knowledge base completion, analogy tests, and parsing show that word representations trained on both objectives can give improvements in some cases.
△ Less
Submitted 21 March, 2015; v1 submitted 18 December, 2014;
originally announced December 2014.
-
Incorporating Both Distributional and Relational Semantics in Word Representations
Authors:
Daniel Fried,
Kevin Duh
Abstract:
We investigate the hypothesis that word representations ought to incorporate both distributional and relational semantics. To this end, we employ the Alternating Direction Method of Multipliers (ADMM), which flexibly optimizes a distributional objective on raw text and a relational objective on WordNet. Preliminary results on knowledge base completion, analogy tests, and parsing show that word rep…
▽ More
We investigate the hypothesis that word representations ought to incorporate both distributional and relational semantics. To this end, we employ the Alternating Direction Method of Multipliers (ADMM), which flexibly optimizes a distributional objective on raw text and a relational objective on WordNet. Preliminary results on knowledge base completion, analogy tests, and parsing show that word representations trained on both objectives can give improvements in some cases.
△ Less
Submitted 21 March, 2015; v1 submitted 14 December, 2014;
originally announced December 2014.
-
Analyzing the Language of Food on Social Media
Authors:
Daniel Fried,
Mihai Surdeanu,
Stephen Kobourov,
Melanie Hingle,
Dane Bell
Abstract:
We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors. For all tasks, our language-based models signific…
▽ More
We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors. For all tasks, our language-based models significantly outperform the majority-class baselines. Performance is further improved with more complex natural language processing, such as topic modeling. We analyze which textual features have most predictive power for these datasets, providing insight into the connections between the language of food, geographic locale, and community characteristics. Lastly, we design and implement an online system for real-time query and visualization of the dataset. Visualization tools, such as geo-referenced heatmaps, semantics-preserving wordclouds and temporal histograms, allow us to discover more complex, global patterns mirrored in the language of food.
△ Less
Submitted 11 September, 2014; v1 submitted 7 September, 2014;
originally announced September 2014.
-
Relationship of Time Reversal Symmetry Breaking with Optical Kerr Rotation
Authors:
Alexander D. Fried
Abstract:
We prove an instance of the Reciprocity Theorem that demonstrates that Kerr rotation, also known as the magneto-optical Kerr effect, may only arise in materials that break microscopic time reversal symmetry. This argument applies in the linear response regime, and only fails for nonlinear effects. Recent measurements with a modified Sagnac Interferometer have found finite Kerr rotation in a variet…
▽ More
We prove an instance of the Reciprocity Theorem that demonstrates that Kerr rotation, also known as the magneto-optical Kerr effect, may only arise in materials that break microscopic time reversal symmetry. This argument applies in the linear response regime, and only fails for nonlinear effects. Recent measurements with a modified Sagnac Interferometer have found finite Kerr rotation in a variety of superconductors. The Sagnac Interferometer is a probe for nonreciprocity, so it must be that time reversal symmetry is broken in these materials.
△ Less
Submitted 18 September, 2014; v1 submitted 8 June, 2014;
originally announced June 2014.
-
Pulsed Laser Deposition of High-Quality Thin Films of the Insulating Ferromagnet EuS
Authors:
Qi I. Yang,
Jinfeng Zhao,
Li Zhang,
Merav Dolev,
Alexander D. Fried,
Ann F. Marshall,
Subhash H. Risbud,
Aharon Kapitulnik
Abstract:
High-quality thin films of the ferromagnetic-insulator europium(II) sulfide (EuS) were fabricated by pulsed laser deposition on Al2O3 (0001) and Si (100) substrates. A single orientation was obtained with the [100] planes parallel to the substrates, with atomic-scale smoothness indicates a near-ideal surface topography. The films exhibit uniform ferromagnetism below 15.9 K, with a substantial comp…
▽ More
High-quality thin films of the ferromagnetic-insulator europium(II) sulfide (EuS) were fabricated by pulsed laser deposition on Al2O3 (0001) and Si (100) substrates. A single orientation was obtained with the [100] planes parallel to the substrates, with atomic-scale smoothness indicates a near-ideal surface topography. The films exhibit uniform ferromagnetism below 15.9 K, with a substantial component of the magnetization perpendicular to the plane of the films. Optimization of the growth condition also yielded truly insulating films with immeasurably large resistance. This combination of magnetic and electric properties open the gate for novel devices that require a true ferromagnetic insulator.
△ Less
Submitted 25 February, 2014; v1 submitted 17 August, 2013;
originally announced August 2013.
-
Emerging Weak Localization Effects on Topological Insulator-Insulating Ferromagnet (Bi_2Se_3-EuS) Interface
Authors:
Qi I. Yang,
Merav Dolev,
Li Zhang,
Jinfeng Zhao,
Alexander D. Fried,
Elizabeth Schemm,
Min Liu,
Alexander Palevski,
Ann F. Marshall,
Subhash H. Risbud,
Aharon Kapitulnik
Abstract:
Thin films of topological insulator Bi_2Se_3 were deposited directly on insulating ferromagnetic EuS. Unusual negative magnetoresistance was observed near the zero field below the Curie temperature (T_C), resembling the weak localization effect; whereas the usual positive magnetoresistance was recovered above T_C. Such negative magnetoresistance was only observed for Bi_2Se_3 layers thinner than t…
▽ More
Thin films of topological insulator Bi_2Se_3 were deposited directly on insulating ferromagnetic EuS. Unusual negative magnetoresistance was observed near the zero field below the Curie temperature (T_C), resembling the weak localization effect; whereas the usual positive magnetoresistance was recovered above T_C. Such negative magnetoresistance was only observed for Bi_2Se_3 layers thinner than t~4nm, when its top and bottom surfaces are coupled. These results provide evidence for a proximity effect between a topological insulator and an insulating ferromagnet, laying the foundation for future realization of the half-integer quantized anomalous Hall effect in three-dimensional topological insulators.
△ Less
Submitted 28 August, 2013; v1 submitted 9 June, 2013;
originally announced June 2013.
-
Maps of Computer Science
Authors:
Daniel Fried,
Stephen G. Kobourov
Abstract:
We describe a practical approach for visual exploration of research papers. Specifically, we use the titles of papers from the DBLP database to create what we call maps of computer science (MoCS). Words and phrases from the paper titles are the cities in the map, and countries are created based on word and phrase similarity, calculated using co-occurrence. With the help of heatmaps, we can visuali…
▽ More
We describe a practical approach for visual exploration of research papers. Specifically, we use the titles of papers from the DBLP database to create what we call maps of computer science (MoCS). Words and phrases from the paper titles are the cities in the map, and countries are created based on word and phrase similarity, calculated using co-occurrence. With the help of heatmaps, we can visualize the profile of a particular conference or journal over the base map. Similarly, heatmap profiles can be made of individual researchers or groups such as a department. The visualization system also makes it possible to change the data used to generate the base map. For example, a specific journal or conference can be used to generate the base map and then the heatmap overlays can be used to show the evolution of research topics in the field over the years. As before, individual researchers or research groups profiles can be visualized using heatmap overlays but this time over the journal or conference base map. Finally, research papers or abstracts easily generate visual abstracts giving a visual representation of the distribution of topics in the paper. We outline a modular and extensible system for term extraction using natural language processing techniques, and show the applicability of methods of information retrieval to calculation of term similarity and creation of a topic map. The system is available at mocs.cs.arizona.edu.
△ Less
Submitted 9 April, 2013;
originally announced April 2013.
-
Complexity of Canadian Traveler Problem Variants
Authors:
Dror Fried,
Solomon Eyal Shimony,
Amit Benbassat,
Cenny Wenner
Abstract:
The Canadian traveler problem (CTP) is the problem of traversing a given graph, where some of the edges may be blocked - a state which is revealed only upon reaching an incident vertex. Originally stated by Papadimitriou and Yannakakis (1991), the adversarial version of CTP was shown to be PSPACE-complete, with the stochastic version shown to be #P-hard. We show that stochastic CTP is also PSPACE-…
▽ More
The Canadian traveler problem (CTP) is the problem of traversing a given graph, where some of the edges may be blocked - a state which is revealed only upon reaching an incident vertex. Originally stated by Papadimitriou and Yannakakis (1991), the adversarial version of CTP was shown to be PSPACE-complete, with the stochastic version shown to be #P-hard. We show that stochastic CTP is also PSPACE-complete: initially proving PSPACE-hardness for the dependent version of stochastic CTP,and proceeding with gadgets that allow us to extend the proof to the independent case. Since for disjoint-path graphs, CTP can be solved in polynomial time, we examine the complexity of the more general remote-sensing CTP, and show that it is NP-hard even for disjoint-path graphs.
△ Less
Submitted 19 July, 2012;
originally announced July 2012.
-
Schinzel's Problem: Imprimitive covers and the monodromy method
Authors:
Michael D. Fried,
Ivica Gusic
Abstract:
Schinzel's original problem was to describe when an expression f(x)-g(y), with f,g nonconstant and having complex coefficients, is reducible. We call such an (f,g) a Schinzel pair if this happens nontrivially: f(x)-g(y) is newly reducible. Fried accomplished this as a special case of a result in "http://www.math.uci.edu/~mfried/paplist-ff/dav-red.pdf">dav-red.pdf, when f is indecomposable. That wo…
▽ More
Schinzel's original problem was to describe when an expression f(x)-g(y), with f,g nonconstant and having complex coefficients, is reducible. We call such an (f,g) a Schinzel pair if this happens nontrivially: f(x)-g(y) is newly reducible. Fried accomplished this as a special case of a result in "http://www.math.uci.edu/~mfried/paplist-ff/dav-red.pdf">dav-red.pdf, when f is indecomposable. That work featured using primitive permutation representations. Even after 42 years going beyond using primitivity is a challenge to the monodromy method despite many intervening related papers (see http://www.math.uci.edu/~mfried/paplist-ff/UMStory.pdf">UMStory.pdf. Here we develop a formula for branch cycles that characterizes Schinzel pairs satisfying a condition of Avanzi, Gusic and Zannier and relate it to this ongoing story.
△ Less
Submitted 4 December, 2011; v1 submitted 9 April, 2011;
originally announced April 2011.
-
Variables separated equations: Strikingly different roles for the Branch Cycle Lemma and the Finite Simple Group Classification
Authors:
Michael d. Fried
Abstract:
H. Davenport's Problem asks: What can we expect of two polynomials, over the integers, with the same ranges on almost all residue class fields? This stood out among many separated variable problems posed by Davenport, D.J. Lewis and A. Schinzel.
By bounding the degrees, but expanding the maps and variables in Davenport's Problem, Galois stratification enhanced the separated variable theme, solvi…
▽ More
H. Davenport's Problem asks: What can we expect of two polynomials, over the integers, with the same ranges on almost all residue class fields? This stood out among many separated variable problems posed by Davenport, D.J. Lewis and A. Schinzel.
By bounding the degrees, but expanding the maps and variables in Davenport's Problem, Galois stratification enhanced the separated variable theme, solving an Ax and Kochen problem from their Artin Conjecture work. J. Denef and F. Loeser applied this to add Chow motive coefficients to previously introduced zeta functions on a diophantine statement.
By restricting the variables, but leaving the degrees unbounded, we found the striking distinction between Davenport's problem over the rationals, solved by applying the Branch Cycle Lemma, and its generalization over any number field, solved using the simple group classification. This encouraged J. Thompson to formulate the genus 0 problem on rational function monodromy groups. R. Guralnick and Thompson led its solution in stages.
We look at at two developments since the solution of Davenport's problem.
* Stemming from C. MacCluer's 1967 thesis, identifying a general class of problems, including Davenport's, as monodromy precise.
* R(iemann) E(xistence) T(heorem)'s role as a converse to problems generalizing Davenport's, and Schinzel's (on reducibility).
We use these to consider: Going beyond the simple group classification to handle imprimitive groups; and what is the role of covers and correspondences in going from algebraic equations to zeta functions with Chow motive coefficients.
△ Less
Submitted 11 August, 2011; v1 submitted 23 December, 2010;
originally announced December 2010.
-
Moduli of relatively nilpotent extensions
Authors:
Michael D. Fried
Abstract:
Gives the most precise available description of the p-Frattini module for any p-perfect finite group G=G_0 (Thm. 2.8), and therefore of the groups G_{k,ab}, k \ge 0, from which we form the abelianized M(odular) T(ower). §4 includes a classification of Schur multiplier quotients, from which we figure two points (see the html file http://www.math.uci.edu/~mfried/paplist-mt/rims-rev.html):
1. Whe…
▽ More
Gives the most precise available description of the p-Frattini module for any p-perfect finite group G=G_0 (Thm. 2.8), and therefore of the groups G_{k,ab}, k \ge 0, from which we form the abelianized M(odular) T(ower). §4 includes a classification of Schur multiplier quotients, from which we figure two points (see the html file http://www.math.uci.edu/~mfried/paplist-mt/rims-rev.html):
1. Whether there is a non-empty MT over a given Hurwitz space component at level 0; and
2. whether all cusps above a given level 0 o-p' cusp are p-cusps.
The diophantine discussions of §5 remind how Demjanenko-Manin worked on modular curve towers, showing why we still need Falting's Thm. to conclude the Main MT conjecture when the p-Frattini module has dimension exceeding 1 (G_0 is not p-super singular). By 2009 there was a successful resolution of the Main Conjecture when the MT levels (reduced Hurwitz spaces) have dimension 1. http://www.math.uci.edu/~mfried/paplist-mt/MTTLine-domain.html reviews all inputs and results of the Modular Tower program starting with Books of Serre and Shimura.
△ Less
Submitted 21 October, 2009;
originally announced October 2009.
-
Relating two genus 0 problems of John Thompson
Authors:
Michael D. Fried
Abstract:
The "relating" entwines three problems:
1. Davenport's Problem, describing pairs of polynomials over Q whose ranges on Z/p are the same for almost all p.
2. Showing that the monodromy groups of rational function maps over the complexes are limited to a finite set of groups, outside of groups close to alternating groups (example, symmetric groups) with special representations, and dihedral an…
▽ More
The "relating" entwines three problems:
1. Davenport's Problem, describing pairs of polynomials over Q whose ranges on Z/p are the same for almost all p.
2. Showing that the monodromy groups of rational function maps over the complexes are limited to a finite set of groups, outside of groups close to alternating groups (example, symmetric groups) with special representations, and dihedral and cyclic groups.
3. Relating the genus 0 modular curves to the character group of the Monster simple group, so-called Monstrous Moonshine. http://www.math.uci.edu/~mfried/pathlist-cov/thomp-genus0.html has a more detailed exposition on the paper; http://www.math.uci.edu/~mfried/deflist-cov/Genus0-Prob.html gives a separate description of genus 0 problem #2.
△ Less
Submitted 20 October, 2009;
originally announced October 2009.
-
The place of exceptional covers among all diophantine relations
Authors:
Michael D. Fried
Abstract:
A cover of normal varieties is exceptional over a finite field if the map on points over infinitely many extensions of the field is one-one. A cover over a number field is exceptional if it is exceptional over infinitely many residue class fields. The first result: The category of exceptional covers of a normal variety, Z, over a finite field, F_q, has fiber products, and therefore a natural Gal…
▽ More
A cover of normal varieties is exceptional over a finite field if the map on points over infinitely many extensions of the field is one-one. A cover over a number field is exceptional if it is exceptional over infinitely many residue class fields. The first result: The category of exceptional covers of a normal variety, Z, over a finite field, F_q, has fiber products, and therefore a natural Galois group (with permutation representation) limit. This has many applications to considering Poincare series attached to diophantine questions. The paper follows three lines:
* The historical role of the Galois Theoretic property of exceptionality, first considered by Davenport and Lewis.
* How the tower structure on the category of exceptional covers of a pair (Z,F_q) allows forming subtowers that separate known results from unknown territory.
* The use of Serre's OIT, especially the GL_2 case, to consider cryptology periods and functional composition aspects of exceptionality.
A more extensive html description of the paper is at http://www.math.uci.edu/~mfried/paplist-ff/exceptTowYFFTA_519.html
△ Less
Submitted 17 October, 2009;
originally announced October 2009.
-
The Main Conjecture of Modular Towers and its higher rank generalization
Authors:
Michael D. Fried
Abstract:
The genus of projective curves discretely separates decidedly different two variable algebraic relations. So, we can focus on the connected moduli M_g of genus g curves. Yet, modern applications require a data variable (function) on such curves. The resulting spaces are versions, depending on our need from this data variable, of Hurwitz spaces. A Nielsen class is a set defined by r \ge 3 conjuga…
▽ More
The genus of projective curves discretely separates decidedly different two variable algebraic relations. So, we can focus on the connected moduli M_g of genus g curves. Yet, modern applications require a data variable (function) on such curves. The resulting spaces are versions, depending on our need from this data variable, of Hurwitz spaces. A Nielsen class is a set defined by r \ge 3 conjugacy classes C in the data variable monodromy G. It gives a striking genus analog.
Using Frattini covers of G, every Nielsen class produces a projective system of related Nielsen classes for any prime p dividing |G|. A nonempty (infinite) projective system of braid orbits in these Nielsen classes is an infinite (G,C) component (tree) branch. These correspond to projective systems of irreducible (dim r-3) components from {H(G_{p,k}(G),C)}_{k=0}^{\infty}, the (G,C,p) Modular Tower (MT). The classical modular curve towers {Y_1(p^{k+1})}_{k=0}^\infty (simplest case: G is dihedral, r=4, C are involution classes) are an avatar.
The (weak) Main Conjecture says, if G is p-perfect, there are no rational points at high levels of a component branch. When r=4, MT levels (minus their cusps) are upper half plane quotients covering the j-line. Our topics.
* Identifying component branches on a MT from g-p', p and Weigel cusp branches using the MT generalization of spin structures.
* Listing cusp branch properties that imply the (weak) Main Conjecture and extracting the small list of towers that could possibly fail the conjecture. * Formulating a (strong) Main Conjecture for higher rank MTs (with examples): almost all primes produce a modular curve-like system.
△ Less
Submitted 19 November, 2006;
originally announced November 2006.
-
Alternating groups and moduli space lifting Invariants
Authors:
Michael D. Fried
Abstract:
Main Theorem: Spaces of r-branch point 3-cycle covers, degree n or Galois of degree n!/2 have one (resp. two) component(s) if r=n-1 (resp. r\ge n). Improves Fried-Serre on deciding when sphere covers with odd-order branching lift to unramified Spin covers. We produce Hurwitz-Torelli automorphic functions on Hurwitz spaces, and draw Inverse Galois conclusions. Example: Absolute spaces of 3-cycle…
▽ More
Main Theorem: Spaces of r-branch point 3-cycle covers, degree n or Galois of degree n!/2 have one (resp. two) component(s) if r=n-1 (resp. r\ge n). Improves Fried-Serre on deciding when sphere covers with odd-order branching lift to unramified Spin covers. We produce Hurwitz-Torelli automorphic functions on Hurwitz spaces, and draw Inverse Galois conclusions. Example: Absolute spaces of 3-cycle covers with +1 (resp. -1) lift invariant carry canonical even (resp. odd) theta functions when r is even (resp. odd). For inner spaces the result is independent of r. Another use appears in, http://www.math.uci.edu/~mfried/paplist-mt/twoorbit.html, "Connectedness of families of sphere covers of A_n-Type." This shows the M(odular) T(ower)s for the prime p=2 lying over Hurwitz spaces first studied by, http://www.math.uci.edu/~mfried/othlist-cov/hurwitzLiu-Oss.pdf, Liu and Osserman have 2-cusps. That is sufficient to establish the Main Conjecture: (*) High tower levels are general-type varieties and have no rational points.For infinitely many of those MTs, the tree of cusps contains a subtree -- a spire -- isomorphic to the tree of cusps on a modular curve tower. This makes plausible a version of Serre's O(pen) I(mage) T(heorem) on such MTs. Establishing these modular curve-like properties opens, to MTs, modular curve-like thinking where modular curves have never gone before. A fuller html description of this paper is at http://www.math.uci.edu/~mfried/paplist-cov/hf-can0611591.html .
△ Less
Submitted 7 November, 2009; v1 submitted 19 November, 2006;
originally announced November 2006.
-
Hurwitz monodromy, spin separation and higher levels of a modular tower
Authors:
Paul Bailey,
Michael D. Fried
Abstract:
Each finite $p$-perfect group $G$ ($p$ a prime) has a universal central $p$-extension. For a perfect group these central extensions come from its {\sl Schur multiplier}. Serre gave a Stiefel-Whitney class approach to analyzing spin covers of alternating groups ($p=2$) aimed at geometric covering space problems. This included the regular version of the Inverse Galois Problem.
Every finite simpl…
▽ More
Each finite $p$-perfect group $G$ ($p$ a prime) has a universal central $p$-extension. For a perfect group these central extensions come from its {\sl Schur multiplier}. Serre gave a Stiefel-Whitney class approach to analyzing spin covers of alternating groups ($p=2$) aimed at geometric covering space problems. This included the regular version of the Inverse Galois Problem.
Every finite simple group with order divisible by $p$ has an infinite string of perfect centerless group covers exhibiting nontrivial Schur multipliers for the prime $p$. Sequences of moduli spaces of curves attached to $G$ and $p$, called {\sl Modular Towers}, capture the geometry of these many appearances of Schur multipliers in degeneration phenomena of {\sl Harbater-Mumford cover representatives}. These modular curve tower generalizations inspire conjectures akin to Serre's open image theorem. This includes that at suitably high levels we expect no rational points. Guided by two papers of Serre's, these cases reveal common appearance of spin structures producing $θ$-nulls on these moduli spaces. The results immediately apply to all the expected Inverse Galois topics. This includes systematic exposure of moduli spaces having points where the field of moduli is a field of definition and other points where it is not.
△ Less
Submitted 16 June, 2005; v1 submitted 17 April, 2001;
originally announced April 2001.