-
Learning Personal Food Preferences via Food Logs Embedding
Authors:
Ahmed A. Metwally,
Ariel K. Leong,
Aman Desai,
Anvith Nagarjuna,
Dalia Perelman,
Michael Snyder
Abstract:
Diet management is key to managing chronic diseases such as diabetes. Automated food recommender systems may be able to assist by providing meal recommendations that conform to a user's nutrition goals and food preferences. Current recommendation systems suffer from a lack of accuracy that is in part due to a lack of knowledge of food preferences, namely foods users like to and are able to eat fre…
▽ More
Diet management is key to managing chronic diseases such as diabetes. Automated food recommender systems may be able to assist by providing meal recommendations that conform to a user's nutrition goals and food preferences. Current recommendation systems suffer from a lack of accuracy that is in part due to a lack of knowledge of food preferences, namely foods users like to and are able to eat frequently. In this work, we propose a method for learning food preferences from food logs, a comprehensive but noisy source of information about users' dietary habits. We also introduce accompanying metrics. The method generates and compares word embeddings to identify the parent food category of each food entry and then calculates the most popular. Our proposed approach identifies 82% of a user's ten most frequently eaten foods. Our method is publicly available on (https://github.com/aametwally/LearningFoodPreferences)
△ Less
Submitted 22 November, 2021; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Information-theoretic User Interaction: Significant Inputs for Program Synthesis
Authors:
Ashish Tiwari,
Arjun Radhakrishna,
Sumit Gulwani,
Daniel Perelman
Abstract:
Programming-by-example technologies are being deployed in industrial products for real-time synthesis of various kinds of data transformations. These technologies rely on the user to provide few representative examples of the transformation task. Motivated by the need to find the most pertinent question to ask the user, in this paper, we introduce the {\em significant questions problem}, and show…
▽ More
Programming-by-example technologies are being deployed in industrial products for real-time synthesis of various kinds of data transformations. These technologies rely on the user to provide few representative examples of the transformation task. Motivated by the need to find the most pertinent question to ask the user, in this paper, we introduce the {\em significant questions problem}, and show that it is hard in general. We then develop an information-theoretic greedy approach for solving the problem. We justify the greedy algorithm using the conditional entropy result, which informally says that the question that achieves the maximum information gain is the one that we know least about.
In the context of interactive program synthesis, we use the above result to develop an {\em{active program learner}} that generates the significant inputs to pose as queries to the user in each iteration. The procedure requires extending a {\em{passive program learner}} to a {\em{sampling program learner}} that is able to sample candidate programs from the set of all consistent programs to enable estimation of information gain. It also uses clustering of inputs based on features in the inputs and the corresponding outputs to sample a small set of candidate significant inputs. Our active learner is able to tradeoff false negatives for false positives and converge in a small number of iterations on a real-world dataset of %around 800 string transformation tasks.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Twins: BFT Systems Made Robust
Authors:
Shehar Bano,
Alberto Sonnino,
Andrey Chursin,
Dmitri Perelman,
Zekun Li,
Avery Ching,
Dahlia Malkhi
Abstract:
This paper presents Twins, an automated unit test generator of Byzantine attacks. Twins implements three types of Byzantine behaviors: (i) leader equivocation, (ii) double voting, and (iii) losing internal state such as forgetting 'locks' guarding voted values. To emulate interesting attacks by a Byzantine node, it instantiates twin copies of the node instead of one, giving both twins the same ide…
▽ More
This paper presents Twins, an automated unit test generator of Byzantine attacks. Twins implements three types of Byzantine behaviors: (i) leader equivocation, (ii) double voting, and (iii) losing internal state such as forgetting 'locks' guarding voted values. To emulate interesting attacks by a Byzantine node, it instantiates twin copies of the node instead of one, giving both twins the same identities and network credentials. To the rest of the system, the twins appear indistinguishable from a single node behaving in a 'questionable' manner. Twins can systematically generate Byzantine attack scenarios at scale, execute them in a controlled manner, and examine their behavior. Twins scenarios iterate over protocol rounds and vary the communication patterns among nodes. Twins runs in a production setting within DiemBFT where it can execute 44M Twins-generated scenarios daily. Whereas the system at hand did not manifest errors, subtle safety bugs that were deliberately injected for the purpose of validating the implementation of Twins itself were exposed within minutes. Twins can prevent developers from regressing correctness when updating the codebase, introducing new features, or performing routine maintenance tasks. Twins only requires a thin wrapper over DiemBFT, we thus envision other systems using it. Building on this idea, one new attack and several known attacks against other BFT protocols were materialized as Twins scenarios. In all cases, the target protocols break within fewer than a dozen protocol rounds, hence it is realistic for the Twins approach to expose the problems.
△ Less
Submitted 14 January, 2022; v1 submitted 22 April, 2020;
originally announced April 2020.
-
FlashProfile: A Framework for Synthesizing Data Profiles
Authors:
Saswat Padhi,
Prateek Jain,
Daniel Perelman,
Oleksandr Polozov,
Sumit Gulwani,
Todd Millstein
Abstract:
We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, m…
▽ More
We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios.
Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters.
Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across $153$ tasks over $75$ large real datasets, we observe a median profiling time of only $\sim\,0.7\,$s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill.
△ Less
Submitted 16 April, 2019; v1 submitted 17 September, 2017;
originally announced September 2017.
-
Interactive Program Synthesis
Authors:
Vu Le,
Daniel Perelman,
Oleksandr Polozov,
Mohammad Raza,
Abhishek Udupa,
Sumit Gulwani
Abstract:
Program synthesis from incomplete specifications (e.g. input-output examples) has gained popularity and found real-world applications, primarily due to its ease-of-use. Since this technology is often used in an interactive setting, efficiency and correctness are often the key user expectations from a system based on such technologies. Ensuring efficiency is challenging since the highly combinatori…
▽ More
Program synthesis from incomplete specifications (e.g. input-output examples) has gained popularity and found real-world applications, primarily due to its ease-of-use. Since this technology is often used in an interactive setting, efficiency and correctness are often the key user expectations from a system based on such technologies. Ensuring efficiency is challenging since the highly combinatorial nature of program synthesis algorithms does not fit in a 1-2 second response expectation of a user-facing system. Meeting correctness expectations is also difficult, given that the specifications provided are incomplete, and that the users of such systems are typically non-programmers.
In this paper, we describe how interactivity can be leveraged to develop efficient synthesis algorithms, as well as to decrease the cognitive burden that a user endures trying to ensure that the system produces the desired program. We build a formal model of user interaction along three dimensions: incremental algorithm, step-based problem formulation, and feedback-based intent refinement. We then illustrate the effectiveness of each of these forms of interactivity with respect to synthesis performance and correctness on a set of real-world case studies.
△ Less
Submitted 9 March, 2017;
originally announced March 2017.
-
FRAPPE: Fast Replication Platform for Elastic Services
Authors:
Vita Bortnikov,
Gregory Chockler,
Dmitri Perelman,
Alexey Roytman,
Shlomit Shachor,
lya Shnayderman
Abstract:
Elasticity is critical for today's cloud services, which must be able to quickly adapt to dynamically changing load conditions and resource availability. We introduce FRAPPE, a new consistent replication platform aiming at improving elasticity of the replicated services hosted in clouds or large data centers. In the core of FRAPPE is a novel replicated state machine protocol, which employs specula…
▽ More
Elasticity is critical for today's cloud services, which must be able to quickly adapt to dynamically changing load conditions and resource availability. We introduce FRAPPE, a new consistent replication platform aiming at improving elasticity of the replicated services hosted in clouds or large data centers. In the core of FRAPPE is a novel replicated state machine protocol, which employs speculative executions to ensure continuous operation during the reconfiguration periods as well as in situations where failures prevent the agreement on the next stable configuration from being reached in a timely fashion. We present the FRAPPE's architecture and describe the basic techniques underlying the implementation of our speculative state machine protocol.
△ Less
Submitted 20 April, 2016;
originally announced April 2016.
-
Reconfigurable State Machine Replication from Non-Reconfigurable Building Blocks
Authors:
Vita Bortnikov,
Gregory Chockler,
Dmitri Perelman,
Alexey Roytman,
Shlomit Shachor,
Ilya Shnayderman
Abstract:
Reconfigurable state machine replication is an important enabler of elasticity for replicated cloud services, which must be able to dynamically adjust their size as a function of changing load and resource availability. We introduce a new generic framework to allow the reconfigurable state machine implementation to be derived from a collection of arbitrary non-reconfigurable state machines. Our re…
▽ More
Reconfigurable state machine replication is an important enabler of elasticity for replicated cloud services, which must be able to dynamically adjust their size as a function of changing load and resource availability. We introduce a new generic framework to allow the reconfigurable state machine implementation to be derived from a collection of arbitrary non-reconfigurable state machines. Our reduction framework follows the black box approach, and does not make any assumptions with respect to its execution environment apart from reliable channels. It allows higher-level services to leverage speculative command execution to ensure uninterrupted progress during the reconfiguration periods as well as in situations where failures prevent the reconfiguration agreement from being reached in a timely fashion. We apply our framework to obtain a reconfigurable speculative state machine from the non-reconfigurable Paxos implementation, and analyze its performance on a realistic distributed testbed. Our results show that our framework incurs negligible overheads in the absence of reconfiguration, and allows steady throughput to be maintained throughout the reconfiguration periods.
△ Less
Submitted 30 December, 2015;
originally announced December 2015.