-
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Authors:
Rui Yang,
Hanyang Chen,
Junyu Zhang,
Mark Zhao,
Cheng Qian,
Kangrui Wang,
Qineng Wang,
Teja Venkat Koripella,
Marziyeh Movahedi,
Manling Li,
Heng Ji,
Huan Zhang,
Tong Zhang
Abstract:
Leveraging Multi-modal Large Language Models (MLLMs) to create embodied agents offers a promising avenue for tackling real-world tasks. While language-centric embodied agents have garnered substantial attention, MLLM-based embodied agents remain underexplored due to the lack of comprehensive evaluation frameworks. To bridge this gap, we introduce EmbodiedBench, an extensive benchmark designed to e…
▽ More
Leveraging Multi-modal Large Language Models (MLLMs) to create embodied agents offers a promising avenue for tackling real-world tasks. While language-centric embodied agents have garnered substantial attention, MLLM-based embodied agents remain underexplored due to the lack of comprehensive evaluation frameworks. To bridge this gap, we introduce EmbodiedBench, an extensive benchmark designed to evaluate vision-driven embodied agents. EmbodiedBench features: (1) a diverse set of 1,128 testing tasks across four environments, ranging from high-level semantic tasks (e.g., household) to low-level tasks involving atomic actions (e.g., navigation and manipulation); and (2) six meticulously curated subsets evaluating essential agent capabilities like commonsense reasoning, complex instruction understanding, spatial awareness, visual perception, and long-term planning. Through extensive experiments, we evaluated 24 leading proprietary and open-source MLLMs within EmbodiedBench. Our findings reveal that: MLLMs excel at high-level tasks but struggle with low-level manipulation, with the best model, GPT-4o, scoring only 28.9\% on average. EmbodiedBench provides a multifaceted standardized evaluation platform that not only highlights existing challenges but also offers valuable insights to advance MLLM-based embodied agents. Our code and dataset are available at https://embodiedbench.github.io.
△ Less
Submitted 5 June, 2025; v1 submitted 13 February, 2025;
originally announced February 2025.
-
Risk Analysis in the Selection of Project Managers Based on ANP and FMEA
Authors:
Armin Asaadi,
Armita Atrian,
Hesam Nik Hoseini,
Mohammad Mahdi Movahedi
Abstract:
Project managers play a crucial role in the success of projects. The selection of an appropriate project manager is a primary concern for senior managers in firms. Typically, this process involves candidate interviews and assessments of their abilities. There are various criteria for selecting a project manager, and the importance of each criterion depends on the project type, its conditions, and…
▽ More
Project managers play a crucial role in the success of projects. The selection of an appropriate project manager is a primary concern for senior managers in firms. Typically, this process involves candidate interviews and assessments of their abilities. There are various criteria for selecting a project manager, and the importance of each criterion depends on the project type, its conditions, and the risks associated with their absence in the chosen candidate. Often, senior managers in engineering companies lack awareness of the significance of these criteria and the potential risks linked to their absence. This research aims to identify these risks in selecting project managers for civil engineering projects, utilizing a combined ANP-FMEA approach. Through a comprehensive literature review, five risk categories have been identified: individual skills, power-related issues, knowledge and expertise, experience, and personality traits. Subsequently, these risks, along with their respective sub-criteria and internal relationships, were analysed using the combined ANP-FMEA technique. The results highlighted that the lack of political influence, absence of construction experience, and deficiency in project management expertise represent the most substantial risks in selecting a project manager. Moreover, upon comparison with the traditional FMEA approach, this study demonstrates the superior ability of the ANP-FMEA model in differentiating risks and pinpointing factors with elevated risk levels.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
The Privacy-preserving Padding Problem: Non-negative Mechanisms for Conservative Answers with Differential Privacy
Authors:
Benjamin M. Case,
James Honaker,
Mahnush Movahedi
Abstract:
Differentially private noise mechanisms commonly use symmetric noise distributions. This is attractive both for achieving the differential privacy definition, and for unbiased expectations in the noised answers. However, there are contexts in which a noisy answer only has utility if it is conservative, that is, has known-signed error, which we call a padded answer. Seemingly, it is paradoxical to…
▽ More
Differentially private noise mechanisms commonly use symmetric noise distributions. This is attractive both for achieving the differential privacy definition, and for unbiased expectations in the noised answers. However, there are contexts in which a noisy answer only has utility if it is conservative, that is, has known-signed error, which we call a padded answer. Seemingly, it is paradoxical to satisfy the DP definition with one-sided error, but we show how it is possible to bury the paradox into approximate DP's delta parameter. We develop a few mechanisms for one-sided padding mechanisms that always give conservative answers, but still achieve approximate differential privacy. We show how these mechanisms can be applied in a few select areas including making the cardinalities of set intersections and unions revealed in Private Set Intersection protocols differential private and enabling multiparty computation protocols to compute on sparse data which has its exact sizes made differential private rather than performing a fully oblivious more expensive computation.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Privacy-Preserving Randomized Controlled Trials: A Protocol for Industry Scale Deployment
Authors:
Mahnush Movahedi,
Benjamin M. Case,
Andrew Knox,
James Honaker,
Li Li,
Yiming Paul Li,
Sanjay Saravanan,
Shubho Sengupta,
Erik Taubeneck
Abstract:
In this paper, we outline a way to deploy a privacy-preserving protocol for multiparty Randomized Controlled Trials on the scale of 500 million rows of data and more than a billion gates. Randomized Controlled Trials (RCTs) are widely used to improve business and policy decisions in various sectors such as healthcare, education, criminology, and marketing. A Randomized Controlled Trial is a scient…
▽ More
In this paper, we outline a way to deploy a privacy-preserving protocol for multiparty Randomized Controlled Trials on the scale of 500 million rows of data and more than a billion gates. Randomized Controlled Trials (RCTs) are widely used to improve business and policy decisions in various sectors such as healthcare, education, criminology, and marketing. A Randomized Controlled Trial is a scientifically rigorous method to measure the effectiveness of a treatment. This is accomplished by randomly allocating subjects to two or more groups, treating them differently, and then comparing the outcomes across groups. In many scenarios, multiple parties hold different parts of the data for conducting and analyzing RCTs. Given privacy requirements and expectations of each of these parties, it is often challenging to have a centralized store of data to conduct and analyze RCTs.
We accomplish this by a three-stage solution. The first stage uses the Private Secret Share Set Intersection (PS$^3$I) solution to create a joined set and establish secret shares without revealing membership, while discarding individuals who were placed into more than one group. The second stage runs multiple instances of a general purpose MPC over a sharded database to aggregate statistics about each experimental group while discarding individuals who took an action before they received treatment. The third stage adds distributed and calibrated Differential Privacy (DP) noise to the aggregate statistics and uncertainty measures, providing formal two-sided privacy guarantees.
We also evaluate the performance of multiple open source general purpose MPC libraries for this task. We additionally demonstrate how we have used this to create a working ads effectiveness measurement product capable of measuring hundreds of millions of individuals per experiment.
△ Less
Submitted 10 August, 2021; v1 submitted 12 January, 2021;
originally announced January 2021.
-
DFINITY Technology Overview Series, Consensus System
Authors:
Timo Hanke,
Mahnush Movahedi,
Dominic Williams
Abstract:
The DFINITY blockchain computer provides a secure, performant and flexible consensus mechanism. At its core, DFINITY contains a decentralized randomness beacon which acts as a verifiable random function (VRF) that produces a stream of outputs over time. The novel technique behind the beacon relies on the existence of a unique-deterministic, non-interactive, DKG-friendly threshold signatures scheme…
▽ More
The DFINITY blockchain computer provides a secure, performant and flexible consensus mechanism. At its core, DFINITY contains a decentralized randomness beacon which acts as a verifiable random function (VRF) that produces a stream of outputs over time. The novel technique behind the beacon relies on the existence of a unique-deterministic, non-interactive, DKG-friendly threshold signatures scheme. The only known examples of such a scheme are pairing-based and derived from BLS.
The DFINITY blockchain is layered on top of the DFINITY beacon and uses the beacon as its source of randomness for leader selection and leader ranking. A "weight" is attributed to a chain based on the ranks of the leaders who propose the blocks in the chain, and that weight is used to select between competing chains. The DFINITY blockchain is layered on top of the DFINITY beacon and uses the beacon as its source of randomness for leader selection and leader ranking blockchain is further hardened by a notarization process which dramatically improves the time to finality and eliminates the nothing-at-stake and selfish mining attacks.
DFINITY consensus algorithm is made to scale through continuous quorum selections driven by the random beacon. In practice, DFINITY achieves block times of a few seconds and transaction finality after only two confirmations. The system gracefully handles temporary losses of network synchrony including network splits, while it is provably secure under synchrony.
△ Less
Submitted 11 May, 2018;
originally announced May 2018.
-
Interactive Communication with Unknown Noise Rate
Authors:
Varsha Dani,
Thomas P. Hayes,
Mahnush Movahedi,
Jared Saia,
Maxwell Young
Abstract:
Alice and Bob want to run a protocol over a noisy channel, where a certain number of bits are flipped adversarially. Several results take a protocol requiring $L$ bits of noise-free communication and make it robust over such a channel. In a recent breakthrough result, Haeupler described an algorithm that sends a number of bits that is conjectured to be near optimal in such a model. However, his al…
▽ More
Alice and Bob want to run a protocol over a noisy channel, where a certain number of bits are flipped adversarially. Several results take a protocol requiring $L$ bits of noise-free communication and make it robust over such a channel. In a recent breakthrough result, Haeupler described an algorithm that sends a number of bits that is conjectured to be near optimal in such a model. However, his algorithm critically requires $a \ priori$ knowledge of the number of bits that will be flipped by the adversary.
We describe an algorithm requiring no such knowledge. If an adversary flips $T$ bits, our algorithm sends $L + O\left(\sqrt{L(T+1)\log L} + T\right)$ bits in expectation and succeeds with high probability in $L$. It does so without any $a \ priori$ knowledge of $T$. Assuming a conjectured lower bound by Haeupler, our result is optimal up to logarithmic factors.
Our algorithm critically relies on the assumption of a private channel. We show that privacy is necessary when the amount of noise is unknown.
△ Less
Submitted 13 August, 2015; v1 submitted 23 April, 2015;
originally announced April 2015.
-
On Optimal Decision-Making in Ant Colonies
Authors:
Mahnush Movahedi,
Mahdi Zamani
Abstract:
Colonies of ants can collectively choose the best of several nests, even when many of the active ants who organize the move visit only one site. Understanding such a behavior can help us design efficient distributed decision making algorithms. Marshall et al. propose a model for house-hunting in colonies of ant Temnothorax albipennis. Unfortunately, their model does not achieve optimal decision-ma…
▽ More
Colonies of ants can collectively choose the best of several nests, even when many of the active ants who organize the move visit only one site. Understanding such a behavior can help us design efficient distributed decision making algorithms. Marshall et al. propose a model for house-hunting in colonies of ant Temnothorax albipennis. Unfortunately, their model does not achieve optimal decision-making while laboratory experiments show that, in fact, colonies usually achieve optimality during the house-hunting process. In this paper, we argue that the model of Marshall et al. can achieve optimality by including nest size information in their mathematical model. We use lab results of Pratt et al. to re-define the differential equations of Marshall et al. Finally, we sketch our strategy for testing the optimality of the new model.
△ Less
Submitted 19 August, 2014;
originally announced August 2014.
-
Secure Anonymous Broadcast
Authors:
Mahnush Movahedi,
Jared Saia,
Mahdi Zamani
Abstract:
In anonymous broadcast, one or more parties want to anonymously send messages to all parties. This problem is increasingly important as a black-box in many privacy-preserving applications such as anonymous communication, distributed auctions, and multi-party computation. In this paper, we design decentralized protocols for anonymous broadcast that require each party to send (and compute) a polylog…
▽ More
In anonymous broadcast, one or more parties want to anonymously send messages to all parties. This problem is increasingly important as a black-box in many privacy-preserving applications such as anonymous communication, distributed auctions, and multi-party computation. In this paper, we design decentralized protocols for anonymous broadcast that require each party to send (and compute) a polylogarithmic number of bits (and operations) per anonymous bit delivered with $O(\log n)$ rounds of communication. Our protocol is provably secure against traffic analysis, does not require any trusted party, and is completely load-balanced. The protocol tolerates up to $n/6$ statically-scheduled Byzantine parties that are controlled by a computationally unbounded adversary. Our main strategy for achieving scalability is to perform local communications (and computations) among a logarithmic number of parties. We provide simulation results to show that our protocol improves significantly over previous work. We finally show that using a common cryptographic tool in our protocol one can achieve practical results for anonymous broadcast.
△ Less
Submitted 21 May, 2014;
originally announced May 2014.
-
A DDoS-Aware IDS Model Based on Danger Theory and Mobile Agents
Authors:
Mahdi Zamani,
Mahnush Movahedi,
Mohammad Ebadzadeh,
Hossein Pedram
Abstract:
We propose an artificial immune model for intrusion detection in distributed systems based on a relatively recent theory in immunology called Danger theory. Based on Danger theory, immune response in natural systems is a result of sensing corruption as well as sensing unknown substances. In contrast, traditional self-nonself discrimination theory states that immune response is only initiated by se…
▽ More
We propose an artificial immune model for intrusion detection in distributed systems based on a relatively recent theory in immunology called Danger theory. Based on Danger theory, immune response in natural systems is a result of sensing corruption as well as sensing unknown substances. In contrast, traditional self-nonself discrimination theory states that immune response is only initiated by sensing nonself (unknown) patterns. Danger theory solves many problems that could only be partially explained by the traditional model. Although the traditional model is simpler, such problems result in high false positive rates in immune-inspired intrusion detection systems. We believe using danger theory in a multi-agent environment that computationally emulates the behavior of natural immune systems is effective in reducing false positive rates. We first describe a simplified scenario of immune response in natural systems based on danger theory and then, convert it to a computational model as a network protocol. In our protocol, we define several immune signals and model cell signaling via message passing between agents that emulate cells. Most messages include application-specific patterns that must be meaningfully extracted from various system properties. We show how to model these messages in practice by performing a case study on the problem of detecting distributed denial-of-service attacks in wireless sensor networks. We conduct a set of systematic experiments to find a set of performance metrics that can accurately distinguish malicious patterns. The results indicate that the system can be efficiently used to detect malicious patterns with a high level of accuracy.
△ Less
Submitted 28 December, 2014; v1 submitted 31 December, 2013;
originally announced January 2014.
-
Machine Learning Techniques for Intrusion Detection
Authors:
Mahdi Zamani,
Mahnush Movahedi
Abstract:
An Intrusion Detection System (IDS) is a software that monitors a single or a network of computers for malicious activities (attacks) that are aimed at stealing or censoring information or corrupting network protocols. Most techniques used in today's IDS are not able to deal with the dynamic and complex nature of cyber attacks on computer networks. Hence, efficient adaptive methods like various te…
▽ More
An Intrusion Detection System (IDS) is a software that monitors a single or a network of computers for malicious activities (attacks) that are aimed at stealing or censoring information or corrupting network protocols. Most techniques used in today's IDS are not able to deal with the dynamic and complex nature of cyber attacks on computer networks. Hence, efficient adaptive methods like various techniques of machine learning can result in higher detection rates, lower false alarm rates and reasonable computation and communication costs. In this paper, we study several such schemes and compare their performance. We divide the schemes into methods based on classical artificial intelligence (AI) and methods based on computational intelligence (CI). We explain how various characteristics of CI techniques can be used to build efficient IDS.
△ Less
Submitted 9 May, 2015; v1 submitted 8 December, 2013;
originally announced December 2013.
-
Quorums Quicken Queries: Efficient Asynchronous Secure Multiparty Computation
Authors:
Varsha Dani,
Valerie King,
Mahnush Movahedi,
Jared Saia
Abstract:
We describe an asynchronous algorithm to solve secure multiparty computation (MPC) over n players, when strictly less than a 1/8 fraction of the players are controlled by a static adversary. For any function f over a field that can be computed by a circuit with m gates, our algorithm requires each player to send a number of field elements and perform an amount of computation that is O (m/n + \sqrt…
▽ More
We describe an asynchronous algorithm to solve secure multiparty computation (MPC) over n players, when strictly less than a 1/8 fraction of the players are controlled by a static adversary. For any function f over a field that can be computed by a circuit with m gates, our algorithm requires each player to send a number of field elements and perform an amount of computation that is O (m/n + \sqrt{n}). This significantly improves over traditional algorithms, which require each player to both send a number of messages and perform computation that is Ω(nm). Additionally, we define the threshold counting problem and present a distributed algorithm to solve it in the asynchronous communication model. Our algorithm is load balanced, with computation, communication and latency complexity of O(log n), and may be of independent interest to other applications with a load balancing goal in mind.
△ Less
Submitted 13 October, 2013;
originally announced October 2013.
-
Scalable Mechanisms for Rational Secret Sharing
Authors:
Varsha Dani,
Mahnush Movahedi,
Jared Saia
Abstract:
We consider the classical secret sharing problem in the case where all agents are selfish but rational. In recent work, Kol and Naor show that, when there are two players, in the non-simultaneous communication model, i.e. when rushing is possible, there is no Nash equilibrium that ensures both players learn the secret. However, they describe a mechanism for this problem, for any number of players,…
▽ More
We consider the classical secret sharing problem in the case where all agents are selfish but rational. In recent work, Kol and Naor show that, when there are two players, in the non-simultaneous communication model, i.e. when rushing is possible, there is no Nash equilibrium that ensures both players learn the secret. However, they describe a mechanism for this problem, for any number of players, that is an epsilon-Nash equilibrium, in that no player can gain more than epsilon utility by deviating from it. Unfortunately, the Kol and Naor mechanism, and, to the best of our knowledge, all previous mechanisms for this problem require each agent to send O(n) messages in expectation, where n is the number of agents. This may be problematic for some applications of rational secret sharing such as secure multi-party computation and simulation of a mediator.
We address this issue by describing mechanisms for rational secret sharing that are designed for large n. Both of our results hold for n > 2, and are Nash equilbria, rather than just epsilon-Nash equilbria.
Our first result is a mechanism for n-out-of-n rational secret sharing that is scalable in the sense that it requires each agent to send only an expected O(log n) bits. Moreover, the latency of this mechanism is O(log n) in expectation, compared to O(n) expected latency for the Kol and Naor result. Our second result is a mechanism for a relaxed variant of rational m-out-of-n secret sharing where m = Theta(n). It requires each processor to send O(log n) bits and has O(log n) latency. Both of our mechanisms are non-cryptographic, and are not susceptible to backwards induction.
△ Less
Submitted 2 May, 2012;
originally announced May 2012.
-
Secure Multi-Party Computation in Large Networks
Authors:
Varsha Dani,
Valerie King,
Mahnush Movahedi,
Jared Saia,
Mahdi Zamani
Abstract:
We describe scalable protocols for solving the secure multi-party computation (MPC) problem among a large number of parties. We consider both the synchronous and the asynchronous communication models. In the synchronous setting, our protocol is secure against a static malicious adversary corrupting less than a $1/3$ fraction of the parties. In the asynchronous setting, we allow the adversary to co…
▽ More
We describe scalable protocols for solving the secure multi-party computation (MPC) problem among a large number of parties. We consider both the synchronous and the asynchronous communication models. In the synchronous setting, our protocol is secure against a static malicious adversary corrupting less than a $1/3$ fraction of the parties. In the asynchronous setting, we allow the adversary to corrupt less than a $1/8$ fraction of parties. For any deterministic function that can be computed by an arithmetic circuit with $m$ gates, both of our protocols require each party to send a number of field elements and perform an amount of computation that is $\tilde{O}(m/n + \sqrt n)$. We also show that our protocols provide perfect and universally-composable security.
To achieve our asynchronous MPC result, we define the \emph{threshold counting problem} and present a distributed protocol to solve it in the asynchronous setting. This protocol is load balanced, with computation, communication and latency complexity of $O(\log{n})$, and can also be used for designing other load-balanced applications in the asynchronous communication model.
△ Less
Submitted 27 September, 2015; v1 submitted 1 March, 2012;
originally announced March 2012.