Search | arXiv e-print repository

arXiv:2507.06261 [pdf, ps, other]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3278 additional authors not shown)

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving. △ Less

Submitted 7 July, 2025; originally announced July 2025.

Comments: 72 pages, 17 figures

arXiv:2507.03786 [pdf, ps, other]

Bicriteria approximation for $k$-edge-connectivity

Authors: Zeev Nutov, Reut Cohen

Abstract: In the $k$-Edge Connected Spanning Subgraph ($k$-ECSS) problem we are given a (multi-)graph $G=(V,E)$ with edge costs and an integer $k$, and seek a min-cost $k$-edge-connected spanning subgraph of $G$. The problem admits a $2$-approximation algorithm and no better approximation ratio is known. Recently, Hershkowitz, Klein, and Zenklusen [STOC 24] gave a bicriteria $(1,k-10)$-approximation algorit… ▽ More In the $k$-Edge Connected Spanning Subgraph ($k$-ECSS) problem we are given a (multi-)graph $G=(V,E)$ with edge costs and an integer $k$, and seek a min-cost $k$-edge-connected spanning subgraph of $G$. The problem admits a $2$-approximation algorithm and no better approximation ratio is known. Recently, Hershkowitz, Klein, and Zenklusen [STOC 24] gave a bicriteria $(1,k-10)$-approximation algorithm that computes a $(k-10)$-edge-connected spanning subgraph of cost at most the optimal value of a standard Cut-LP for $k$-ECSS. We improve the bicriteria approximation to $(1,k-4)$, and also give another non-trivial bicriteria approximation $(3/2,k-2)$. The $k$-Edge-Connected Spanning Multi-subgraph ($k$-ECSM) problem is almost the same as $k$-ECSS, except that any edge can be selected multiple times at the same cost. A $(1,k-p)$ bicriteria approximation for $k$-ECSS w.r.t. Cut-LP implies approximation ratio $1+p/k$ for $k$-ECSM, hence our result also improves the approximation ratio for $k$-ECSM. △ Less

Submitted 4 July, 2025; originally announced July 2025.

arXiv:2505.21218 [pdf, ps, other]

Pretrained LLMs Learn Multiple Types of Uncertainty

Authors: Roi Cohen, Omri Fahn, Gerard de Melo

Abstract: Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we study how well LLMs capture uncertainty, without explicitly being trained for that. We show that, if consider… ▽ More Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we study how well LLMs capture uncertainty, without explicitly being trained for that. We show that, if considering uncertainty as a linear concept in the model's latent space, it might indeed be captured, even after only pretraining. We further show that, though unintuitive, LLMs appear to capture several different types of uncertainty, each of which can be useful to predict the correctness for a specific task or benchmark. Furthermore, we provide in-depth results such as demonstrating a correlation between our correction prediction and the model's ability to abstain from misinformation using words, and the lack of impact of model scaling for capturing uncertainty. Finally, we claim that unifying the uncertainty types as a single one using instruction-tuning or [IDK]-token tuning is helpful for the model in terms of correctness prediction. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.20487 [pdf, other]

InFact: Informativeness Alignment for Improved LLM Factuality

Authors: Roi Cohen, Russa Biswas, Gerard de Melo

Abstract: Factual completeness is a general term that captures how detailed and informative a factually correct text is. For instance, the factual sentence ``Barack Obama was born in the United States'' is factually correct, though less informative than the factual sentence ``Barack Obama was born in Honolulu, Hawaii, United States''. Despite the known fact that LLMs tend to hallucinate and generate factual… ▽ More Factual completeness is a general term that captures how detailed and informative a factually correct text is. For instance, the factual sentence ``Barack Obama was born in the United States'' is factually correct, though less informative than the factual sentence ``Barack Obama was born in Honolulu, Hawaii, United States''. Despite the known fact that LLMs tend to hallucinate and generate factually incorrect text, they might also tend to choose to generate factual text that is indeed factually correct and yet less informative than other, more informative choices. In this work, we tackle this problem by proposing an informativeness alignment mechanism. This mechanism takes advantage of recent factual benchmarks to propose an informativeness alignment objective. This objective prioritizes answers that are both correct and informative. A key finding of our work is that when training a model to maximize this objective or optimize its preference, we can improve not just informativeness but also factuality. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2504.01481 [pdf, other]

Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code

Authors: Roxane Cohen, Robin David, Florian Yger, Fabrice Rossi

Abstract: Protecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must first detect and characterize obfuscation before launching any attack against it. This paper investigates the problem of function-level obfuscation detection us… ▽ More Protecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must first detect and characterize obfuscation before launching any attack against it. This paper investigates the problem of function-level obfuscation detection using graph-based approaches, comparing algorithms, from elementary baselines to promising techniques like GNN (Graph Neural Networks), on different feature choices. We consider various obfuscation types and obfuscators, resulting in two complex datasets. Our findings demonstrate that GNNs need meaningful features that capture aspects of function semantics to outperform baselines. Our approach shows satisfactory results, especially in a challenging 11-class classification task and in a practical malware analysis example. △ Less

Submitted 2 April, 2025; originally announced April 2025.

Comments: The 13th International Conference on Complex Networks and their Applications, Dec 2024, Istabul, Turkey

arXiv:2503.20429 [pdf, other]

Latent Beam Diffusion Models for Decoding Image Sequences

Authors: Guilherme Fernandes, Vasco Ramos, Regev Cohen, Idan Szpektor, João Magalhães

Abstract: While diffusion models excel at generating high-quality images from text prompts, they struggle with visual consistency in image sequences. Existing methods generate each image independently, leading to disjointed narratives - a challenge further exacerbated in non-linear storytelling, where scenes must connect beyond adjacent frames. We introduce a novel beam search strategy for latent space expl… ▽ More While diffusion models excel at generating high-quality images from text prompts, they struggle with visual consistency in image sequences. Existing methods generate each image independently, leading to disjointed narratives - a challenge further exacerbated in non-linear storytelling, where scenes must connect beyond adjacent frames. We introduce a novel beam search strategy for latent space exploration, enabling conditional generation of full image sequences with beam search decoding. Unlike prior approaches that use fixed latent priors, our method dynamically searches for an optimal sequence of latent representations, ensuring coherent visual transitions. As the latent denoising space is explored, the beam search graph is pruned with a cross-attention mechanism that efficiently scores search paths, prioritizing alignment with both textual prompts and visual context. Human and automatic evaluations confirm that BeamDiffusion outperforms other baseline methods, producing full sequences with superior coherence, visual continuity, and textual alignment. △ Less

Submitted 28 May, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

arXiv:2502.00012 [pdf, ps, other]

Lessons from complexity theory for AI governance

Authors: Noam Kolt, Michal Shur-Ofry, Reuven Cohen

Abstract: The study of complex adaptive systems, pioneered in physics, biology, and the social sciences, offers important lessons for AI governance. Contemporary AI systems and the environments in which they operate exhibit many of the properties characteristic of complex systems, including nonlinear growth patterns, emergent phenomena, and cascading effects that can lead to tail risks. Complexity theory ca… ▽ More The study of complex adaptive systems, pioneered in physics, biology, and the social sciences, offers important lessons for AI governance. Contemporary AI systems and the environments in which they operate exhibit many of the properties characteristic of complex systems, including nonlinear growth patterns, emergent phenomena, and cascading effects that can lead to tail risks. Complexity theory can help illuminate the features of AI that pose central challenges for policymakers, such as feedback loops induced by training AI models on synthetic data and the interconnectedness between AI systems and critical infrastructure. Drawing on insights from other domains shaped by complex systems, including public health and climate change, we examine how efforts to govern AI are marked by deep uncertainty. To contend with this challenge, we propose a set of complexity-compatible principles concerning the timing and structure of AI governance, and the risk thresholds that should trigger regulatory intervention. △ Less

Submitted 3 March, 2025; v1 submitted 7 January, 2025; originally announced February 2025.

arXiv:2412.06676 [pdf, other]

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Authors: Roi Cohen, Konstantin Dobler, Eden Biran, Gerard de Melo

Abstract: Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we propose a novel calibration method that can be used to combat hallucinations. We add a special [IDK] ("I don'… ▽ More Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we propose a novel calibration method that can be used to combat hallucinations. We add a special [IDK] ("I don't know") token to the model's vocabulary and introduce an objective function that shifts probability mass to the [IDK] token for incorrect predictions. This approach allows the model to express uncertainty in its output explicitly. We evaluate our proposed method across multiple model architectures and factual downstream tasks. We find that models trained with our method are able to express uncertainty in places where they would previously make mistakes while suffering only a small loss of encoded knowledge. We further perform extensive ablation studies of multiple variations of our approach and provide a detailed analysis of the precision-recall tradeoff of our method. △ Less

Submitted 9 December, 2024; originally announced December 2024.

Comments: Published at NeurIPS 2024

arXiv:2411.11869 [pdf, other]

A Multi-Modal Unsupervised Machine Learning Approach for Biomedical Signal Processing in CPR

Authors: Saidul Islam, Jamal Bentahar, Robin Cohen, Gaith Rjoub

Abstract: Cardiopulmonary resuscitation (CPR) is a critical, life-saving intervention aimed at restoring blood circulation and breathing in individuals experiencing cardiac arrest or respiratory failure. Accurate and real-time analysis of biomedical signals during CPR is essential for monitoring and decision-making, from the pre-hospital stage to the intensive care unit (ICU). However, CPR signals are often… ▽ More Cardiopulmonary resuscitation (CPR) is a critical, life-saving intervention aimed at restoring blood circulation and breathing in individuals experiencing cardiac arrest or respiratory failure. Accurate and real-time analysis of biomedical signals during CPR is essential for monitoring and decision-making, from the pre-hospital stage to the intensive care unit (ICU). However, CPR signals are often corrupted by noise and artifacts, making precise interpretation challenging. Traditional denoising methods, such as filters, struggle to adapt to the varying and complex noise patterns present in CPR signals. Given the high-stakes nature of CPR, where rapid and accurate responses can determine survival, there is a pressing need for more robust and adaptive denoising techniques. In this context, an unsupervised machine learning (ML) methodology is particularly valuable, as it removes the dependence on labeled data, which can be scarce or impractical in emergency scenarios. This paper introduces a novel unsupervised ML approach for denoising CPR signals using a multi-modality framework, which leverages multiple signal sources to enhance the denoising process. The proposed approach not only improves noise reduction and signal fidelity but also preserves critical inter-signal correlations (0.9993) which is crucial for downstream tasks. Furthermore, it outperforms existing methods in an unsupervised context in terms of signal-to-noise ratio (SNR) and peak signal-to-noise ratio (PSNR), making it highly effective for real-time applications. The integration of multi-modality further enhances the system's adaptability to various biomedical signals beyond CPR, improving both automated CPR systems and clinical decision-making. △ Less

Submitted 3 November, 2024; originally announced November 2024.

arXiv:2411.03131 [pdf, other]

Machine Learning Innovations in CPR: A Comprehensive Survey on Enhanced Resuscitation Techniques

Authors: Saidul Islam, Gaith Rjoub, Hanae Elmekki, Jamal Bentahar, Witold Pedrycz, Robin Cohen

Abstract: This survey paper explores the transformative role of Machine Learning (ML) and Artificial Intelligence (AI) in Cardiopulmonary Resuscitation (CPR). It examines the evolution from traditional CPR methods to innovative ML-driven approaches, highlighting the impact of predictive modeling, AI-enhanced devices, and real-time data analysis in improving resuscitation outcomes. The paper provides a compr… ▽ More This survey paper explores the transformative role of Machine Learning (ML) and Artificial Intelligence (AI) in Cardiopulmonary Resuscitation (CPR). It examines the evolution from traditional CPR methods to innovative ML-driven approaches, highlighting the impact of predictive modeling, AI-enhanced devices, and real-time data analysis in improving resuscitation outcomes. The paper provides a comprehensive overview, classification, and critical analysis of current applications, challenges, and future directions in this emerging field. △ Less

Submitted 3 November, 2024; originally announced November 2024.

arXiv:2410.19791 [pdf, other]

Data-Driven Cellular Network Selector for Vehicle Teleoperations

Authors: Barak Gahtan, Reuven Cohen, Alex M. Bronstein, Eli Shapira

Abstract: Remote control of robotic systems, also known as teleoperation, is crucial for the development of autonomous vehicle (AV) technology. It allows a remote operator to view live video from AVs and, in some cases, to make real-time decisions. The effectiveness of video-based teleoperation systems is heavily influenced by the quality of the cellular network and, in particular, its packet loss rate and… ▽ More Remote control of robotic systems, also known as teleoperation, is crucial for the development of autonomous vehicle (AV) technology. It allows a remote operator to view live video from AVs and, in some cases, to make real-time decisions. The effectiveness of video-based teleoperation systems is heavily influenced by the quality of the cellular network and, in particular, its packet loss rate and latency. To optimize these parameters, an AV can be connected to multiple cellular networks and determine in real time over which cellular network each video packet will be transmitted. We present an algorithm, called Active Network Selector (ANS), which uses a time series machine learning approach for solving this problem. We compare ANS to a baseline non-learning algorithm, which is used today in commercial systems, and show that ANS performs much better, with respect to both packet loss and packet latency. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: IEEE Network of Future 2024

arXiv:2410.06140 [pdf, other]

Estimating the Number of HTTP/3 Responses in QUIC Using Deep Learning

Authors: Barak Gahtan, Robert J. Shahla, Reuven Cohen, Alex M. Bronstein

Abstract: QUIC, a new and increasingly used transport protocol, enhances TCP by offering improved security, performance, and stream multiplexing. These features, however, also impose challenges for network middle-boxes that need to monitor and analyze web traffic. This paper proposes a novel method to estimate the number of HTTP/3 responses in a given QUIC connection by an observer. This estimation reveals… ▽ More QUIC, a new and increasingly used transport protocol, enhances TCP by offering improved security, performance, and stream multiplexing. These features, however, also impose challenges for network middle-boxes that need to monitor and analyze web traffic. This paper proposes a novel method to estimate the number of HTTP/3 responses in a given QUIC connection by an observer. This estimation reveals server behavior, client-server interactions, and data transmission efficiency, which is crucial for various applications such as designing a load balancing solution and detecting HTTP/3 flood attacks. The proposed scheme transforms QUIC connection traces into image sequences and uses machine learning (ML) models, guided by a tailored loss function, to predict response counts. Evaluations on more than seven million images-derived from 100,000 traces collected across 44,000 websites over four months-achieve up to 97% accuracy in both known and unknown server settings and 92% accuracy on previously unseen complete QUIC traces. △ Less

Submitted 28 April, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.03728 [pdf, other]

Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis

Authors: Barak Gahtan, Robert J. Shahla, Alex M. Bronstein, Reuven Cohen

Abstract: The increasing adoption of the QUIC transport protocol has transformed encrypted web traffic, necessitating new methodologies for network analysis. However, existing datasets lack the scope, metadata, and decryption capabilities required for robust benchmarking in encrypted traffic research. We introduce VisQUIC, a large-scale dataset of 100,000 labeled QUIC traces from over 44,000 websites, colle… ▽ More The increasing adoption of the QUIC transport protocol has transformed encrypted web traffic, necessitating new methodologies for network analysis. However, existing datasets lack the scope, metadata, and decryption capabilities required for robust benchmarking in encrypted traffic research. We introduce VisQUIC, a large-scale dataset of 100,000 labeled QUIC traces from over 44,000 websites, collected over four months. Unlike prior datasets, VisQUIC provides SSL keys for controlled decryption, supports multiple QUIC implementations (Chromium QUIC, Facebooks mvfst, Cloudflares quiche), and introduces a novel image-based representation that enables machine learning-driven encrypted traffic analysis. The dataset includes standardized benchmarking tools, ensuring reproducibility. To demonstrate VisQUICs utility, we present a benchmarking task for estimating HTTP/3 responses in encrypted QUIC traffic, achieving 97% accuracy using only observable packet features. By publicly releasing VisQUIC, we provide an open foundation for advancing encrypted traffic analysis, QUIC security research, and network monitoring. △ Less

Submitted 24 May, 2025; v1 submitted 30 September, 2024; originally announced October 2024.

Comments: The dataset and the supplementary material can be provided upon request

arXiv:2410.02914 [pdf, other]

Streamlining Conformal Information Retrieval via Score Refinement

Authors: Yotam Intrator, Ori Kelner, Regev Cohen, Roman Goldenberg, Ehud Rivlin, Daniel Freedman

Abstract: Information retrieval (IR) methods, like retrieval augmented generation, are fundamental to modern applications but often lack statistical guarantees. Conformal prediction addresses this by retrieving sets guaranteed to include relevant information, yet existing approaches produce large-sized sets, incurring high computational costs and slow response times. In this work, we introduce a score refin… ▽ More Information retrieval (IR) methods, like retrieval augmented generation, are fundamental to modern applications but often lack statistical guarantees. Conformal prediction addresses this by retrieving sets guaranteed to include relevant information, yet existing approaches produce large-sized sets, incurring high computational costs and slow response times. In this work, we introduce a score refinement method that applies a simple monotone transformation to retrieval scores, leading to significantly smaller conformal sets while maintaining their statistical guarantees. Experiments on various BEIR benchmarks validate the effectiveness of our approach in producing compact sets containing relevant information. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: 6 pages

arXiv:2408.12570 [pdf, other]

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Authors: Jamba Team, Barak Lenz, Alan Arazi, Amir Bergman, Avshalom Manevich, Barak Peleg, Ben Aviram, Chen Almagor, Clara Fridman, Dan Padnos, Daniel Gissin, Daniel Jannai, Dor Muhlgay, Dor Zimberg, Edden M Gerber, Elad Dolev, Eran Krakovsky, Erez Safahi, Erez Schwartz, Gal Cohen, Gal Shachaf, Haim Rozenblum, Hofit Bata, Ido Blass, Inbal Magar , et al. (36 additional authors not shown)

Abstract: We present Jamba-1.5, new instruction-tuned large language models based on our Jamba architecture. Jamba is a hybrid Transformer-Mamba mixture of experts architecture, providing high throughput and low memory usage across context lengths, while retaining the same or better quality as Transformer models. We release two model sizes: Jamba-1.5-Large, with 94B active parameters, and Jamba-1.5-Mini, wi… ▽ More We present Jamba-1.5, new instruction-tuned large language models based on our Jamba architecture. Jamba is a hybrid Transformer-Mamba mixture of experts architecture, providing high throughput and low memory usage across context lengths, while retaining the same or better quality as Transformer models. We release two model sizes: Jamba-1.5-Large, with 94B active parameters, and Jamba-1.5-Mini, with 12B active parameters. Both models are fine-tuned for a variety of conversational and instruction-following capabilties, and have an effective context length of 256K tokens, the largest amongst open-weight models. To support cost-effective inference, we introduce ExpertsInt8, a novel quantization technique that allows fitting Jamba-1.5-Large on a machine with 8 80GB GPUs when processing 256K-token contexts without loss of quality. When evaluated on a battery of academic and chatbot benchmarks, Jamba-1.5 models achieve excellent results while providing high throughput and outperforming other open-weight models on long-context benchmarks. The model weights for both sizes are publicly available under the Jamba Open Model License and we release ExpertsInt8 as open source. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: Webpage: https://www.ai21.com/jamba

arXiv:2407.15153 [pdf, other]

Anchored Diffusion for Video Face Reenactment

Authors: Idan Kligvasser, Regev Cohen, George Leifman, Ehud Rivlin, Michael Elad

Abstract: Video generation has drawn significant interest recently, pushing the development of large-scale models capable of producing realistic videos with coherent motion. Due to memory constraints, these models typically generate short video segments that are then combined into long videos. The merging process poses a significant challenge, as it requires ensuring smooth transitions and overall consisten… ▽ More Video generation has drawn significant interest recently, pushing the development of large-scale models capable of producing realistic videos with coherent motion. Due to memory constraints, these models typically generate short video segments that are then combined into long videos. The merging process poses a significant challenge, as it requires ensuring smooth transitions and overall consistency. In this paper, we introduce Anchored Diffusion, a novel method for synthesizing relatively long and seamless videos. We extend Diffusion Transformers (DiTs) to incorporate temporal information, creating our sequence-DiT (sDiT) model for generating short video segments. Unlike previous works, we train our model on video sequences with random non-uniform temporal spacing and incorporate temporal information via external guidance, increasing flexibility and allowing it to capture both short and long-term relationships. Furthermore, during inference, we leverage the transformer architecture to modify the diffusion process, generating a batch of non-uniform sequences anchored to a common frame, ensuring consistency regardless of temporal distance. To demonstrate our method, we focus on face reenactment, the task of creating a video from a source image that replicates the facial expressions and movements from a driving video. Through comprehensive experiments, we show our approach outperforms current techniques in producing longer consistent high-quality videos while offering editing capabilities. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2406.07024 [pdf, other]

Plant-and-Steal: Truthful Fair Allocations via Predictions

Authors: Ilan Reuven Cohen, Alon Eden, Talya Eden, Arsen Vasilyan

Abstract: We study truthful mechanisms for approximating the Maximin-Share (MMS) allocation of agents with additive valuations for indivisible goods. Algorithmically, constant factor approximations exist for the problem for any number of agents. When adding incentives to the mix, a jarring result by Amanatidis, Birmpas, Christodoulou, and Markakis [EC 2017] shows that the best possible approximation for two… ▽ More We study truthful mechanisms for approximating the Maximin-Share (MMS) allocation of agents with additive valuations for indivisible goods. Algorithmically, constant factor approximations exist for the problem for any number of agents. When adding incentives to the mix, a jarring result by Amanatidis, Birmpas, Christodoulou, and Markakis [EC 2017] shows that the best possible approximation for two agents and $m$ items is $\lfloor \frac{m}{2} \rfloor$. We adopt a learning-augmented framework to investigate what is possible when some prediction on the input is given. For two agents, we give a truthful mechanism that takes agents' ordering over items as prediction. When the prediction is accurate, we give a $2$-approximation to the MMS (consistency), and when the prediction is off, we still get an $\lceil \frac{m}{2} \rceil$-approximation to the MMS (robustness). We further show that the mechanism's performance degrades gracefully in the number of ``mistakes" in the prediction; i.e., we interpolate (up to constant factors) between the two extremes: when there are no mistakes, and when there is a maximum number of mistakes. We also show an impossibility result on the obtainable consistency for mechanisms with finite robustness. For the general case of $n\ge 2$ agents, we give a 2-approximation mechanism for accurate predictions, with relaxed fallback guarantees. Finally, we give experimental results which illustrate when different components of our framework, made to insure consistency and robustness, come into play. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.16475 [pdf, other]

Looks Too Good To Be True: An Information-Theoretic Analysis of Hallucinations in Generative Restoration Models

Authors: Regev Cohen, Idan Kligvasser, Ehud Rivlin, Daniel Freedman

Abstract: The pursuit of high perceptual quality in image restoration has driven the development of revolutionary generative models, capable of producing results often visually indistinguishable from real data. However, as their perceptual quality continues to improve, these models also exhibit a growing tendency to generate hallucinations - realistic-looking details that do not exist in the ground truth im… ▽ More The pursuit of high perceptual quality in image restoration has driven the development of revolutionary generative models, capable of producing results often visually indistinguishable from real data. However, as their perceptual quality continues to improve, these models also exhibit a growing tendency to generate hallucinations - realistic-looking details that do not exist in the ground truth images. Hallucinations in these models create uncertainty about their reliability, raising major concerns about their practical application. This paper investigates this phenomenon through the lens of information theory, revealing a fundamental tradeoff between uncertainty and perception. We rigorously analyze the relationship between these two factors, proving that the global minimal uncertainty in generative models grows in tandem with perception. In particular, we define the inherent uncertainty of the restoration problem and show that attaining perfect perceptual quality entails at least twice this uncertainty. Additionally, we establish a relation between distortion, uncertainty and perception, through which we prove the aforementioned uncertainly-perception tradeoff induces the well-known perception-distortion tradeoff. We demonstrate our theoretical findings through experiments with super-resolution and inpainting algorithms. This work uncovers fundamental limitations of generative models in achieving both high perceptual quality and reliable predictions for image restoration. Thus, we aim to raise awareness among practitioners about this inherent tradeoff, empowering them to make informed decisions and potentially prioritize safety over perceptual performance. △ Less

Submitted 25 October, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.11566 [pdf, other]

Uncertainty-Aware PPG-2-ECG for Enhanced Cardiovascular Diagnosis using Diffusion Models

Authors: Omer Belhasin, Idan Kligvasser, George Leifman, Regev Cohen, Erin Rainaldi, Li-Fang Cheng, Nishant Verma, Paul Varghese, Ehud Rivlin, Michael Elad

Abstract: Analyzing the cardiovascular system condition via Electrocardiography (ECG) is a common and highly effective approach, and it has been practiced and perfected over many decades. ECG sensing is non-invasive and relatively easy to acquire, and yet it is still cumbersome for holter monitoring tests that may span over hours and even days. A possible alternative in this context is Photoplethysmography… ▽ More Analyzing the cardiovascular system condition via Electrocardiography (ECG) is a common and highly effective approach, and it has been practiced and perfected over many decades. ECG sensing is non-invasive and relatively easy to acquire, and yet it is still cumbersome for holter monitoring tests that may span over hours and even days. A possible alternative in this context is Photoplethysmography (PPG): An optically-based signal that measures blood volume fluctuations, as typically sensed by conventional ``wearable devices''. While PPG presents clear advantages in acquisition, convenience, and cost-effectiveness, ECG provides more comprehensive information, allowing for a more precise detection of heart conditions. This implies that a conversion from PPG to ECG, as recently discussed in the literature, inherently involves an unavoidable level of uncertainty. In this paper we introduce a novel methodology for addressing the PPG-2-ECG conversion, and offer an enhanced classification of cardiovascular conditions using the given PPG, all while taking into account the uncertainties arising from the conversion process. We provide a mathematical justification for our proposed computational approach, and present empirical studies demonstrating its superior performance compared to state-of-the-art baseline methods. △ Less

Submitted 20 April, 2025; v1 submitted 19 May, 2024; originally announced May 2024.

arXiv:2404.16859 [pdf, other]

Rumour Evaluation with Very Large Language Models

Authors: Dahlia Shehata, Robin Cohen, Charles Clarke

Abstract: Conversational prompt-engineering-based large language models (LLMs) have enabled targeted control over the output creation, enhancing versatility, adaptability and adhoc retrieval. From another perspective, digital misinformation has reached alarming levels. The anonymity, availability and reach of social media offer fertile ground for rumours to propagate. This work proposes to leverage the adva… ▽ More Conversational prompt-engineering-based large language models (LLMs) have enabled targeted control over the output creation, enhancing versatility, adaptability and adhoc retrieval. From another perspective, digital misinformation has reached alarming levels. The anonymity, availability and reach of social media offer fertile ground for rumours to propagate. This work proposes to leverage the advancement of prompting-dependent LLMs to combat misinformation by extending the research efforts of the RumourEval task on its Twitter dataset. To the end, we employ two prompting-based LLM variants (GPT-3.5-turbo and GPT-4) to extend the two RumourEval subtasks: (1) veracity prediction, and (2) stance classification. For veracity prediction, three classifications schemes are experimented per GPT variant. Each scheme is tested in zero-, one- and few-shot settings. Our best results outperform the precedent ones by a substantial margin. For stance classification, prompting-based-approaches show comparable performance to prior results, with no improvement over finetuning methods. Rumour stance subtask is also extended beyond the original setting to allow multiclass classification. All of the generated predictions for both subtasks are equipped with confidence scores determining their trustworthiness degree according to the LLM, and post-hoc justifications for explainability and interpretability purposes. Our primary aim is AI for social good. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2402.12423 [pdf, other]

On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models

Authors: Miri Varshavsky-Hassid, Roy Hirsch, Regev Cohen, Tomer Golany, Daniel Freedman, Ehud Rivlin

Abstract: The incorporation of Denoising Diffusion Models (DDMs) in the Text-to-Speech (TTS) domain is rising, providing great value in synthesizing high quality speech. Although they exhibit impressive audio quality, the extent of their semantic capabilities is unknown, and controlling their synthesized speech's vocal properties remains a challenge. Inspired by recent advances in image synthesis, we explor… ▽ More The incorporation of Denoising Diffusion Models (DDMs) in the Text-to-Speech (TTS) domain is rising, providing great value in synthesizing high quality speech. Although they exhibit impressive audio quality, the extent of their semantic capabilities is unknown, and controlling their synthesized speech's vocal properties remains a challenge. Inspired by recent advances in image synthesis, we explore the latent space of frozen TTS models, which is composed of the latent bottleneck activations of the DDM's denoiser. We identify that this space contains rich semantic information, and outline several novel methods for finding semantic directions within it, both supervised and unsupervised. We then demonstrate how these enable off-the-shelf audio editing, without any further training, architectural changes or data requirements. We present evidence of the semantic and acoustic qualities of the edited audio, and provide supplemental samples: https://latent-analysis-grad-tts.github.io/speech-samples/. △ Less

Submitted 4 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted to ACL 2024

arXiv:2402.00857 [pdf, other]

Early Time Classification with Accumulated Accuracy Gap Control

Authors: Liran Ringel, Regev Cohen, Daniel Freedman, Michael Elad, Yaniv Romano

Abstract: Early time classification algorithms aim to label a stream of features without processing the full input stream, while maintaining accuracy comparable to that achieved by applying the classifier to the entire input. In this paper, we introduce a statistical framework that can be applied to any sequential classifier, formulating a calibrated stopping rule. This data-driven rule attains finite-sampl… ▽ More Early time classification algorithms aim to label a stream of features without processing the full input stream, while maintaining accuracy comparable to that achieved by applying the classifier to the entire input. In this paper, we introduce a statistical framework that can be applied to any sequential classifier, formulating a calibrated stopping rule. This data-driven rule attains finite-sample, distribution-free control of the accuracy gap between full and early-time classification. We start by presenting a novel method that builds on the Learn-then-Test calibration framework to control this gap marginally, on average over i.i.d. instances. As this algorithm tends to yield an excessively high accuracy gap for early halt times, our main contribution is the proposal of a framework that controls a stronger notion of error, where the accuracy gap is controlled conditionally on the accumulated halt times. Numerical experiments demonstrate the effectiveness, applicability, and usefulness of our method. We show that our proposed early stopping mechanism reduces up to 94% of timesteps used for classification while achieving rigorous accuracy gap control. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.02501 [pdf, other]

A Kolmogorov metric embedding for live cell microscopy signaling patterns

Authors: Layton Aho, Mark Winter, Marc DeCarlo, Agne Frismantiene, Yannick Blum, Paolo Armando Gagliardi, Olivier Pertz, Andrew R. Cohen

Abstract: We present a metric embedding that captures spatiotemporal patterns of cell signaling dynamics in 5-D $(x,y,z,channel,time)$ live cell microscopy movies. The embedding uses a metric distance called the normalized information distance (NID) based on Kolmogorov complexity theory, an absolute measure of information content between digital objects. The NID uses statistics of lossless compression to co… ▽ More We present a metric embedding that captures spatiotemporal patterns of cell signaling dynamics in 5-D $(x,y,z,channel,time)$ live cell microscopy movies. The embedding uses a metric distance called the normalized information distance (NID) based on Kolmogorov complexity theory, an absolute measure of information content between digital objects. The NID uses statistics of lossless compression to compute a theoretically optimal metric distance between pairs of 5-D movies, requiring no a priori knowledge of expected pattern dynamics, and no training data. The cell signaling structure function (SSF) is defined using a class of metric 3-D image filters that compute at each spatiotemporal cell centroid the voxel intensity configuration of the nucleus w.r.t. the surrounding cytoplasm, or a functional output e.g. velocity. The only parameter is the expected cell radii ($μm$). The SSF can be optionally combined with segmentation and tracking algorithms. The resulting lossless compression pipeline represents each 5-D input movie as a single point in a metric embedding space. The utility of a metric embedding follows from Euclidean distance between any points in the embedding space approximating optimally the pattern difference, as measured by the NID, between corresponding pairs of 5-D movies. This is true throughout the embedding space, not only at points corresponding to input images. Examples are shown for synthetic data, for 2-D+time movies of ERK and AKT signaling under different oncogenic mutations in human epithelial (MCF10A) cells, for 3-D MCF10A spheroids under optogenetic manipulation of ERK, and for ERK dynamics during colony differentiation in human induced pluripotent stem cells. △ Less

Submitted 5 February, 2025; v1 submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.14506 [pdf, other]

Concurrent Asynchronous Byzantine Agreement in Expected-Constant Rounds, Revisited

Authors: Ran Cohen, Pouyan Forghani, Juan Garay, Rutvik Patel, Vassilis Zikas

Abstract: It is well known that without randomization, Byzantine agreement (BA) requires a linear number of rounds in the synchronous setting, while it is flat out impossible in the asynchronous setting. The primitive which allows to bypass the above limitation is known as oblivious common coin (OCC). It allows parties to agree with constant probability on a random coin, where agreement is oblivious, i.e.,… ▽ More It is well known that without randomization, Byzantine agreement (BA) requires a linear number of rounds in the synchronous setting, while it is flat out impossible in the asynchronous setting. The primitive which allows to bypass the above limitation is known as oblivious common coin (OCC). It allows parties to agree with constant probability on a random coin, where agreement is oblivious, i.e., players are not aware whether or not agreement has been achieved. The starting point of our work is the observation that no known protocol exists for information-theoretic multi-valued OCC with optimal resiliency in the asynchronous setting (with eventual message delivery). This apparent hole in the literature is particularly problematic, as multi-valued OCC is implicitly or explicitly used in several constructions. In this paper, we present the first information-theoretic multi-valued OCC protocol in the asynchronous setting with optimal resiliency, i.e., tolerating $t < n/3$ corruptions, thereby filling this important gap. Further, our protocol efficiently implements OCC with an exponential-size domain, a property which is not even achieved by known constructions in the simpler, synchronous setting. We then turn to the problem of round-preserving parallel composition of asynchronous BA. A protocol for this task was proposed by Ben-Or and El-Yaniv [Distributed Computing '03]. Their construction, however, is flawed in several ways. Thus, as a second contribution, we provide a simpler, more modular protocol for the above task. Finally, and as a contribution of independent interest, we provide proofs in Canetti's Universal Composability framework; this makes our work the first one offering composability guarantees, which are important as BA is a core building block of secure multi-party computation protocols. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: A preliminary version of this work appeared in TCC 2023

arXiv:2310.17209 [pdf, other]

Weakly-Supervised Surgical Phase Recognition

Authors: Roy Hirsch, Regev Cohen, Mathilde Caron, Tomer Golany, Daniel Freedman, Ehud Rivlin

Abstract: A key element of computer-assisted surgery systems is phase recognition of surgical videos. Existing phase recognition algorithms require frame-wise annotation of a large number of videos, which is time and money consuming. In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction. Furthermore, we utilize withi… ▽ More A key element of computer-assisted surgery systems is phase recognition of surgical videos. Existing phase recognition algorithms require frame-wise annotation of a large number of videos, which is time and money consuming. In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction. Furthermore, we utilize within our method two forms of weak supervision: sparse timestamps or few-shot learning. The proposed algorithm enjoys low complexity and can operate in lowdata regimes. We validate our method by running experiments with the public Cholec80 dataset of laparoscopic cholecystectomy videos, demonstrating promising performance in multiple setups. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2309.01466 [pdf, ps, other]

Communication Lower Bounds for Cryptographic Broadcast Protocols

Authors: Erica Blum, Elette Boyle, Ran Cohen, Chen-Da Liu-Zhang

Abstract: Broadcast protocols enable a set of $n$ parties to agree on the input of a designated sender, even facing attacks by malicious parties. In the honest-majority setting, randomization and cryptography were harnessed to achieve low-communication broadcast with sub-quadratic total communication and balanced sub-linear cost per party. However, comparatively little is known in the dishonest-majority set… ▽ More Broadcast protocols enable a set of $n$ parties to agree on the input of a designated sender, even facing attacks by malicious parties. In the honest-majority setting, randomization and cryptography were harnessed to achieve low-communication broadcast with sub-quadratic total communication and balanced sub-linear cost per party. However, comparatively little is known in the dishonest-majority setting. Here, the most communication-efficient constructions are based on Dolev and Strong (SICOMP '83), and sub-quadratic broadcast has not been achieved. On the other hand, the only nontrivial $ω(n)$ communication lower bounds are restricted to deterministic protocols, or against strong adaptive adversaries that can perform "after the fact" removal of messages. We provide new communication lower bounds in this space, which hold against arbitrary cryptography and setup assumptions, as well as a simple protocol showing near tightness of our first bound. 1) We demonstrate a tradeoff between resiliency and communication for protocols secure against $n-o(n)$ static corruptions. For example, $Ω(n\cdot {\sf polylog}(n))$ messages are needed when the number of honest parties is $n/{\sf polylog}(n)$; $Ω(n\sqrt{n})$ messages are needed for $O(\sqrt{n})$ honest parties; and $Ω(n^2)$ messages are needed for $O(1)$ honest parties. Complementarily, we demonstrate broadcast with $O(n\cdot{\sf polylog}(n))$ total communication facing any constant fraction of static corruptions. 2) Our second bound considers $n/2 + k$ corruptions and a weakly adaptive adversary that cannot remove messages "after the fact." We show that any broadcast protocol within this setting can be attacked to force an arbitrary party to send messages to $k$ other parties. This rules out, for example, broadcast facing 51% corruptions in which all non-sender parties have sublinear communication locality. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: A preliminary version of this work appeared in DISC 2023

arXiv:2308.16585 [pdf]

doi 10.1016/S2589-7500(23)00135-8

Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study

Authors: Patrick Saux, Pierre Bauvin, Violeta Raverdy, Julien Teigny, Hélène Verkindt, Tomy Soumphonphakdy, Maxence Debert, Anne Jacobs, Daan Jacobs, Valerie Monpellier, Phong Ching Lee, Chin Hong Lim, Johanna C Andersson-Assarsson, Lena Carlsson, Per-Arne Svensson, Florence Galtier, Guelareh Dezfoulian, Mihaela Moldovanu, Severine Andrieux, Julien Couster, Marie Lepage, Erminia Lembo, Ornella Verrastro, Maud Robert, Paulina Salminen , et al. (9 additional authors not shown)

Abstract: Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participa… ▽ More Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve gastrectomy, or gastric band. Patients with a previous history of bariatric surgery or large delays between scheduled and actual visits were excluded. The training cohort comprised patients from two centres in France (ABOS and BAREVAL). The primary outcome was BMI at 5 years. A model was developed using least absolute shrinkage and selection operator to select variables and the classification and regression trees algorithm to build interpretable regression trees. The performances of the model were assessed through the median absolute deviation (MAD) and root mean squared error (RMSE) of BMI. Findings10 231 patients from 12 centres in ten countries were included in the analysis, corresponding to 30 602 patient-years. Among participants in all 12 cohorts, 7701 (75$\bullet$3%) were female, 2530 (24$\bullet$7%) were male. Among 434 baseline attributes available in the training cohort, seven variables were selected: height, weight, intervention type, age, diabetes status, diabetes duration, and smoking status. At 5 years, across external testing cohorts the overall mean MAD BMI was 2$\bullet$8 kg/m${}^2$ (95% CI 2$\bullet$6-3$\bullet$0) and mean RMSE BMI was 4$\bullet$7 kg/m${}^2$ (4$\bullet$4-5$\bullet$0), and the mean difference between predicted and observed BMI was-0$\bullet$3 kg/m${}^2$ (SD 4$\bullet$7). This model is incorporated in an easy to use and interpretable web-based prediction tool to help inform clinical decision before surgery. InterpretationWe developed a machine learning-based model, which is internationally validated, for predicting individual 5-year weight loss trajectories after three common bariatric interventions. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: The Lancet Digital Health, 2023

arXiv:2308.12394 [pdf, other]

doi 10.1007/978-3-031-43904-9_55

Self-Supervised Learning for Endoscopic Video Analysis

Authors: Roy Hirsch, Mathilde Caron, Regev Cohen, Amir Livne, Ron Shapiro, Tomer Golany, Roman Goldenberg, Daniel Freedman, Ehud Rivlin

Abstract: Self-supervised learning (SSL) has led to important breakthroughs in computer vision by allowing learning from large amounts of unlabeled data. As such, it might have a pivotal role to play in biomedicine where annotating data requires a highly specialized expertise. Yet, there are many healthcare domains for which SSL has not been extensively explored. One such domain is endoscopy, minimally inva… ▽ More Self-supervised learning (SSL) has led to important breakthroughs in computer vision by allowing learning from large amounts of unlabeled data. As such, it might have a pivotal role to play in biomedicine where annotating data requires a highly specialized expertise. Yet, there are many healthcare domains for which SSL has not been extensively explored. One such domain is endoscopy, minimally invasive procedures which are commonly used to detect and treat infections, chronic inflammatory diseases or cancer. In this work, we study the use of a leading SSL framework, namely Masked Siamese Networks (MSNs), for endoscopic video analysis such as colonoscopy and laparoscopy. To fully exploit the power of SSL, we create sizable unlabeled endoscopic video datasets for training MSNs. These strong image representations serve as a foundation for secondary training with limited annotated datasets, resulting in state-of-the-art performance in endoscopic benchmarks like surgical phase recognition during laparoscopy and colonoscopic polyp characterization. Additionally, we achieve a 50% reduction in annotated data size without sacrificing performance. Thus, our work provides evidence that SSL can dramatically reduce the need of annotated data in endoscopy. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted to MICCAI 2023

Journal ref: MICCAI 2023

arXiv:2308.11033 [pdf, other]

Constructing cost-effective infrastructure networks

Authors: Rotem Brand, Reuven Cohen, Baruch Barzel, Simi Haber

Abstract: The need for reliable and low-cost infrastructure is crucial in today's world. However, achieving both at the same time is often challenging. Traditionally, infrastructure networks are designed with a radial topology lacking redundancy, which makes them vulnerable to disruptions. As a result, network topologies have evolved towards a ring topology with only one redundant edge and, from there, to m… ▽ More The need for reliable and low-cost infrastructure is crucial in today's world. However, achieving both at the same time is often challenging. Traditionally, infrastructure networks are designed with a radial topology lacking redundancy, which makes them vulnerable to disruptions. As a result, network topologies have evolved towards a ring topology with only one redundant edge and, from there, to more complex mesh networks. However, we prove that large rings are unreliable. Our research shows that a sparse mesh network with a small number of redundant edges that follow some design rules can significantly improve reliability while remaining cost-effective. Moreover, we have identified key areas where adding redundant edges can impact network reliability the most by using the SAIDI index, which measures the expected number of consumers disconnected from the source node. These findings offer network planners a valuable tool for quickly identifying and addressing reliability issues without the need for complex simulations. Properly planned sparse mesh networks can thus provide a reliable and a cost-effective solution to modern infrastructure challenges. △ Less

Submitted 30 July, 2023; originally announced August 2023.

arXiv:2307.12976 [pdf, other]

Evaluating the Ripple Effects of Knowledge Editing in Language Models

Authors: Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva

Abstract: Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been success… ▽ More Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been successfully injected, and if similar predictions for other subjects have not changed. Here we argue that such evaluation is limited, since injecting one fact (e.g. ``Jack Depp is the son of Johnny Depp'') introduces a ``ripple effect'' in the form of additional facts that the model needs to update (e.g.``Jack Depp is the sibling of Lily-Rose Depp''). To address this issue, we propose a novel set of evaluation criteria that consider the implications of an edit on related facts. Using these criteria, we then construct RippleEdits, a diagnostic benchmark of 5K factual edits, capturing a variety of types of ripple effects. We evaluate prominent editing methods on RippleEdits, showing that current methods fail to introduce consistent changes in the model's knowledge. In addition, we find that a simple in-context editing baseline obtains the best scores on our benchmark, suggesting a promising research direction for model editing. △ Less

Submitted 20 December, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2024. Author's final version

arXiv:2307.09312 [pdf, other]

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

Authors: Liam Hebert, Gaurav Sahu, Yuxuan Guo, Nanda Kishore Sreenivas, Lukasz Golab, Robin Cohen

Abstract: We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor detecting hate speech in online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the cont… ▽ More We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor detecting hate speech in online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the discussion surrounding a comment and grounding the interwoven fusion layers that combine text and image embeddings instead of processing modalities separately. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies. △ Less

Submitted 22 February, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: Accepted to AAAI 2024 (AI for Social Impact Track)

arXiv:2305.18861 [pdf, ps, other]

A General Framework for Learning-Augmented Online Allocation

Authors: Ilan Reuven Cohen, Debmalya Panigrahi

Abstract: Online allocation is a broad class of problems where items arriving online have to be allocated to agents who have a fixed utility/cost for each assigned item so to maximize/minimize some objective. This framework captures a broad range of fundamental problems such as the Santa Claus problem (maximizing minimum utility), Nash welfare maximization (maximizing geometric mean of utilities), makespan… ▽ More Online allocation is a broad class of problems where items arriving online have to be allocated to agents who have a fixed utility/cost for each assigned item so to maximize/minimize some objective. This framework captures a broad range of fundamental problems such as the Santa Claus problem (maximizing minimum utility), Nash welfare maximization (maximizing geometric mean of utilities), makespan minimization (minimizing maximum cost), minimization of $\ell_p$-norms, and so on. We focus on divisible items (i.e., fractional allocations) in this paper. Even for divisible items, these problems are characterized by strong super-constant lower bounds in the classical worst-case online model. In this paper, we study online allocations in the {\em learning-augmented} setting, i.e., where the algorithm has access to some additional (machine-learned) information about the problem instance. We introduce a {\em general} algorithmic framework for learning-augmented online allocation that produces nearly optimal solutions for this broad range of maximization and minimization objectives using only a single learned parameter for every agent. As corollaries of our general framework, we improve prior results of Lattanzi et al. (SODA 2020) and Li and Xian (ICML 2021) for learning-augmented makespan minimization, and obtain the first learning-augmented nearly-optimal algorithms for the other objectives such as Santa Claus, Nash welfare, $\ell_p$-minimization, etc. We also give tight bounds on the resilience of our algorithms to errors in the learned parameters, and study the learnability of these parameters. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.13281 [pdf, other]

LM vs LM: Detecting Factual Errors via Cross Examination

Authors: Roi Cohen, May Hamri, Mor Geva, Amir Globerson

Abstract: A prominent weakness of modern language models (LMs) is their tendency to generate factually incorrect text, which hinders their usability. A natural question is whether such factual errors can be detected automatically. Inspired by truth-seeking mechanisms in law, we propose a factuality evaluation framework for LMs that is based on cross-examination. Our key idea is that an incorrect claim is li… ▽ More A prominent weakness of modern language models (LMs) is their tendency to generate factually incorrect text, which hinders their usability. A natural question is whether such factual errors can be detected automatically. Inspired by truth-seeking mechanisms in law, we propose a factuality evaluation framework for LMs that is based on cross-examination. Our key idea is that an incorrect claim is likely to result in inconsistency with other claims that the model generates. To discover such inconsistencies, we facilitate a multi-turn interaction between the LM that generated the claim and another LM (acting as an examiner) which introduces questions to discover inconsistencies. We empirically evaluate our method on factual claims made by multiple recent LMs on four benchmarks, finding that it outperforms existing methods and baselines, often by a large gap. Our results demonstrate the potential of using interacting LMs for capturing factual errors. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.12737 [pdf, other]

The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning

Authors: Zhuang Li, Lizhen Qu, Philip R. Cohen, Raj V. Tumuluri, Gholamreza Haffari

Abstract: Multilingual semantic parsing aims to leverage the knowledge from the high-resource languages to improve low-resource semantic parsing, yet commonly suffers from the data imbalance problem. Prior works propose to utilize the translations by either humans or machines to alleviate such issues. However, human translations are expensive, while machine translations are cheap but prone to error and bias… ▽ More Multilingual semantic parsing aims to leverage the knowledge from the high-resource languages to improve low-resource semantic parsing, yet commonly suffers from the data imbalance problem. Prior works propose to utilize the translations by either humans or machines to alleviate such issues. However, human translations are expensive, while machine translations are cheap but prone to error and bias. In this work, we propose an active learning approach that exploits the strengths of both human and machine translations by iteratively adding small batches of human translations into the machine-translated training set. Besides, we propose novel aggregated acquisition criteria that help our active learning method select utterances to be manually translated. Our experiments demonstrate that an ideal utterance selection can significantly reduce the error and bias in the translated data, resulting in higher parser accuracies than the parsers merely trained on the machine-translated data. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: ACL 2023

arXiv:2305.11428 [pdf, other]

doi 10.1007/s00145-023-09460-8

Must the Communication Graph of MPC Protocols be an Expander?

Authors: Elette Boyle, Ran Cohen, Deepesh Data, Pavel Hubáček

Abstract: Secure multiparty computation (MPC) on incomplete communication networks has been studied within two primary models: (1) Where a partial network is fixed a priori, and thus corruptions can occur dependent on its structure, and (2) Where edges in the communication graph are determined dynamically as part of the protocol. Whereas a rich literature has succeeded in mapping out the feasibility and lim… ▽ More Secure multiparty computation (MPC) on incomplete communication networks has been studied within two primary models: (1) Where a partial network is fixed a priori, and thus corruptions can occur dependent on its structure, and (2) Where edges in the communication graph are determined dynamically as part of the protocol. Whereas a rich literature has succeeded in mapping out the feasibility and limitations of graph structures supporting secure computation in the fixed-graph model (including strong classical lower bounds), these bounds do not apply in the latter dynamic-graph setting, which has recently seen exciting new results, but remains relatively unexplored. In this work, we initiate a similar foundational study of MPC within the dynamic-graph model. As a first step, we investigate the property of graph expansion. All existing protocols (implicitly or explicitly) yield communication graphs which are expanders, but it is not clear whether this is inherent. Our results consist of two types (for constant fraction of corruptions): * Upper bounds: We demonstrate secure protocols whose induced communication graphs are not expander graphs, within a wide range of settings (computational, information theoretic, with low locality, even with low locality and adaptive security), each assuming some form of input-independent setup. * Lower bounds: In the plain model (no setup) with adaptive corruptions, we demonstrate that for certain functionalities, no protocol can maintain a non-expanding communication graph against all adversarial strategies. Our lower bound relies only on protocol correctness (not privacy), and requires a surprisingly delicate argument. More generally, we provide a formal framework for analyzing the evolving communication graph of MPC protocols, giving a starting point for studying the relation between secure computation and further, more general graph properties. △ Less

Submitted 21 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Journal ref: Journal of Cryptology 36, 20 (2023)

arXiv:2303.12942 [pdf, other]

doi 10.1109/TNSM.2023.3282740

A Survey on Explainable Artificial Intelligence for Cybersecurity

Authors: Gaith Rjoub, Jamal Bentahar, Omar Abdel Wahab, Rabeb Mizouni, Alyssa Song, Robin Cohen, Hadi Otrok, Azzam Mourad

Abstract: The black-box nature of artificial intelligence (AI) models has been the source of many concerns in their use for critical applications. Explainable Artificial Intelligence (XAI) is a rapidly growing research field that aims to create machine learning models that can provide clear and interpretable explanations for their decisions and actions. In the field of network cybersecurity, XAI has the pot… ▽ More The black-box nature of artificial intelligence (AI) models has been the source of many concerns in their use for critical applications. Explainable Artificial Intelligence (XAI) is a rapidly growing research field that aims to create machine learning models that can provide clear and interpretable explanations for their decisions and actions. In the field of network cybersecurity, XAI has the potential to revolutionize the way we approach network security by enabling us to better understand the behavior of cyber threats and to design more effective defenses. In this survey, we review the state of the art in XAI for cybersecurity in network systems and explore the various approaches that have been proposed to address this important problem. The review follows a systematic classification of network-driven cybersecurity threats and issues. We discuss the challenges and limitations of current XAI methods in the context of cybersecurity and outline promising directions for future research. △ Less

Submitted 11 June, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.09646 [pdf]

An Explainable Collaborative Dialogue System using a Theory of Mind

Authors: Philip R. Cohen, Lucian Galescu, Maayan Shvo

Abstract: Eva is a neuro-symbolic domain-independent multimodal collaborative dialogue system that takes seriously that the purpose of task-oriented dialogue is to assist the user. To do this, the system collaborates by inferring their intentions and plans, detects obstacles to success, finds plans to overcome them or to achieve higher-level goals, and plans its actions, including speech acts, to help users… ▽ More Eva is a neuro-symbolic domain-independent multimodal collaborative dialogue system that takes seriously that the purpose of task-oriented dialogue is to assist the user. To do this, the system collaborates by inferring their intentions and plans, detects obstacles to success, finds plans to overcome them or to achieve higher-level goals, and plans its actions, including speech acts, to help users accomplish those goals. In doing so, the system maintains and reasons with its own declaratively-specified beliefs, goals and intentions, and explicitly reasons about those of its user. Because Eva can track different users' mental states, it can engage multiple agents in multi-party dialogues. Reasoning is accomplished with a modal Horn-clause meta-interpreter that enables computable inference within the subset of logic implemented. The system employs both hierarchical and backward-chaining planning, operating over a rich modal logic-based knowledge and action representation. The planning and reasoning subsystems obey the principles of persistent goals and intentions including: 1) The formation and decomposition of intentions to perform complex actions, 2) the conditions under which persistent goals and intentions can be given up, and 3) persistent goal and intention revision using the relativizing formulas that are created during the planning process. The system treats its speech acts just like its other actions. This general approach enables Eva to plan a variety of speech acts, including requests, informs, questions, confirmations, offers, acceptances, and emotive expressions. Because the dialogue engine is a planner, as the dialogue proceeds, the system can flexibly generate, execute, and potentially repair its plans using physical, digital, and speech actions. Importantly, Eva can explain its utterances because it has created a plan that caused it to utter them. △ Less

Submitted 20 June, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: 46 pages, 7 figures, 2 appendices

ACM Class: I.2.7; I.2.8; I.2.4; I.2.3; I.2.11

arXiv:2301.12810 [pdf, other]

Crawling the Internal Knowledge-Base of Language Models

Authors: Roi Cohen, Mor Geva, Jonathan Berant, Amir Globerson

Abstract: Language models are trained on large volumes of text, and as a result their parameters might contain a significant body of factual knowledge. Any downstream task performed by these models implicitly builds on these facts, and thus it is highly desirable to have means for representing this body of knowledge in an interpretable way. However, there is currently no mechanism for such a representation.… ▽ More Language models are trained on large volumes of text, and as a result their parameters might contain a significant body of factual knowledge. Any downstream task performed by these models implicitly builds on these facts, and thus it is highly desirable to have means for representing this body of knowledge in an interpretable way. However, there is currently no mechanism for such a representation. Here, we propose to address this goal by extracting a knowledge-graph of facts from a given language model. We describe a procedure for ``crawling'' the internal knowledge-base of a language model. Specifically, given a seed entity, we expand a knowledge-graph around it. The crawling procedure is decomposed into sub-tasks, realized through specially designed prompts that control for both precision (i.e., that no wrong facts are generated) and recall (i.e., the number of facts generated). We evaluate our approach on graphs crawled starting from dozens of seed entities, and show it yields high precision graphs (82-92%), while emitting a reasonable number of facts per entity. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: To be published in EACL 2023 (Findings)

arXiv:2301.10871 [pdf, other]

Qualitative Analysis of a Graph Transformer Approach to Addressing Hate Speech: Adapting to Dynamically Changing Content

Authors: Liam Hebert, Hong Yi Chen, Robin Cohen, Lukasz Golab

Abstract: Our work advances an approach for predicting hate speech in social media, drawing out the critical need to consider the discussions that follow a post to successfully detect when hateful discourse may arise. Using graph transformer networks, coupled with modelling attention and BERT-level natural language processing, our approach can capture context and anticipate upcoming anti-social behaviour. I… ▽ More Our work advances an approach for predicting hate speech in social media, drawing out the critical need to consider the discussions that follow a post to successfully detect when hateful discourse may arise. Using graph transformer networks, coupled with modelling attention and BERT-level natural language processing, our approach can capture context and anticipate upcoming anti-social behaviour. In this paper, we offer a detailed qualitative analysis of this solution for hate speech detection in social networks, leading to insights into where the method has the most impressive outcomes in comparison with competitors and identifying scenarios where there are challenges to achieving ideal performance. Included is an exploration of the kinds of posts that permeate social media today, including the use of hateful images. This suggests avenues for extending our model to be more comprehensive. A key insight is that the focus on reasoning about the concept of context positions us well to be able to support multi-modal analysis of online posts. We conclude with a reflection on how the problem we are addressing relates especially well to the theme of dynamic change, a critical concern for all AI solutions for social impact. We also comment briefly on how mental health well-being can be advanced with our work, through curated content attuned to the extent of hate in posts. △ Less

Submitted 30 April, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: Accepted at AAAI 2023 AI for Social Good

arXiv:2301.04248 [pdf, other]

Predicting Hateful Discussions on Reddit using Graph Transformer Networks and Communal Context

Authors: Liam Hebert, Lukasz Golab, Robin Cohen

Abstract: We propose a system to predict harmful discussions on social media platforms. Our solution uses contextual deep language models and proposes the novel idea of integrating state-of-the-art Graph Transformer Networks to analyze all conversations that follow an initial post. This framework also supports adapting to future comments as the conversation unfolds. In addition, we study whether a community… ▽ More We propose a system to predict harmful discussions on social media platforms. Our solution uses contextual deep language models and proposes the novel idea of integrating state-of-the-art Graph Transformer Networks to analyze all conversations that follow an initial post. This framework also supports adapting to future comments as the conversation unfolds. In addition, we study whether a community-specific analysis of hate speech leads to more effective detection of hateful discussions. We evaluate our approach on 333,487 Reddit discussions from various communities. We find that community-specific modeling improves performance two-fold and that models which capture wider-discussion context improve accuracy by 28\% (35\% for the most hateful content) compared to limited context models. △ Less

Submitted 10 January, 2023; originally announced January 2023.

Comments: Accepted and Presented at WI-IAT 22

arXiv:2211.15211 [pdf, other]

What's Behind the Mask: Estimating Uncertainty in Image-to-Image Problems

Authors: Gilad Kutiel, Regev Cohen, Michael Elad, Daniel Freedman

Abstract: Estimating uncertainty in image-to-image networks is an important task, particularly as such networks are being increasingly deployed in the biological and medical imaging realms. In this paper, we introduce a new approach to this problem based on masking. Given an existing image-to-image network, our approach computes a mask such that the distance between the masked reconstructed image and the ma… ▽ More Estimating uncertainty in image-to-image networks is an important task, particularly as such networks are being increasingly deployed in the biological and medical imaging realms. In this paper, we introduce a new approach to this problem based on masking. Given an existing image-to-image network, our approach computes a mask such that the distance between the masked reconstructed image and the masked true image is guaranteed to be less than a specified threshold, with high probability. The mask thus identifies the more certain regions of the reconstructed image. Our approach is agnostic to the underlying image-to-image network, and only requires triples of the input (degraded), reconstructed and true images for training. Furthermore, our method is agnostic to the distance metric used. As a result, one can use $L_p$-style distances or perceptual distances like LPIPS, which contrasts with interval-based approaches to uncertainty. Our theoretical guarantees derive from a conformal calibration procedure. We evaluate our mask-based approach to uncertainty on image colorization, image completion, and super-resolution tasks, demonstrating high quality performance on each. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2210.01423 [pdf, other]

Using Deep Reinforcement Learning for mmWave Real-Time Scheduling

Authors: Barak Gahtan, Reuven Cohen, Alex M. Bronstein, Gil Kedar

Abstract: We study the problem of real-time scheduling in a multi-hop millimeter-wave (mmWave) mesh. We develop a model-free deep reinforcement learning algorithm called Adaptive Activator RL (AARL), which determines the subset of mmWave links that should be activated during each time slot and the power level for each link. The most important property of AARL is its ability to make scheduling decisions with… ▽ More We study the problem of real-time scheduling in a multi-hop millimeter-wave (mmWave) mesh. We develop a model-free deep reinforcement learning algorithm called Adaptive Activator RL (AARL), which determines the subset of mmWave links that should be activated during each time slot and the power level for each link. The most important property of AARL is its ability to make scheduling decisions within the strict time slot constraints of typical 5G mmWave networks. AARL can handle a variety of network topologies, network loads, and interference models, it can also adapt to different workloads. We demonstrate the operation of AARL on several topologies: a small topology with 10 links, a moderately-sized mesh with 48 links, and a large topology with 96 links. We show that for each topology, we compare the throughput obtained by AARL to that of a benchmark algorithm called RPMA (Residual Profit Maximizer Algorithm). The most important advantage of AARL compared to RPMA is that it is much faster and can make the necessary scheduling decisions very rapidly during every time slot, while RPMA cannot. In addition, the quality of the scheduling decisions made by AARL outperforms those made by RPMA. △ Less

Submitted 18 February, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2206.02848 [pdf]

Plagiarism deterrence for introductory programming

Authors: Simon J. Cohen, Michael J. Martin, Chance A. Shipley, Abhishek Kumar, Andrew R. Cohen

Abstract: Plagiarism in introductory programming courses is an enormous challenge for both students and institutions. For students, relying on the work of others too early in their academic development can make it impossible to acquire necessary skills for independent success in the future. For institutions, widespread student cheating can dilute the quality of the educational experience being offered. Curr… ▽ More Plagiarism in introductory programming courses is an enormous challenge for both students and institutions. For students, relying on the work of others too early in their academic development can make it impossible to acquire necessary skills for independent success in the future. For institutions, widespread student cheating can dilute the quality of the educational experience being offered. Currently available solutions consider only pairwise comparisons between student submissions and focus on punitive deterrence. Our approach instead relies on a class-wide statistical characterization that can be clearly and securely shared with students via an intuitive new p-value representing independence of student effort. A pairwise, compression-based similarity detection algorithm captures relationships between assignments more accurately. An automated deterrence system is used to warn students that their behavior is being closely monitored. High-confidence instances are made directly available for instructor review using our open-source toolkit. An unbiased scoring system aids students and the instructor in understanding true independence of effort. Preliminary results indicate that the system can provide meaningful measurements of independence from week one, improving the efficacy of technical education. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2205.13697 [pdf, other]

FedFormer: Contextual Federation with Attention in Reinforcement Learning

Authors: Liam Hebert, Lukasz Golab, Pascal Poupart, Robin Cohen

Abstract: A core issue in multi-agent federated reinforcement learning is defining how to aggregate insights from multiple agents. This is commonly done by taking the average of each participating agent's model weights into one common model (FedAvg). We instead propose FedFormer, a novel federation strategy that utilizes Transformer Attention to contextually aggregate embeddings from models originating from… ▽ More A core issue in multi-agent federated reinforcement learning is defining how to aggregate insights from multiple agents. This is commonly done by taking the average of each participating agent's model weights into one common model (FedAvg). We instead propose FedFormer, a novel federation strategy that utilizes Transformer Attention to contextually aggregate embeddings from models originating from different learner agents. In so doing, we attentively weigh the contributions of other agents with respect to the current agent's environment and learned relationships, thus providing a more effective and efficient federation. We evaluate our methods on the Meta-World environment and find that our approach yields significant improvements over FedAvg and non-federated Soft Actor-Critic single-agent methods. Our results compared to Soft Actor-Critic show that FedFormer achieves higher episodic return while still abiding by the privacy constraints of federated learning. Finally, we also demonstrate improvements in effectiveness with increased agent pools across all methods in certain tasks. This is contrasted by FedAvg, which fails to make noticeable improvements when scaled. △ Less

Submitted 2 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Our source code can be found at https://github.com/liamhebert/FedFormer. Accepted at AAMAS 2023

arXiv:2203.11891 [pdf, other]

doi 10.1098/rspb.2022.2409

Evolution is Driven by Natural Autoencoding: Reframing Species, Interaction Codes, Cooperation, and Sexual Reproduction

Authors: Irun R. Cohen, Assaf Marron

Abstract: The continuity of life and its evolution, we proposed, emerge from an interactive group process manifested in networks of interaction. We term this process \textit{survival-of-the-fitted}. Here, we reason that survival of the fitted results from a natural computational process we term \textit{natural autoencoding}. Natural autoencoding works by retaining repeating biological interactions while non… ▽ More The continuity of life and its evolution, we proposed, emerge from an interactive group process manifested in networks of interaction. We term this process \textit{survival-of-the-fitted}. Here, we reason that survival of the fitted results from a natural computational process we term \textit{natural autoencoding}. Natural autoencoding works by retaining repeating biological interactions while non-repeatable interactions disappear. (1) We define a species by its \textit{species interaction code}, which consists of a compact description of the repeating interactions of species organisms with their external and internal environments. Species interaction codes are descriptions recorded in the biological infrastructure that enables repeating interactions. Encoding and decoding are interwoven. (2) Evolution proceeds by natural autoencoding of sustained changes in species interaction codes. DNA is only one element in natural autoencoding. (3) Natural autoencoding accounts for the paradox of genome randomization in sexual reproduction -- recombined genomes are analogous to the diversified inputs required for artificial autoencoding. The increase in entropy generated by genome randomization compensates for the decrease in entropy generated by organized life. (4) Natural autoencoding and artificial autoencoding algorithms manifest defined similarities and differences. Recognition of the importance of fittedness could well serve the future of a humanly livable biosphere. △ Less

Submitted 3 February, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: In version 7 we added various clarifications including to terminology, definitions, and to the reference to the second law of thermodynamics

Journal ref: Proc. Roy. Soc. B v.290 no.1994 (2023)

arXiv:2203.09016 [pdf, other]

Natural Language Communication with a Teachable Agent

Authors: Rachel Love, Edith Law, Philip R. Cohen, Dana Kulić

Abstract: Conversational teachable agents offer a promising platform to support learning, both in the classroom and in remote settings. In this context, the agent takes the role of the novice, while the student takes on the role of teacher. This framing is significant for its ability to elicit the Protégé effect in the student-teacher, a pedagogical phenomenon known to increase engagement in the teaching ta… ▽ More Conversational teachable agents offer a promising platform to support learning, both in the classroom and in remote settings. In this context, the agent takes the role of the novice, while the student takes on the role of teacher. This framing is significant for its ability to elicit the Protégé effect in the student-teacher, a pedagogical phenomenon known to increase engagement in the teaching task, and also improve cognitive outcomes. In prior work, teachable agents often take a passive role in the learning interaction, and there are few studies in which the agent and student engage in natural language dialogue during the teaching task. This work investigates the effect of teaching modality when interacting with a virtual agent, via the web-based teaching platform, the Curiosity Notebook. A method of teaching the agent by selecting sentences from source material is compared to a method paraphrasing the source material and typing text input to teach. A user study has been conducted to measure the effect teaching modality on the learning outcomes and engagement of the participants. The results indicate that teaching via paraphrasing and text input has a positive effect on learning outcomes for the material covered, and also on aspects of affective engagement. Furthermore, increased paraphrasing effort, as measured by the similarity between the source material and the material the teacher conveyed to the robot, improves learning outcomes for participants. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2201.04807 [pdf, other]

Active Learning-Based Multistage Sequential Decision-Making Model with Application on Common Bile Duct Stone Evaluation

Authors: Hongzhen Tian, Reuven Zev Cohen, Chuck Zhang, Yajun Mei

Abstract: Multistage sequential decision-making scenarios are commonly seen in the healthcare diagnosis process. In this paper, an active learning-based method is developed to actively collect only the necessary patient data in a sequential manner. There are two novelties in the proposed method. First, unlike the existing ordinal logistic regression model which only models a single stage, we estimate the pa… ▽ More Multistage sequential decision-making scenarios are commonly seen in the healthcare diagnosis process. In this paper, an active learning-based method is developed to actively collect only the necessary patient data in a sequential manner. There are two novelties in the proposed method. First, unlike the existing ordinal logistic regression model which only models a single stage, we estimate the parameters for all stages together. Second, it is assumed that the coefficients for common features in different stages are kept consistent. The effectiveness of the proposed method is validated in both a simulation study and a real case study. Compared with the baseline method where the data is modeled individually and independently, the proposed method improves the estimation efficiency by 62\%-1838\%. For both simulation and testing cohorts, the proposed method is more effective, stable, interpretable, and computationally efficient on parameter estimation. The proposed method can be easily extended to a variety of scenarios where decision-making can be done sequentially with only necessary information. △ Less

Submitted 13 January, 2022; originally announced January 2022.

arXiv:2201.01222 [pdf, other]

The cluster structure function

Authors: Andrew R. Cohen, Paul M. B. Vitányi

Abstract: For each partition of a data set into a given number of parts there is a partition such that every part is as much as possible a good model (an "algorithmic sufficient statistic") for the data in that part. Since this can be done for every number between one and the number of data, the result is a function, the cluster structure function. It maps the number of parts of a partition to values relate… ▽ More For each partition of a data set into a given number of parts there is a partition such that every part is as much as possible a good model (an "algorithmic sufficient statistic") for the data in that part. Since this can be done for every number between one and the number of data, the result is a function, the cluster structure function. It maps the number of parts of a partition to values related to the deficiencies of being good models by the parts. Such a function starts with a value at least zero for no partition of the data set and descents to zero for the partition of the data set into singleton parts. The optimal clustering is the one chosen to minimize the cluster structure function. The theory behind the method is expressed in algorithmic information theory (Kolmogorov complexity). In practice the Kolmogorov complexities involved are approximated by a concrete compressor. We give examples using real data sets: the MNIST handwritten digits and the segmentation of real cells as used in stem cell research. △ Less

Submitted 14 October, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

arXiv:2112.06883 [pdf, other]

A Methodology for a Scalable, Collaborative, and Resource-Efficient Platform to Facilitate Healthcare AI Research

Authors: Raphael Y. Cohen, Vesela P. Kovacheva

Abstract: Healthcare AI holds the potential to increase patient safety, augment efficiency and improve patient outcomes, yet research is often limited by data access, cohort curation, and tooling for analysis. Collection and translation of electronic health record data, live data, and real-time high resolution device data can be challenging and time-consuming. The development of real-world AI tools requires… ▽ More Healthcare AI holds the potential to increase patient safety, augment efficiency and improve patient outcomes, yet research is often limited by data access, cohort curation, and tooling for analysis. Collection and translation of electronic health record data, live data, and real-time high resolution device data can be challenging and time-consuming. The development of real-world AI tools requires overcoming challenges in data acquisition, scarce hospital resources and high needs for data governance. These bottlenecks may result in resource-heavy needs and long delays in research and development of AI systems. We present a system and methodology to accelerate data acquisition, dataset development and analysis, and AI model development. We created an interactive platform that relies on a scalable microservice backend. This system can ingest 15,000 patient records per hour, where each record represents thousands of multimodal measurements, text notes, and high resolution data. Collectively, these records can approach a terabyte of data. The system can further perform cohort generation and preliminary dataset analysis in 2-5 minutes. As a result, multiple users can collaborate simultaneously to iterate on datasets and models in real time. We anticipate that this approach will drive real-world AI model development, and, in the long run, meaningfully improve healthcare delivery. △ Less

Submitted 13 December, 2021; originally announced December 2021.

arXiv:2112.00794 [pdf, other]

DFTS2: Simulating Deep Feature Transmission Over Packet Loss Channels

Authors: Ashiv Dhondea, Robert A. Cohen, Ivan V. Bajić

Abstract: In edge-cloud collaborative intelligence (CI), an unreliable transmission channel exists in the information path of the AI model performing the inference. It is important to be able to simulate the performance of the CI system across an imperfect channel in order to understand system behavior and develop appropriate error control strategies. In this paper we present a simulation framework called D… ▽ More In edge-cloud collaborative intelligence (CI), an unreliable transmission channel exists in the information path of the AI model performing the inference. It is important to be able to simulate the performance of the CI system across an imperfect channel in order to understand system behavior and develop appropriate error control strategies. In this paper we present a simulation framework called DFTS2, which enables researchers to define the components of the CI system in TensorFlow~2, select a packet-based channel model with various parameters, and simulate system behavior under various channel conditions and error/loss control strategies. Using DFTS2, we also present the most comprehensive study to date of the packet loss concealment methods for collaborative image classification models. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: 6 pages, 4 figures, IEEE Conference on Visual Communications and Image Processing (VCIP) 2021

Showing 1–50 of 123 results for author: Cohen, R