Skip to main content

Showing 1–50 of 245 results for author: Cohen, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08050  [pdf, ps, other

    cs.CR

    Browser Security Posture Analysis: A Client-Side Security Assessment Framework

    Authors: Avihay Cohen

    Abstract: Modern web browsers have effectively become the new operating system for business applications, yet their security posture is often under-scrutinized. This paper presents a novel, comprehensive Browser Security Posture Analysis Framework[1], a browser-based client-side security assessment toolkit that runs entirely in JavaScript and WebAssembly within the browser. It performs a battery of over 120… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 30 pages

  2. arXiv:2505.05291  [pdf, other

    eess.IV cs.AI cs.CV q-bio.TO

    Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection

    Authors: Benjamin A. Cohen, Jonathan Fhima, Meishar Meisel, Baskin Meital, Luis Filipe Nakayama, Eran Berkowitz, Joachim A. Behar

    Abstract: Self-supervised learning (SSL) has enabled Vision Transformers (ViTs) to learn robust representations from large-scale natural image datasets, enhancing their generalization across domains. In retinal imaging, foundation models pretrained on either natural or ophthalmic data have shown promise, but the benefits of in-domain pretraining remain uncertain. To investigate this, we benchmark six SSL-pr… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 10 pages, 3 figures

  3. arXiv:2505.02499  [pdf, other

    cs.CR cs.IT

    An Efficient Hybrid Key Exchange Mechanism

    Authors: Benjamin D. Kim, Vipindev Adat Vasudevan, Alejandro Cohen, Rafael G. L. D'Oliveira, Thomas Stahlbuhk, Muriel Médard

    Abstract: We present \textsc{CHOKE}, a novel code-based hybrid key-encapsulation mechanism (KEM) designed to securely and efficiently transmit multiple session keys simultaneously. By encoding $n$ independent session keys with an individually secure linear code and encapsulating each resulting coded symbol using a separate KEM, \textsc{CHOKE} achieves computational individual security -- each key remains se… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 7 pages, 2 figures

  4. arXiv:2504.06304  [pdf, other

    q-bio.GN cs.CV cs.LG

    Leveraging State Space Models in Long Range Genomics

    Authors: Matvei Popov, Aymen Kallala, Anirudha Ramesh, Narimane Hennouni, Shivesh Khaitan, Rick Gentry, Alain-Sam Cohen

    Abstract: Long-range dependencies are critical for understanding genomic structure and function, yet most conventional methods struggle with them. Widely adopted transformer-based models, while excelling at short-context tasks, are limited by the attention module's quadratic computational complexity and inability to extrapolate to sequences longer than those seen in training. In this work, we explore State… ▽ More

    Submitted 11 May, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: Accepted at ICLR 2025 (Spotlight @ LMRL) - Project page: https://anirudharamesh.github.io/iclr-long-range-genomics/

  5. arXiv:2503.17670  [pdf, other

    cs.HC

    Do You "Trust" This Visualization? An Inventory to Measure Trust in Visualizations

    Authors: Huichen Will Wang, Kylie Lin, Andrew Cohen, Ryan Kennedy, Zach Zwald, Carolina Nobre, Cindy Xiong Bearfield

    Abstract: Trust plays a critical role in visual data communication and decision-making, yet existing visualization research employs varied trust measures, making it challenging to compare and synthesize findings across studies. In this work, we first took a bottom-up, data-driven approach to understand what visualization readers mean when they say they "trust" a visualization. We compiled and adapted a broa… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  6. arXiv:2503.11751  [pdf, other

    cs.CL

    reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs

    Authors: Zhaofeng Wu, Michihiro Yasunaga, Andrew Cohen, Yoon Kim, Asli Celikyilmaz, Marjan Ghazvininejad

    Abstract: Reward models have become a staple in modern NLP, serving as not only a scalable text evaluator, but also an indispensable component in many alignment recipes and inference-time algorithms. However, while recent reward models increase performance on standard benchmarks, this may partly be due to overfitting effects, which would confound an understanding of their true capability. In this work, we s… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  7. arXiv:2503.05873  [pdf, other

    cs.IT

    Coding-Based Hybrid Post-Quantum Cryptosystem for Non-Uniform Information

    Authors: Saar Tarnopolsky, Alejandro Cohen

    Abstract: We introduce for non-uniform messages a novel hybrid universal network coding cryptosystem (NU-HUNCC) in the finite blocklength regime that provides Post-Quantum (PQ) security at high communication rates. Recently, hybrid cryptosystems offered PQ security by premixing the data using secure linear coding schemes and encrypting only a small portion of it. The data is assumed to be uniformly distribu… ▽ More

    Submitted 26 January, 2025; originally announced March 2025.

    Comments: Parts of this work were accepted for publication at the IEEE International Symposium on Information Theory, ISIT 2024

  8. arXiv:2503.01894  [pdf, other

    cs.CV cs.AI cs.HC

    LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces

    Authors: Rashid Mushkani, Shravan Nayak, Hugo Berard, Allison Cohen, Shin Koseki, Hadrien Bertrand

    Abstract: We introduce the Local Intersectional Visual Spaces (LIVS) dataset, a benchmark for multi-criteria alignment, developed through a two-year participatory process with 30 community organizations to support the pluralistic alignment of text-to-image (T2I) models in inclusive urban planning. The dataset encodes 37,710 pairwise comparisons across 13,462 images, structured along six criteria - Accessibi… ▽ More

    Submitted 7 May, 2025; v1 submitted 27 February, 2025; originally announced March 2025.

    Comments: ICML 2025

  9. arXiv:2502.15226  [pdf, other

    cs.CL cs.AI cs.HC

    Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews

    Authors: Mengqiao Liu, Tevin Wang, Cassandra A. Cohen, Sarah Li, Chenyan Xiong

    Abstract: Which large language model (LLM) is better? Every evaluation tells a story, but what do users really think about current LLMs? This paper presents CLUE, an LLM-powered interviewer that conducts in-the-moment user experience interviews, right after users interacted with LLMs, and automatically gathers insights about user opinions from massive interview logs. We conduct a study with thousands of use… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  10. arXiv:2502.13645  [pdf, other

    cs.CL

    Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks

    Authors: Ori Shapira, Shlomo E. Chazan, Amir DN Cohen

    Abstract: With the increasing prevalence of recorded human speech, spoken language understanding (SLU) is essential for its efficient processing. In order to process the speech, it is commonly transcribed using automatic speech recognition technology. This speech-to-text transition introduces errors into the transcripts, which subsequently propagate to downstream NLP tasks, such as dialogue summarization. W… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  11. arXiv:2502.11984  [pdf, other

    cs.IT cs.NI

    Blank Space: Adaptive Causal Coding for Streaming Communications Over Multi-Hop Networks

    Authors: Adina Waxman, Shai Ginzach, Aviel Glam, Alejandro Cohen

    Abstract: In this work, we introduce Blank Space AC-RLNC (BS), a novel Adaptive and Causal Network Coding (AC-RLNC) solution designed to mitigate the triplet trade-off between throughput-delay-efficiency in multi-hop networks. BS leverages the network's physical limitations considering the bottleneck from each node to the destination. In particular, BS introduces a light-computational re-encoding algorithm,… ▽ More

    Submitted 28 April, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  12. arXiv:2502.08794  [pdf, other

    cs.LG

    Spectral Journey: How Transformers Predict the Shortest Path

    Authors: Andrew Cohen, Andrey Gromov, Kaiyu Yang, Yuandong Tian

    Abstract: Decoder-only transformers lead to a step-change in capability of large language models. However, opinions are mixed as to whether they are really planning or reasoning. A path to making progress in this direction is to study the model's behavior in a setting with carefully controlled data. Then interpret the learned representations and reverse-engineer the computation performed internally. We stud… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: 12 pages

  13. arXiv:2502.08524  [pdf, other

    cs.LG cs.CL

    LLM Pretraining with Continuous Concepts

    Authors: Jihoon Tack, Jack Lanchantin, Jane Yu, Andrew Cohen, Ilia Kulikov, Janice Lan, Shibo Hao, Yuandong Tian, Jason Weston, Xian Li

    Abstract: Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  14. arXiv:2502.02774  [pdf, other

    cs.IT cs.CR

    Optimal Computational Secret Sharing

    Authors: Igor L. Aureliano, Alejandro Cohen, Rafael G. L. D'Oliveira

    Abstract: In $(t, n)$-threshold secret sharing, a secret $S$ is distributed among $n$ participants such that any subset of size $t$ can recover $S$, while any subset of size $t-1$ or fewer learns nothing about it. For information-theoretic secret sharing, it is known that the share size must be at least as large as the secret, i.e., $|S|$. When computational security is employed using cryptographic encrypti… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  15. arXiv:2501.17899  [pdf, other

    cs.CY cs.AI cs.HC

    The Right to AI

    Authors: Rashid Mushkani, Hugo Berard, Allison Cohen, Shin Koeski

    Abstract: This paper proposes a Right to AI, which asserts that individuals and communities should meaningfully participate in the development and governance of the AI systems that shape their lives. Motivated by the increasing deployment of AI in critical domains and inspired by Henri Lefebvre's concept of the Right to the City, we reconceptualize AI as a societal infrastructure, rather than merely a produ… ▽ More

    Submitted 7 May, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

    Comments: ICML 2025

  16. arXiv:2501.17002  [pdf, other

    cs.IT

    Covert Adversarial Actuators in Finite MDPs

    Authors: Edoardo David Santi, Gongpu Chen, Deniz Gündüz, Asaf Cohen

    Abstract: We consider a Markov decision process (MDP) in which actions prescribed by the controller are executed by a separate actuator, which may behave adversarially. At each time step, the controller selects and transmits an action to the actuator; however, the actuator may deviate from the intended action to degrade the control reward. Given that the controller observes only the sequence of visited stat… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

  17. arXiv:2501.15645  [pdf, other

    cs.IT

    Individual Confidential Computing of Polynomials over Non-Uniform Information

    Authors: Saar Tarnopolsky, Zirui, Deng, Vinayak Ramkumar, Netanel Raviv, Alejandro Cohen

    Abstract: In this paper, we address the problem of secure distributed computation in scenarios where user data is not uniformly distributed, extending existing frameworks that assume uniformity, an assumption that is challenging to enforce in data for computation. Motivated by the pervasive reliance on single service providers for data storage and computation, we propose a privacy-preserving scheme that ach… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

    Comments: Parts of this work were submitted to ISIT 2025

  18. arXiv:2501.15076  [pdf, other

    cs.CR cs.IT cs.LG

    Cryptanalysis via Machine Learning Based Information Theoretic Metrics

    Authors: Benjamin D. Kim, Vipindev Adat Vasudevan, Rafael G. L. D'Oliveira, Alejandro Cohen, Thomas Stahlbuhk, Muriel Médard

    Abstract: The fields of machine learning (ML) and cryptanalysis share an interestingly common objective of creating a function, based on a given set of inputs and outputs. However, the approaches and methods in doing so vary vastly between the two fields. In this paper, we explore integrating the knowledge from the ML domain to provide empirical evaluations of cryptosystems. Particularly, we utilize informa… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  19. arXiv:2501.15014  [pdf, other

    cs.LG cs.AI cs.NE

    On Accelerating Edge AI: Optimizing Resource-Constrained Environments

    Authors: Jacob Sander, Achraf Cohen, Venkat R. Dasari, Brent Venable, Brian Jalaian

    Abstract: Resource-constrained edge deployments demand AI solutions that balance high performance with stringent compute, memory, and energy limitations. In this survey, we present a comprehensive overview of the primary strategies for accelerating deep learning models under such constraints. First, we examine model compression techniques-pruning, quantization, tensor decomposition, and knowledge distillati… ▽ More

    Submitted 28 January, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 26 pages, 13 Figures

  20. arXiv:2501.11740  [pdf, other

    cs.IT

    PIR Over Wireless Channels: Achieving Privacy With Public Responses

    Authors: Or Elimelech, Asaf Cohen

    Abstract: In this paper, we address the problem of Private Information Retrieval (PIR) over a public Additive White Gaussian Noise (AWGN) channel. In such a setup, the server's responses are visible to other servers. Thus, a curious server can listen to the other responses, compromising the user's privacy. Indeed, previous works on PIR over a shared medium assumed the servers cannot instantaneously listen t… ▽ More

    Submitted 24 January, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

  21. arXiv:2501.11459  [pdf, other

    cs.IT

    Multi-Stage Active Sequential Hypothesis Testing with Clustered Hypotheses

    Authors: George Vershinin, Asaf Cohen, Omer Gurewitz

    Abstract: We consider the problem where an active Decision-Maker (DM) is tasked to identify the true hypothesis using as few as possible observations while maintaining accuracy. The DM collects observations according to its determined actions and knows the distributions under each hypothesis. We propose a deterministic and adaptive multi-stage hypothesis-elimination strategy where the DM selects an action,… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

    Comments: 7 pages, 2 figures

  22. arXiv:2412.16040  [pdf, other

    cs.HC

    An Experimental Study Of Netflix Use and the Effects of Autoplay on Watching Behaviors

    Authors: Brennan Schaffner, Yaretzi Ulloa, Riya Sahni, Jiatong Li, Ava Kim Cohen, Natasha Messier, Lan Gao, Marshini Chetty

    Abstract: Prior work on dark patterns, or manipulative online interfaces, suggests they have potentially detrimental effects on user autonomy. Dark pattern features, like those designed for attention capture, can potentially extend platform sessions beyond that users would have otherwise intended. Existing research, however, has not formally measured the quantitative effects of these features on user engage… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: CSCW 2025 Preprint

  23. arXiv:2412.05434  [pdf, other

    cs.CL cs.LG

    Diversity Over Quantity: A Lesson From Few Shot Relation Classification

    Authors: Amir DN Cohen, Shauli Ravfogel, Shaltiel Shmidman, Yoav Goldberg

    Abstract: In few-shot relation classification (FSRC), models must generalize to novel relations with only a few labeled examples. While much of the recent progress in NLP has focused on scaling data size, we argue that diversity in relation types is more crucial for FSRC performance. In this work, we demonstrate that training on a diverse set of relations significantly enhances a model's ability to generali… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

  24. arXiv:2412.04305  [pdf, other

    cs.CL cs.LG

    ALMA: Alignment with Minimal Annotation

    Authors: Michihiro Yasunaga, Leonid Shamis, Chunting Zhou, Andrew Cohen, Jason Weston, Luke Zettlemoyer, Marjan Ghazvininejad

    Abstract: Recent approaches to large language model (LLM) alignment typically require millions of human annotations or rely on external aligned models for synthetic data generation. This paper introduces ALMA: Alignment with Minimal Annotation, demonstrating that effective alignment can be achieved using only 9,000 labeled examples -- less than 1% of conventional approaches. ALMA generates large amounts of… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  25. arXiv:2412.03421  [pdf, other

    physics.soc-ph cs.SI nlin.AO

    Governance as a complex, networked, democratic, satisfiability problem

    Authors: Laurent Hébert-Dufresne, Nicholas W. Landry, Juniper Lovato, Jonathan St-Onge, Jean-Gabriel Young, Marie-Ève Couture-Ménard, Stéphane Bernatchez, Catherine Choquette, Alan A. Cohen

    Abstract: Democratic governments comprise a subset of a population whose goal is to produce coherent decisions, solving societal challenges while respecting the will of the people. New governance frameworks represent this as a social network rather than as a hierarchical pyramid with centralized authority. But how should this network be structured? We model the decisions a population must make as a satisfia… ▽ More

    Submitted 17 April, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

  26. arXiv:2411.13904  [pdf, other

    cs.CL

    Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning

    Authors: Song Jiang, Da JU, Andrew Cohen, Sasha Mitts, Aaron Foss, Justine T Kao, Xian Li, Yuandong Tian

    Abstract: How are LLM-based agents used in the future? While many of the existing work on agents has focused on improving the performance of a specific family of objective and challenging tasks, in this work, we take a different perspective by thinking about full delegation: agents take over humans' routine decision-making processes and are trusted by humans to find solutions that fit people's personalized… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  27. arXiv:2410.17051  [pdf, other

    cs.CL

    Data-driven Coreference-based Ontology Building

    Authors: Shir Ashury-Tahan, Amir David Nissan Cohen, Nadav Cohen, Yoram Louzoun, Yoav Goldberg

    Abstract: While coreference resolution is traditionally used as a component in individual document understanding, in this work we take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations that are present in a large corpus. We derive coreference chains from a corpus of 30 million biomedical abstracts and construct a graph based on the strin… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Journal ref: EMNLP 2024

  28. arXiv:2410.16456  [pdf, other

    cs.CL

    To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning

    Authors: Da JU, Song Jiang, Andrew Cohen, Aaron Foss, Sasha Mitts, Arman Zharmagambetov, Brandon Amos, Xian Li, Justine T Kao, Maryam Fazel-Zarandi, Yuandong Tian

    Abstract: Travel planning is a challenging and time-consuming task that aims to find an itinerary which satisfies multiple, interdependent constraints regarding flights, accommodations, attractions, and other travel arrangements. In this paper, we propose To the Globe (TTG), a real-time demo system that takes natural language requests from users, translates it to symbolic form via a fine-tuned Large Languag… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Journal ref: EMNLP 2024 Demo Track

  29. arXiv:2409.15359  [pdf, other

    cs.CL cs.AI cs.LG

    Watch Your Steps: Observable and Modular Chains of Thought

    Authors: Cassandra A. Cohen, William W. Cohen

    Abstract: We propose a variant of chain of thought (CoT) prompting called Program Trace Prompting that makes explanations more observable while preserving the power, generality and flexibility of CoT. In our approach, few-shot CoT demonstrations are wrapped in a formal syntax based on Python, and each prompt: identifies and names steps; defines the input/output behavior of steps; and replaces CoT explanatio… ▽ More

    Submitted 1 October, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

  30. arXiv:2409.06801  [pdf, other

    cs.CY

    Understanding and Mitigating the Impacts of Differentially Private Census Data on State Level Redistricting

    Authors: Christian Cianfarani, Aloni Cohen

    Abstract: Data from the Decennial Census is published only after applying a disclosure avoidance system (DAS). Data users were shaken by the adoption of differential privacy in the 2020 DAS, a radical departure from past methods. The change raises the question of whether redistricting law permits, forbids, or requires taking account of the effect of disclosure avoidance. Such uncertainty creates legal risks… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 24 pages, 5 figures, 7 tables

  31. arXiv:2409.03864  [pdf, other

    cs.PL

    The MLIR Transform Dialect. Your compiler is more powerful than you think

    Authors: Martin Paul Lücke, Oleksandr Zinenko, William S. Moses, Michel Steuwer, Albert Cohen

    Abstract: To take full advantage of a specific hardware target, performance engineers need to gain control on compilers in order to leverage their domain knowledge about the program and hardware. Yet, modern compilers are poorly controlled, usually by configuring a sequence of coarse-grained monolithic black-box passes, or by means of predefined compiler annotations/pragmas. These can be effective, but ofte… ▽ More

    Submitted 9 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

  32. arXiv:2408.14740  [pdf

    cs.CY

    Properties of Effective Information Anonymity Regulations

    Authors: Aloni Cohen, Micah Altman, Francesca Falzon, Evangelina Anna Markatou, Kobbi Nissim

    Abstract: A firm seeks to analyze a dataset and to release the results. The dataset contains information about individual people, and the firm is subject to some regulation that forbids the release of the dataset itself. The regulation also imposes conditions on the release of the results. What properties should the regulation satisfy? We restrict our attention to regulations tailored to controlling the dow… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  33. arXiv:2408.06143  [pdf, other

    cs.RO

    Motion Planning for Minimally Actuated Serial Robots

    Authors: Avi Cohen, Avishai Sintov, David Zarrouk

    Abstract: Modern manipulators are acclaimed for their precision but often struggle to operate in confined spaces. This limitation has driven the development of hyper-redundant and continuum robots. While these present unique advantages, they face challenges in, for instance, weight, mechanical complexity, modeling and costs. The Minimally Actuated Serial Robot (MASR) has been proposed as a light-weight, low… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Journal ref: IEEE RA-L, 2024

  34. arXiv:2407.07080  [pdf, other

    cs.CL

    Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities

    Authors: Shaltiel Shmidman, Avi Shmidman, Amir DN Cohen, Moshe Koppel

    Abstract: Training large language models (LLMs) in low-resource languages such as Hebrew poses unique challenges. In this paper, we introduce DictaLM2.0 and DictaLM2.0-Instruct, two LLMs derived from the Mistral model, trained on a substantial corpus of approximately 200 billion tokens in both Hebrew and English. Adapting a pre-trained model to a new language involves specialized techniques that differ sign… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  35. arXiv:2406.12406  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Fast Rates for Bandit PAC Multiclass Classification

    Authors: Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran

    Abstract: We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(\varepsilon,δ)$-PAC version of the problem, with sample complexity of… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  36. arXiv:2405.11244  [pdf, other

    cs.SC cs.PL

    Strided Difference Bound Matrices

    Authors: Arjun Pitchanathan, Albert Cohen, Oleksandr Zinenko, Tobias Grosser

    Abstract: A wide range of symbolic analysis and optimization problems can be formalized using polyhedra. Sub-classes of polyhedra, also known as sub-polyhedral domains, are sought for their lower space and time complexity. We introduce the Strided Difference Bound Matrix (SDBM) domain, which represents a sweet spot in the context of optimizing compilers. Its expressiveness and efficient algorithms are parti… ▽ More

    Submitted 4 July, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: Preprint and extended from the CAV 2024 conference version. Fixed issue in arxiv version where URLs were not wrapped

  37. arXiv:2405.11109  [pdf, other

    cs.CR cs.AI cs.CL

    Watermarking Language Models for Many Adaptive Users

    Authors: Aloni Cohen, Alexander Hoover, Gabe Schoenbach

    Abstract: We study watermarking schemes for language models with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a language model more than once, as even benign users do. And with just a single exception (Christ and Gunn, 2024), prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but… ▽ More

    Submitted 28 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 39 pages

  38. arXiv:2405.10027  [pdf, ps, other

    cs.LG cs.AI stat.ML

    The Real Price of Bandit Information in Multiclass Classification

    Authors: Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran

    Abstract: We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be… ▽ More

    Submitted 19 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  39. arXiv:2405.06773  [pdf, ps, other

    cs.IT

    A Monotone Circuit Construction for Individually-Secure Multi-Secret Sharing

    Authors: Cailyn Bass, Alejandro Cohen, Rafael G. L. D'Oliveira, Muriel Médard

    Abstract: In this work, we introduce a new technique for taking a single-secret sharing scheme with a general access structure and transforming it into an individually secure multi-secret sharing scheme where every secret has the same general access structure. To increase the information rate, we consider Individual Security which guarantees zero mutual information with each secret individually, for any una… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  40. arXiv:2405.05107  [pdf, other

    cs.ET cs.AR eess.SY

    Leveraging AES Padding: dBs for Nothing and FEC for Free in IoT Systems

    Authors: Jongchan Woo, Vipindev Adat Vasudevan, Benjamin D. Kim, Rafael G. L. D'Oliveira, Alejandro Cohen, Thomas Stahlbuhk, Ken R. Duffy, Muriel Médard

    Abstract: The Internet of Things (IoT) represents a significant advancement in digital technology, with its rapidly growing network of interconnected devices. This expansion, however, brings forth critical challenges in data security and reliability, especially under the threat of increasing cyber vulnerabilities. Addressing the security concerns, the Advanced Encryption Standard (AES) is commonly employed… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  41. arXiv:2405.01495  [pdf, other

    cs.IT cs.CR

    Error Correction Capabilities of Non-Linear Cryptographic Hash Functions

    Authors: Alejandro Cohen, Rafael G. L. D'Oliveira

    Abstract: Linear hashes are known to possess error-correcting capabilities. However, in most applications, non-linear hashes with pseudorandom outputs are utilized instead. It has also been established that classical non-systematic random codes, both linear and non-linear, are capacity achieving in the asymptotic regime. Thus, it is reasonable to expect that non-linear hashes might also exhibit good error-c… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  42. arXiv:2404.17686  [pdf, other

    cs.NI cs.IT

    On the Benefits of Coding for Network Slicing

    Authors: Homa Esfahanizadeh, Vipindev Adat Vasudevan, Benjamin D. Kim, Shruti Siva, Jennifer Kim, Alejandro Cohen, Muriel Médard

    Abstract: Network slicing has emerged as an integral concept in 5G, aiming to partition the physical network infrastructure into isolated slices, customized for specific applications. We theoretically formulate the key performance metrics of an application, in terms of goodput and delivery delay, at a cost of network resources in terms of bandwidth. We explore an un-coded communication protocol that uses fe… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  43. arXiv:2403.18375  [pdf, other

    cs.LG eess.SP

    Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates

    Authors: Natalie Lang, Alejandro Cohen, Nir Shlezinger

    Abstract: Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  44. arXiv:2403.17011  [pdf, other

    cs.LG cs.AI cs.CY

    SUDO: a framework for evaluating clinical artificial intelligence systems without ground-truth annotations

    Authors: Dani Kiyasseh, Aaron Cohen, Chengsheng Jiang, Nicholas Altieri

    Abstract: A clinical artificial intelligence (AI) system is often validated on a held-out set of data which it has not been exposed to before (e.g., data from a different hospital with a distinct electronic health record system). This evaluation process is meant to mimic the deployment of an AI system on data in the wild; those which are currently unseen by the system yet are expected to be encountered in a… ▽ More

    Submitted 2 January, 2024; originally announced March 2024.

  45. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  46. Large language models surpass human experts in predicting neuroscience results

    Authors: Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Yáñez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, Daniele Marinazzo, Jonathan Nicholas, Alessandro Salatiello, Ilia Sucholutsky, Pasquale Minervini, Sepehr Razavi, Roberta Rocca, Elkhan Yusifov, Tereza Okalova, Nianlong Gu, Martin Ferianc, Mikail Khona, Kaustubh R. Patil, Pui-Shee Lee, Rui Mata , et al. (14 additional authors not shown)

    Abstract: Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created Brain… ▽ More

    Submitted 28 November, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: The latest version of this paper has been published at Nature Human Behaviour, please see https://www.nature.com/articles/s41562-024-02046-9

  47. arXiv:2402.11119  [pdf, ps, other

    cs.LG cs.CR cs.DS

    Private PAC Learning May be Harder than Online Learning

    Authors: Mark Bun, Aloni Cohen, Rathin Desai

    Abstract: We continue the study of the computational complexity of differentially private PAC learning and how it is situated within the foundations of machine learning. A recent line of work uncovered a qualitative equivalence between the private PAC model and Littlestone's mistake-bounded model of online learning, in particular, showing that any concept class of Littlestone dimension $d$ can be privately… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  48. arXiv:2402.10018  [pdf, other

    cs.IT q-bio.QM stat.AP

    Non-Adaptive Multi-Stage Algorithm and Bounds for Group Testing with Prior Statistics

    Authors: Ayelet C. Portnoy, Amit Solomon, Alejandro Cohen

    Abstract: In this paper, we propose an efficient multi-stage algorithm for non-adaptive Group Testing (GT) with general correlated prior statistics. The proposed solution can be applied to any correlated statistical prior represented in trellis, e.g., finite state machines and Markov processes. We introduce a variation of List Viterbi Algorithm (LVA) to enable accurate recovery using much fewer tests than o… ▽ More

    Submitted 4 February, 2025; v1 submitted 15 February, 2024; originally announced February 2024.

  49. arXiv:2402.08407  [pdf, other

    cs.CR

    Coding-Based Hybrid Post-Quantum Cryptosystem for Non-Uniform Information

    Authors: Saar Tarnopolsky, Alejandro Cohen

    Abstract: We introduce for non-uniform messages a novel hybrid universal network coding cryptosystem (NU-HUNCC) in the finite blocklength regime that provides Post-Quantum (PQ) security at high communication rates. Recently, hybrid cryptosystems offered PQ security by premixing the data using secure coding schemes and encrypting only a small portion of it, assuming the data is uniformly distributed. An assu… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  50. arXiv:2402.07229  [pdf, other

    cs.IT cs.AI

    Successive Refinement in Large-Scale Computation: Advancing Model Inference Applications

    Authors: Homa Esfahanizadeh, Alejandro Cohen, Shlomo Shamai, Muriel Medard

    Abstract: Modern computationally-intensive applications often operate under time constraints, necessitating acceleration methods and distribution of computational workloads across multiple entities. However, the outcome is either achieved within the desired timeline or not, and in the latter case, valuable resources are wasted. In this paper, we introduce solutions for layered-resolution computation. These… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: 13 pages, partially appeared in proceedings of IEEE Cloudnet 2022, submitted and under review for IEEE Transactions on Signal Processing