Search | arXiv e-print repository

HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification

Authors: David Krongauz, Hido Pinto, Sarah Kohn, Yanir Marmor, Eran Segal

Abstract: Human speech contains paralinguistic cues that reflect a speaker's physiological and neurological state, potentially enabling non-invasive detection of various medical phenotypes. We introduce the Human Phenotype Project Voice corpus (HPP-Voice): a dataset of 7,188 recordings in which Hebrew-speaking adults count for 30 seconds, with each speaker linked to up to 15 potentially voice-related phenot… ▽ More Human speech contains paralinguistic cues that reflect a speaker's physiological and neurological state, potentially enabling non-invasive detection of various medical phenotypes. We introduce the Human Phenotype Project Voice corpus (HPP-Voice): a dataset of 7,188 recordings in which Hebrew-speaking adults count for 30 seconds, with each speaker linked to up to 15 potentially voice-related phenotypes spanning respiratory, sleep, mental health, metabolic, immune, and neurological conditions. We present a systematic comparison of 14 modern speech embedding models, where modern speech embeddings from these 30-second counting tasks outperform MFCCs and demographics for downstream health condition classifications. We found that embedding learned from a speaker identification model can predict objectively measured moderate to severe sleep apnea in males with an AUC of 0.64 $\pm$ 0.03, while MFCC and demographic features led to AUCs of 0.56 $\pm$ 0.02 and 0.57 $\pm$ 0.02, respectively. Additionally, our results reveal gender-specific patterns in model effectiveness across different medical domains. For males, speaker identification and diarization models consistently outperformed speech foundation models for respiratory conditions (e.g., asthma: 0.61 $\pm$ 0.03 vs. 0.56 $\pm$ 0.02) and sleep-related conditions (insomnia: 0.65 $\pm$ 0.04 vs. 0.59 $\pm$ 0.05). For females, speaker diarization models performed best for smoking status (0.61 $\pm$ 0.02 vs 0.55 $\pm$ 0.02), while Hebrew-specific models performed best (0.59 $\pm$ 0.02 vs. 0.58 $\pm$ 0.02) in classifying anxiety compared to speech foundation models. Our findings provide evidence that a simple counting task can support large-scale, multi-phenotypic voice screening and highlight which embedding families generalize best to specific conditions, insights that can guide future vocal biomarker research and clinical deployment. △ Less

Submitted 25 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

Comments: supplementary figures added; typos corrected

arXiv:2505.00949 [pdf, other]

Llama-Nemotron: Efficient Reasoning Models

Authors: Akhiad Bercovich, Itay Levy, Izik Golan, Mohammad Dabbah, Ran El-Yaniv, Omri Puny, Ido Galil, Zach Moshe, Tomer Ronen, Najeeb Nabwani, Ido Shahaf, Oren Tropp, Ehud Karpas, Ran Zilberstein, Jiaqi Zeng, Soumye Singhal, Alexander Bukharin, Yian Zhang, Tugrul Konuk, Gerald Shen, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Yoshi Suhara, Olivier Delalleau, Zijia Chen , et al. (109 additional authors not shown)

Abstract: We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior i… ▽ More We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM. △ Less

Submitted 14 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

arXiv:2504.03624 [pdf, other]

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Authors: NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo , et al. (176 additional authors not shown)

Abstract: As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf… ▽ More As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3$\times$ faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B model using a new compression via pruning and distillation technique called MiniPuzzle. Nemotron-H-47B-Base achieves similar accuracy to the 56B model, but is 20% faster to infer. In addition, we introduce an FP8-based training recipe and show that it can achieve on par results with BF16-based training. This recipe is used to train the 56B model. We are releasing Nemotron-H base model checkpoints with support in Hugging Face and NeMo. △ Less

Submitted 15 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

arXiv:2504.00036 [pdf, other]

Improving Diseases Predictions Utilizing External Bio-Banks

Authors: Hido Pinto, Eran Segal

Abstract: Machine learning has been successfully used in critical domains, such as medicine. However, extracting meaningful insights from biomedical data is often constrained by the lack of their available disease labels. In this research, we demonstrate how machine learning can be leveraged to enhance explainability and uncover biologically meaningful associations, even when predictive improvements in dise… ▽ More Machine learning has been successfully used in critical domains, such as medicine. However, extracting meaningful insights from biomedical data is often constrained by the lack of their available disease labels. In this research, we demonstrate how machine learning can be leveraged to enhance explainability and uncover biologically meaningful associations, even when predictive improvements in disease modeling are limited. We train LightGBM models from scratch on our dataset (10K) to impute metabolomics features and apply them to the UK Biobank (UKBB) for downstream analysis. The imputed metabolomics features are then used in survival analysis to assess their impact on disease-related risk factors. As a result, our approach successfully identified biologically relevant connections that were not previously known to the predictive models. Additionally, we applied a genome-wide association study (GWAS) on key metabolomics features, revealing a link between vascular dementia and smoking. Although being a well-established epidemiological relationship, this link was not embedded in the model's training data, which validated the method's ability to extract meaningful signals. Furthermore, by integrating survival models as inputs in the 10K data, we uncovered associations between metabolic substances and obesity, demonstrating the ability to infer disease risk for future patients without requiring direct outcome labels. These findings highlight the potential of leveraging external bio-banks to extract valuable biomedical insights, even in data-limited scenarios. Our results demonstrate that machine learning models trained on smaller datasets can still be used to uncover real biological associations when carefully integrated with survival analysis and genetic studies. △ Less

Submitted 30 March, 2025; originally announced April 2025.

arXiv:2503.18908 [pdf, other]

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Authors: Akhiad Bercovich, Mohammad Dabbah, Omri Puny, Ido Galil, Amnon Geifman, Yonatan Geifman, Izhak Golan, Ehud Karpas, Itay Levy, Zach Moshe, Najeeb Nabwani, Tomer Ronen, Itamar Schen, Elad Segal, Ido Shahaf, Oren Tropp, Ran Zilberstein, Ran El-Yaniv

Abstract: We introduce FFN Fusion, an architectural optimization technique that reduces sequential computation in large language models by identifying and exploiting natural opportunities for parallelization. Our key insight is that sequences of Feed-Forward Network (FFN) layers, particularly those remaining after the removal of specific attention layers, can often be parallelized with minimal accuracy impa… ▽ More We introduce FFN Fusion, an architectural optimization technique that reduces sequential computation in large language models by identifying and exploiting natural opportunities for parallelization. Our key insight is that sequences of Feed-Forward Network (FFN) layers, particularly those remaining after the removal of specific attention layers, can often be parallelized with minimal accuracy impact. We develop a principled methodology for identifying and fusing such sequences, transforming them into parallel operations that significantly reduce inference latency while preserving model behavior. Applying these techniques to Llama-3.1-405B-Instruct, we create Llama-Nemotron-Ultra-253B-Base (Ultra-253B-Base), an efficient and soon-to-be publicly available model that achieves a 1.71X speedup in inference latency and 35X lower per-token cost while maintaining strong performance across benchmarks. Through extensive experiments on models from 49B to 253B parameters, we demonstrate that FFN Fusion becomes increasingly effective at larger scales and can complement existing optimization techniques like quantization and pruning. Most intriguingly, we find that even full transformer blocks containing both attention and FFN layers can sometimes be parallelized, suggesting new directions for neural architecture design. △ Less

Submitted 24 March, 2025; originally announced March 2025.

arXiv:2412.16276 [pdf, other]

SGAC: A Graph Neural Network Framework for Imbalanced and Structure-Aware AMP Classification

Authors: Yingxu Wang, Victor Liang, Nan Yin, Siwei Liu, Eran Segal

Abstract: Classifying antimicrobial peptides(AMPs) from the vast array of peptides mined from metagenomic sequencing data is a significant approach to addressing the issue of antibiotic resistance. However, current AMP classification methods, primarily relying on sequence-based data, neglect the spatial structure of peptides, thereby limiting the accurate classification of AMPs. Additionally, the number of… ▽ More Classifying antimicrobial peptides(AMPs) from the vast array of peptides mined from metagenomic sequencing data is a significant approach to addressing the issue of antibiotic resistance. However, current AMP classification methods, primarily relying on sequence-based data, neglect the spatial structure of peptides, thereby limiting the accurate classification of AMPs. Additionally, the number of known AMPs is significantly lower than that of non-AMPs, leading to imbalanced datasets that reduce predictive accuracy for AMPs. To alleviate these two limitations, we first employ Omegafold to predict the three-dimensional spatial structures of AMPs and non-AMPs, constructing peptide graphs based on the amino acids' C$_α$ positions. Building upon this, we propose a novel classification model named Spatial GNN-based AMP Classifier (SGAC). Our SGAC model employs a graph encoder based on Graph Neural Networks (GNNs) to process peptide graphs, generating high-dimensional representations that capture essential features from the three-dimensional spatial structure of amino acids. Then, to address the inherent imbalanced datasets, SGAC first incorporates Weight-enhanced Contrastive Learning, which clusters similar peptides while ensuring separation between dissimilar ones, using weighted contributions to emphasize AMP-specific features. Furthermore, SGAC employs Weight-enhanced Pseudo-label Distillation to dynamically generate high-confidence pseudo labels for ambiguous peptides, further refining predictions and promoting balanced learning between AMPs and non-AMPs. Experiments on publicly available AMP and non-AMP datasets demonstrate that SGAC significantly outperforms traditional sequence-based methods and achieves state-of-the-art performance among graph-based models, validating its effectiveness in AMP classification. △ Less

Submitted 20 December, 2024; originally announced December 2024.

arXiv:2412.14748 [pdf, other]

A short guide to GKZ

Authors: Ed Segal

Abstract: These notes are a brief summary of the main results from the book `Discriminants, Resultants and Multidimensional Determinants' by Gelfand-Kapranov-Zelevinsky. We sketch the key ideas involved in the proofs, using as little technical background as possible. These notes are a brief summary of the main results from the book `Discriminants, Resultants and Multidimensional Determinants' by Gelfand-Kapranov-Zelevinsky. We sketch the key ideas involved in the proofs, using as little technical background as possible. △ Less

Submitted 19 December, 2024; originally announced December 2024.

Comments: 21 pages

MSC Class: 14-01

arXiv:2412.06993 [pdf, other]

Toward AI-Driven Digital Organism: Multiscale Foundation Models for Predicting, Simulating and Programming Biology at All Levels

Authors: Le Song, Eran Segal, Eric Xing

Abstract: We present an approach of using AI to model and simulate biology and life. Why is it important? Because at the core of medicine, pharmacy, public health, longevity, agriculture and food security, environmental protection, and clean energy, it is biology at work. Biology in the physical world is too complex to manipulate and always expensive and risky to tamper with. In this perspective, we layout… ▽ More We present an approach of using AI to model and simulate biology and life. Why is it important? Because at the core of medicine, pharmacy, public health, longevity, agriculture and food security, environmental protection, and clean energy, it is biology at work. Biology in the physical world is too complex to manipulate and always expensive and risky to tamper with. In this perspective, we layout an engineering viable approach to address this challenge by constructing an AI-Driven Digital Organism (AIDO), a system of integrated multiscale foundation models, in a modular, connectable, and holistic fashion to reflect biological scales, connectedness, and complexities. An AIDO opens up a safe, affordable and high-throughput alternative platform for predicting, simulating and programming biology at all levels from molecules to cells to individuals. We envision that an AIDO is poised to trigger a new wave of better-guided wet-lab experimentation and better-informed first-principle reasoning, which can eventually help us better decode and improve life. △ Less

Submitted 9 December, 2024; originally announced December 2024.

arXiv:2411.06518 [pdf, other]

Causal Representation Learning from Multimodal Biomedical Observations

Authors: Yuewen Sun, Lingjing Kong, Guangyi Chen, Loka Li, Gongxu Luo, Zijian Li, Yixuan Zhang, Yujia Zheng, Mengyue Yang, Petar Stojanov, Eran Segal, Eric P. Xing, Kun Zhang

Abstract: Prevalent in biomedical applications (e.g., human phenotype research), multimodal datasets can provide valuable insights into the underlying physiological mechanisms. However, current machine learning (ML) models designed to analyze these datasets often lack interpretability and identifiability guarantees, which are essential for biomedical research. Recent advances in causal representation learni… ▽ More Prevalent in biomedical applications (e.g., human phenotype research), multimodal datasets can provide valuable insights into the underlying physiological mechanisms. However, current machine learning (ML) models designed to analyze these datasets often lack interpretability and identifiability guarantees, which are essential for biomedical research. Recent advances in causal representation learning have shown promise in identifying interpretable latent causal variables with formal theoretical guarantees. Unfortunately, most current work on multimodal distributions either relies on restrictive parametric assumptions or yields only coarse identification results, limiting their applicability to biomedical research that favors a detailed understanding of the mechanisms. In this work, we aim to develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets. Theoretically, we consider a nonparametric latent distribution (c.f., parametric assumptions in previous work) that allows for causal relationships across potentially different modalities. We establish identifiability guarantees for each latent component, extending the subspace identification results from previous work. Our key theoretical contribution is the structural sparsity of causal connections between modalities, which, as we will discuss, is natural for a large collection of biomedical systems. Empirically, we present a practical framework to instantiate our theoretical insights. We demonstrate the effectiveness of our approach through extensive experiments on both numerical and synthetic datasets. Results on a real-world human phenotype dataset are consistent with established biomedical research, validating our theoretical and methodological framework. △ Less

Submitted 16 March, 2025; v1 submitted 10 November, 2024; originally announced November 2024.

arXiv:2408.17421 [pdf, other]

Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes

Authors: Li Zhang, Basu Jindal, Ahmed Alaa, Robert Weinreb, David Wilson, Eran Segal, James Zou, Pengtao Xie

Abstract: Semantic segmentation of medical images is pivotal in applications like disease diagnosis and treatment planning. While deep learning has excelled in automating this task, a major hurdle is the need for numerous annotated segmentation masks, which are resource-intensive to produce due to the required expertise and time. This scenario often leads to ultra low-data regimes, where annotated images ar… ▽ More Semantic segmentation of medical images is pivotal in applications like disease diagnosis and treatment planning. While deep learning has excelled in automating this task, a major hurdle is the need for numerous annotated segmentation masks, which are resource-intensive to produce due to the required expertise and time. This scenario often leads to ultra low-data regimes, where annotated images are extremely limited, posing significant challenges for the generalization of conventional deep learning methods on test images. To address this, we introduce a generative deep learning framework, which uniquely generates high-quality paired segmentation masks and medical images, serving as auxiliary data for training robust models in data-scarce environments. Unlike traditional generative models that treat data generation and segmentation model training as separate processes, our method employs multi-level optimization for end-to-end data generation. This approach allows segmentation performance to directly influence the data generation process, ensuring that the generated data is specifically tailored to enhance the performance of the segmentation model. Our method demonstrated strong generalization performance across 9 diverse medical image segmentation tasks and on 16 datasets, in ultra-low data regimes, spanning various diseases, organs, and imaging modalities. When applied to various segmentation models, it achieved performance improvements of 10-20\% (absolute), in both same-domain and out-of-domain scenarios. Notably, it requires 8 to 20 times less training data than existing methods to achieve comparable results. This advancement significantly improves the feasibility and cost-effectiveness of applying deep learning in medical imaging, particularly in scenarios with limited data availability. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2408.11876 [pdf]

From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis

Authors: Guy Lutsker, Gal Sapir, Smadar Shilo, Jordi Merino, Anastasia Godneva, Jerry R Greenfield, Dorit Samocha-Bonet, Raja Dhir, Francisco Gude, Shie Mannor, Eli Meirom, Gal Chechik, Hagai Rossman, Eran Segal

Abstract: Recent advances in SSL enabled novel medical AI models, known as foundation models, offer great potential for better characterizing health from diverse biomedical data. CGM provides rich, temporal data on glycemic patterns, but its full potential for predicting broader health outcomes remains underutilized. Here, we present GluFormer, a generative foundation model for CGM data that learns nuanced… ▽ More Recent advances in SSL enabled novel medical AI models, known as foundation models, offer great potential for better characterizing health from diverse biomedical data. CGM provides rich, temporal data on glycemic patterns, but its full potential for predicting broader health outcomes remains underutilized. Here, we present GluFormer, a generative foundation model for CGM data that learns nuanced glycemic patterns and translates them into predictive representations of metabolic health. Trained on over 10 million CGM measurements from 10,812 adults, primarily without diabetes, GluFormer uses autoregressive token prediction to capture longitudinal glucose dynamics. We show that GluFormer generalizes to 19 external cohorts (n=6,044) spanning different ethnicities and ages, 5 countries, 8 CGM devices, and diverse pathophysiological states. GluFormers representations exceed the performance of current CGM metrics, such as the Glucose Management Indicator (GMI), for forecasting clinical measures. In a longitudinal study of 580 adults with CGM data and 12-year follow-up, GluFormer identifies individuals at elevated risk of developing diabetes more effectively than blood HbA1C%, capturing 66% of all new-onset diabetes diagnoses in the top quartile versus 7% in the bottom quartile. Similarly, 69% of cardiovascular-death events occurred in the top quartile with none in the bottom quartile, demonstrating powerful risk stratification beyond traditional glycemic metrics. We also show that CGM representations from pre-intervention periods in Randomized Clinical Trials outperform other methods in predicting primary and secondary outcomes. When integrating dietary data into GluFormer, we show that the multi-modal version of the model can accurately generate CGM data based on dietary intake data, simulate outcomes of dietary interventions, and predict individual responses to specific foods. △ Less

Submitted 7 January, 2025; v1 submitted 20 August, 2024; originally announced August 2024.

arXiv:2404.11087 [pdf, other]

FrackyFrac: A Standalone UniFrac Calculator

Authors: Amit Lavon, Smadar Shilo, Ayya Keshet, Eran Segal

Abstract: UniFrac is a family of distance metrics over microbial abundances, that take taxonomic relatedness into account. Current tools and libraries for calculating UniFrac have specific requirements regarding the user's technical expertise, operating system, and pre-installed software, which might exclude potential users. FrackyFrac is a native command-line tool that can run on any platform and has no re… ▽ More UniFrac is a family of distance metrics over microbial abundances, that take taxonomic relatedness into account. Current tools and libraries for calculating UniFrac have specific requirements regarding the user's technical expertise, operating system, and pre-installed software, which might exclude potential users. FrackyFrac is a native command-line tool that can run on any platform and has no requirements. It can also generate the phylogenetic trees required for the calculation. We show that FrackyFrac's performance is on par with currently existing implementations. FrackyFrac can make UniFrac accessible to researchers who may otherwise skip it due to the effort involved, and it can simplify analysis pipelines for those who already use it. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.09672 [pdf, other]

COMPRER: A Multimodal Multi-Objective Pretraining Framework for Enhanced Medical Image Representation

Authors: Guy Lutsker, Hagai Rossman, Nastya Godiva, Eran Segal

Abstract: Substantial advances in multi-modal Artificial Intelligence (AI) facilitate the combination of diverse medical modalities to achieve holistic health assessments. We present COMPRER , a novel multi-modal, multi-objective pretraining framework which enhances medical-image representation, diagnostic inferences, and prognosis of diseases. COMPRER employs a multi-objective training framework, where eac… ▽ More Substantial advances in multi-modal Artificial Intelligence (AI) facilitate the combination of diverse medical modalities to achieve holistic health assessments. We present COMPRER , a novel multi-modal, multi-objective pretraining framework which enhances medical-image representation, diagnostic inferences, and prognosis of diseases. COMPRER employs a multi-objective training framework, where each objective introduces distinct knowledge to the model. This includes a multimodal loss that consolidates information across different imaging modalities; A temporal loss that imparts the ability to discern patterns over time; Medical-measure prediction adds appropriate medical insights; Lastly, reconstruction loss ensures the integrity of image structure within the latent space. Despite the concern that multiple objectives could weaken task performance, our findings show that this combination actually boosts outcomes on certain tasks. Here, we apply this framework to both fundus images and carotid ultrasound, and validate our downstream tasks capabilities by predicting both current and future cardiovascular conditions. COMPRER achieved higher Area Under the Curve (AUC) scores in evaluating medical conditions compared to existing models on held-out data. On the Out-of-distribution (OOD) UK-Biobank dataset COMPRER maintains favorable performance over well-established models with more parameters, even though these models were trained on $75\times$ more data than COMPRER. In addition, to better assess our model's performance in contrastive learning, we introduce a novel evaluation metric, providing deeper understanding of the effectiveness of the latent space pairing. △ Less

Submitted 4 February, 2024; originally announced March 2024.

arXiv:2402.05763 [pdf, ps, other]

The McKay correspondence in type $D_4$ via VGIT

Authors: Tarig Abdelgadir, Ed Segal

Abstract: We present an explicit GIT construction which produces both the minimal resolution of the type $D_4$ surface singularity, and also the orbifold resolution. Our construction is based on a Tannakian approach which is in principle applicable to arbitrary quotient singularities. We present an explicit GIT construction which produces both the minimal resolution of the type $D_4$ surface singularity, and also the orbifold resolution. Our construction is based on a Tannakian approach which is in principle applicable to arbitrary quotient singularities. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 12 pages, comments welcome

MSC Class: 14J17; 14L24; 14D23

arXiv:2312.07160 [pdf, other]

Audience Prospecting for Dynamic-Product-Ads in Native Advertising

Authors: Eliran Abutbul, Yohay Kaplan, Naama Krasne, Oren Somekh, Or David, Omer Duvdevany, Evgeny Segal

Abstract: With yearly revenue exceeding one billion USD, Yahoo Gemini native advertising marketplace serves more than two billion impressions daily to hundreds of millions of unique users. One of the fastest growing segments of Gemini native is dynamic-product-ads (DPA), where major advertisers, such as Amazon and Walmart, provide catalogs with millions of products for the system to choose from and present… ▽ More With yearly revenue exceeding one billion USD, Yahoo Gemini native advertising marketplace serves more than two billion impressions daily to hundreds of millions of unique users. One of the fastest growing segments of Gemini native is dynamic-product-ads (DPA), where major advertisers, such as Amazon and Walmart, provide catalogs with millions of products for the system to choose from and present to users. The subject of this work is finding and expanding the right audience for each DPA ad, which is one of the many challenges DPA presents. Approaches such as targeting various user groups, e.g., users who already visited the advertisers' websites (Retargeting), users that searched for certain products (Search-Prospecting), or users that reside in preferred locations (Location-Prospecting), have limited audience expansion capabilities. In this work we present two new approaches for audience expansion that also maintain predefined performance goals. The Conversion-Prospecting approach predicts DPA conversion rates based on Gemini native logged data, and calculates the expected cost-per-action (CPA) for determining users' eligibility to products and optimizing DPA bids in Gemini native auctions. To support new advertisers and products, the Trending-Prospecting approach matches trending products to users by learning their tendency towards products from advertisers' sites logged events. The tendency scores indicate the popularity of the product and the similarity of the user to those who have previously engaged with this product. The two new prospecting approaches were tested online, serving real Gemini native traffic, demonstrating impressive DPA delivery and DPA revenue lifts while maintaining most traffic within the acceptable CPA range (i.e., performance goal). After a successful testing phase, the proposed approaches are currently in production and serve all Gemini native traffic. △ Less

Submitted 13 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: In Proc. IeeeBigData'2023 (Industry and Government Program)

arXiv:2311.08979 [pdf, other]

A Multimodal Dataset of 21,412 Recorded Nights for Sleep and Respiratory Research

Authors: Alon Diament, Maria Gorodetski, Adam Jankelow, Ayya Keshet, Tal Shor, Daphna Weissglas-Volkov, Hagai Rossman, Eran Segal

Abstract: This study introduces a novel, rich dataset obtained from home sleep apnea tests using the FDA-approved WatchPAT-300 device, collected from 7,077 participants over 21,412 nights. The dataset comprises three levels of sleep data: raw multi-channel time-series from sensors, annotated sleep events, and computed summary statistics, which include 447 features related to sleep architecture, sleep apnea,… ▽ More This study introduces a novel, rich dataset obtained from home sleep apnea tests using the FDA-approved WatchPAT-300 device, collected from 7,077 participants over 21,412 nights. The dataset comprises three levels of sleep data: raw multi-channel time-series from sensors, annotated sleep events, and computed summary statistics, which include 447 features related to sleep architecture, sleep apnea, and heart rate variability (HRV). We present reference values for Apnea/Hypopnea Index (AHI), sleep efficiency, Wake After Sleep Onset (WASO), and HRV sample entropy, stratified by age and sex. Moreover, we demonstrate that the dataset improves the predictive capability for various health related traits, including body composition, bone density, blood sugar levels and cardiovascular health. These results illustrate the dataset's potential to advance sleep research, personalized healthcare, and machine learning applications in biomedicine. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 14 pages

arXiv:2309.01724 [pdf, other]

doi 10.1051/0004-6361/202347074

Neural network-based emulation of interstellar medium models

Authors: Pierre Palud, Lucas Einig, Franck Le Petit, Emeric Bron, Pierre Chainais, Jocelyn Chanussot, Jérôme Pety, Pierre-Antoine Thouvenin, David Languignon, Ivana Bešlić, Miriam G. Santa-Maria, Jan H. Orkisz, Léontine E. Ségal, Antoine Zakardjian, Sébastien Bardeau, Maryvonne Gerin, Javier R. Goicoechea, Pierre Gratier, Viviana V. Guzman, Annie Hughes, François Levrier, Harvey S. Liszt, Jacques Le Bourlot, Antoine Roueff, Albrecht Sievers

Abstract: The interpretation of observations of atomic and molecular tracers in the galactic and extragalactic interstellar medium (ISM) requires comparisons with state-of-the-art astrophysical models to infer some physical conditions. Usually, ISM models are too time-consuming for such inference procedures, as they call for numerous model evaluations. As a result, they are often replaced by an interpolatio… ▽ More The interpretation of observations of atomic and molecular tracers in the galactic and extragalactic interstellar medium (ISM) requires comparisons with state-of-the-art astrophysical models to infer some physical conditions. Usually, ISM models are too time-consuming for such inference procedures, as they call for numerous model evaluations. As a result, they are often replaced by an interpolation of a grid of precomputed models. We propose a new general method to derive faster, lighter, and more accurate approximations of the model from a grid of precomputed models. These emulators are defined with artificial neural networks (ANNs) designed and trained to address the specificities inherent in ISM models. Indeed, such models often predict many observables (e.g., line intensities) from just a few input physical parameters and can yield outliers due to numerical instabilities or physical bistabilities. We propose applying five strategies to address these characteristics: 1) an outlier removal procedure; 2) a clustering method that yields homogeneous subsets of lines that are simpler to predict with different ANNs; 3) a dimension reduction technique that enables to adequately size the network architecture; 4) the physical inputs are augmented with a polynomial transform to ease the learning of nonlinearities; and 5) a dense architecture to ease the learning of simple relations. We compare the proposed ANNs with standard classes of interpolation methods to emulate the Meudon PDR code, a representative ISM numerical model. Combinations of the proposed strategies outperform all interpolation methods by a factor of 2 on the average error, reaching 4.5% on the Meudon PDR code. These networks are also 1000 times faster than accurate interpolation methods and require ten to forty times less memory. This work will enable efficient inferences on wide-field multiline observations of the ISM. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Journal ref: A&A 678, A198 (2023)

arXiv:2306.04971 [pdf, other]

A Melting Pot of Evolution and Learning

Authors: Moshe Sipper, Achiya Elyasaf, Tomer Halperin, Zvika Haramaty, Raz Lapid, Eyal Segal, Itai Tzruia, Snir Vitrack Tamam

Abstract: We survey eight recent works by our group, involving the successful blending of evolutionary algorithms with machine learning and deep learning: 1. Binary and Multinomial Classification through Evolutionary Symbolic Regression, 2. Classy Ensemble: A Novel Ensemble Algorithm for Classification, 3. EC-KitY: Evolutionary Computation Tool Kit in Python, 4. Evolution of Activation Functions for Deep Le… ▽ More We survey eight recent works by our group, involving the successful blending of evolutionary algorithms with machine learning and deep learning: 1. Binary and Multinomial Classification through Evolutionary Symbolic Regression, 2. Classy Ensemble: A Novel Ensemble Algorithm for Classification, 3. EC-KitY: Evolutionary Computation Tool Kit in Python, 4. Evolution of Activation Functions for Deep Learning-Based Image Classification, 5. Adaptive Combination of a Genetic Algorithm and Novelty Search for Deep Neuroevolution, 6. An Evolutionary, Gradient-Free, Query-Efficient, Black-Box Algorithm for Generating Adversarial Instances in Deep Networks, 7. Foiling Explanations in Deep Neural Networks, 8. Patch of Invisibility: Naturalistic Black-Box Adversarial Attacks on Object Detectors. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: To Appear in Proceedings of Genetic Programming Theory & Practice XX, 2023

arXiv:2304.10969 [pdf, ps, other]

Equivariant Fukaya categories at singular values

Authors: Yanki Lekili, Ed Segal

Abstract: Given a Hamiltonian torus action on a symplectic manifold, Teleman and Fukaya have proposed that the Fukaya category of each symplectic quotient should be equivalent to an equivariant Fukaya category of the original manifold. We lay out new conjectures that extend this story - in certain situations - to singular values of the moment map. These include a proposal for how, in some cases, we can reco… ▽ More Given a Hamiltonian torus action on a symplectic manifold, Teleman and Fukaya have proposed that the Fukaya category of each symplectic quotient should be equivalent to an equivariant Fukaya category of the original manifold. We lay out new conjectures that extend this story - in certain situations - to singular values of the moment map. These include a proposal for how, in some cases, we can recover the non-equivariant Fukaya category of the original manifold starting from data on the quotient. To justify our conjectures we pass through the mirror and work out numerous examples, using well-established heuristics in toric mirror symmetry. We also discuss the algebraic and categorical structures that underlie our story. △ Less

Submitted 21 April, 2023; originally announced April 2023.

MSC Class: 53D37; 14J33

arXiv:2211.00262 [pdf, other]

Training Vision-Language Models with Less Bimodal Supervision

Authors: Elad Segal, Ben Bogin, Jonathan Berant

Abstract: Standard practice in pretraining multimodal models, such as vision-language models, is to rely on pairs of aligned inputs from both modalities, for example, aligned image-text pairs. However, such pairs can be difficult to obtain in low-resource settings and for some modality pairs (e.g., structured tables and images). In this work, we investigate the extent to which we can reduce the reliance on… ▽ More Standard practice in pretraining multimodal models, such as vision-language models, is to rely on pairs of aligned inputs from both modalities, for example, aligned image-text pairs. However, such pairs can be difficult to obtain in low-resource settings and for some modality pairs (e.g., structured tables and images). In this work, we investigate the extent to which we can reduce the reliance on such parallel data, which we term \emph{bimodal supervision}, and use models that are pretrained on each modality independently. We experiment with a high-performing vision-language model, and analyze the effect of bimodal supervision on three vision-language tasks. We find that on simpler tasks, such as VQAv2 and GQA, one can eliminate bimodal supervision completely, suffering only a minor loss in performance. Conversely, for NLVR2, which requires more complex reasoning, training without bimodal supervision leads to random performance. Nevertheless, using only 5\% of the bimodal data (142K images along with their captions), or leveraging weak supervision in the form of a list of machine-generated labels for each image, leads to only a moderate degradation compared to using 3M image-text pairs: 74\%$\rightarrow$$\sim$70\%. Our code is available at https://github.com/eladsegal/less-bimodal-sup. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: AKBC 2022

arXiv:2209.03618 [pdf, other]

Adaptive Combination of a Genetic Algorithm and Novelty Search for Deep Neuroevolution

Authors: Eyal Segal, Moshe Sipper

Abstract: Evolutionary Computation (EC) has been shown to be able to quickly train Deep Artificial Neural Networks (DNNs) to solve Reinforcement Learning (RL) problems. While a Genetic Algorithm (GA) is well-suited for exploiting reward functions that are neither deceptive nor sparse, it struggles when the reward function is either of those. To that end, Novelty Search (NS) has been shown to be able to outp… ▽ More Evolutionary Computation (EC) has been shown to be able to quickly train Deep Artificial Neural Networks (DNNs) to solve Reinforcement Learning (RL) problems. While a Genetic Algorithm (GA) is well-suited for exploiting reward functions that are neither deceptive nor sparse, it struggles when the reward function is either of those. To that end, Novelty Search (NS) has been shown to be able to outperform gradient-following optimizers in some cases, while under-performing in others. We propose a new algorithm: Explore-Exploit $γ$-Adaptive Learner ($E^2γAL$, or EyAL). By preserving a dynamically-sized niche of novelty-seeking agents, the algorithm manages to maintain population diversity, exploiting the reward signal when possible and exploring otherwise. The algorithm combines both the exploitation power of a GA and the exploration power of NS, while maintaining their simplicity and elegance. Our experiments show that EyAL outperforms NS in most scenarios, while being on par with a GA -- and in some scenarios it can outperform both. EyAL also allows the substitution of the exploiting component (GA) and the exploring component (NS) with other algorithms, e.g., Evolution Strategy and Surprise Search, thus opening the door for future research. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Journal ref: Proceedings of the 14th International Joint Conference on Computational Intelligence (IJCCI 2022)

arXiv:2208.08895 [pdf]

Self-Assembled Fatty Acid Crystalline Coatings Display Non-Toxic Superhydrophobic Antimicrobial Properties

Authors: Elena Prudnikov, Iryna Polishchuk, Andy Sand, Hanan Abu Hamad, Naama Massad-Ivanir, Ester Segal, Boaz Pokroy

Abstract: Superhydrophobcity is a well-known wetting phenomenon found in numerous plants and insects. It is achieved by the combination of the surfaces chemical properties and its surface roughness. Inspired by nature, numerous synthetic superhydrophobic surfaces have been developed for various applications. Designated surface coating is one of the fabrication routes to achieve the superhydrophobicity. Yet,… ▽ More Superhydrophobcity is a well-known wetting phenomenon found in numerous plants and insects. It is achieved by the combination of the surfaces chemical properties and its surface roughness. Inspired by nature, numerous synthetic superhydrophobic surfaces have been developed for various applications. Designated surface coating is one of the fabrication routes to achieve the superhydrophobicity. Yet, many of these coatings, such as fluorine-based formulations, may pose severe health and environmental risks, limiting the applicability. Herein, we present a new family of superhydrophobic coatings comprised of natural saturated fatty acids, which are not only a part of our daily diet, but can be produced from renewable feedstock, providing a safe and sustainable alternative to existing state-of-the-art. These crystalline coatings are readily fabricated via single-step deposition routes, thermal deposition or spray-coating. The fatty acids self-assemble into highly hierarchical crystalline structures exhibiting a water contact angle of about 165 degrees and contact angle hysteresis lower than 6 degrees, while their properties and morphology depend on the specific fatty acid used as well as on the deposition technique. Moreover, the fatty acid coatings demonstrate excellent thermal stability. Importantly these new family of coatings displays excellent anti-biofouling and antimicrobial properties against Escherichia coli and Listeria innocua, used as relevant model Gram-negative and Gram-positive bacteria, respectively. We believe that these coatings have a great application potential in the fields, where other alternatives are prohibited due to safety limitations, while at the same time their usage in other regulation-free applications is not limited. △ Less

Submitted 10 August, 2022; originally announced August 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2205.04793 [pdf, ps, other]

Serre functors of residual categories via hybrid models

Authors: Federico Barbacovi, Ed Segal

Abstract: In this short note we observe that the Serre functor on the residual category of a complete intersection can be easily described in the framework of hybrid models. Using this description we recover some recent results of Kuznetsov and Perry. In this short note we observe that the Serre functor on the residual category of a complete intersection can be easily described in the framework of hybrid models. Using this description we recover some recent results of Kuznetsov and Perry. △ Less

Submitted 23 May, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: v1:9 pages. v2: We thank the referee for pointing out that our Prop. 2.1.2 had already appeared in a paper of Favero-Kelly, with an identical proof. To appear in Bull. LMS

MSC Class: 14F08

arXiv:2201.03533 [pdf, other]

SCROLLS: Standardized CompaRison Over Long Language Sequences

Authors: Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy

Abstract: NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing infor… ▽ More NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input. SCROLLS contains summarization, question answering, and natural language inference tasks, covering multiple domains, including literature, science, business, and entertainment. Initial baselines, including Longformer Encoder-Decoder, indicate that there is ample room for improvement on SCROLLS. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods. △ Less

Submitted 11 October, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

Comments: EMNLP 2022

arXiv:2106.05745 [pdf, other]

Line fields on punctured surfaces and twisted derived categories

Authors: Ed Segal

Abstract: The Fukaya category of a punctured surface can be reconstructed from a pair-of-pants decomposition using a formal construction that attaches a category to a trivalent graph. We extend this formal construction to include a choice of line field on the surface, this requires a certain decoration on the graph. On the mirror side we show that this leads to a kind of twisted derived category which has n… ▽ More The Fukaya category of a punctured surface can be reconstructed from a pair-of-pants decomposition using a formal construction that attaches a category to a trivalent graph. We extend this formal construction to include a choice of line field on the surface, this requires a certain decoration on the graph. On the mirror side we show that this leads to a kind of twisted derived category which has not been widely studied. Mirror symmetry predicts that our category should be an invariant of decorated graphs and we prove that this is indeed the case, using only B-model methods. We also give B-model proofs that a few different mirror constructions are equivalent. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 47 pages

MSC Class: 14F08; 14J33

arXiv:2103.07524 [pdf]

doi 10.1021/acssensors.1c00787

Signal Processing Techniques to Reduce the Limit of Detection for Thin Film Biosensors

Authors: Simon J. Ward, Rabeb Layouni, Sofia Arshavsky-Graham, Ester Segal, Sharon M. Weiss

Abstract: The ultimate detection limit of optical biosensors is often limited by various noise sources, including those introduced by the optical measurement setup. While sophisticated modifications to instrumentation may reduce noise, a simpler approach that can benefit all sensor platforms is the application of signal processing to minimize the deleterious effects of noise. In this work, we show that appl… ▽ More The ultimate detection limit of optical biosensors is often limited by various noise sources, including those introduced by the optical measurement setup. While sophisticated modifications to instrumentation may reduce noise, a simpler approach that can benefit all sensor platforms is the application of signal processing to minimize the deleterious effects of noise. In this work, we show that applying complex Morlet wavelet convolution to Fabry-Pérot interference fringes characteristic of thin film reflectometric biosensors effectively filters out white noise and low frequency reflectance variations. Subsequent calculation of an average difference in phase between the filtered analyte and reference signals enables a significant reduction in the limit of detection (LOD) enabling closer competition with current state-of-the-art techniques. This method is applied on experimental data sets of thin film porous silicon sensors (PSi) in buffered solution and complex media obtained from two different laboratories. The demonstrated improvement in LOD achieved using wavelet convolution and average phase difference paves the way for PSi optical biosensors to operate with clinically relevant detection limits for medical diagnostics, environmental monitoring, and food safety. △ Less

Submitted 12 March, 2021; originally announced March 2021.

Comments: 11 pages, 3 Figures, 2 Tables

Journal ref: ACS Sensors 6, 2967-2978 (2021)

arXiv:2102.08412 [pdf, other]

doi 10.1007/s00220-021-04298-2

Discriminants and semi-orthogonal decompositions

Authors: Alex Kite, Ed Segal

Abstract: The derived categories of toric varieties admit semi-orthogonal decompositions coming from wall-crossing in GIT. We prove that these decompositions satisfy a Jordan-Holder property: the subcategories that appear, and their multiplicities, are independent of the choices made. For Calabi-Yau toric varieties wall-crossing instead gives derived equivalences and autoequivalences, and mirror symmetry… ▽ More The derived categories of toric varieties admit semi-orthogonal decompositions coming from wall-crossing in GIT. We prove that these decompositions satisfy a Jordan-Holder property: the subcategories that appear, and their multiplicities, are independent of the choices made. For Calabi-Yau toric varieties wall-crossing instead gives derived equivalences and autoequivalences, and mirror symmetry relates these to monodromy around the GKZ discriminant locus. We formulate a conjecture equating intersection multiplicities in the discriminant with the multiplicities appearing in certain semi-orthogonal decompositions. We then prove this conjecture in some cases. △ Less

Submitted 2 February, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: v1: 21 pages. v2: minor revisions. Published in Comm. Math. Phys

MSC Class: 14F08; 14J33

arXiv:2101.02235 [pdf, other]

Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

Authors: Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan Berant

Abstract: A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce StrategyQA, a question answering (QA) benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy. A fundamental challenge in this setup is how to elicit such creative que… ▽ More A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce StrategyQA, a question answering (QA) benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy. A fundamental challenge in this setup is how to elicit such creative questions from crowdsourcing workers, while covering a broad range of potential strategies. We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts. Moreover, we annotate each question with (1) a decomposition into reasoning steps for answering it, and (2) Wikipedia paragraphs that contain the answers to each step. Overall, StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs. Analysis shows that questions in StrategyQA are short, topic-diverse, and cover a wide range of strategies. Empirically, we show that humans perform well (87%) on this task, while our best baseline reaches an accuracy of $\sim$66%. △ Less

Submitted 6 January, 2021; originally announced January 2021.

Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2021. Author's final version

arXiv:1909.13375 [pdf, other]

A Simple and Effective Model for Answering Multi-span Questions

Authors: Elad Segal, Avia Efrat, Mor Shoham, Amir Globerson, Jonathan Berant

Abstract: Models for reading comprehension (RC) commonly restrict their output space to the set of all single contiguous spans from the input, in order to alleviate the learning problem and avoid the need for a model that generates text explicitly. However, forcing an answer to be a single span can be restrictive, and some recent datasets also include multi-span questions, i.e., questions whose answer is a… ▽ More Models for reading comprehension (RC) commonly restrict their output space to the set of all single contiguous spans from the input, in order to alleviate the learning problem and avoid the need for a model that generates text explicitly. However, forcing an answer to be a single span can be restrictive, and some recent datasets also include multi-span questions, i.e., questions whose answer is a set of non-contiguous spans in the text. Naturally, models that return single spans cannot answer these questions. In this work, we propose a simple architecture for answering multi-span questions by casting the task as a sequence tagging problem, namely, predicting for each input token whether it should be part of the output or not. Our model substantially improves performance on span extraction questions from DROP and Quoref by 9.9 and 5.5 EM points respectively. △ Less

Submitted 5 October, 2020; v1 submitted 29 September, 2019; originally announced September 2019.

Comments: EMNLP 2020

arXiv:1805.08691 [pdf, other]

Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe

Authors: Jiong Gong, Haihao Shen, Guoming Zhang, Xiaoli Liu, Shane Li, Ge Jin, Niharika Maheshwari, Evarist Fomenko, Eden Segal

Abstract: High throughput and low latency inference of deep neural networks are critical for the deployment of deep learning applications. This paper presents the efficient inference techniques of IntelCaffe, the first Intel optimized deep learning framework that supports efficient 8-bit low precision inference and model optimization techniques of convolutional neural networks on Intel Xeon Scalable Process… ▽ More High throughput and low latency inference of deep neural networks are critical for the deployment of deep learning applications. This paper presents the efficient inference techniques of IntelCaffe, the first Intel optimized deep learning framework that supports efficient 8-bit low precision inference and model optimization techniques of convolutional neural networks on Intel Xeon Scalable Processors. The 8-bit optimized model is automatically generated with a calibration process from FP32 model without the need of fine-tuning or retraining. We show that the inference throughput and latency with ResNet-50, Inception-v3 and SSD are improved by 1.38X-2.9X and 1.35X-3X respectively with neglectable accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and 26X-37X from BVLC Caffe. All these techniques have been open-sourced on IntelCaffe GitHub1, and the artifact is provided to reproduce the result on Amazon AWS Cloud. △ Less

Submitted 4 May, 2018; originally announced May 2018.

Comments: 1st Reproducible Tournament on Pareto-efficient Image Classification, co-held with ASPLOS 2018

arXiv:1805.06440 [pdf, other]

Regularization Learning Networks: Deep Learning for Tabular Datasets

Authors: Ira Shavitt, Eran Segal

Abstract: Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks. We propose that applying a different regularization coefficient to each weight might boost the performance of DNNs by allowing them to make more use of the more relevant inputs. However, this will lead to an intractable number of hyperparam… ▽ More Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks. We propose that applying a different regularization coefficient to each weight might boost the performance of DNNs by allowing them to make more use of the more relevant inputs. However, this will lead to an intractable number of hyperparameters. Here, we introduce Regularization Learning Networks (RLNs), which overcome this challenge by introducing an efficient hyperparameter tuning scheme which minimizes a new Counterfactual Loss. Our results show that RLNs significantly improve DNNs on tabular datasets, and achieve comparable results to GBTs, with the best performance achieved with an ensemble that combines GBTs and RLNs. RLNs produce extremely sparse networks, eliminating up to 99.8% of the network edges and 82% of the input features, thus providing more interpretable models and reveal the importance that the network assigns to different inputs. RLNs could efficiently learn a single network in datasets that comprise both tabular and unstructured data, such as in the setting of medical imaging accompanied by electronic health records. An open source implementation of RLN can be found at https://github.com/irashavitt/regularization_learning_networks. △ Less

Submitted 23 October, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

Comments: Accepted to the 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montreal, Canada

arXiv:1705.01366 [pdf, ps, other]

A non-commutative Bertini theorem

Authors: Jørgen Vold Rennemo, Ed Segal, Michel Van den Bergh

Abstract: We prove a version of the classical 'generic smoothness' theorem with smooth varieties replaced by non-commutative resolutions of singular varieties. This in particular implies a non-commutative version of the Bertini theorem. We prove a version of the classical 'generic smoothness' theorem with smooth varieties replaced by non-commutative resolutions of singular varieties. This in particular implies a non-commutative version of the Bertini theorem. △ Less

Submitted 22 June, 2020; v1 submitted 3 May, 2017; originally announced May 2017.

Comments: 6 pages. v2: added funder acknowledgement. Published in J. Noncommutative Geometry

MSC Class: 14A22 (Primary); 14E15; 16S38 (Secondary)

Journal ref: J. Noncommutative Geometry 13 (2019), no. 2, 609-616

arXiv:1609.04045 [pdf, ps, other]

doi 10.1215/00127094-2019-0014

Hori-mological projective duality

Authors: Jørgen Vold Rennemo, Ed Segal

Abstract: Kuznetsov has conjectured that Pfaffian varieties should admit non-commutative crepant resolutions which satisfy his Homological Projective Duality. We prove half the cases of this conjecture, by interpreting and proving a duality of non-abelian gauged linear sigma models proposed by Hori. Kuznetsov has conjectured that Pfaffian varieties should admit non-commutative crepant resolutions which satisfy his Homological Projective Duality. We prove half the cases of this conjecture, by interpreting and proving a duality of non-abelian gauged linear sigma models proposed by Hori. △ Less

Submitted 22 June, 2020; v1 submitted 13 September, 2016; originally announced September 2016.

Comments: 55 pages. V2: slightly rewritten to take advantage of the `non-commutative Bertini theorem' recently proved by the authors and Van den Bergh. V3: lots of changes in exposition following referees' comments. Section 5 has been mostly cut because it was boring. To appear in Duke Math. J. V3: added funder acknowledgement

MSC Class: 14F05; 81T30; 16E35; 16S38

Journal ref: Duke Math. J. 168, no. 11 (2019), 2127-2205

arXiv:1603.06717 [pdf, ps, other]

All autoequivalences are spherical twists

Authors: Ed Segal

Abstract: In this short note we observe that, for purely formal reasons, any autoequivalence can be constructed as a twist around a spherical functor. As an example, we show how the P-twists constructed by Huybrechts and Thomas can be formulated as spherical twists. In this short note we observe that, for purely formal reasons, any autoequivalence can be constructed as a twist around a spherical functor. As an example, we show how the P-twists constructed by Huybrechts and Thomas can be formulated as spherical twists. △ Less

Submitted 13 October, 2020; v1 submitted 22 March, 2016; originally announced March 2016.

Comments: 11 pages at a relaxed pace. V2: Added funder acknowledgement. Published in IMRN. V3: `Proposition' 3.11 (which was only sketched) is wrong - added a counterexample due to Merlin Christ

MSC Class: 14F05; 18E30

Journal ref: Int. Math. Res. Not. 2018 (2018), no. 10, 3137-3154

arXiv:1506.06999 [pdf, ps, other]

doi 10.1112/blms/bdw026

A new 5-fold flop and derived equivalence

Authors: Ed Segal

Abstract: We describe a new example of a flop in 5-dimensions, due to Roland Abuaf, with the nice feature that the contracting loci on either side are not isomorphic. We prove that the two sides are derived equivalent. We describe a new example of a flop in 5-dimensions, due to Roland Abuaf, with the nice feature that the contracting loci on either side are not isomorphic. We prove that the two sides are derived equivalent. △ Less

Submitted 27 January, 2016; v1 submitted 23 June, 2015; originally announced June 2015.

Comments: v1. It may well be that this example has appeared before - references welcome! v2. Minor changes. Final version, to appear in Bull. London Math. Soc

MSC Class: 14E05; 13D09

arXiv:1410.6829 [pdf, ps, other]

Quintic threefolds and Fano elevenfolds

Authors: Ed Segal, Richard P. Thomas

Abstract: The derived category of coherent sheaves on a general quintic threefold is a central object in mirror symmetry. We show that it can be embedded into the derived category of a certain Fano elevenfold. Our proof also generates related examples in different dimensions. The derived category of coherent sheaves on a general quintic threefold is a central object in mirror symmetry. We show that it can be embedded into the derived category of a certain Fano elevenfold. Our proof also generates related examples in different dimensions. △ Less

Submitted 17 November, 2015; v1 submitted 24 October, 2014; originally announced October 2014.

Comments: V1: 12 pages. V2: added reference to work of Iliev and Manivel. V3: persistent sign error corrected. Other minor changes following referee's suggestions. To appear in Crelle

MSC Class: 14F05; 14J33

arXiv:1410.0027 [pdf, ps, other]

doi 10.4310/PAMQ.2015.v11.n2.a3

K-Theoretic and Categorical Properties of Toric Deligne--Mumford Stacks

Authors: Tom Coates, Hiroshi Iritani, Yunfeng Jiang, Ed Segal

Abstract: We prove the following results for toric Deligne-Mumford stacks, under minimal compactness hypotheses: the Localization Theorem in equivariant K-theory; the equivariant Hirzebruch-Riemann-Roch theorem; the Fourier--Mukai transformation associated to a crepant toric wall-crossing gives an equivariant derived equivalence. We prove the following results for toric Deligne-Mumford stacks, under minimal compactness hypotheses: the Localization Theorem in equivariant K-theory; the equivariant Hirzebruch-Riemann-Roch theorem; the Fourier--Mukai transformation associated to a crepant toric wall-crossing gives an equivariant derived equivalence. △ Less

Submitted 25 August, 2016; v1 submitted 30 September, 2014; originally announced October 2014.

Comments: 14 pages, no figures. v2: references updated, v3: minor revision, final version

MSC Class: 14A20 (Primary); 19L47; 14F05 (Secondary)

Journal ref: Pure and Applied Mathematics Quarterly, Vol. 11, No. 2 (2015), pp. 239-266

arXiv:1401.3661 [pdf, ps, other]

doi 10.14231/AG-2015-015

The Pfaffian-Grassmannian equivalence revisited

Authors: Nicolas Addington, Will Donovan, Ed Segal

Abstract: We give a new proof of the 'Pfaffian-Grassmannian' derived equivalence between certain pairs of non-birational Calabi-Yau threefolds. Our proof follows the physical constructions of Hori and Tong, and we factor the equivalence into three steps by passing through some intermediate categories of (global) matrix factorizations. The first step is global Knoerrer periodicity, the second comes from a bi… ▽ More We give a new proof of the 'Pfaffian-Grassmannian' derived equivalence between certain pairs of non-birational Calabi-Yau threefolds. Our proof follows the physical constructions of Hori and Tong, and we factor the equivalence into three steps by passing through some intermediate categories of (global) matrix factorizations. The first step is global Knoerrer periodicity, the second comes from a birational map between Landau-Ginzburg B-models, and for the third we develop some new techniques. △ Less

Submitted 25 November, 2014; v1 submitted 15 January, 2014; originally announced January 2014.

Comments: Improved exposition, minor corrections. 32 pages

MSC Class: Primary 14F05; 14J32; 18E30; 81T30; Secondary 14M15

Journal ref: Alg. Geom. 2(3):332-364, 2015

arXiv:1310.7877 [pdf, ps, other]

doi 10.1007/s00220-014-2226-3

Mixed braid group actions from deformations of surface singularities

Authors: Will Donovan, Ed Segal

Abstract: We consider a set of toric Calabi-Yau varieties which arise as deformations of the small resolutions of type A surface singularities. By careful analysis of the heuristics of B-brane transport in the associated GLSMs, we predict the existence of a mixed braid group action on the derived category of each variety, and then prove that this action does indeed exist. This generalizes the braid group ac… ▽ More We consider a set of toric Calabi-Yau varieties which arise as deformations of the small resolutions of type A surface singularities. By careful analysis of the heuristics of B-brane transport in the associated GLSMs, we predict the existence of a mixed braid group action on the derived category of each variety, and then prove that this action does indeed exist. This generalizes the braid group action found by Seidel and Thomas for the undeformed resolutions. We also show that the actions for different deformations are related, in a way that is predicted by the physical heuristics. △ Less

Submitted 29 October, 2013; originally announced October 2013.

Comments: 37 pages, including many figures and examples

MSC Class: Primary 14F05; 18E30; Secondary 14J33; 20F36

arXiv:1301.2289 [pdf]

Exact Inference in Networks with Discrete Children of Continuous Parents

Authors: Uri Lerner, Eran Segal, Daphne Koller

Abstract: Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks. Animportant subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can… ▽ More Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks. Animportant subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can be used for exact inference in CLG networks. However, many domains also include discrete variables that depend on continuous ones, and CLG networks do not allow such dependencies to berepresented. No exact inference algorithm has been proposed for these enhanced CLG networks. In this paper, we generalize Lauritzen's algorithm, providing the first "exact" inference algorithm for augmented CLG networks - networks where continuous nodes are conditional linear Gaussians but that also allow discrete children ofcontinuous parents. Our algorithm is exact in the sense that it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to the accuracy obtained by numerical integration used within thealgorithm. When the discrete children are modeled with softmax CPDs (as is the case in many real world domains) the approximation of the continuous distributions using the first two moments is particularly accurate. Our algorithm is simple to implement and often comparable in its complexity to Lauritzen's algorithm. We show empirically that it achieves substantially higher accuracy than previous approximate algorithms. △ Less

Submitted 10 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001)

Report number: UAI-P-2001-PG-319-328

arXiv:1212.2517 [pdf]

Learning Module Networks

Authors: Eran Segal, Dana Pe'er, Aviv Regev, Daphne Koller, Nir Friedman

Abstract: Methods for learning Bayesian network structure can discover dependency structure between observed variables, and have been shown to be useful in many applications. However, in domains that involve a large number of variables, the space of possible network structures is enormous, making it difficult, for both computational and statistical reasons, to identify a good model. In this… ▽ More Methods for learning Bayesian network structure can discover dependency structure between observed variables, and have been shown to be useful in many applications. However, in domains that involve a large number of variables, the space of possible network structures is enormous, making it difficult, for both computational and statistical reasons, to identify a good model. In this paper, we consider a solution to this problem, suitable for domains where many variables have similar behavior. Our method is based on a new class of models, which we call module networks. A module network explicitly represents the notion of a module - a set of variables that have the same parents in the network and share the same conditional probability distribution. We define the semantics of module networks, and describe an algorithm that learns a module network from data. The algorithm learns both the partitioning of the variables into modules and the dependency structure between the variables. We evaluate our algorithm on synthetic data, and on real data in the domains of gene expression and the stock market. Our results show that module networks generalize better than Bayesian networks, and that the learned module network structure reveals regularities that are obscured in learned Bayesian networks. △ Less

Submitted 19 October, 2012; originally announced December 2012.

Comments: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

Report number: UAI-P-2003-PG-525-534

arXiv:1211.2446 [pdf, ps, other]

doi 10.4310/ATMP.2014.v18.n6.a5

D-brane probes, branched double covers, and noncommutative resolutions

Authors: Nicolas Addington, Edward Segal, Eric Sharpe

Abstract: This paper describes D-brane probes of theories arising in abelian gauged linear sigma models (GLSMs) describing branched double covers and noncommutative resolutions thereof, via nonperturbative effects rather than as the critical locus of a superpotential. As these theories can be described as IR limits of Landau-Ginzburg models, technically this paper is an exercise in utilizing (sheafy) matrix… ▽ More This paper describes D-brane probes of theories arising in abelian gauged linear sigma models (GLSMs) describing branched double covers and noncommutative resolutions thereof, via nonperturbative effects rather than as the critical locus of a superpotential. As these theories can be described as IR limits of Landau-Ginzburg models, technically this paper is an exercise in utilizing (sheafy) matrix factorizations. For Landau-Ginzburg models which are believed to flow in the IR to smooth branched double covers, our D-brane probes recover the structure of the branched double cover (and flat nontrivial B fields), verifying previous results. In addition to smooth branched double covers, the same class of Landau-Ginzburg models is also believed to sometimes flow to `noncommutative resolutions' of singular spaces. These noncommutative resolutions are abstract conformal field theories without a global geometric description, but D-brane probes perceive them as non-Kahler small resolutions of a singular Calabi-Yau. We conjecture that such non-Kahler small resolutions are typical in D-brane probes of such theories. △ Less

Submitted 11 November, 2012; originally announced November 2012.

Comments: 61 pages, LaTeX

Journal ref: Adv. Theor. Math. Phys. 18(6):1369-1436, 2014

arXiv:1206.0219 [pdf, ps, other]

doi 10.1112/S0010437X13007641

Window shifts, flop equivalences and Grassmannian twists

Authors: Will Donovan, Ed Segal

Abstract: We introduce a new class of autoequivalences that act on the derived categories of certain vector bundles over Grassmannians. These autoequivalences arise from Grassmannian flops: they generalize Seidel-Thomas spherical twists, which can be seen as arising from standard flops. We first give a simple algebraic construction, which is well-suited to explicit computations. We then give a geometric con… ▽ More We introduce a new class of autoequivalences that act on the derived categories of certain vector bundles over Grassmannians. These autoequivalences arise from Grassmannian flops: they generalize Seidel-Thomas spherical twists, which can be seen as arising from standard flops. We first give a simple algebraic construction, which is well-suited to explicit computations. We then give a geometric construction using spherical functors which we prove is equivalent. △ Less

Submitted 2 October, 2012; v1 submitted 1 June, 2012; originally announced June 2012.

Comments: Improved structure and formatting. Minor edits to some explanations. Added acknowledgements and addresses. 38 pages, 7 figures

MSC Class: 14F05; 18E30 (Primary) 14M15 (Secondary)

Journal ref: Compositio Math. 150 (2014) 942-978

arXiv:0910.5534 [pdf, ps, other]

doi 10.1007/s00220-011-1232-y

Equivalences between GIT quotients of Landau-Ginzburg B-models

Authors: Ed Segal

Abstract: We define the category of B-branes in a (not necessarily affine) Landau-Ginzburg B-model, incorporating the notion of R-charge. Our definition is a direct generalization of the category of perfect complexes. We then consider pairs of Landau-Ginzburg B-models that arise as different GIT quotients of a vector space by a one-dimensional torus, and show that for each such pair the two categories of B-… ▽ More We define the category of B-branes in a (not necessarily affine) Landau-Ginzburg B-model, incorporating the notion of R-charge. Our definition is a direct generalization of the category of perfect complexes. We then consider pairs of Landau-Ginzburg B-models that arise as different GIT quotients of a vector space by a one-dimensional torus, and show that for each such pair the two categories of B-branes are quasi-equivalent. In fact we produce a whole set of quasi-equivalences indexed by the integers, and show that the resulting auto-equivalences are all spherical twists. △ Less

Submitted 24 November, 2010; v1 submitted 29 October, 2009; originally announced October 2009.

Comments: v3: Added two references. Final version, to appear in Comm. Math. Phys

Journal ref: Commun.Math.Phys.304:411-432,2011

arXiv:0904.1339 [pdf, other]

The closed state space of affine Landau-Ginzburg B-models

Authors: Ed Segal

Abstract: We study the category of perfect cdg-modules over a curved algebra, and in particular the category of B-branes in an affine Landau-Ginzburg model. We construct an explicit chain map from the Hochschild complex of the category to the closed state space of the model, and prove that this is a quasi-isomorphism from the Borel-Moore Hochschild complex. Using the lowest-order term of our map we derive K… ▽ More We study the category of perfect cdg-modules over a curved algebra, and in particular the category of B-branes in an affine Landau-Ginzburg model. We construct an explicit chain map from the Hochschild complex of the category to the closed state space of the model, and prove that this is a quasi-isomorphism from the Borel-Moore Hochschild complex. Using the lowest-order term of our map we derive Kapustin and Li's formula for the correlator of an open-string state over a disc. △ Less

Submitted 19 April, 2011; v1 submitted 8 April, 2009; originally announced April 2009.

Comments: Completely rewritten due to errors in the first version

arXiv:0902.3239 [pdf, ps, other]

Gauge Theory in higher dimensions, II

Authors: Simon Donaldson, Ed Segal

Abstract: The main aim of the paper is to develop the "Floer theory" associated to Calabi-Yau 3-folds, exending the analogy of Thomas' "holomorphic Casson invariant". The treatment in the body of the paper is largely formal, assuming appropriate compactness properties of moduli spaces of $G_{2}$-instantons, but in the last section we make some remarks about these compactness isssues. Section 3 of the pape… ▽ More The main aim of the paper is to develop the "Floer theory" associated to Calabi-Yau 3-folds, exending the analogy of Thomas' "holomorphic Casson invariant". The treatment in the body of the paper is largely formal, assuming appropriate compactness properties of moduli spaces of $G_{2}$-instantons, but in the last section we make some remarks about these compactness isssues. Section 3 of the paper contains a general dscussion of deformations of the equations, for gauge field and submanifolds, associated to manifolds with exceptional holonomy. △ Less

Submitted 18 February, 2009; originally announced February 2009.

arXiv:math/0702539 [pdf, ps, other]

The A-infinity Deformation Theory of a Point and the Derived Categories of Local Calabi-Yaus

Authors: Ed Segal

Abstract: Let A be an augmented algebra over a semi-simple algebra S. We show that the Ext algebra of S as an A-module, enriched with its natural A-infinity structure, can be used to reconstruct the completion of A at the augmentation ideal. We use this technical result to justify a calculation in the physics literature describing algebras that are derived equivalent to certain non-compact Calabi-Yau thre… ▽ More Let A be an augmented algebra over a semi-simple algebra S. We show that the Ext algebra of S as an A-module, enriched with its natural A-infinity structure, can be used to reconstruct the completion of A at the augmentation ideal. We use this technical result to justify a calculation in the physics literature describing algebras that are derived equivalent to certain non-compact Calabi-Yau three-folds. Since the calculation produces superpotentials for these algebras we also include some discussion of superpotential algebras and their invariants. △ Less

Submitted 11 July, 2008; v1 submitted 19 February, 2007; originally announced February 2007.

Comments: Final version, to be published in J. Algebra

Showing 1–48 of 48 results for author: Segal, E