-
HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification
Authors:
David Krongauz,
Hido Pinto,
Sarah Kohn,
Yanir Marmor,
Eran Segal
Abstract:
Human speech contains paralinguistic cues that reflect a speaker's physiological and neurological state, potentially enabling non-invasive detection of various medical phenotypes. We introduce the Human Phenotype Project Voice corpus (HPP-Voice): a dataset of 7,188 recordings in which Hebrew-speaking adults count for 30 seconds, with each speaker linked to up to 15 potentially voice-related phenot…
▽ More
Human speech contains paralinguistic cues that reflect a speaker's physiological and neurological state, potentially enabling non-invasive detection of various medical phenotypes. We introduce the Human Phenotype Project Voice corpus (HPP-Voice): a dataset of 7,188 recordings in which Hebrew-speaking adults count for 30 seconds, with each speaker linked to up to 15 potentially voice-related phenotypes spanning respiratory, sleep, mental health, metabolic, immune, and neurological conditions. We present a systematic comparison of 14 modern speech embedding models, where modern speech embeddings from these 30-second counting tasks outperform MFCCs and demographics for downstream health condition classifications. We found that embedding learned from a speaker identification model can predict objectively measured moderate to severe sleep apnea in males with an AUC of 0.64 $\pm$ 0.03, while MFCC and demographic features led to AUCs of 0.56 $\pm$ 0.02 and 0.57 $\pm$ 0.02, respectively. Additionally, our results reveal gender-specific patterns in model effectiveness across different medical domains. For males, speaker identification and diarization models consistently outperformed speech foundation models for respiratory conditions (e.g., asthma: 0.61 $\pm$ 0.03 vs. 0.56 $\pm$ 0.02) and sleep-related conditions (insomnia: 0.65 $\pm$ 0.04 vs. 0.59 $\pm$ 0.05). For females, speaker diarization models performed best for smoking status (0.61 $\pm$ 0.02 vs 0.55 $\pm$ 0.02), while Hebrew-specific models performed best (0.59 $\pm$ 0.02 vs. 0.58 $\pm$ 0.02) in classifying anxiety compared to speech foundation models. Our findings provide evidence that a simple counting task can support large-scale, multi-phenotypic voice screening and highlight which embedding families generalize best to specific conditions, insights that can guide future vocal biomarker research and clinical deployment.
△ Less
Submitted 25 May, 2025; v1 submitted 22 May, 2025;
originally announced May 2025.
-
Llama-Nemotron: Efficient Reasoning Models
Authors:
Akhiad Bercovich,
Itay Levy,
Izik Golan,
Mohammad Dabbah,
Ran El-Yaniv,
Omri Puny,
Ido Galil,
Zach Moshe,
Tomer Ronen,
Najeeb Nabwani,
Ido Shahaf,
Oren Tropp,
Ehud Karpas,
Ran Zilberstein,
Jiaqi Zeng,
Soumye Singhal,
Alexander Bukharin,
Yian Zhang,
Tugrul Konuk,
Gerald Shen,
Ameya Sunil Mahabaleshwarkar,
Bilal Kartal,
Yoshi Suhara,
Olivier Delalleau,
Zijia Chen
, et al. (109 additional authors not shown)
Abstract:
We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior i…
▽ More
We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.
△ Less
Submitted 14 May, 2025; v1 submitted 1 May, 2025;
originally announced May 2025.
-
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Authors:
NVIDIA,
:,
Aaron Blakeman,
Aarti Basant,
Abhinav Khattar,
Adithya Renduchintala,
Akhiad Bercovich,
Aleksander Ficek,
Alexis Bjorlin,
Ali Taghibakhshi,
Amala Sanjay Deshmukh,
Ameya Sunil Mahabaleshwarkar,
Andrew Tao,
Anna Shors,
Ashwath Aithal,
Ashwin Poojary,
Ayush Dattagupta,
Balaram Buddharaju,
Bobby Chen,
Boris Ginsburg,
Boxin Wang,
Brandon Norick,
Brian Butterfield,
Bryan Catanzaro,
Carlo del Mundo
, et al. (176 additional authors not shown)
Abstract:
As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf…
▽ More
As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3$\times$ faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B model using a new compression via pruning and distillation technique called MiniPuzzle. Nemotron-H-47B-Base achieves similar accuracy to the 56B model, but is 20% faster to infer. In addition, we introduce an FP8-based training recipe and show that it can achieve on par results with BF16-based training. This recipe is used to train the 56B model. We are releasing Nemotron-H base model checkpoints with support in Hugging Face and NeMo.
△ Less
Submitted 15 April, 2025; v1 submitted 4 April, 2025;
originally announced April 2025.
-
Improving Diseases Predictions Utilizing External Bio-Banks
Authors:
Hido Pinto,
Eran Segal
Abstract:
Machine learning has been successfully used in critical domains, such as medicine. However, extracting meaningful insights from biomedical data is often constrained by the lack of their available disease labels. In this research, we demonstrate how machine learning can be leveraged to enhance explainability and uncover biologically meaningful associations, even when predictive improvements in dise…
▽ More
Machine learning has been successfully used in critical domains, such as medicine. However, extracting meaningful insights from biomedical data is often constrained by the lack of their available disease labels. In this research, we demonstrate how machine learning can be leveraged to enhance explainability and uncover biologically meaningful associations, even when predictive improvements in disease modeling are limited. We train LightGBM models from scratch on our dataset (10K) to impute metabolomics features and apply them to the UK Biobank (UKBB) for downstream analysis. The imputed metabolomics features are then used in survival analysis to assess their impact on disease-related risk factors. As a result, our approach successfully identified biologically relevant connections that were not previously known to the predictive models. Additionally, we applied a genome-wide association study (GWAS) on key metabolomics features, revealing a link between vascular dementia and smoking. Although being a well-established epidemiological relationship, this link was not embedded in the model's training data, which validated the method's ability to extract meaningful signals. Furthermore, by integrating survival models as inputs in the 10K data, we uncovered associations between metabolic substances and obesity, demonstrating the ability to infer disease risk for future patients without requiring direct outcome labels. These findings highlight the potential of leveraging external bio-banks to extract valuable biomedical insights, even in data-limited scenarios. Our results demonstrate that machine learning models trained on smaller datasets can still be used to uncover real biological associations when carefully integrated with survival analysis and genetic studies.
△ Less
Submitted 30 March, 2025;
originally announced April 2025.
-
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Authors:
Akhiad Bercovich,
Mohammad Dabbah,
Omri Puny,
Ido Galil,
Amnon Geifman,
Yonatan Geifman,
Izhak Golan,
Ehud Karpas,
Itay Levy,
Zach Moshe,
Najeeb Nabwani,
Tomer Ronen,
Itamar Schen,
Elad Segal,
Ido Shahaf,
Oren Tropp,
Ran Zilberstein,
Ran El-Yaniv
Abstract:
We introduce FFN Fusion, an architectural optimization technique that reduces sequential computation in large language models by identifying and exploiting natural opportunities for parallelization. Our key insight is that sequences of Feed-Forward Network (FFN) layers, particularly those remaining after the removal of specific attention layers, can often be parallelized with minimal accuracy impa…
▽ More
We introduce FFN Fusion, an architectural optimization technique that reduces sequential computation in large language models by identifying and exploiting natural opportunities for parallelization. Our key insight is that sequences of Feed-Forward Network (FFN) layers, particularly those remaining after the removal of specific attention layers, can often be parallelized with minimal accuracy impact. We develop a principled methodology for identifying and fusing such sequences, transforming them into parallel operations that significantly reduce inference latency while preserving model behavior. Applying these techniques to Llama-3.1-405B-Instruct, we create Llama-Nemotron-Ultra-253B-Base (Ultra-253B-Base), an efficient and soon-to-be publicly available model that achieves a 1.71X speedup in inference latency and 35X lower per-token cost while maintaining strong performance across benchmarks. Through extensive experiments on models from 49B to 253B parameters, we demonstrate that FFN Fusion becomes increasingly effective at larger scales and can complement existing optimization techniques like quantization and pruning. Most intriguingly, we find that even full transformer blocks containing both attention and FFN layers can sometimes be parallelized, suggesting new directions for neural architecture design.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
SGAC: A Graph Neural Network Framework for Imbalanced and Structure-Aware AMP Classification
Authors:
Yingxu Wang,
Victor Liang,
Nan Yin,
Siwei Liu,
Eran Segal
Abstract:
Classifying antimicrobial peptides(AMPs) from the vast array of peptides mined from metagenomic sequencing data is a significant approach to addressing the issue of antibiotic resistance. However, current AMP classification methods, primarily relying on sequence-based data, neglect the spatial structure of peptides, thereby limiting the accurate classification of AMPs. Additionally, the number of…
▽ More
Classifying antimicrobial peptides(AMPs) from the vast array of peptides mined from metagenomic sequencing data is a significant approach to addressing the issue of antibiotic resistance. However, current AMP classification methods, primarily relying on sequence-based data, neglect the spatial structure of peptides, thereby limiting the accurate classification of AMPs. Additionally, the number of known AMPs is significantly lower than that of non-AMPs, leading to imbalanced datasets that reduce predictive accuracy for AMPs. To alleviate these two limitations, we first employ Omegafold to predict the three-dimensional spatial structures of AMPs and non-AMPs, constructing peptide graphs based on the amino acids' C$_α$ positions. Building upon this, we propose a novel classification model named Spatial GNN-based AMP Classifier (SGAC). Our SGAC model employs a graph encoder based on Graph Neural Networks (GNNs) to process peptide graphs, generating high-dimensional representations that capture essential features from the three-dimensional spatial structure of amino acids. Then, to address the inherent imbalanced datasets, SGAC first incorporates Weight-enhanced Contrastive Learning, which clusters similar peptides while ensuring separation between dissimilar ones, using weighted contributions to emphasize AMP-specific features. Furthermore, SGAC employs Weight-enhanced Pseudo-label Distillation to dynamically generate high-confidence pseudo labels for ambiguous peptides, further refining predictions and promoting balanced learning between AMPs and non-AMPs. Experiments on publicly available AMP and non-AMP datasets demonstrate that SGAC significantly outperforms traditional sequence-based methods and achieves state-of-the-art performance among graph-based models, validating its effectiveness in AMP classification.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
A short guide to GKZ
Authors:
Ed Segal
Abstract:
These notes are a brief summary of the main results from the book `Discriminants, Resultants and Multidimensional Determinants' by Gelfand-Kapranov-Zelevinsky. We sketch the key ideas involved in the proofs, using as little technical background as possible.
These notes are a brief summary of the main results from the book `Discriminants, Resultants and Multidimensional Determinants' by Gelfand-Kapranov-Zelevinsky. We sketch the key ideas involved in the proofs, using as little technical background as possible.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Toward AI-Driven Digital Organism: Multiscale Foundation Models for Predicting, Simulating and Programming Biology at All Levels
Authors:
Le Song,
Eran Segal,
Eric Xing
Abstract:
We present an approach of using AI to model and simulate biology and life. Why is it important? Because at the core of medicine, pharmacy, public health, longevity, agriculture and food security, environmental protection, and clean energy, it is biology at work. Biology in the physical world is too complex to manipulate and always expensive and risky to tamper with. In this perspective, we layout…
▽ More
We present an approach of using AI to model and simulate biology and life. Why is it important? Because at the core of medicine, pharmacy, public health, longevity, agriculture and food security, environmental protection, and clean energy, it is biology at work. Biology in the physical world is too complex to manipulate and always expensive and risky to tamper with. In this perspective, we layout an engineering viable approach to address this challenge by constructing an AI-Driven Digital Organism (AIDO), a system of integrated multiscale foundation models, in a modular, connectable, and holistic fashion to reflect biological scales, connectedness, and complexities. An AIDO opens up a safe, affordable and high-throughput alternative platform for predicting, simulating and programming biology at all levels from molecules to cells to individuals. We envision that an AIDO is poised to trigger a new wave of better-guided wet-lab experimentation and better-informed first-principle reasoning, which can eventually help us better decode and improve life.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Causal Representation Learning from Multimodal Biomedical Observations
Authors:
Yuewen Sun,
Lingjing Kong,
Guangyi Chen,
Loka Li,
Gongxu Luo,
Zijian Li,
Yixuan Zhang,
Yujia Zheng,
Mengyue Yang,
Petar Stojanov,
Eran Segal,
Eric P. Xing,
Kun Zhang
Abstract:
Prevalent in biomedical applications (e.g., human phenotype research), multimodal datasets can provide valuable insights into the underlying physiological mechanisms. However, current machine learning (ML) models designed to analyze these datasets often lack interpretability and identifiability guarantees, which are essential for biomedical research. Recent advances in causal representation learni…
▽ More
Prevalent in biomedical applications (e.g., human phenotype research), multimodal datasets can provide valuable insights into the underlying physiological mechanisms. However, current machine learning (ML) models designed to analyze these datasets often lack interpretability and identifiability guarantees, which are essential for biomedical research. Recent advances in causal representation learning have shown promise in identifying interpretable latent causal variables with formal theoretical guarantees. Unfortunately, most current work on multimodal distributions either relies on restrictive parametric assumptions or yields only coarse identification results, limiting their applicability to biomedical research that favors a detailed understanding of the mechanisms.
In this work, we aim to develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets. Theoretically, we consider a nonparametric latent distribution (c.f., parametric assumptions in previous work) that allows for causal relationships across potentially different modalities. We establish identifiability guarantees for each latent component, extending the subspace identification results from previous work. Our key theoretical contribution is the structural sparsity of causal connections between modalities, which, as we will discuss, is natural for a large collection of biomedical systems. Empirically, we present a practical framework to instantiate our theoretical insights. We demonstrate the effectiveness of our approach through extensive experiments on both numerical and synthetic datasets. Results on a real-world human phenotype dataset are consistent with established biomedical research, validating our theoretical and methodological framework.
△ Less
Submitted 16 March, 2025; v1 submitted 10 November, 2024;
originally announced November 2024.
-
Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes
Authors:
Li Zhang,
Basu Jindal,
Ahmed Alaa,
Robert Weinreb,
David Wilson,
Eran Segal,
James Zou,
Pengtao Xie
Abstract:
Semantic segmentation of medical images is pivotal in applications like disease diagnosis and treatment planning. While deep learning has excelled in automating this task, a major hurdle is the need for numerous annotated segmentation masks, which are resource-intensive to produce due to the required expertise and time. This scenario often leads to ultra low-data regimes, where annotated images ar…
▽ More
Semantic segmentation of medical images is pivotal in applications like disease diagnosis and treatment planning. While deep learning has excelled in automating this task, a major hurdle is the need for numerous annotated segmentation masks, which are resource-intensive to produce due to the required expertise and time. This scenario often leads to ultra low-data regimes, where annotated images are extremely limited, posing significant challenges for the generalization of conventional deep learning methods on test images. To address this, we introduce a generative deep learning framework, which uniquely generates high-quality paired segmentation masks and medical images, serving as auxiliary data for training robust models in data-scarce environments. Unlike traditional generative models that treat data generation and segmentation model training as separate processes, our method employs multi-level optimization for end-to-end data generation. This approach allows segmentation performance to directly influence the data generation process, ensuring that the generated data is specifically tailored to enhance the performance of the segmentation model. Our method demonstrated strong generalization performance across 9 diverse medical image segmentation tasks and on 16 datasets, in ultra-low data regimes, spanning various diseases, organs, and imaging modalities. When applied to various segmentation models, it achieved performance improvements of 10-20\% (absolute), in both same-domain and out-of-domain scenarios. Notably, it requires 8 to 20 times less training data than existing methods to achieve comparable results. This advancement significantly improves the feasibility and cost-effectiveness of applying deep learning in medical imaging, particularly in scenarios with limited data availability.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis
Authors:
Guy Lutsker,
Gal Sapir,
Smadar Shilo,
Jordi Merino,
Anastasia Godneva,
Jerry R Greenfield,
Dorit Samocha-Bonet,
Raja Dhir,
Francisco Gude,
Shie Mannor,
Eli Meirom,
Gal Chechik,
Hagai Rossman,
Eran Segal
Abstract:
Recent advances in SSL enabled novel medical AI models, known as foundation models, offer great potential for better characterizing health from diverse biomedical data. CGM provides rich, temporal data on glycemic patterns, but its full potential for predicting broader health outcomes remains underutilized. Here, we present GluFormer, a generative foundation model for CGM data that learns nuanced…
▽ More
Recent advances in SSL enabled novel medical AI models, known as foundation models, offer great potential for better characterizing health from diverse biomedical data. CGM provides rich, temporal data on glycemic patterns, but its full potential for predicting broader health outcomes remains underutilized. Here, we present GluFormer, a generative foundation model for CGM data that learns nuanced glycemic patterns and translates them into predictive representations of metabolic health. Trained on over 10 million CGM measurements from 10,812 adults, primarily without diabetes, GluFormer uses autoregressive token prediction to capture longitudinal glucose dynamics. We show that GluFormer generalizes to 19 external cohorts (n=6,044) spanning different ethnicities and ages, 5 countries, 8 CGM devices, and diverse pathophysiological states. GluFormers representations exceed the performance of current CGM metrics, such as the Glucose Management Indicator (GMI), for forecasting clinical measures. In a longitudinal study of 580 adults with CGM data and 12-year follow-up, GluFormer identifies individuals at elevated risk of developing diabetes more effectively than blood HbA1C%, capturing 66% of all new-onset diabetes diagnoses in the top quartile versus 7% in the bottom quartile. Similarly, 69% of cardiovascular-death events occurred in the top quartile with none in the bottom quartile, demonstrating powerful risk stratification beyond traditional glycemic metrics. We also show that CGM representations from pre-intervention periods in Randomized Clinical Trials outperform other methods in predicting primary and secondary outcomes. When integrating dietary data into GluFormer, we show that the multi-modal version of the model can accurately generate CGM data based on dietary intake data, simulate outcomes of dietary interventions, and predict individual responses to specific foods.
△ Less
Submitted 7 January, 2025; v1 submitted 20 August, 2024;
originally announced August 2024.
-
FrackyFrac: A Standalone UniFrac Calculator
Authors:
Amit Lavon,
Smadar Shilo,
Ayya Keshet,
Eran Segal
Abstract:
UniFrac is a family of distance metrics over microbial abundances, that take taxonomic relatedness into account. Current tools and libraries for calculating UniFrac have specific requirements regarding the user's technical expertise, operating system, and pre-installed software, which might exclude potential users. FrackyFrac is a native command-line tool that can run on any platform and has no re…
▽ More
UniFrac is a family of distance metrics over microbial abundances, that take taxonomic relatedness into account. Current tools and libraries for calculating UniFrac have specific requirements regarding the user's technical expertise, operating system, and pre-installed software, which might exclude potential users. FrackyFrac is a native command-line tool that can run on any platform and has no requirements. It can also generate the phylogenetic trees required for the calculation. We show that FrackyFrac's performance is on par with currently existing implementations. FrackyFrac can make UniFrac accessible to researchers who may otherwise skip it due to the effort involved, and it can simplify analysis pipelines for those who already use it.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
COMPRER: A Multimodal Multi-Objective Pretraining Framework for Enhanced Medical Image Representation
Authors:
Guy Lutsker,
Hagai Rossman,
Nastya Godiva,
Eran Segal
Abstract:
Substantial advances in multi-modal Artificial Intelligence (AI) facilitate the combination of diverse medical modalities to achieve holistic health assessments. We present COMPRER , a novel multi-modal, multi-objective pretraining framework which enhances medical-image representation, diagnostic inferences, and prognosis of diseases. COMPRER employs a multi-objective training framework, where eac…
▽ More
Substantial advances in multi-modal Artificial Intelligence (AI) facilitate the combination of diverse medical modalities to achieve holistic health assessments. We present COMPRER , a novel multi-modal, multi-objective pretraining framework which enhances medical-image representation, diagnostic inferences, and prognosis of diseases. COMPRER employs a multi-objective training framework, where each objective introduces distinct knowledge to the model. This includes a multimodal loss that consolidates information across different imaging modalities; A temporal loss that imparts the ability to discern patterns over time; Medical-measure prediction adds appropriate medical insights; Lastly, reconstruction loss ensures the integrity of image structure within the latent space. Despite the concern that multiple objectives could weaken task performance, our findings show that this combination actually boosts outcomes on certain tasks. Here, we apply this framework to both fundus images and carotid ultrasound, and validate our downstream tasks capabilities by predicting both current and future cardiovascular conditions. COMPRER achieved higher Area Under the Curve (AUC) scores in evaluating medical conditions compared to existing models on held-out data. On the Out-of-distribution (OOD) UK-Biobank dataset COMPRER maintains favorable performance over well-established models with more parameters, even though these models were trained on $75\times$ more data than COMPRER. In addition, to better assess our model's performance in contrastive learning, we introduce a novel evaluation metric, providing deeper understanding of the effectiveness of the latent space pairing.
△ Less
Submitted 4 February, 2024;
originally announced March 2024.
-
The McKay correspondence in type $D_4$ via VGIT
Authors:
Tarig Abdelgadir,
Ed Segal
Abstract:
We present an explicit GIT construction which produces both the minimal resolution of the type $D_4$ surface singularity, and also the orbifold resolution. Our construction is based on a Tannakian approach which is in principle applicable to arbitrary quotient singularities.
We present an explicit GIT construction which produces both the minimal resolution of the type $D_4$ surface singularity, and also the orbifold resolution. Our construction is based on a Tannakian approach which is in principle applicable to arbitrary quotient singularities.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Audience Prospecting for Dynamic-Product-Ads in Native Advertising
Authors:
Eliran Abutbul,
Yohay Kaplan,
Naama Krasne,
Oren Somekh,
Or David,
Omer Duvdevany,
Evgeny Segal
Abstract:
With yearly revenue exceeding one billion USD, Yahoo Gemini native advertising marketplace serves more than two billion impressions daily to hundreds of millions of unique users. One of the fastest growing segments of Gemini native is dynamic-product-ads (DPA), where major advertisers, such as Amazon and Walmart, provide catalogs with millions of products for the system to choose from and present…
▽ More
With yearly revenue exceeding one billion USD, Yahoo Gemini native advertising marketplace serves more than two billion impressions daily to hundreds of millions of unique users. One of the fastest growing segments of Gemini native is dynamic-product-ads (DPA), where major advertisers, such as Amazon and Walmart, provide catalogs with millions of products for the system to choose from and present to users. The subject of this work is finding and expanding the right audience for each DPA ad, which is one of the many challenges DPA presents. Approaches such as targeting various user groups, e.g., users who already visited the advertisers' websites (Retargeting), users that searched for certain products (Search-Prospecting), or users that reside in preferred locations (Location-Prospecting), have limited audience expansion capabilities. In this work we present two new approaches for audience expansion that also maintain predefined performance goals. The Conversion-Prospecting approach predicts DPA conversion rates based on Gemini native logged data, and calculates the expected cost-per-action (CPA) for determining users' eligibility to products and optimizing DPA bids in Gemini native auctions. To support new advertisers and products, the Trending-Prospecting approach matches trending products to users by learning their tendency towards products from advertisers' sites logged events. The tendency scores indicate the popularity of the product and the similarity of the user to those who have previously engaged with this product. The two new prospecting approaches were tested online, serving real Gemini native traffic, demonstrating impressive DPA delivery and DPA revenue lifts while maintaining most traffic within the acceptable CPA range (i.e., performance goal). After a successful testing phase, the proposed approaches are currently in production and serve all Gemini native traffic.
△ Less
Submitted 13 December, 2023; v1 submitted 12 December, 2023;
originally announced December 2023.
-
A Multimodal Dataset of 21,412 Recorded Nights for Sleep and Respiratory Research
Authors:
Alon Diament,
Maria Gorodetski,
Adam Jankelow,
Ayya Keshet,
Tal Shor,
Daphna Weissglas-Volkov,
Hagai Rossman,
Eran Segal
Abstract:
This study introduces a novel, rich dataset obtained from home sleep apnea tests using the FDA-approved WatchPAT-300 device, collected from 7,077 participants over 21,412 nights. The dataset comprises three levels of sleep data: raw multi-channel time-series from sensors, annotated sleep events, and computed summary statistics, which include 447 features related to sleep architecture, sleep apnea,…
▽ More
This study introduces a novel, rich dataset obtained from home sleep apnea tests using the FDA-approved WatchPAT-300 device, collected from 7,077 participants over 21,412 nights. The dataset comprises three levels of sleep data: raw multi-channel time-series from sensors, annotated sleep events, and computed summary statistics, which include 447 features related to sleep architecture, sleep apnea, and heart rate variability (HRV). We present reference values for Apnea/Hypopnea Index (AHI), sleep efficiency, Wake After Sleep Onset (WASO), and HRV sample entropy, stratified by age and sex. Moreover, we demonstrate that the dataset improves the predictive capability for various health related traits, including body composition, bone density, blood sugar levels and cardiovascular health. These results illustrate the dataset's potential to advance sleep research, personalized healthcare, and machine learning applications in biomedicine.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Neural network-based emulation of interstellar medium models
Authors:
Pierre Palud,
Lucas Einig,
Franck Le Petit,
Emeric Bron,
Pierre Chainais,
Jocelyn Chanussot,
Jérôme Pety,
Pierre-Antoine Thouvenin,
David Languignon,
Ivana Bešlić,
Miriam G. Santa-Maria,
Jan H. Orkisz,
Léontine E. Ségal,
Antoine Zakardjian,
Sébastien Bardeau,
Maryvonne Gerin,
Javier R. Goicoechea,
Pierre Gratier,
Viviana V. Guzman,
Annie Hughes,
François Levrier,
Harvey S. Liszt,
Jacques Le Bourlot,
Antoine Roueff,
Albrecht Sievers
Abstract:
The interpretation of observations of atomic and molecular tracers in the galactic and extragalactic interstellar medium (ISM) requires comparisons with state-of-the-art astrophysical models to infer some physical conditions. Usually, ISM models are too time-consuming for such inference procedures, as they call for numerous model evaluations. As a result, they are often replaced by an interpolatio…
▽ More
The interpretation of observations of atomic and molecular tracers in the galactic and extragalactic interstellar medium (ISM) requires comparisons with state-of-the-art astrophysical models to infer some physical conditions. Usually, ISM models are too time-consuming for such inference procedures, as they call for numerous model evaluations. As a result, they are often replaced by an interpolation of a grid of precomputed models.
We propose a new general method to derive faster, lighter, and more accurate approximations of the model from a grid of precomputed models.
These emulators are defined with artificial neural networks (ANNs) designed and trained to address the specificities inherent in ISM models. Indeed, such models often predict many observables (e.g., line intensities) from just a few input physical parameters and can yield outliers due to numerical instabilities or physical bistabilities. We propose applying five strategies to address these characteristics: 1) an outlier removal procedure; 2) a clustering method that yields homogeneous subsets of lines that are simpler to predict with different ANNs; 3) a dimension reduction technique that enables to adequately size the network architecture; 4) the physical inputs are augmented with a polynomial transform to ease the learning of nonlinearities; and 5) a dense architecture to ease the learning of simple relations.
We compare the proposed ANNs with standard classes of interpolation methods to emulate the Meudon PDR code, a representative ISM numerical model. Combinations of the proposed strategies outperform all interpolation methods by a factor of 2 on the average error, reaching 4.5% on the Meudon PDR code. These networks are also 1000 times faster than accurate interpolation methods and require ten to forty times less memory.
This work will enable efficient inferences on wide-field multiline observations of the ISM.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
A Melting Pot of Evolution and Learning
Authors:
Moshe Sipper,
Achiya Elyasaf,
Tomer Halperin,
Zvika Haramaty,
Raz Lapid,
Eyal Segal,
Itai Tzruia,
Snir Vitrack Tamam
Abstract:
We survey eight recent works by our group, involving the successful blending of evolutionary algorithms with machine learning and deep learning: 1. Binary and Multinomial Classification through Evolutionary Symbolic Regression, 2. Classy Ensemble: A Novel Ensemble Algorithm for Classification, 3. EC-KitY: Evolutionary Computation Tool Kit in Python, 4. Evolution of Activation Functions for Deep Le…
▽ More
We survey eight recent works by our group, involving the successful blending of evolutionary algorithms with machine learning and deep learning: 1. Binary and Multinomial Classification through Evolutionary Symbolic Regression, 2. Classy Ensemble: A Novel Ensemble Algorithm for Classification, 3. EC-KitY: Evolutionary Computation Tool Kit in Python, 4. Evolution of Activation Functions for Deep Learning-Based Image Classification, 5. Adaptive Combination of a Genetic Algorithm and Novelty Search for Deep Neuroevolution, 6. An Evolutionary, Gradient-Free, Query-Efficient, Black-Box Algorithm for Generating Adversarial Instances in Deep Networks, 7. Foiling Explanations in Deep Neural Networks, 8. Patch of Invisibility: Naturalistic Black-Box Adversarial Attacks on Object Detectors.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Equivariant Fukaya categories at singular values
Authors:
Yanki Lekili,
Ed Segal
Abstract:
Given a Hamiltonian torus action on a symplectic manifold, Teleman and Fukaya have proposed that the Fukaya category of each symplectic quotient should be equivalent to an equivariant Fukaya category of the original manifold. We lay out new conjectures that extend this story - in certain situations - to singular values of the moment map. These include a proposal for how, in some cases, we can reco…
▽ More
Given a Hamiltonian torus action on a symplectic manifold, Teleman and Fukaya have proposed that the Fukaya category of each symplectic quotient should be equivalent to an equivariant Fukaya category of the original manifold. We lay out new conjectures that extend this story - in certain situations - to singular values of the moment map. These include a proposal for how, in some cases, we can recover the non-equivariant Fukaya category of the original manifold starting from data on the quotient.
To justify our conjectures we pass through the mirror and work out numerous examples, using well-established heuristics in toric mirror symmetry. We also discuss the algebraic and categorical structures that underlie our story.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Training Vision-Language Models with Less Bimodal Supervision
Authors:
Elad Segal,
Ben Bogin,
Jonathan Berant
Abstract:
Standard practice in pretraining multimodal models, such as vision-language models, is to rely on pairs of aligned inputs from both modalities, for example, aligned image-text pairs. However, such pairs can be difficult to obtain in low-resource settings and for some modality pairs (e.g., structured tables and images). In this work, we investigate the extent to which we can reduce the reliance on…
▽ More
Standard practice in pretraining multimodal models, such as vision-language models, is to rely on pairs of aligned inputs from both modalities, for example, aligned image-text pairs. However, such pairs can be difficult to obtain in low-resource settings and for some modality pairs (e.g., structured tables and images). In this work, we investigate the extent to which we can reduce the reliance on such parallel data, which we term \emph{bimodal supervision}, and use models that are pretrained on each modality independently. We experiment with a high-performing vision-language model, and analyze the effect of bimodal supervision on three vision-language tasks. We find that on simpler tasks, such as VQAv2 and GQA, one can eliminate bimodal supervision completely, suffering only a minor loss in performance. Conversely, for NLVR2, which requires more complex reasoning, training without bimodal supervision leads to random performance. Nevertheless, using only 5\% of the bimodal data (142K images along with their captions), or leveraging weak supervision in the form of a list of machine-generated labels for each image, leads to only a moderate degradation compared to using 3M image-text pairs: 74\%$\rightarrow$$\sim$70\%. Our code is available at https://github.com/eladsegal/less-bimodal-sup.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Adaptive Combination of a Genetic Algorithm and Novelty Search for Deep Neuroevolution
Authors:
Eyal Segal,
Moshe Sipper
Abstract:
Evolutionary Computation (EC) has been shown to be able to quickly train Deep Artificial Neural Networks (DNNs) to solve Reinforcement Learning (RL) problems. While a Genetic Algorithm (GA) is well-suited for exploiting reward functions that are neither deceptive nor sparse, it struggles when the reward function is either of those. To that end, Novelty Search (NS) has been shown to be able to outp…
▽ More
Evolutionary Computation (EC) has been shown to be able to quickly train Deep Artificial Neural Networks (DNNs) to solve Reinforcement Learning (RL) problems. While a Genetic Algorithm (GA) is well-suited for exploiting reward functions that are neither deceptive nor sparse, it struggles when the reward function is either of those. To that end, Novelty Search (NS) has been shown to be able to outperform gradient-following optimizers in some cases, while under-performing in others. We propose a new algorithm: Explore-Exploit $γ$-Adaptive Learner ($E^2γAL$, or EyAL). By preserving a dynamically-sized niche of novelty-seeking agents, the algorithm manages to maintain population diversity, exploiting the reward signal when possible and exploring otherwise. The algorithm combines both the exploitation power of a GA and the exploration power of NS, while maintaining their simplicity and elegance. Our experiments show that EyAL outperforms NS in most scenarios, while being on par with a GA -- and in some scenarios it can outperform both. EyAL also allows the substitution of the exploiting component (GA) and the exploring component (NS) with other algorithms, e.g., Evolution Strategy and Surprise Search, thus opening the door for future research.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Self-Assembled Fatty Acid Crystalline Coatings Display Non-Toxic Superhydrophobic Antimicrobial Properties
Authors:
Elena Prudnikov,
Iryna Polishchuk,
Andy Sand,
Hanan Abu Hamad,
Naama Massad-Ivanir,
Ester Segal,
Boaz Pokroy
Abstract:
Superhydrophobcity is a well-known wetting phenomenon found in numerous plants and insects. It is achieved by the combination of the surfaces chemical properties and its surface roughness. Inspired by nature, numerous synthetic superhydrophobic surfaces have been developed for various applications. Designated surface coating is one of the fabrication routes to achieve the superhydrophobicity. Yet,…
▽ More
Superhydrophobcity is a well-known wetting phenomenon found in numerous plants and insects. It is achieved by the combination of the surfaces chemical properties and its surface roughness. Inspired by nature, numerous synthetic superhydrophobic surfaces have been developed for various applications. Designated surface coating is one of the fabrication routes to achieve the superhydrophobicity. Yet, many of these coatings, such as fluorine-based formulations, may pose severe health and environmental risks, limiting the applicability. Herein, we present a new family of superhydrophobic coatings comprised of natural saturated fatty acids, which are not only a part of our daily diet, but can be produced from renewable feedstock, providing a safe and sustainable alternative to existing state-of-the-art. These crystalline coatings are readily fabricated via single-step deposition routes, thermal deposition or spray-coating. The fatty acids self-assemble into highly hierarchical crystalline structures exhibiting a water contact angle of about 165 degrees and contact angle hysteresis lower than 6 degrees, while their properties and morphology depend on the specific fatty acid used as well as on the deposition technique. Moreover, the fatty acid coatings demonstrate excellent thermal stability. Importantly these new family of coatings displays excellent anti-biofouling and antimicrobial properties against Escherichia coli and Listeria innocua, used as relevant model Gram-negative and Gram-positive bacteria, respectively. We believe that these coatings have a great application potential in the fields, where other alternatives are prohibited due to safety limitations, while at the same time their usage in other regulation-free applications is not limited.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Serre functors of residual categories via hybrid models
Authors:
Federico Barbacovi,
Ed Segal
Abstract:
In this short note we observe that the Serre functor on the residual category of a complete intersection can be easily described in the framework of hybrid models. Using this description we recover some recent results of Kuznetsov and Perry.
In this short note we observe that the Serre functor on the residual category of a complete intersection can be easily described in the framework of hybrid models. Using this description we recover some recent results of Kuznetsov and Perry.
△ Less
Submitted 23 May, 2023; v1 submitted 10 May, 2022;
originally announced May 2022.
-
SCROLLS: Standardized CompaRison Over Long Language Sequences
Authors:
Uri Shaham,
Elad Segal,
Maor Ivgi,
Avia Efrat,
Ori Yoran,
Adi Haviv,
Ankit Gupta,
Wenhan Xiong,
Mor Geva,
Jonathan Berant,
Omer Levy
Abstract:
NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing infor…
▽ More
NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input. SCROLLS contains summarization, question answering, and natural language inference tasks, covering multiple domains, including literature, science, business, and entertainment. Initial baselines, including Longformer Encoder-Decoder, indicate that there is ample room for improvement on SCROLLS. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
△ Less
Submitted 11 October, 2022; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Line fields on punctured surfaces and twisted derived categories
Authors:
Ed Segal
Abstract:
The Fukaya category of a punctured surface can be reconstructed from a pair-of-pants decomposition using a formal construction that attaches a category to a trivalent graph. We extend this formal construction to include a choice of line field on the surface, this requires a certain decoration on the graph. On the mirror side we show that this leads to a kind of twisted derived category which has n…
▽ More
The Fukaya category of a punctured surface can be reconstructed from a pair-of-pants decomposition using a formal construction that attaches a category to a trivalent graph. We extend this formal construction to include a choice of line field on the surface, this requires a certain decoration on the graph. On the mirror side we show that this leads to a kind of twisted derived category which has not been widely studied.
Mirror symmetry predicts that our category should be an invariant of decorated graphs and we prove that this is indeed the case, using only B-model methods. We also give B-model proofs that a few different mirror constructions are equivalent.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Signal Processing Techniques to Reduce the Limit of Detection for Thin Film Biosensors
Authors:
Simon J. Ward,
Rabeb Layouni,
Sofia Arshavsky-Graham,
Ester Segal,
Sharon M. Weiss
Abstract:
The ultimate detection limit of optical biosensors is often limited by various noise sources, including those introduced by the optical measurement setup. While sophisticated modifications to instrumentation may reduce noise, a simpler approach that can benefit all sensor platforms is the application of signal processing to minimize the deleterious effects of noise. In this work, we show that appl…
▽ More
The ultimate detection limit of optical biosensors is often limited by various noise sources, including those introduced by the optical measurement setup. While sophisticated modifications to instrumentation may reduce noise, a simpler approach that can benefit all sensor platforms is the application of signal processing to minimize the deleterious effects of noise. In this work, we show that applying complex Morlet wavelet convolution to Fabry-Pérot interference fringes characteristic of thin film reflectometric biosensors effectively filters out white noise and low frequency reflectance variations. Subsequent calculation of an average difference in phase between the filtered analyte and reference signals enables a significant reduction in the limit of detection (LOD) enabling closer competition with current state-of-the-art techniques. This method is applied on experimental data sets of thin film porous silicon sensors (PSi) in buffered solution and complex media obtained from two different laboratories. The demonstrated improvement in LOD achieved using wavelet convolution and average phase difference paves the way for PSi optical biosensors to operate with clinically relevant detection limits for medical diagnostics, environmental monitoring, and food safety.
△ Less
Submitted 12 March, 2021;
originally announced March 2021.
-
Discriminants and semi-orthogonal decompositions
Authors:
Alex Kite,
Ed Segal
Abstract:
The derived categories of toric varieties admit semi-orthogonal decompositions coming from wall-crossing in GIT. We prove that these decompositions satisfy a Jordan-Holder property: the subcategories that appear, and their multiplicities, are independent of the choices made.
For Calabi-Yau toric varieties wall-crossing instead gives derived equivalences and autoequivalences, and mirror symmetry…
▽ More
The derived categories of toric varieties admit semi-orthogonal decompositions coming from wall-crossing in GIT. We prove that these decompositions satisfy a Jordan-Holder property: the subcategories that appear, and their multiplicities, are independent of the choices made.
For Calabi-Yau toric varieties wall-crossing instead gives derived equivalences and autoequivalences, and mirror symmetry relates these to monodromy around the GKZ discriminant locus. We formulate a conjecture equating intersection multiplicities in the discriminant with the multiplicities appearing in certain semi-orthogonal decompositions. We then prove this conjecture in some cases.
△ Less
Submitted 2 February, 2022; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Authors:
Mor Geva,
Daniel Khashabi,
Elad Segal,
Tushar Khot,
Dan Roth,
Jonathan Berant
Abstract:
A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce StrategyQA, a question answering (QA) benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy. A fundamental challenge in this setup is how to elicit such creative que…
▽ More
A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce StrategyQA, a question answering (QA) benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy. A fundamental challenge in this setup is how to elicit such creative questions from crowdsourcing workers, while covering a broad range of potential strategies. We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts. Moreover, we annotate each question with (1) a decomposition into reasoning steps for answering it, and (2) Wikipedia paragraphs that contain the answers to each step. Overall, StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs. Analysis shows that questions in StrategyQA are short, topic-diverse, and cover a wide range of strategies. Empirically, we show that humans perform well (87%) on this task, while our best baseline reaches an accuracy of $\sim$66%.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
A Simple and Effective Model for Answering Multi-span Questions
Authors:
Elad Segal,
Avia Efrat,
Mor Shoham,
Amir Globerson,
Jonathan Berant
Abstract:
Models for reading comprehension (RC) commonly restrict their output space to the set of all single contiguous spans from the input, in order to alleviate the learning problem and avoid the need for a model that generates text explicitly. However, forcing an answer to be a single span can be restrictive, and some recent datasets also include multi-span questions, i.e., questions whose answer is a…
▽ More
Models for reading comprehension (RC) commonly restrict their output space to the set of all single contiguous spans from the input, in order to alleviate the learning problem and avoid the need for a model that generates text explicitly. However, forcing an answer to be a single span can be restrictive, and some recent datasets also include multi-span questions, i.e., questions whose answer is a set of non-contiguous spans in the text. Naturally, models that return single spans cannot answer these questions. In this work, we propose a simple architecture for answering multi-span questions by casting the task as a sequence tagging problem, namely, predicting for each input token whether it should be part of the output or not. Our model substantially improves performance on span extraction questions from DROP and Quoref by 9.9 and 5.5 EM points respectively.
△ Less
Submitted 5 October, 2020; v1 submitted 29 September, 2019;
originally announced September 2019.
-
Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe
Authors:
Jiong Gong,
Haihao Shen,
Guoming Zhang,
Xiaoli Liu,
Shane Li,
Ge Jin,
Niharika Maheshwari,
Evarist Fomenko,
Eden Segal
Abstract:
High throughput and low latency inference of deep neural networks are critical for the deployment of deep learning applications. This paper presents the efficient inference techniques of IntelCaffe, the first Intel optimized deep learning framework that supports efficient 8-bit low precision inference and model optimization techniques of convolutional neural networks on Intel Xeon Scalable Process…
▽ More
High throughput and low latency inference of deep neural networks are critical for the deployment of deep learning applications. This paper presents the efficient inference techniques of IntelCaffe, the first Intel optimized deep learning framework that supports efficient 8-bit low precision inference and model optimization techniques of convolutional neural networks on Intel Xeon Scalable Processors. The 8-bit optimized model is automatically generated with a calibration process from FP32 model without the need of fine-tuning or retraining. We show that the inference throughput and latency with ResNet-50, Inception-v3 and SSD are improved by 1.38X-2.9X and 1.35X-3X respectively with neglectable accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and 26X-37X from BVLC Caffe. All these techniques have been open-sourced on IntelCaffe GitHub1, and the artifact is provided to reproduce the result on Amazon AWS Cloud.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
Regularization Learning Networks: Deep Learning for Tabular Datasets
Authors:
Ira Shavitt,
Eran Segal
Abstract:
Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks. We propose that applying a different regularization coefficient to each weight might boost the performance of DNNs by allowing them to make more use of the more relevant inputs. However, this will lead to an intractable number of hyperparam…
▽ More
Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks. We propose that applying a different regularization coefficient to each weight might boost the performance of DNNs by allowing them to make more use of the more relevant inputs. However, this will lead to an intractable number of hyperparameters. Here, we introduce Regularization Learning Networks (RLNs), which overcome this challenge by introducing an efficient hyperparameter tuning scheme which minimizes a new Counterfactual Loss. Our results show that RLNs significantly improve DNNs on tabular datasets, and achieve comparable results to GBTs, with the best performance achieved with an ensemble that combines GBTs and RLNs. RLNs produce extremely sparse networks, eliminating up to 99.8% of the network edges and 82% of the input features, thus providing more interpretable models and reveal the importance that the network assigns to different inputs. RLNs could efficiently learn a single network in datasets that comprise both tabular and unstructured data, such as in the setting of medical imaging accompanied by electronic health records. An open source implementation of RLN can be found at https://github.com/irashavitt/regularization_learning_networks.
△ Less
Submitted 23 October, 2018; v1 submitted 16 May, 2018;
originally announced May 2018.
-
A non-commutative Bertini theorem
Authors:
Jørgen Vold Rennemo,
Ed Segal,
Michel Van den Bergh
Abstract:
We prove a version of the classical 'generic smoothness' theorem with smooth varieties replaced by non-commutative resolutions of singular varieties. This in particular implies a non-commutative version of the Bertini theorem.
We prove a version of the classical 'generic smoothness' theorem with smooth varieties replaced by non-commutative resolutions of singular varieties. This in particular implies a non-commutative version of the Bertini theorem.
△ Less
Submitted 22 June, 2020; v1 submitted 3 May, 2017;
originally announced May 2017.
-
Hori-mological projective duality
Authors:
Jørgen Vold Rennemo,
Ed Segal
Abstract:
Kuznetsov has conjectured that Pfaffian varieties should admit non-commutative crepant resolutions which satisfy his Homological Projective Duality. We prove half the cases of this conjecture, by interpreting and proving a duality of non-abelian gauged linear sigma models proposed by Hori.
Kuznetsov has conjectured that Pfaffian varieties should admit non-commutative crepant resolutions which satisfy his Homological Projective Duality. We prove half the cases of this conjecture, by interpreting and proving a duality of non-abelian gauged linear sigma models proposed by Hori.
△ Less
Submitted 22 June, 2020; v1 submitted 13 September, 2016;
originally announced September 2016.
-
All autoequivalences are spherical twists
Authors:
Ed Segal
Abstract:
In this short note we observe that, for purely formal reasons, any autoequivalence can be constructed as a twist around a spherical functor. As an example, we show how the P-twists constructed by Huybrechts and Thomas can be formulated as spherical twists.
In this short note we observe that, for purely formal reasons, any autoequivalence can be constructed as a twist around a spherical functor. As an example, we show how the P-twists constructed by Huybrechts and Thomas can be formulated as spherical twists.
△ Less
Submitted 13 October, 2020; v1 submitted 22 March, 2016;
originally announced March 2016.
-
A new 5-fold flop and derived equivalence
Authors:
Ed Segal
Abstract:
We describe a new example of a flop in 5-dimensions, due to Roland Abuaf, with the nice feature that the contracting loci on either side are not isomorphic. We prove that the two sides are derived equivalent.
We describe a new example of a flop in 5-dimensions, due to Roland Abuaf, with the nice feature that the contracting loci on either side are not isomorphic. We prove that the two sides are derived equivalent.
△ Less
Submitted 27 January, 2016; v1 submitted 23 June, 2015;
originally announced June 2015.
-
Quintic threefolds and Fano elevenfolds
Authors:
Ed Segal,
Richard P. Thomas
Abstract:
The derived category of coherent sheaves on a general quintic threefold is a central object in mirror symmetry. We show that it can be embedded into the derived category of a certain Fano elevenfold.
Our proof also generates related examples in different dimensions.
The derived category of coherent sheaves on a general quintic threefold is a central object in mirror symmetry. We show that it can be embedded into the derived category of a certain Fano elevenfold.
Our proof also generates related examples in different dimensions.
△ Less
Submitted 17 November, 2015; v1 submitted 24 October, 2014;
originally announced October 2014.
-
K-Theoretic and Categorical Properties of Toric Deligne--Mumford Stacks
Authors:
Tom Coates,
Hiroshi Iritani,
Yunfeng Jiang,
Ed Segal
Abstract:
We prove the following results for toric Deligne-Mumford stacks, under minimal compactness hypotheses: the Localization Theorem in equivariant K-theory; the equivariant Hirzebruch-Riemann-Roch theorem; the Fourier--Mukai transformation associated to a crepant toric wall-crossing gives an equivariant derived equivalence.
We prove the following results for toric Deligne-Mumford stacks, under minimal compactness hypotheses: the Localization Theorem in equivariant K-theory; the equivariant Hirzebruch-Riemann-Roch theorem; the Fourier--Mukai transformation associated to a crepant toric wall-crossing gives an equivariant derived equivalence.
△ Less
Submitted 25 August, 2016; v1 submitted 30 September, 2014;
originally announced October 2014.
-
The Pfaffian-Grassmannian equivalence revisited
Authors:
Nicolas Addington,
Will Donovan,
Ed Segal
Abstract:
We give a new proof of the 'Pfaffian-Grassmannian' derived equivalence between certain pairs of non-birational Calabi-Yau threefolds. Our proof follows the physical constructions of Hori and Tong, and we factor the equivalence into three steps by passing through some intermediate categories of (global) matrix factorizations. The first step is global Knoerrer periodicity, the second comes from a bi…
▽ More
We give a new proof of the 'Pfaffian-Grassmannian' derived equivalence between certain pairs of non-birational Calabi-Yau threefolds. Our proof follows the physical constructions of Hori and Tong, and we factor the equivalence into three steps by passing through some intermediate categories of (global) matrix factorizations. The first step is global Knoerrer periodicity, the second comes from a birational map between Landau-Ginzburg B-models, and for the third we develop some new techniques.
△ Less
Submitted 25 November, 2014; v1 submitted 15 January, 2014;
originally announced January 2014.
-
Mixed braid group actions from deformations of surface singularities
Authors:
Will Donovan,
Ed Segal
Abstract:
We consider a set of toric Calabi-Yau varieties which arise as deformations of the small resolutions of type A surface singularities. By careful analysis of the heuristics of B-brane transport in the associated GLSMs, we predict the existence of a mixed braid group action on the derived category of each variety, and then prove that this action does indeed exist. This generalizes the braid group ac…
▽ More
We consider a set of toric Calabi-Yau varieties which arise as deformations of the small resolutions of type A surface singularities. By careful analysis of the heuristics of B-brane transport in the associated GLSMs, we predict the existence of a mixed braid group action on the derived category of each variety, and then prove that this action does indeed exist. This generalizes the braid group action found by Seidel and Thomas for the undeformed resolutions. We also show that the actions for different deformations are related, in a way that is predicted by the physical heuristics.
△ Less
Submitted 29 October, 2013;
originally announced October 2013.
-
Exact Inference in Networks with Discrete Children of Continuous Parents
Authors:
Uri Lerner,
Eran Segal,
Daphne Koller
Abstract:
Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks. Animportant subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can…
▽ More
Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks. Animportant subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can be used for exact inference in CLG networks. However, many domains also include discrete variables that depend on continuous ones, and CLG networks do not allow such dependencies to berepresented. No exact inference algorithm has been proposed for these enhanced CLG networks. In this paper, we generalize Lauritzen's algorithm, providing the first "exact" inference algorithm for augmented CLG networks - networks where continuous nodes are conditional linear Gaussians but that also allow discrete children ofcontinuous parents. Our algorithm is exact in the sense that it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to the accuracy obtained by numerical integration used within thealgorithm. When the discrete children are modeled with softmax CPDs (as is the case in many real world domains) the approximation of the continuous distributions using the first two moments is particularly accurate. Our algorithm is simple to implement and often comparable in its complexity to Lauritzen's algorithm. We show empirically that it achieves substantially higher accuracy than previous approximate algorithms.
△ Less
Submitted 10 January, 2013;
originally announced January 2013.
-
Learning Module Networks
Authors:
Eran Segal,
Dana Pe'er,
Aviv Regev,
Daphne Koller,
Nir Friedman
Abstract:
Methods for learning Bayesian network structure can discover dependency structure between observed variables, and have been shown to be useful in many applications. However, in domains that involve a large number of variables, the space of possible network structures is enormous, making it difficult, for both computational and statistical reasons, to identify a good model. In this…
▽ More
Methods for learning Bayesian network structure can discover dependency structure between observed variables, and have been shown to be useful in many applications. However, in domains that involve a large number of variables, the space of possible network structures is enormous, making it difficult, for both computational and statistical reasons, to identify a good model. In this paper, we consider a solution to this problem, suitable for domains where many variables have similar behavior. Our method is based on a new class of models, which we call module networks. A module network explicitly represents the notion of a module - a set of variables that have the same parents in the network and share the same conditional probability distribution. We define the semantics of module networks, and describe an algorithm that learns a module network from data. The algorithm learns both the partitioning of the variables into modules and the dependency structure between the variables. We evaluate our algorithm on synthetic data, and on real data in the domains of gene expression and the stock market. Our results show that module networks generalize better than Bayesian networks, and that the learned module network structure reveals regularities that are obscured in learned Bayesian networks.
△ Less
Submitted 19 October, 2012;
originally announced December 2012.
-
D-brane probes, branched double covers, and noncommutative resolutions
Authors:
Nicolas Addington,
Edward Segal,
Eric Sharpe
Abstract:
This paper describes D-brane probes of theories arising in abelian gauged linear sigma models (GLSMs) describing branched double covers and noncommutative resolutions thereof, via nonperturbative effects rather than as the critical locus of a superpotential. As these theories can be described as IR limits of Landau-Ginzburg models, technically this paper is an exercise in utilizing (sheafy) matrix…
▽ More
This paper describes D-brane probes of theories arising in abelian gauged linear sigma models (GLSMs) describing branched double covers and noncommutative resolutions thereof, via nonperturbative effects rather than as the critical locus of a superpotential. As these theories can be described as IR limits of Landau-Ginzburg models, technically this paper is an exercise in utilizing (sheafy) matrix factorizations. For Landau-Ginzburg models which are believed to flow in the IR to smooth branched double covers, our D-brane probes recover the structure of the branched double cover (and flat nontrivial B fields), verifying previous results. In addition to smooth branched double covers, the same class of Landau-Ginzburg models is also believed to sometimes flow to `noncommutative resolutions' of singular spaces. These noncommutative resolutions are abstract conformal field theories without a global geometric description, but D-brane probes perceive them as non-Kahler small resolutions of a singular Calabi-Yau. We conjecture that such non-Kahler small resolutions are typical in D-brane probes of such theories.
△ Less
Submitted 11 November, 2012;
originally announced November 2012.
-
Window shifts, flop equivalences and Grassmannian twists
Authors:
Will Donovan,
Ed Segal
Abstract:
We introduce a new class of autoequivalences that act on the derived categories of certain vector bundles over Grassmannians. These autoequivalences arise from Grassmannian flops: they generalize Seidel-Thomas spherical twists, which can be seen as arising from standard flops. We first give a simple algebraic construction, which is well-suited to explicit computations. We then give a geometric con…
▽ More
We introduce a new class of autoequivalences that act on the derived categories of certain vector bundles over Grassmannians. These autoequivalences arise from Grassmannian flops: they generalize Seidel-Thomas spherical twists, which can be seen as arising from standard flops. We first give a simple algebraic construction, which is well-suited to explicit computations. We then give a geometric construction using spherical functors which we prove is equivalent.
△ Less
Submitted 2 October, 2012; v1 submitted 1 June, 2012;
originally announced June 2012.
-
Equivalences between GIT quotients of Landau-Ginzburg B-models
Authors:
Ed Segal
Abstract:
We define the category of B-branes in a (not necessarily affine) Landau-Ginzburg B-model, incorporating the notion of R-charge. Our definition is a direct generalization of the category of perfect complexes. We then consider pairs of Landau-Ginzburg B-models that arise as different GIT quotients of a vector space by a one-dimensional torus, and show that for each such pair the two categories of B-…
▽ More
We define the category of B-branes in a (not necessarily affine) Landau-Ginzburg B-model, incorporating the notion of R-charge. Our definition is a direct generalization of the category of perfect complexes. We then consider pairs of Landau-Ginzburg B-models that arise as different GIT quotients of a vector space by a one-dimensional torus, and show that for each such pair the two categories of B-branes are quasi-equivalent. In fact we produce a whole set of quasi-equivalences indexed by the integers, and show that the resulting auto-equivalences are all spherical twists.
△ Less
Submitted 24 November, 2010; v1 submitted 29 October, 2009;
originally announced October 2009.
-
The closed state space of affine Landau-Ginzburg B-models
Authors:
Ed Segal
Abstract:
We study the category of perfect cdg-modules over a curved algebra, and in particular the category of B-branes in an affine Landau-Ginzburg model. We construct an explicit chain map from the Hochschild complex of the category to the closed state space of the model, and prove that this is a quasi-isomorphism from the Borel-Moore Hochschild complex. Using the lowest-order term of our map we derive K…
▽ More
We study the category of perfect cdg-modules over a curved algebra, and in particular the category of B-branes in an affine Landau-Ginzburg model. We construct an explicit chain map from the Hochschild complex of the category to the closed state space of the model, and prove that this is a quasi-isomorphism from the Borel-Moore Hochschild complex. Using the lowest-order term of our map we derive Kapustin and Li's formula for the correlator of an open-string state over a disc.
△ Less
Submitted 19 April, 2011; v1 submitted 8 April, 2009;
originally announced April 2009.
-
Gauge Theory in higher dimensions, II
Authors:
Simon Donaldson,
Ed Segal
Abstract:
The main aim of the paper is to develop the "Floer theory" associated to Calabi-Yau 3-folds, exending the analogy of Thomas' "holomorphic Casson invariant". The treatment in the body of the paper is largely formal, assuming appropriate compactness properties of moduli spaces of $G_{2}$-instantons, but in the last section we make some remarks about these compactness isssues. Section 3 of the pape…
▽ More
The main aim of the paper is to develop the "Floer theory" associated to Calabi-Yau 3-folds, exending the analogy of Thomas' "holomorphic Casson invariant". The treatment in the body of the paper is largely formal, assuming appropriate compactness properties of moduli spaces of $G_{2}$-instantons, but in the last section we make some remarks about these compactness isssues. Section 3 of the paper contains a general dscussion of deformations of the equations, for gauge field and submanifolds, associated to manifolds with exceptional holonomy.
△ Less
Submitted 18 February, 2009;
originally announced February 2009.
-
The A-infinity Deformation Theory of a Point and the Derived Categories of Local Calabi-Yaus
Authors:
Ed Segal
Abstract:
Let A be an augmented algebra over a semi-simple algebra S. We show that the Ext algebra of S as an A-module, enriched with its natural A-infinity structure, can be used to reconstruct the completion of A at the augmentation ideal. We use this technical result to justify a calculation in the physics literature describing algebras that are derived equivalent to certain non-compact Calabi-Yau thre…
▽ More
Let A be an augmented algebra over a semi-simple algebra S. We show that the Ext algebra of S as an A-module, enriched with its natural A-infinity structure, can be used to reconstruct the completion of A at the augmentation ideal. We use this technical result to justify a calculation in the physics literature describing algebras that are derived equivalent to certain non-compact Calabi-Yau three-folds. Since the calculation produces superpotentials for these algebras we also include some discussion of superpotential algebras and their invariants.
△ Less
Submitted 11 July, 2008; v1 submitted 19 February, 2007;
originally announced February 2007.