-
MiMo-VL Technical Report
Authors:
Xiaomi LLM-Core Team,
:,
Zihao Yue,
Zhenru Lin,
Yifan Song,
Weikun Wang,
Shuhuai Ren,
Shuhao Gu,
Shicheng Li,
Peidian Li,
Liang Zhao,
Lei Li,
Kainan Bao,
Hao Tian,
Hailin Zhang,
Gang Wang,
Dawei Zhu,
Cici,
Chenhong He,
Bowen Ye,
Bowen Shen,
Zihan Zhang,
Zihan Jiang,
Zhixian Zheng,
Zhichao Song
, et al. (50 additional authors not shown)
Abstract:
We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding applications, it sets a new standard with…
▽ More
We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding applications, it sets a new standard with 56.1 on OSWorld-G, even outperforming specialized models such as UI-TARS. Our training combines four-stage pre-training (2.4 trillion tokens) with Mixed On-policy Reinforcement Learning (MORL) integrating diverse reward signals. We identify the importance of incorporating high-quality reasoning data with long Chain-of-Thought into pre-training stages, and the benefits of mixed RL despite challenges in simultaneous multi-domain optimization. We also contribute a comprehensive evaluation suite covering 50+ tasks to promote reproducibility and advance the field. The model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-VL.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Cell-Scale Dynamic Modeling of Membrane Interactions with Arbitrarily Shaped Particles
Authors:
Didarul Ahasan Redwan,
Justin Reicher,
Xin Yong
Abstract:
Modeling membrane interactions with arbitrarily shaped colloidal particles, such as environmental micro- and nanoplastics, at the cell scale remains particularly challenging, owing to the complexity of particle geometries and the need to resolve fully coupled translational and rotational dynamics. Here, we present a force-based computational framework capable of capturing dynamic interactions betw…
▽ More
Modeling membrane interactions with arbitrarily shaped colloidal particles, such as environmental micro- and nanoplastics, at the cell scale remains particularly challenging, owing to the complexity of particle geometries and the need to resolve fully coupled translational and rotational dynamics. Here, we present a force-based computational framework capable of capturing dynamic interactions between deformable lipid vesicles and rigid particles of irregular shapes. Both vesicle and particle surfaces are represented using triangulated meshes, and Langevin dynamics resolves membrane deformation alongside rigid-body particle motion. Adhesive interactions between the particle and membrane surfaces are modeled using two numerical schemes: a vertex-to-vertex mapping and a vertex-to-surface projection. The latter yields more accurate wrapping energetics, as demonstrated by benchmark comparisons against ideal spheres. The dynamic simulations reveal that lower particle-to-vesicle mass ratios facilitate frequent particle reorientation and complete membrane wrapping, while higher mass ratios limit orientation changes and stabilize partial wrapping. To illustrate the framework's versatility, we simulate interactions involving cubical, rod-like, bowl-shaped, and tetrahedral particles with spherical, cigar-shaped, or biconcave vesicles. This generalizable modeling approach enables predictive, cell-scale studies of membrane-particle interactions across a wide range of geometries, with applications in environmental biophysics and nanomedicine.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens
Authors:
Xixian Yong,
Xiao Zhou,
Yingying Zhang,
Jinlin Li,
Yefeng Zheng,
Xian Wu
Abstract:
The recent rise of Large Reasoning Models (LRMs) has significantly improved multi-step reasoning performance, but often at the cost of generating excessively long reasoning chains. This paper revisits the efficiency of such reasoning processes through an information-theoretic lens, revealing a fundamental trade-off between reasoning length and semantic efficiency. We propose two metrics, InfoBias…
▽ More
The recent rise of Large Reasoning Models (LRMs) has significantly improved multi-step reasoning performance, but often at the cost of generating excessively long reasoning chains. This paper revisits the efficiency of such reasoning processes through an information-theoretic lens, revealing a fundamental trade-off between reasoning length and semantic efficiency. We propose two metrics, InfoBias and InfoGain, to quantify divergence from ideal reasoning paths and stepwise information contribution, respectively. Empirical analyses show that longer reasoning chains tend to exhibit higher information bias and diminishing information gain, especially for incorrect answers. Motivated by these findings, we introduce an entropy-based Adaptive Think strategy that dynamically halts reasoning once confidence is sufficiently high, improving efficiency while maintaining competitive accuracy. Compared to the Vanilla Think approach (default mode), our strategy yields a 1.10% improvement in average accuracy and a 50.80% reduction in token usage on QwQ-32B across six benchmark tasks spanning diverse reasoning types and difficulty levels, demonstrating superior efficiency and reasoning performance. These results underscore the promise of entropy-based methods for enhancing both accuracy and cost-effiiciency in large language model deployment.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Authors:
LLM-Core Xiaomi,
:,
Bingquan Xia,
Bowen Shen,
Cici,
Dawei Zhu,
Di Zhang,
Gang Wang,
Hailin Zhang,
Huaqiu Liu,
Jiebao Xiao,
Jinhao Dong,
Liang Zhao,
Peidian Li,
Peng Wang,
Shihua Yu,
Shimao Chen,
Weikun Wang,
Wenhan Ma,
Xiangwei Deng,
Yi Huang,
Yifan Song,
Zihan Jiang,
Bowen Ye,
Can Cai
, et al. (40 additional authors not shown)
Abstract:
We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective…
▽ More
We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective for enhanced performance and accelerated inference speed. During post-training, we curate a dataset of 130K verifiable mathematics and programming problems for reinforcement learning, integrating a test-difficulty-driven code-reward scheme to alleviate sparse-reward issues and employing strategic data resampling to stabilize training. Extensive evaluations show that MiMo-7B-Base possesses exceptional reasoning potential, outperforming even much larger 32B models. The final RL-tuned model, MiMo-7B-RL, achieves superior performance on mathematics, code and general reasoning tasks, surpassing the performance of OpenAI o1-mini. The model checkpoints are available at https://github.com/xiaomimimo/MiMo.
△ Less
Submitted 5 June, 2025; v1 submitted 12 May, 2025;
originally announced May 2025.
-
SEA-LION: Southeast Asian Languages in One Network
Authors:
Raymond Ng,
Thanh Ngan Nguyen,
Yuli Huang,
Ngee Chia Tai,
Wai Yi Leong,
Wei Qi Leong,
Xianbin Yong,
Jian Gang Ngui,
Yosephine Susanto,
Nicholas Cheng,
Hamsawardhini Rengarajan,
Peerat Limkonchotiwat,
Adithya Venkatadri Hulagadri,
Kok Wai Teng,
Yeo Yeow Tong,
Bryan Siow,
Wei Yi Teo,
Wayne Lau,
Choon Meng Tan,
Brandon Ong,
Zhi Hao Ong,
Jann Railey Montalan,
Adwin Chan,
Sajeban Antonyrex,
Ren Lee
, et al. (6 additional authors not shown)
Abstract:
Recently, Large Language Models (LLMs) have dominated much of the artificial intelligence scene with their ability to process and generate natural languages. However, the majority of LLM research and development remains English-centric, leaving low-resource languages such as those in the Southeast Asian (SEA) region under-represented. To address this representation gap, we introduce Llama-SEA-LION…
▽ More
Recently, Large Language Models (LLMs) have dominated much of the artificial intelligence scene with their ability to process and generate natural languages. However, the majority of LLM research and development remains English-centric, leaving low-resource languages such as those in the Southeast Asian (SEA) region under-represented. To address this representation gap, we introduce Llama-SEA-LION-v3-8B-IT and Gemma-SEA-LION-v3-9B-IT, two cutting-edge multilingual LLMs designed for SEA languages. The SEA-LION family of LLMs supports 11 SEA languages, namely English, Chinese, Indonesian, Vietnamese, Malay, Thai, Burmese, Lao, Filipino, Tamil, and Khmer. Our work leverages large-scale multilingual continued pre-training with a comprehensive post-training regime involving multiple stages of instruction fine-tuning, alignment, and model merging. Evaluation results on multilingual benchmarks indicate that our models achieve state-of-the-art performance across LLMs supporting SEA languages. We open-source the models to benefit the wider SEA community.
△ Less
Submitted 15 April, 2025; v1 submitted 8 April, 2025;
originally announced April 2025.
-
Anomalous energy gap in superconducting La$_{2.85}$Pr$_{0.15}$Ni$_2$O$_7$/SrLaAlO$_4$ heterostructures
Authors:
Jianchang Shen,
Yu Miao,
Zhipeng Ou,
Guangdi Zhou,
Yaqi Chen,
Runqing Luan,
Hongxu Sun,
Zikun Feng,
Xinru Yong,
Peng Li,
Yueying Li,
Lizhi Xu,
Wei Lv,
Zihao Nie,
Heng Wang,
Haoliang Huang,
Yu-Jie Sun,
Qi-Kun Xue,
Zhuoyu Chen,
Junfeng He
Abstract:
The discovery of superconductivity in bilayer nickelate thin films under ambient pressure provides an opportunity to directly investigate characteristic energy scales of the superconducting state from electronic structure. Here, we successfully probe the energy gap and dispersion renormalization in one unit-cell La$_{2.85}$Pr$_{0.15}$Ni$_2$O$_7$ films epitaxially grown on SrLaAlO$_4$ substrates, b…
▽ More
The discovery of superconductivity in bilayer nickelate thin films under ambient pressure provides an opportunity to directly investigate characteristic energy scales of the superconducting state from electronic structure. Here, we successfully probe the energy gap and dispersion renormalization in one unit-cell La$_{2.85}$Pr$_{0.15}$Ni$_2$O$_7$ films epitaxially grown on SrLaAlO$_4$ substrates, by developing an ultra-high vacuum quenching technique for in-situ angle-resolved photoemission spectroscopy measurements. The energy gap is observed on the underlying Fermi surface without showing a node along the Brillouin zone diagonal. This gap exhibits particle-hole symmetric behavior and persists to a temperature higher than the superconducting transition temperature, indicating the existence of a pseudogap. An abrupt band renormalization is observed with a dispersion anomaly at ~70 meV below Fermi level, pointing to another energy scale besides the energy gap. These observations provide initial information on fundamental properties of the superconducting state in bilayer nickelates under ambient pressure.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
SEA-HELM: Southeast Asian Holistic Evaluation of Language Models
Authors:
Yosephine Susanto,
Adithya Venkatadri Hulagadri,
Jann Railey Montalan,
Jian Gang Ngui,
Xian Bin Yong,
Weiqi Leong,
Hamsawardhini Rengarajan,
Peerat Limkonchotiwat,
Yifan Mai,
William Chandra Tjhi
Abstract:
With the rapid emergence of novel capabilities in Large Language Models (LLMs), the need for rigorous multilingual and multicultural benchmarks that are integrated has become more pronounced. Though existing LLM benchmarks are capable of evaluating specific capabilities of LLMs in English as well as in various mid- to low-resource languages, including those in the Southeast Asian (SEA) region, a c…
▽ More
With the rapid emergence of novel capabilities in Large Language Models (LLMs), the need for rigorous multilingual and multicultural benchmarks that are integrated has become more pronounced. Though existing LLM benchmarks are capable of evaluating specific capabilities of LLMs in English as well as in various mid- to low-resource languages, including those in the Southeast Asian (SEA) region, a comprehensive and culturally representative evaluation suite for the SEA languages has not been developed thus far. Here, we present SEA-HELM, a holistic linguistic and cultural LLM evaluation suite that emphasises SEA languages, comprising five core pillars: (1) NLP Classics, (2) LLM-specifics, (3) SEA Linguistics, (4) SEA Culture, (5) Safety. SEA-HELM currently supports Filipino, Indonesian, Tamil, Thai, and Vietnamese. We also introduce the SEA-HELM leaderboard, which allows users to understand models' multilingual and multicultural performance in a systematic and user-friendly manner. We make the SEA-HELM evaluation code publicly available.
△ Less
Submitted 2 June, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
PRIME: Phase Reversed Interleaved Multi-Echo acquisition enables highly accelerated distortion-free diffusion MRI
Authors:
Yohan Jun,
Qiang Liu,
Ting Gong,
Jaejin Cho,
Shohei Fujita,
Xingwang Yong,
Susie Y Huang,
Lipeng Ning,
Anastasia Yendiki,
Yogesh Rathi,
Berkin Bilgic
Abstract:
Purpose: To develop and evaluate a new pulse sequence for highly accelerated distortion-free diffusion MRI (dMRI) by inserting an additional echo without prolonging TR, when generalized slice dithered enhanced resolution (gSlider) radiofrequency encoding is used for volumetric acquisition. Methods: A phase-reversed interleaved multi-echo acquisition (PRIME) was developed for rapid, high-resolution…
▽ More
Purpose: To develop and evaluate a new pulse sequence for highly accelerated distortion-free diffusion MRI (dMRI) by inserting an additional echo without prolonging TR, when generalized slice dithered enhanced resolution (gSlider) radiofrequency encoding is used for volumetric acquisition. Methods: A phase-reversed interleaved multi-echo acquisition (PRIME) was developed for rapid, high-resolution, and distortion-free dMRI, which includes two echoes where the first echo is for target diffusion-weighted imaging (DWI) acquisition with high-resolution and the second echo is acquired with either 1) lower-resolution for high-fidelity field map estimation, or 2) matching resolution to enable efficient diffusion relaxometry acquisitions. The sequence was evaluated on in vivo data acquired from healthy volunteers on clinical and Connectome 2.0 scanners. Results: In vivo experiments demonstrated that 1) high in-plane acceleration (Rin-plane of 5-fold with 2D partial Fourier) was achieved using the high-fidelity field maps estimated from the second echo, which was made at a lower resolution/acceleration to increase its SNR while matching the effective echo spacing of the first readout, 2) high-resolution diffusion relaxometry parameters were estimated from dual-echo PRIME data using a white matter model of multi-TE spherical mean technique (MTE-SMT), and 3) high-fidelity mesoscale DWI at 550 um isotropic resolution could be obtained in vivo by capitalizing on the high-performance gradients of the Connectome 2.0 scanner. Conclusion: The proposed PRIME sequence enabled highly accelerated, high-resolution, and distortion-free dMRI using an additional echo without prolonging scan time when gSlider encoding is utilized.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
MuseCL: Predicting Urban Socioeconomic Indicators via Multi-Semantic Contrastive Learning
Authors:
Xixian Yong,
Xiao Zhou
Abstract:
Predicting socioeconomic indicators within urban regions is crucial for fostering inclusivity, resilience, and sustainability in cities and human settlements. While pioneering studies have attempted to leverage multi-modal data for socioeconomic prediction, jointly exploring their underlying semantics remains a significant challenge. To address the gap, this paper introduces a Multi-Semantic Contr…
▽ More
Predicting socioeconomic indicators within urban regions is crucial for fostering inclusivity, resilience, and sustainability in cities and human settlements. While pioneering studies have attempted to leverage multi-modal data for socioeconomic prediction, jointly exploring their underlying semantics remains a significant challenge. To address the gap, this paper introduces a Multi-Semantic Contrastive Learning (MuseCL) framework for fine-grained urban region profiling and socioeconomic prediction. Within this framework, we initiate the process by constructing contrastive sample pairs for street view and remote sensing images, capitalizing on the similarities in human mobility and Point of Interest (POI) distribution to derive semantic features from the visual modality. Additionally, we extract semantic insights from POI texts embedded within these regions, employing a pre-trained text encoder. To merge the acquired visual and textual features, we devise an innovative cross-modality-based attentional fusion module, which leverages a contrastive mechanism for integration. Experimental results across multiple cities and indicators consistently highlight the superiority of MuseCL, demonstrating an average improvement of 10% in $R^2$ compared to various competitive baseline models. The code of this work is publicly available at https://github.com/XixianYong/MuseCL.
△ Less
Submitted 23 June, 2024;
originally announced July 2024.
-
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
Authors:
David Romero,
Chenyang Lyu,
Haryo Akbarianto Wibowo,
Teresa Lynn,
Injy Hamed,
Aditya Nanda Kishore,
Aishik Mandal,
Alina Dragonetti,
Artem Abzaliev,
Atnafu Lambebo Tonja,
Bontu Fufa Balcha,
Chenxi Whitehouse,
Christian Salamea,
Dan John Velasco,
David Ifeoluwa Adelani,
David Le Meur,
Emilio Villa-Cueva,
Fajri Koto,
Fauzan Farooqui,
Frederico Belcavello,
Ganzorig Batnasan,
Gisela Vallejo,
Grainne Caulfield,
Guido Ivetta,
Haiyue Song
, et al. (51 additional authors not shown)
Abstract:
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen…
▽ More
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 30 countries on four continents, covering 31 languages with 13 scripts, providing a total of 10k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.
△ Less
Submitted 4 November, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Dismai-Bench: Benchmarking and designing generative models using disordered materials and interfaces
Authors:
Adrian Xiao Bin Yong,
Tianyu Su,
Elif Ertekin
Abstract:
Generative models have received significant attention in recent years for materials science applications, particularly in the area of inverse design for materials discovery. However, these models are usually assessed based on newly generated, unverified materials, which provide a narrow evaluation of a model's performance. Also, current efforts for inorganic materials have predominantly focused on…
▽ More
Generative models have received significant attention in recent years for materials science applications, particularly in the area of inverse design for materials discovery. However, these models are usually assessed based on newly generated, unverified materials, which provide a narrow evaluation of a model's performance. Also, current efforts for inorganic materials have predominantly focused on small crystals, even though the capability to generate large disordered structures would significantly expand the applicability of generative modeling. In this work, we present the Disordered Materials & Interfaces Benchmark (Dismai-Bench), a generative model benchmark that uses datasets of disordered alloys, interfaces, and amorphous silicon (256-264 atoms per structure). Models are trained on each dataset independently, and evaluated through direct structural comparisons between training and generated structures. Benchmarking was performed on two graph diffusion models and two (coordinate-based) U-Net diffusion models. The graph models were found to significantly outperform the U-Net models due to the higher expressive power of graphs. While noise in the less expressive models can assist in discovering materials by facilitating exploration beyond the training distribution, these models face significant challenges when confronted with more complex structures. To further demonstrate the benefits of this benchmarking in the development process of a generative model, we considered the case of developing a point-cloud-based generative adversarial network (GAN) to generate low-energy disordered interfaces. We show that the best performing architecture, CryinGAN, outperforms the U-Net models, and is competitive against the graph models despite its lack of invariances and weaker expressive power. This work provides a new framework and insights to guide the development of future generative models.
△ Less
Submitted 13 July, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
NLCG-Net: A Model-Based Zero-Shot Learning Framework for Undersampled Quantitative MRI Reconstruction
Authors:
Xinrui Jiang,
Yohan Jun,
Jaejin Cho,
Mengze Gao,
Xingwang Yong,
Berkin Bilgic
Abstract:
Typical quantitative MRI (qMRI) methods estimate parameter maps after image reconstructing, which is prone to biases and error propagation. We propose a Nonlinear Conjugate Gradient (NLCG) optimizer for model-based T2/T1 estimation, which incorporates U-Net regularization trained in a scan-specific manner. This end-to-end method directly estimates qMRI maps from undersampled k-space data using mon…
▽ More
Typical quantitative MRI (qMRI) methods estimate parameter maps after image reconstructing, which is prone to biases and error propagation. We propose a Nonlinear Conjugate Gradient (NLCG) optimizer for model-based T2/T1 estimation, which incorporates U-Net regularization trained in a scan-specific manner. This end-to-end method directly estimates qMRI maps from undersampled k-space data using mono-exponential signal modeling with zero-shot scan-specific neural network regularization to enable high fidelity T1 and T2 mapping. T2 and T1 mapping results demonstrate the ability of the proposed NLCG-Net to improve estimation quality compared to subspace reconstruction at high accelerations.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Inclusion in Virtual Reality Technology: A Scoping Review
Authors:
Xiaofeng Yong,
Ali Arya
Abstract:
Despite the significant growth in virtual reality applications and research, the notion of inclusion in virtual reality is not well studied. Inclusion refers to the active involvement of different groups of people in the adoption, use, design, and development of VR technology and applications. In this review, we provide a scoping analysis of existing virtual reality research literature about inclu…
▽ More
Despite the significant growth in virtual reality applications and research, the notion of inclusion in virtual reality is not well studied. Inclusion refers to the active involvement of different groups of people in the adoption, use, design, and development of VR technology and applications. In this review, we provide a scoping analysis of existing virtual reality research literature about inclusion. We categorize the literature based on target group into ability, gender, and age, followed by those that study community-based design of VR experiences. In the latter group, we focus mainly on Indigenous Peoples as a clearer and more important example. We also briefly review the approaches to model and consider the role of users in technology adoption and design as a background for inclusion studies. We identify a series of generic barriers and research gaps and some specific ones for each group, resulting in suggested directions for future research.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
What Language Model to Train if You Have One Million GPU Hours?
Authors:
Teven Le Scao,
Thomas Wang,
Daniel Hesslow,
Lucile Saulnier,
Stas Bekman,
M Saiful Bari,
Stella Biderman,
Hady Elsahar,
Niklas Muennighoff,
Jason Phang,
Ofir Press,
Colin Raffel,
Victor Sanh,
Sheng Shen,
Lintang Sutawika,
Jaesung Tae,
Zheng Xin Yong,
Julien Launay,
Iz Beltagy
Abstract:
The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations can transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notabl…
▽ More
The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations can transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notably, it can be difficult to evaluate how modeling decisions may impact emergent capabilities, given that these capabilities arise mainly from sheer scale alone. In the process of building BLOOM--the Big Science Large Open-science Open-access Multilingual language model--our goal is to identify an architecture and training setup that makes the best use of our 1,000,000 A100-GPU-hours budget. Specifically, we perform an ablation study at the billion-parameter scale comparing different modeling practices and their impact on zero-shot generalization. In addition, we study the impact of various popular pre-training corpora on zero-shot generalization. We also study the performance of a multilingual model and how it compares to the English-only one. Finally, we consider the scaling behaviour of Transformers to choose the target model size, shape, and training setup. All our models and code are open-sourced at https://huggingface.co/bigscience .
△ Less
Submitted 7 November, 2022; v1 submitted 27 October, 2022;
originally announced October 2022.
-
An Exploration of Neural Radiance Field Scene Reconstruction: Synthetic, Real-world and Dynamic Scenes
Authors:
Benedict Quartey,
Tuluhan Akbulut,
Wasiwasi Mgonzo,
Zheng Xin Yong
Abstract:
This project presents an exploration into 3D scene reconstruction of synthetic and real-world scenes using Neural Radiance Field (NeRF) approaches. We primarily take advantage of the reduction in training and rendering time of neural graphic primitives multi-resolution hash encoding, to reconstruct static video game scenes and real-world scenes, comparing and observing reconstruction detail and li…
▽ More
This project presents an exploration into 3D scene reconstruction of synthetic and real-world scenes using Neural Radiance Field (NeRF) approaches. We primarily take advantage of the reduction in training and rendering time of neural graphic primitives multi-resolution hash encoding, to reconstruct static video game scenes and real-world scenes, comparing and observing reconstruction detail and limitations. Additionally, we explore dynamic scene reconstruction using Neural Radiance Fields for Dynamic Scenes(D-NeRF). Finally, we extend the implementation of D-NeRF, originally constrained to handle synthetic scenes to also handle real-world dynamic scenes.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Effect of local stress on accurate modeling of bacterial outer membranes using all-atom molecular dynamics
Authors:
Emad Pirhadi,
Juan M. Vanegas,
Mithila Farin,
Jeffrey W. Schertzer,
Xin Yong
Abstract:
Biological membranes are fundamental components of living organisms that play an undeniable role in their survival. Molecular dynamics (MD) serves as an essential computational tool for studying biomembranes on molecular and atomistic scales. The status quo of MD simulations of biomembranes studies a nanometer-sized membrane patch periodically extended under periodic boundary conditions (PBC). In…
▽ More
Biological membranes are fundamental components of living organisms that play an undeniable role in their survival. Molecular dynamics (MD) serves as an essential computational tool for studying biomembranes on molecular and atomistic scales. The status quo of MD simulations of biomembranes studies a nanometer-sized membrane patch periodically extended under periodic boundary conditions (PBC). In nature, membranes are usually composed of different lipids in their two layers (referred to as leaflets). This compositional asymmetry imposes a fixed ratio of lipid numbers between the two leaflets in a periodically constrained membrane, which needs to be set appropriately. The widely adopted methods of defining leaflet lipid ratio suffer from the lack of control over the mechanical tension of each leaflet, which could significantly influence research findings. In this study, we investigate the role of membrane-building protocol and the resulting initial stress state on the interaction between small molecules and asymmetric membranes. We model the outer membrane of Pseudomonas aeruginosa bacteria using two different building protocols and probe their interactions with the Pseudomonas Quinolone Signal (PQS). Our results show that differential stress could shift the position of free energy minimum for the PQS molecule between the two leaflets of the asymmetric membrane. This work provides critical insights into the relationship between the initial per-leaflet tension and the spontaneous intercalation of PQS.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Effects of Shape on Interaction Dynamics of Tetrahedral Nanoplastics and the Cell Membrane
Authors:
Xin Yong,
Ke Du
Abstract:
Cellular uptake of nanoplastics is instrumental in their environmental accumulation and transfer to humans through the food chain. Despite extensive studies using spherical plastic nanoparticles, the influence of the morphological characteristics of environmentally released nanoplastics is understudied. Using dissipative particle dynamics simulations, we modeled the interactions between a cell mem…
▽ More
Cellular uptake of nanoplastics is instrumental in their environmental accumulation and transfer to humans through the food chain. Despite extensive studies using spherical plastic nanoparticles, the influence of the morphological characteristics of environmentally released nanoplastics is understudied. Using dissipative particle dynamics simulations, we modeled the interactions between a cell membrane and hydrophobic nanotetrahedra, which feature high shape anisotropy and large surface curvature seen for environmental nanoplastics. We observe robust uptake of nanotetrahedra with sharp vertices and edges by the lipid membrane. Two local energy minimum configurations of nanotetrahedra embedded in the membrane bilayer were identified for particles of large sizes. Further analysis of particle dynamics within the membrane shows that the two interaction states exhibit distinct translational and rotational dynamics in the directions normal and parallel to the plane of the membrane. The membrane confinement significantly arrests the out-of-plane motion, resulting in caged translation and subdiffusive rotation. While the in-plane diffusion remains Brownian, we find that the translational and rotational modes decouple from each other as the particle size increases. The rotational diffusion decreases by a greater extent compared to the translational diffusion, deviating from the continuum theory predictions. These results provide fundamental insights into the shape effect on the nanoparticle dynamics in crowded lipid membranes.
△ Less
Submitted 27 August, 2023; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Bayesian autoencoders with uncertainty quantification: Towards trustworthy anomaly detection
Authors:
Bang Xiang Yong,
Alexandra Brintrup
Abstract:
Despite numerous studies of deep autoencoders (AEs) for unsupervised anomaly detection, AEs still lack a way to express uncertainty in their predictions, crucial for ensuring safe and trustworthy machine learning systems in high-stake applications. Therefore, in this work, the formulation of Bayesian autoencoders (BAEs) is adopted to quantify the total anomaly uncertainty, comprising epistemic and…
▽ More
Despite numerous studies of deep autoencoders (AEs) for unsupervised anomaly detection, AEs still lack a way to express uncertainty in their predictions, crucial for ensuring safe and trustworthy machine learning systems in high-stake applications. Therefore, in this work, the formulation of Bayesian autoencoders (BAEs) is adopted to quantify the total anomaly uncertainty, comprising epistemic and aleatoric uncertainties. To evaluate the quality of uncertainty, we consider the task of classifying anomalies with the additional option of rejecting predictions of high uncertainty. In addition, we use the accuracy-rejection curve and propose the weighted average accuracy as a performance metric. Our experiments demonstrate the effectiveness of the BAE and total anomaly uncertainty on a set of benchmark datasets and two real datasets for manufacturing: one for condition monitoring, the other for quality inspection.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Do autoencoders need a bottleneck for anomaly detection?
Authors:
Bang Xiang Yong,
Alexandra Brintrup
Abstract:
A common belief in designing deep autoencoders (AEs), a type of unsupervised neural network, is that a bottleneck is required to prevent learning the identity function. Learning the identity function renders the AEs useless for anomaly detection. In this work, we challenge this limiting belief and investigate the value of non-bottlenecked AEs.
The bottleneck can be removed in two ways: (1) overp…
▽ More
A common belief in designing deep autoencoders (AEs), a type of unsupervised neural network, is that a bottleneck is required to prevent learning the identity function. Learning the identity function renders the AEs useless for anomaly detection. In this work, we challenge this limiting belief and investigate the value of non-bottlenecked AEs.
The bottleneck can be removed in two ways: (1) overparameterising the latent layer, and (2) introducing skip connections. However, limited works have reported on the use of one of the ways. For the first time, we carry out extensive experiments covering various combinations of bottleneck removal schemes, types of AEs and datasets. In addition, we propose the infinitely-wide AEs as an extreme example of non-bottlenecked AEs.
Their improvement over the baseline implies learning the identity function is not trivial as previously assumed. Moreover, we find that non-bottlenecked architectures (highest AUROC=0.857) can outperform their bottlenecked counterparts (highest AUROC=0.696) on the popular task of CIFAR (inliers) vs SVHN (anomalies), among other tasks, shedding light on the potential of developing non-bottlenecked AEs for improving anomaly detection.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Coalitional Bayesian Autoencoders -- Towards explainable unsupervised deep learning
Authors:
Bang Xiang Yong,
Alexandra Brintrup
Abstract:
This paper aims to improve the explainability of Autoencoder's (AE) predictions by proposing two explanation methods based on the mean and epistemic uncertainty of log-likelihood estimate, which naturally arise from the probabilistic formulation of the AE called Bayesian Autoencoders (BAE). To quantitatively evaluate the performance of explanation methods, we test them in sensor network applicatio…
▽ More
This paper aims to improve the explainability of Autoencoder's (AE) predictions by proposing two explanation methods based on the mean and epistemic uncertainty of log-likelihood estimate, which naturally arise from the probabilistic formulation of the AE called Bayesian Autoencoders (BAE). To quantitatively evaluate the performance of explanation methods, we test them in sensor network applications, and propose three metrics based on covariate shift of sensors : (1) G-mean of Spearman drift coefficients, (2) G-mean of sensitivity-specificity of explanation ranking and (3) sensor explanation quality index (SEQI) which combines the two aforementioned metrics. Surprisingly, we find that explanations of BAE's predictions suffer from high correlation resulting in misleading explanations. To alleviate this, a "Coalitional BAE" is proposed, which is inspired by agent-based system theory. Our comprehensive experiments on publicly available condition monitoring datasets demonstrate the improved quality of explanations using the Coalitional BAE.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Multitask Prompted Training Enables Zero-Shot Task Generalization
Authors:
Victor Sanh,
Albert Webson,
Colin Raffel,
Stephen H. Bach,
Lintang Sutawika,
Zaid Alyafeai,
Antoine Chaffin,
Arnaud Stiegler,
Teven Le Scao,
Arun Raja,
Manan Dey,
M Saiful Bari,
Canwen Xu,
Urmish Thakker,
Shanya Sharma Sharma,
Eliza Szczechla,
Taewoon Kim,
Gunjan Chhablani,
Nihal Nayak,
Debajyoti Datta,
Jonathan Chang,
Mike Tian-Jian Jiang,
Han Wang,
Matteo Manica,
Sheng Shen
, et al. (16 additional authors not shown)
Abstract:
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale,…
▽ More
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely held-out tasks. We fine-tune a pretrained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-bench benchmark, outperforming models up to 6x its size. All trained models are available at https://github.com/bigscience-workshop/t-zero and all prompts are available at https://github.com/bigscience-workshop/promptsource.
△ Less
Submitted 17 March, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Bayesian Autoencoders: Analysing and Fixing the Bernoulli likelihood for Out-of-Distribution Detection
Authors:
Bang Xiang Yong,
Tim Pearce,
Alexandra Brintrup
Abstract:
After an autoencoder (AE) has learnt to reconstruct one dataset, it might be expected that the likelihood on an out-of-distribution (OOD) input would be low. This has been studied as an approach to detect OOD inputs. Recent work showed this intuitive approach can fail for the dataset pairs FashionMNIST vs MNIST. This paper suggests this is due to the use of Bernoulli likelihood and analyses why th…
▽ More
After an autoencoder (AE) has learnt to reconstruct one dataset, it might be expected that the likelihood on an out-of-distribution (OOD) input would be low. This has been studied as an approach to detect OOD inputs. Recent work showed this intuitive approach can fail for the dataset pairs FashionMNIST vs MNIST. This paper suggests this is due to the use of Bernoulli likelihood and analyses why this is the case, proposing two fixes: 1) Compute the uncertainty of likelihood estimate by using a Bayesian version of the AE. 2) Use alternative distributions to model the likelihood.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Multi Agent System for Machine Learning Under Uncertainty in Cyber Physical Manufacturing System
Authors:
Bang Xiang Yong,
Alexandra Brintrup
Abstract:
Recent advancements in predictive machine learning has led to its application in various use cases in manufacturing. Most research focused on maximising predictive accuracy without addressing the uncertainty associated with it. While accuracy is important, focusing primarily on it poses an overfitting danger, exposing manufacturers to risk, ultimately hindering the adoption of these techniques. In…
▽ More
Recent advancements in predictive machine learning has led to its application in various use cases in manufacturing. Most research focused on maximising predictive accuracy without addressing the uncertainty associated with it. While accuracy is important, focusing primarily on it poses an overfitting danger, exposing manufacturers to risk, ultimately hindering the adoption of these techniques. In this paper, we determine the sources of uncertainty in machine learning and establish the success criteria of a machine learning system to function well under uncertainty in a cyber-physical manufacturing system (CPMS) scenario. Then, we propose a multi-agent system architecture which leverages probabilistic machine learning as a means of achieving such criteria. We propose possible scenarios for which our proposed architecture is useful and discuss future work. Experimentally, we implement Bayesian Neural Networks for multi-tasks classification on a public dataset for the real-time condition monitoring of a hydraulic system and demonstrate the usefulness of the system by evaluating the probability of a prediction being accurate given its uncertainty. We deploy these models using our proposed agent-based framework and integrate web visualisation to demonstrate its real-time feasibility.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Bayesian Autoencoders for Drift Detection in Industrial Environments
Authors:
Bang Xiang Yong,
Yasmin Fathy,
Alexandra Brintrup
Abstract:
Autoencoders are unsupervised models which have been used for detecting anomalies in multi-sensor environments. A typical use includes training a predictive model with data from sensors operating under normal conditions and using the model to detect anomalies. Anomalies can come either from real changes in the environment (real drift) or from faulty sensory devices (virtual drift); however, the us…
▽ More
Autoencoders are unsupervised models which have been used for detecting anomalies in multi-sensor environments. A typical use includes training a predictive model with data from sensors operating under normal conditions and using the model to detect anomalies. Anomalies can come either from real changes in the environment (real drift) or from faulty sensory devices (virtual drift); however, the use of Autoencoders to distinguish between different anomalies has not yet been considered. To this end, we first propose the development of Bayesian Autoencoders to quantify epistemic and aleatoric uncertainties. We then test the Bayesian Autoencoder using a real-world industrial dataset for hydraulic condition monitoring. The system is injected with noise and drifts, and we have found the epistemic uncertainty to be less sensitive to sensor perturbations as compared to the reconstruction loss. By observing the reconstructed signals with the uncertainties, we gain interpretable insights, and these uncertainties offer a potential avenue for distinguishing real and virtual drifts.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Numerical and Theoretical Modeling of Droplet Impact on Spherical Surfaces
Authors:
Hussein Dalgamoni,
Xin Yong
Abstract:
Droplet impact on solid surfaces is a fluid phenomenon widely involved in additive manufacturing, heat management, and coating, in which the ability to exert control over the impact dynamics and duration is critical. While past studies have established a comprehensive understanding of the impact on flat substrates, what we know about the impact dynamics on curved solid surfaces is still limited. T…
▽ More
Droplet impact on solid surfaces is a fluid phenomenon widely involved in additive manufacturing, heat management, and coating, in which the ability to exert control over the impact dynamics and duration is critical. While past studies have established a comprehensive understanding of the impact on flat substrates, what we know about the impact dynamics on curved solid surfaces is still limited. This work aims to elucidate the physics of droplet impact on spherical surfaces with different Weber numbers ($We$), radii ($R_s$), and surface wettability ($θ^{eq}$) using a combination of axisymmetric lattice Boltzmann method (LBM) and theoretical analysis. The model developed in our previous work [H. Dalgamoni and X. Yong, Phys. Rev. E 98, 13102 (2018)] was extended and modified for simulating the normal impact of droplet on curved substrates in the low Weber number regime (i.e., $We \leq 15$), in which axisymmetric assumption of droplet deformation holds. The LBM simulations show that $We$, $R_s$, and $θ^{eq}$ significantly affect the spreading and recoiling of droplet during impact. The parametric studies uncover five outcomes of impact, which range from complete deposition to total rebound. A simulation-predicted phase diagram was constructed and correlated with the total time that the droplet was in contact with the solid. In addition, a theoretical model based on energy budget during impact was developed to predict the rebound threshold for impact on spherical targets when varying We, and independently, which agrees well with simulation observations. These findings provide fundamental insight into surface structure design for controlling droplet hydrodynamics and the contact time during impact.
△ Less
Submitted 12 May, 2021; v1 submitted 16 May, 2020;
originally announced May 2020.
-
Disease Knowledge Transfer across Neurodegenerative Diseases
Authors:
Razvan V. Marinescu,
Marco Lorenzi,
Stefano B. Blumberg,
Alexandra L. Young,
Pere P. Morell,
Neil P. Oxtoby,
Arman Eshaghi,
Keir X. Yong,
Sebastian J. Crutch,
Polina Golland,
Daniel C. Alexander
Abstract:
We introduce Disease Knowledge Transfer (DKT), a novel technique for transferring biomarker information between related neurodegenerative diseases. DKT infers robust multimodal biomarker trajectories in rare neurodegenerative diseases even when only limited, unimodal data is available, by transferring information from larger multimodal datasets from common neurodegenerative diseases. DKT is a join…
▽ More
We introduce Disease Knowledge Transfer (DKT), a novel technique for transferring biomarker information between related neurodegenerative diseases. DKT infers robust multimodal biomarker trajectories in rare neurodegenerative diseases even when only limited, unimodal data is available, by transferring information from larger multimodal datasets from common neurodegenerative diseases. DKT is a joint-disease generative model of biomarker progressions, which exploits biomarker relationships that are shared across diseases. Our proposed method allows, for the first time, the estimation of plausible, multimodal biomarker trajectories in Posterior Cortical Atrophy (PCA), a rare neurodegenerative disease where only unimodal MRI data is available. For this we train DKT on a combined dataset containing subjects with two distinct diseases and sizes of data available: 1) a larger, multimodal typical AD (tAD) dataset from the TADPOLE Challenge, and 2) a smaller unimodal Posterior Cortical Atrophy (PCA) dataset from the Dementia Research Centre (DRC), for which only a limited number of Magnetic Resonance Imaging (MRI) scans are available. Although validation is challenging due to lack of data in PCA, we validate DKT on synthetic data and two patient datasets (TADPOLE and PCA cohorts), showing it can estimate the ground truth parameters in the simulation and predict unseen biomarkers on the two patient datasets. While we demonstrated DKT on Alzheimer's variants, we note DKT is generalisable to other forms of related neurodegenerative diseases. Source code for DKT is available online: https://github.com/mrazvan22/dkt.
△ Less
Submitted 29 July, 2019; v1 submitted 11 January, 2019;
originally announced January 2019.
-
Unextendible Maximally Entangled Bases in $\mathbb{C}^{pd}\otimes \mathbb{C}^{qd}$
Authors:
Gui-Jun Zhang,
Yuan-Hong Tao,
Yi-Fan Han,
Xin-Lei Yong,
Shao-Ming Fei
Abstract:
The construction of unextendible maximally entangled bases is tightly related to quantum information processing like local state discrimination. We put forward two constructions of UMEBs in $\mathbb {C}^{pd}\otimes \mathbb {C}^{qd}$($p\leq q$) based on the constructions of UMEBs in $\mathbb {C}^{d}\otimes \mathbb {C}^{d}$ and in $\mathbb {C}^{p}\otimes \mathbb {C}^{q}$, which generalizes the resul…
▽ More
The construction of unextendible maximally entangled bases is tightly related to quantum information processing like local state discrimination. We put forward two constructions of UMEBs in $\mathbb {C}^{pd}\otimes \mathbb {C}^{qd}$($p\leq q$) based on the constructions of UMEBs in $\mathbb {C}^{d}\otimes \mathbb {C}^{d}$ and in $\mathbb {C}^{p}\otimes \mathbb {C}^{q}$, which generalizes the results in [Phys. Rev. A. 94, 052302 (2016)] by two approaches. Two different 48-member UMEBs in $\mathbb {C}^{6}\otimes \mathbb {C}^{9}$ have been constructed in detail.
△ Less
Submitted 23 October, 2018; v1 submitted 18 October, 2018;
originally announced October 2018.
-
Constructions of Unextendible Maximally Entangled Bases in \(\mathbb {C}^{d}\otimes \mathbb {C}^{d^{\prime}}\)
Authors:
Gui-Jun Zhang,
Yuan-Hong Tao,
Yi-Fan Han,
Xin-Lei Yong,
Shao-Ming Fei
Abstract:
We study unextendible maximally entangled bases (UMEBs) in \(\mathbb {C}^{d}\otimes \mathbb {C}^{d^{\prime}}\) ($d<d'$). An operational method to construct UMEBs containing $d(d^{\prime}-1)$ maximally entangled vectors is established, and two UMEBs in \(\mathbb {C}^{5}\otimes \mathbb {C}^{6}\) and \(\mathbb {C}^{5}\otimes \mathbb {C}^{12}\) are given as examples. Furthermore, a systematic way of c…
▽ More
We study unextendible maximally entangled bases (UMEBs) in \(\mathbb {C}^{d}\otimes \mathbb {C}^{d^{\prime}}\) ($d<d'$). An operational method to construct UMEBs containing $d(d^{\prime}-1)$ maximally entangled vectors is established, and two UMEBs in \(\mathbb {C}^{5}\otimes \mathbb {C}^{6}\) and \(\mathbb {C}^{5}\otimes \mathbb {C}^{12}\) are given as examples. Furthermore, a systematic way of constructing UMEBs containing $d(d^{\prime}-r)$ maximally entangled vectors in \(\mathbb {C}^{d}\otimes \mathbb {C}^{d^{\prime}}\) is presented for $r=1,2,\cdots, d-1$. Correspondingly, two UMEBs in \(\mathbb {C}^{3}\otimes \mathbb {C}^{10}\) are obtained.
△ Less
Submitted 21 February, 2018;
originally announced February 2018.
-
Emergence of Network Bifurcation Triggered by Entanglement
Authors:
Xi Yong,
Man-Hong Yung,
Xue-Ke Song,
Xun Gao,
Angsheng Li
Abstract:
In many non-linear systems, such as plasma oscillation, boson condensation, chemical reaction, and even predatory-prey oscillation, the coarse-grained dynamics are governed by an equation containing anti-symmetric transitions, known as the anti-symmetric Lotka-Volterra (ALV) equations. In this work, we prove the existence of a novel bifurcation mechanism for the ALV equations, where the equilibriu…
▽ More
In many non-linear systems, such as plasma oscillation, boson condensation, chemical reaction, and even predatory-prey oscillation, the coarse-grained dynamics are governed by an equation containing anti-symmetric transitions, known as the anti-symmetric Lotka-Volterra (ALV) equations. In this work, we prove the existence of a novel bifurcation mechanism for the ALV equations, where the equilibrium state can be drastically changed by flipping the stability of a pair of fixed points. As an application, we focus on the implications of the bifurcation mechanism for evolutionary networks; we found that the bifurcation point can be determined quantitatively by the microscopic quantum entanglement. The equilibrium state can be critically changed from one type of global demographic condensation to another state that supports global cooperation for homogeneous networks. In other words, our results indicate that there exist a class of many-body systems where the macroscopic properties are invariant with a certain amount of microscopic entanglement, but they can be changed abruptly once the entanglement exceeds a critical value. Furthermore, we provide numerical evidence showing that the emergence of bifurcation is robust against the change of the network topologies, and the critical values are in good agreement with our theoretical prediction. These results show that the bifurcation mechanism could be ubiquitous in many physical systems, in addition to evolutionary networks.
△ Less
Submitted 1 April, 2019; v1 submitted 12 September, 2016;
originally announced September 2016.