Search | arXiv e-print repository

doi 10.18653/v1/2024.nlp4science-1.9

Soft Measures for Extracting Causal Collective Intelligence

Authors: Maryam Berijanian, Spencer Dork, Kuldeep Singh, Michael Riley Millikan, Ashlin Riggs, Aadarsh Swaminathan, Sarah L. Gibbs, Scott E. Friedman, Nathan Brugnone

Abstract: Understanding and modeling collective intelligence is essential for addressing complex social systems. Directed graphs called fuzzy cognitive maps (FCMs) offer a powerful tool for encoding causal mental models, but extracting high-integrity FCMs from text is challenging. This study presents an approach using large language models (LLMs) to automate FCM extraction. We introduce novel graph-based si… ▽ More Understanding and modeling collective intelligence is essential for addressing complex social systems. Directed graphs called fuzzy cognitive maps (FCMs) offer a powerful tool for encoding causal mental models, but extracting high-integrity FCMs from text is challenging. This study presents an approach using large language models (LLMs) to automate FCM extraction. We introduce novel graph-based similarity measures and evaluate them by correlating their outputs with human judgments through the Elo rating system. Results show positive correlations with human evaluations, but even the best-performing measure exhibits limitations in capturing FCM nuances. Fine-tuning LLMs improves performance, but existing measures still fall short. This study highlights the need for soft similarity measures tailored to FCM extraction, advancing collective intelligence modeling with NLP. △ Less

Submitted 27 September, 2024; originally announced September 2024.

Comments: Camera-ready version accepted for publication in the EMNLP 2024 Workshop NLP4Science

arXiv:2408.16218 [pdf, other]

Large-Scale Targeted Cause Discovery with Data-Driven Learning

Authors: Jang-Hyun Kim, Claudia Skok Gibbs, Sangdoo Yun, Hyun Oh Song, Kyunghyun Cho

Abstract: We propose a novel machine learning approach for inferring causal variables of a target variable from observations. Our focus is on directly inferring a set of causal factors without requiring full causal graph reconstruction, which is computationally challenging in large-scale systems. The identified causal set consists of all potential regulators of the target variable under experimental setting… ▽ More We propose a novel machine learning approach for inferring causal variables of a target variable from observations. Our focus is on directly inferring a set of causal factors without requiring full causal graph reconstruction, which is computationally challenging in large-scale systems. The identified causal set consists of all potential regulators of the target variable under experimental settings, enabling efficient regulation when intervention costs and feasibility vary across variables. To achieve this, we train a neural network using supervised learning on simulated data to infer causality. By employing a local-inference strategy, our approach scales with linear complexity in the number of variables, efficiently scaling up to thousands of variables. Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks, outperforming existing methods that emphasize full-graph discovery. We validate our model's generalization capability across out-of-distribution graph structures and generating mechanisms, including gene regulatory networks of E. coli and the human K562 cell line. Implementation codes are available at https://github.com/snu-mllab/Targeted-Cause-Discovery. △ Less

Submitted 7 April, 2025; v1 submitted 28 August, 2024; originally announced August 2024.

Comments: v2: add intervention analysis

arXiv:2405.06694 [pdf, other]

SUTRA: Scalable Multilingual Language Model Architecture

Authors: Abhijit Bendale, Michael Sapienza, Steven Ripplinger, Simon Gibbs, Jaewon Lee, Pranav Mistry

Abstract: In this paper, we introduce SUTRA, multilingual Large Language Model architecture capable of understanding, reasoning, and generating text in over 50 languages. SUTRA's design uniquely decouples core conceptual understanding from language-specific processing, which facilitates scalable and efficient multilingual alignment and learning. Employing a Mixture of Experts framework both in language and… ▽ More In this paper, we introduce SUTRA, multilingual Large Language Model architecture capable of understanding, reasoning, and generating text in over 50 languages. SUTRA's design uniquely decouples core conceptual understanding from language-specific processing, which facilitates scalable and efficient multilingual alignment and learning. Employing a Mixture of Experts framework both in language and concept processing, SUTRA demonstrates both computational efficiency and responsiveness. Through extensive evaluations, SUTRA is demonstrated to surpass existing models like GPT-3.5, Llama2 by 20-30% on leading Massive Multitask Language Understanding (MMLU) benchmarks for multilingual tasks. SUTRA models are also online LLMs that can use knowledge from the internet to provide hallucination-free, factual and up-to-date responses while retaining their multilingual capabilities. Furthermore, we explore the broader implications of its architecture for the future of multilingual AI, highlighting its potential to democratize access to AI technology globally and to improve the equity and utility of AI in regions with predominantly non-English languages. Our findings suggest that SUTRA not only fills pivotal gaps in multilingual model capabilities but also establishes a new benchmark for operational efficiency and scalability in AI applications. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2302.11707 [pdf]

A Deep Neural Network Based Approach to Building Budget-Constrained Models for Big Data Analysis

Authors: Rui Ming, Haiping Xu, Shannon E. Gibbs, Donghui Yan, Ming Shao

Abstract: Deep learning approaches require collection of data on many different input features or variables for accurate model training and prediction. Since data collection on input features could be costly, it is crucial to reduce the cost by selecting a subset of features and developing a budget-constrained model (BCM). In this paper, we introduce an approach to eliminating less important features for bi… ▽ More Deep learning approaches require collection of data on many different input features or variables for accurate model training and prediction. Since data collection on input features could be costly, it is crucial to reduce the cost by selecting a subset of features and developing a budget-constrained model (BCM). In this paper, we introduce an approach to eliminating less important features for big data analysis using Deep Neural Networks (DNNs). Once a DNN model has been developed, we identify the weak links and weak neurons, and remove some input features to bring the model cost within a given budget. The experimental results show our approach is feasible and supports user selection of a suitable BCM within a given budget. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: 8 pages

arXiv:1810.03382 [pdf, other]

doi 10.1038/s42256-019-0019-2

Deep learning cardiac motion analysis for human survival prediction

Authors: Ghalib A. Bello, Timothy J. W. Dawes, Jinming Duan, Carlo Biffi, Antonio de Marvao, Luke S. G. E. Howard, J. Simon R. Gibbs, Martin R. Wilkins, Stuart A. Cook, Daniel Rueckert, Declan P. O'Regan

Abstract: Motion analysis is used in computer vision to understand the behaviour of moving objects in sequences of images. Optimising the interpretation of dynamic biological systems requires accurate and precise motion tracking as well as efficient representations of high-dimensional motion trajectories so that these can be used for prediction tasks. Here we use image sequences of the heart, acquired using… ▽ More Motion analysis is used in computer vision to understand the behaviour of moving objects in sequences of images. Optimising the interpretation of dynamic biological systems requires accurate and precise motion tracking as well as efficient representations of high-dimensional motion trajectories so that these can be used for prediction tasks. Here we use image sequences of the heart, acquired using cardiac magnetic resonance imaging, to create time-resolved three-dimensional segmentations using a fully convolutional network trained on anatomical shape priors. This dense motion model formed the input to a supervised denoising autoencoder (4Dsurvival), which is a hybrid network consisting of an autoencoder that learns a task-specific latent code representation trained on observed outcome data, yielding a latent representation optimised for survival prediction. To handle right-censored survival outcomes, our network used a Cox partial likelihood loss function. In a study of 302 patients the predictive accuracy (quantified by Harrell's C-index) was significantly higher (p < .0001) for our model C=0.73 (95$\%$ CI: 0.68 - 0.78) than the human benchmark of C=0.59 (95$\%$ CI: 0.53 - 0.65). This work demonstrates how a complex computer vision task using high-dimensional medical image data can efficiently predict human survival. △ Less

Submitted 8 October, 2018; originally announced October 2018.

Journal ref: Nature Machine Intelligence, 1, 95-104 (2019)

arXiv:1709.09510 [pdf, other]

doi 10.1615/AnnualRevHeatTransfer.2018019042

Thermophysical Phenomena in Metal Additive Manufacturing by Selective Laser Melting: Fundamentals, Modeling, Simulation and Experimentation

Authors: Christoph Meier, Ryan W. Penny, Yu Zou, Jonathan S. Gibbs, A. John Hart

Abstract: Among the many additive manufacturing (AM) processes for metallic materials, selective laser melting (SLM) is arguably the most versatile in terms of its potential to realize complex geometries along with tailored microstructure. However, the complexity of the SLM process, and the need for predictive relation of powder and process parameters to the part properties, demands further development of c… ▽ More Among the many additive manufacturing (AM) processes for metallic materials, selective laser melting (SLM) is arguably the most versatile in terms of its potential to realize complex geometries along with tailored microstructure. However, the complexity of the SLM process, and the need for predictive relation of powder and process parameters to the part properties, demands further development of computational and experimental methods. This review addresses the fundamental physical phenomena of SLM, with a special emphasis on the associated thermal behavior. Simulation and experimental methods are discussed according to three primary categories. First, macroscopic approaches aim to answer questions at the component level and consider for example the determination of residual stresses or dimensional distortion effects prevalent in SLM. Second, mesoscopic approaches focus on the detection of defects such as excessive surface roughness, residual porosity or inclusions that occur at the mesoscopic length scale of individual powder particles. Third, microscopic approaches investigate the metallurgical microstructure evolution resulting from the high temperature gradients and extreme heating and cooling rates induced by the SLM process. Consideration of physical phenomena on all of these three length scales is mandatory to establish the understanding needed to realize high part quality in many applications, and to fully exploit the potential of SLM and related metal AM processes. △ Less

Submitted 4 September, 2017; originally announced September 2017.

Showing 1–6 of 6 results for author: Gibbs, S