-
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Authors:
NVIDIA,
:,
Alisson Azzolini,
Junjie Bai,
Hannah Brandon,
Jiaxin Cao,
Prithvijit Chattopadhyay,
Huayu Chen,
Jinju Chu,
Yin Cui,
Jenna Diamond,
Yifan Ding,
Liang Feng,
Francesco Ferroni,
Rama Govindaraju,
Jinwei Gu,
Siddharth Gururani,
Imad El Hanafi,
Zekun Hao,
Jacob Huffman,
Jingyi Jin,
Brendan Johnson,
Rizwan Khan,
George Kurian,
Elena Lantz
, et al. (29 additional authors not shown)
Abstract:
Physical AI systems need to perceive, understand, and perform complex actions in the physical world. In this paper, we present the Cosmos-Reason1 models that can understand the physical world and generate appropriate embodied decisions (e.g., next step action) in natural language through long chain-of-thought reasoning processes. We begin by defining key capabilities for Physical AI reasoning, wit…
▽ More
Physical AI systems need to perceive, understand, and perform complex actions in the physical world. In this paper, we present the Cosmos-Reason1 models that can understand the physical world and generate appropriate embodied decisions (e.g., next step action) in natural language through long chain-of-thought reasoning processes. We begin by defining key capabilities for Physical AI reasoning, with a focus on physical common sense and embodied reasoning. To represent physical common sense, we use a hierarchical ontology that captures fundamental knowledge about space, time, and physics. For embodied reasoning, we rely on a two-dimensional ontology that generalizes across different physical embodiments. Building on these capabilities, we develop two multimodal large language models, Cosmos-Reason1-7B and Cosmos-Reason1-56B. We curate data and train our models in two stages: Physical AI supervised fine-tuning (SFT) and Physical AI reinforcement learning (RL). To evaluate our models, we build comprehensive benchmarks for physical common sense and embodied reasoning according to our ontologies. Evaluation results show that Physical AI SFT and RL bring significant improvements. To facilitate the development of Physical AI, we make our code and pre-trained models available under the NVIDIA Open Model License at https://github.com/nvidia-cosmos/cosmos-reason1.
△ Less
Submitted 19 May, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
Construal Level and Cognitive Reflection in Newsvendor Games: Unveiling the Influence of Individual Heterogeneity on Decision-Making
Authors:
Kuldeep Singh,
Sumanth Cheemalapati,
George Kurian,
Prathamesh Muzumdar
Abstract:
During the last decade, scholars have studied the behavior of decision-making in newsvendor settings and have identified numerous behavior patterns for deviating from normative behavior. However, there is a dearth of research which have examined the influence of individual heterogeneity on decision-making in newsvendor settings. This study examines the level of construal (Abstract and concrete) us…
▽ More
During the last decade, scholars have studied the behavior of decision-making in newsvendor settings and have identified numerous behavior patterns for deviating from normative behavior. However, there is a dearth of research which have examined the influence of individual heterogeneity on decision-making in newsvendor settings. This study examines the level of construal (Abstract and concrete) using construal level theory (CLT) on performance in newsvendor games. In addition, this study measures the cognitive reflection of individuals using cognitive reflection test (CRT) ex-ante to analyze the true impact of how people construe a problem and its impact on their decision-making.
△ Less
Submitted 13 January, 2025;
originally announced February 2025.
-
The Dead Internet Theory: A Survey on Artificial Interactions and the Future of Social Media
Authors:
Prathamesh Muzumdar,
Sumanth Cheemalapati,
Srikanth Reddy RamiReddy,
Kuldeep Singh,
George Kurian,
Apoorva Muley
Abstract:
The Dead Internet Theory (DIT) suggests that much of today's internet, particularly social media, is dominated by non-human activity, AI-generated content, and corporate agendas, leading to a decline in authentic human interaction. This study explores the origins, core claims, and implications of DIT, emphasizing its relevance in the context of social media platforms. The theory emerged as a respo…
▽ More
The Dead Internet Theory (DIT) suggests that much of today's internet, particularly social media, is dominated by non-human activity, AI-generated content, and corporate agendas, leading to a decline in authentic human interaction. This study explores the origins, core claims, and implications of DIT, emphasizing its relevance in the context of social media platforms. The theory emerged as a response to the perceived homogenization of online spaces, highlighting issues like the proliferation of bots, algorithmically generated content, and the prioritization of engagement metrics over genuine user interaction. AI technologies play a central role in this phenomenon, as social media platforms increasingly use algorithms and machine learning to curate content, drive engagement, and maximize advertising revenue. While these tools enhance scalability and personalization, they also prioritize virality and consumption over authentic communication, contributing to the erosion of trust, the loss of content diversity, and a dehumanized internet experience. This study redefines DIT in the context of social media, proposing that the commodification of content consumption for revenue has taken precedence over meaningful human connectivity. By focusing on engagement metrics, platforms foster a sense of artificiality and disconnection, underscoring the need for human-centric approaches to revive authentic online interaction and community building.
△ Less
Submitted 6 January, 2025;
originally announced February 2025.
-
Determinants of Human Development Index (HDI): A Regression Analysis of Economic and Social Indicators
Authors:
Kuldeep Singh,
Sumanth Cheemalapati,
Srikanth Reddy RamiReddy,
George Kurian,
Prathamesh Muzumdar,
Apoorva Muley
Abstract:
This study aims to investigate the factors influencing the Human Development Index (HDI). Five variables-GDP per capita, health expenditure, education expenditure, infant mortality rate (per 1,000 live births), and average years of schooling-were analyzed to develop a regression model assessing their impact on HDI. The results indicate that GDP per capita, infant mortality rate, and average years…
▽ More
This study aims to investigate the factors influencing the Human Development Index (HDI). Five variables-GDP per capita, health expenditure, education expenditure, infant mortality rate (per 1,000 live births), and average years of schooling-were analyzed to develop a regression model assessing their impact on HDI. The results indicate that GDP per capita, infant mortality rate, and average years of schooling are significant predictors of HDI. Specifically, the study finds a positive relationship between GDP per capita and average years of schooling with HDI, while infant mortality rate is negatively associated with HDI.
△ Less
Submitted 6 January, 2025;
originally announced February 2025.
-
Scalable Machine Learning Training Infrastructure for Online Ads Recommendation and Auction Scoring Modeling at Google
Authors:
George Kurian,
Somayeh Sardashti,
Ryan Sims,
Felix Berger,
Gary Holt,
Yang Li,
Jeremiah Willcock,
Kaiyuan Wang,
Herve Quiroz,
Abdulrahman Salem,
Julian Grady
Abstract:
Large-scale Ads recommendation and auction scoring models at Google scale demand immense computational resources. While specialized hardware like TPUs have improved linear algebra computations, bottlenecks persist in large-scale systems. This paper proposes solutions for three critical challenges that must be addressed for efficient end-to-end execution in a widely used production infrastructure:…
▽ More
Large-scale Ads recommendation and auction scoring models at Google scale demand immense computational resources. While specialized hardware like TPUs have improved linear algebra computations, bottlenecks persist in large-scale systems. This paper proposes solutions for three critical challenges that must be addressed for efficient end-to-end execution in a widely used production infrastructure: (1) Input Generation and Ingestion Pipeline: Efficiently transforming raw features (e.g., "search query") into numerical inputs and streaming them to TPUs; (2) Large Embedding Tables: Optimizing conversion of sparse features into dense floating-point vectors for neural network consumption; (3) Interruptions and Error Handling: Minimizing resource wastage in large-scale shared datacenters. To tackle these challenges, we propose a shared input generation technique to reduce computational load of input generation by amortizing costs across many models. Furthermore, we propose partitioning, pipelining, and RPC (Remote Procedure Call) coalescing software techniques to optimize embedding operations. To maintain efficiency at scale, we describe novel preemption notice and training hold mechanisms that minimize resource wastage, and ensure prompt error resolution. These techniques have demonstrated significant improvement in Google production, achieving a 116% performance boost and an 18% reduction in training costs across representative models.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
Navigating the Docker Ecosystem: A Comprehensive Taxonomy and Survey
Authors:
Prathamesh Muzumdar,
Amol Bhosale,
Ganga Prasad Basyal,
George Kurian
Abstract:
The cloud computing landscape is rapidly expanding and growing in complexity. It has witnessed the emergence of Cloud Computing as a widely adopted model for efficiently processing large volumes of data by harnessing clusters of commodity computers. This evolution enables the handling of massive data through on-demand services, relying on numerous microservices with diverse dependencies. The techn…
▽ More
The cloud computing landscape is rapidly expanding and growing in complexity. It has witnessed the emergence of Cloud Computing as a widely adopted model for efficiently processing large volumes of data by harnessing clusters of commodity computers. This evolution enables the handling of massive data through on-demand services, relying on numerous microservices with diverse dependencies. The technology of containers ensures secure storage, allowing for largescale data processing with high scalability and portability. Container technology, particularly exemplified by Docker in the last decade, plays a pivotal role in this scenario. It empowers microservices to process data swiftly, enabling developers to dynamically scale these services in real-time. This paper initiates by establishing a comprehensive taxonomy for delineating container architecture. Focusing specifically on Docker containers, we scrutinize various existing container related literature. Through this taxonomy and survey, we not only discern similarities and disparities in the architectural approaches of Docker container technology but also pinpoint areas necessitating further research.
△ Less
Submitted 3 January, 2024;
originally announced March 2024.
-
A Latent Dirichlet Allocation (LDA) Semantic Text Analytics Approach to Explore Topical Features in Charity Crowdfunding Campaigns
Authors:
Prathamesh Muzumdar,
George Kurian,
Ganga Prasad Basyal
Abstract:
Crowdfunding in the realm of the Social Web has received substantial attention, with prior research examining various aspects of campaigns, including project objectives, durations, and influential project categories for successful fundraising. These factors are crucial for entrepreneurs seeking donor support. However, the terrain of charity crowdfunding within the Social Web remains relatively une…
▽ More
Crowdfunding in the realm of the Social Web has received substantial attention, with prior research examining various aspects of campaigns, including project objectives, durations, and influential project categories for successful fundraising. These factors are crucial for entrepreneurs seeking donor support. However, the terrain of charity crowdfunding within the Social Web remains relatively unexplored, lacking comprehension of the motivations driving donations that often lack concrete reciprocation. Distinct from conventional crowdfunding that offers tangible returns, charity crowdfunding relies on intangible rewards like tax advantages, recognition posts, or advisory roles. Such details are often embedded within campaign narratives, yet, the analysis of textual content in charity crowdfunding is limited. This study introduces an inventive text analytics framework, utilizing Latent Dirichlet Allocation (LDA) to extract latent themes from textual descriptions of charity campaigns. The study has explored four different themes, two each in campaign and incentive descriptions. Campaign description themes are focused on child and elderly health mainly the ones who are diagnosed with terminal diseases. Incentive description themes are based on tax benefits, certificates, and appreciation posts. These themes, combined with numerical parameters, predict campaign success. The study was successful in using Random Forest Classifier to predict success of the campaign using both thematic and numerical parameters. The study distinguishes thematic categories, particularly medical need-based charity and general causes, based on project and incentive descriptions. In conclusion, this research bridges the gap by showcasing topic modelling utility in uncharted charity crowdfunding domains.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Risk of AI in Healthcare: A Comprehensive Literature Review and Study Framework
Authors:
Apoorva Muley,
Prathamesh Muzumdar,
George Kurian,
Ganga Prasad Basyal
Abstract:
This study conducts a thorough examination of the research stream focusing on AI risks in healthcare, aiming to explore the distinct genres within this domain. A selection criterion was employed to carefully analyze 39 articles to identify three primary genres of AI risks prevalent in healthcare: clinical data risks, technical risks, and socio-ethical risks. Selection criteria was based on journal…
▽ More
This study conducts a thorough examination of the research stream focusing on AI risks in healthcare, aiming to explore the distinct genres within this domain. A selection criterion was employed to carefully analyze 39 articles to identify three primary genres of AI risks prevalent in healthcare: clinical data risks, technical risks, and socio-ethical risks. Selection criteria was based on journal ranking and impact factor. The research seeks to provide a valuable resource for future healthcare researchers, furnishing them with a comprehensive understanding of the complex challenges posed by AI implementation in healthcare settings. By categorizing and elucidating these genres, the study aims to facilitate the development of empirical qualitative and quantitative research, fostering evidence-based approaches to address AI-related risks in healthcare effectively. This endeavor contributes to building a robust knowledge base that can inform the formulation of risk mitigation strategies, ensuring safe and efficient integration of AI technologies in healthcare practices. Thus, it is important to study AI risks in healthcare to build better and efficient AI systems and mitigate risks.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Econometrics Modelling Approach to Examine the Effect of STEM Policy Changes on Asian Students Enrollment Decision in USA
Authors:
Prathamesh Muzumdar,
George Kurian,
Ganga Prasad Basyal,
Apoorva Muley
Abstract:
Academic research has shown significant interest in international student mobility, with previous literature primarily focusing on the migration industry from a political and public policy perspective. For many countries, international student mobility plays a crucial role in bolstering their economies through financial gains and attracting skilled immigrants. While previous studies have explored…
▽ More
Academic research has shown significant interest in international student mobility, with previous literature primarily focusing on the migration industry from a political and public policy perspective. For many countries, international student mobility plays a crucial role in bolstering their economies through financial gains and attracting skilled immigrants. While previous studies have explored the determinants of mobility and country economic policies, only a few have examined the impact of policy changes on mobility trends. In this study, the researchers investigate the influence of immigration policy changes, particularly the optional practical training (OPT) extension on STEM programs, on Asian students' preference for enrolling in STEM majors at universities. The study utilizes observational data and employs a quasi-experimental design, analysing the information using the difference-in-difference technique. The findings of the research indicate that the implementation of the STEM extension policy in 2008 has a significant effect on Asian students' decisions to enroll in a STEM major. Additionally, the study highlights the noteworthy role of individual factors such as the specific STEM major, terminal degree pursued, and gender in influencing Asian students' enrollment decisions.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
Authors:
Norman P. Jouppi,
George Kurian,
Sheng Li,
Peter Ma,
Rahul Nagarajan,
Lifeng Nai,
Nishant Patil,
Suvinay Subramanian,
Andy Swing,
Brian Towles,
Cliff Young,
Xiang Zhou,
Zongwei Zhou,
David Patterson
Abstract:
In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and perfo…
▽ More
In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance; users can pick a twisted 3D torus topology if desired. Much cheaper, lower power, and faster than Infiniband, OCSes and underlying optical components are <5% of system cost and <3% of system power. Each TPU v4 includes SparseCores, dataflow processors that accelerate models that rely on embeddings by 5x-7x yet use only 5% of die area and power. Deployed since 2020, TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. The TPU v4 supercomputer is 4x larger at 4096 chips and thus ~10x faster overall, which along with OCS flexibility helps large language models. For similar sized systems, it is ~4.3x-4.5x faster than the Graphcore IPU Bow and is 1.2x-1.7x faster and uses 1.3x-1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use ~3x less energy and produce ~20x less CO2e than contemporary DSAs in a typical on-premise data center.
△ Less
Submitted 20 April, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Single-beam room-temperature atomic magnetometer with large bandwidth and dynamic range
Authors:
K. K. George Kurian,
Sushree S. Sahoo,
P. K. Madhu,
G. Rajalakshmi
Abstract:
We present a single-beam atomic magnetometer operating at room temperature for the measurement of ac magnetic fields. The magnetometer functions in the non-linear regime of magneto-optical rotation of $^{85}$Rb atomic vapour. We demonstrate a sensitivity of $\sim 0.9$ pT$/ \sqrt{Hz}$ at 2 kHz and a large bandwidth of 24 kHz. The dynamic range of measurement is $10^6$, making the sensor effective e…
▽ More
We present a single-beam atomic magnetometer operating at room temperature for the measurement of ac magnetic fields. The magnetometer functions in the non-linear regime of magneto-optical rotation of $^{85}$Rb atomic vapour. We demonstrate a sensitivity of $\sim 0.9$ pT$/ \sqrt{Hz}$ at 2 kHz and a large bandwidth of 24 kHz. The dynamic range of measurement is $10^6$, making the sensor effective even in Earth's field. We present the signal-to-noise and bandwidth characteristics of the system for both shielded and unshielded modes of operation. Moreover, we perform theoretical analysis for the atom-light system for the single laser beam configuration. The effect of light intensity and detuning on the magnetometer are studied theoretically as well as experimentally to understand the strengths and limitations of the technique.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Empirical study to explore the influence of salesperson's customer orientation on customer loyalty
Authors:
Prathamesh Muzumdar,
George Kurian
Abstract:
This study tries to examine the influence of salesperson's customer orientation on customer loyalty. Customer orientation is the approach taken by a salesperson to improve customer relationship and increase sales. Many organizations prefer sales orientation as a strategic approach towards increasing sales. Though successful in its objective, sales orientation fails to attract repetitive purchase.…
▽ More
This study tries to examine the influence of salesperson's customer orientation on customer loyalty. Customer orientation is the approach taken by a salesperson to improve customer relationship and increase sales. Many organizations prefer sales orientation as a strategic approach towards increasing sales. Though successful in its objective, sales orientation fails to attract repetitive purchase. It has become a necessity to train frontline employees to better understand the customer needs, keeping in mind the firm's ultimate objective. This study examines the improvements customer orientation can bring to increase repurchases thus leading to customer loyalty. The findings suggest that product assortment, long lines of customers, customers' annual income, and the listening skills of salesperson were the significant antecedents of customer loyalty.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Authors:
Yonghui Wu,
Mike Schuster,
Zhifeng Chen,
Quoc V. Le,
Mohammad Norouzi,
Wolfgang Macherey,
Maxim Krikun,
Yuan Cao,
Qin Gao,
Klaus Macherey,
Jeff Klingner,
Apurva Shah,
Melvin Johnson,
Xiaobing Liu,
Ćukasz Kaiser,
Stephan Gouws,
Yoshikiyo Kato,
Taku Kudo,
Hideto Kazawa,
Keith Stevens,
George Kurian,
Nishant Patil,
Wei Wang,
Cliff Young,
Jason Smith
, et al. (6 additional authors not shown)
Abstract:
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NM…
▽ More
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.
△ Less
Submitted 8 October, 2016; v1 submitted 26 September, 2016;
originally announced September 2016.