Data Commons
Authors:
Ramanathan V. Guha,
Prashanth Radhakrishnan,
Bo Xu,
Wei Sun,
Carolyn Au,
Ajai Tirumali,
Muhammad J. Amjad,
Samantha Piekos,
Natalie Diaz,
Jennifer Chen,
Julia Wu,
Prem Ramaswami,
James Manyika
Abstract:
Publicly available data from open sources (e.g., United States Census Bureau (Census), World Health Organization (WHO), Intergovernmental Panel on Climate Change (IPCC)) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This da…
▽ More
Publicly available data from open sources (e.g., United States Census Bureau (Census), World Health Organization (WHO), Intergovernmental Panel on Climate Change (IPCC)) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This data wrangling is time consuming, tedious and needs to be repeated by every user of the data. Our goal with Data Commons (DC) is to help make public data accessible and useful to those who want to understand this data and use it to solve societal challenges and opportunities. We do the data processing and make the processed data widely available via standard schemas and Cloud APIs. Data Commons is a distributed network of sites that publish data in a common schema and interoperate using the Data Commons APIs. Data from different Data Commons can be joined easily. The aggregate of these Data Commons can be viewed as a single Knowledge Graph. This Knowledge Graph can then be searched over using Natural Language questions utilizing advances in Large Language Models. This paper describes the architecture of Data Commons, some of the major deployments and highlights directions for future work.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
ML for Flood Forecasting at Scale
Authors:
Sella Nevo,
Vova Anisimov,
Gal Elidan,
Ran El-Yaniv,
Pete Giencke,
Yotam Gigi,
Avinatan Hassidim,
Zach Moshe,
Mor Schlesinger,
Guy Shalev,
Ajai Tirumali,
Ami Wiesel,
Oleg Zlydenko,
Yossi Matias
Abstract:
Effective riverine flood forecasting at scale is hindered by a multitude of factors, most notably the need to rely on human calibration in current methodology, the limited amount of data for a specific location, and the computational difficulty of building continent/global level models that are sufficiently accurate. Machine learning (ML) is primed to be useful in this scenario: learned models oft…
▽ More
Effective riverine flood forecasting at scale is hindered by a multitude of factors, most notably the need to rely on human calibration in current methodology, the limited amount of data for a specific location, and the computational difficulty of building continent/global level models that are sufficiently accurate. Machine learning (ML) is primed to be useful in this scenario: learned models often surpass human experts in complex high-dimensional scenarios, and the framework of transfer or multitask learning is an appealing solution for leveraging local signals to achieve improved global performance. We propose to build on these strengths and develop ML systems for timely and accurate riverine flood prediction.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.