Google says it has built one of the largest historical datasets ever assembled for flash floods by using its Gemini AI model to analyze global news reports.
The company unveils a new system dubbed Groundsource, a framework designed to convert unstructured news coverage of disasters into structured scientific data that can be used to train forecasting models and improve early-warning systems.
The first dataset generated through the system contains 2.6 million flash flood records spanning more than 150 countries and covering events from 2000 through the present.
According to Google, Groundsource was built to address a longstanding problem in disaster science: the lack of consistent historical data for fast-moving events like flash floods.
Traditional monitoring systems, such as satellite databases and international disaster registries, capture only a fraction of real-world events, particularly smaller or localized floods that do not trigger large-scale emergency alerts.
By contrast, Google’s system mines public reporting across global media sources and converts those reports into a structured archive.
The tech titan says Gemini handles the key step in the process, analyzing articles in more than 80 languages, extracting flood events described in news coverage and converting them into timestamped and geolocated entries.
Gemini performs several verification tasks during extraction, including identifying whether a report describes an actual flood event, resolving time references such as “last Tuesday,” and mapping specific locations like neighborhoods or streets into geographic coordinates.
Google says the resulting dataset dramatically expands the historical record available to researchers.
The 2.6 million-event archive dwarfs traditional systems such as the Global Disaster Alert and Coordination System (GDACS), which tracks roughly 10,000 major events.
In testing, the Groundsource pipeline successfully captured between 85% and 100% of severe flood events recorded by GDACS between 2020 and 2026. Manual reviews also showed that 60% of the extracted events were precisely accurate in both location and timing, while 82% were accurate enough to be useful for real-world analysis.
Google says the new data is already powering improvements to its flood forecasting systems.
Using the expanded historical dataset, the company says it can now generate near-global urban flash flood forecasts up to 24 hours before an event.
Those forecasts are being rolled out through Google’s Flood Hub platform, which provides public warnings and mapping tools for flood-prone regions.
The company says the same AI-driven methodology could eventually be used to build global historical datasets for other hazards that lack reliable records, including droughts, landslides and avalanches.
Disclaimer: Opinions expressed at CapitalAI Daily are not investment advice. Investors should do their own due diligence before making any decisions involving securities, cryptocurrencies, or digital assets. Your transfers and trades are at your own risk, and any losses you may incur are your responsibility. CapitalAI Daily does not recommend the buying or selling of any assets, nor is CapitalAI Daily an investment advisor. See our Editorial Standards and Terms of Use.

