AI and flood modeling in the real world

Aerial footage of a floodplain with binary code overlaid
Articles 28.05.2025

AI has the potential to be transformative in applications from flood forecasting to water resource management. It is not, however, a silver bullet. Dr Chris Lucas, Fathom’s Principal Machine Learning Engineer, talks about the possibilities and limitations of AI and machine learning, and how Fathom uses tried-and-tested methods in new ways to push the boundaries of flood modeling.

The urgent problem of flooding

Every year, extreme flooding destroys and disrupts lives and communities around the world and costs billions of dollars in economic losses. As the climate evolves and populations grow and become more urbanized, the impact of these events will only become more severe; in the US, exposure to flood is expected to double in the next century. 

No surprise then that the race is on to advance new technologies to drive more accurate flood forecasting and estimation, and help communities be better prepared and more resilient. Artificial intelligence (and particularly its subset, machine learning) is one of those technologies, with huge potential to transform the way we model flooding,

AI and machine learning’s role

The deep learning models around today are incredibly complex neural networks (NN) , involving many layers of computation. Recent advances in AI technology have been possible thanks to advances in computational power and in the quality and sheer amount of data that can be used to ‘train’ the models. 

Scientists have already made great strides in applying powerful AI and machine learning (ML) models to help reduce the impact of flood risk to communities, from estimating flood defense standards to forecasting events in data-scarce areas. 

Google Flood Hub is one example; its ML-powered models forecast floods up to seven days in advance using deep learning and public data sources. Amazon has followed suit, with ML projects in Spain that include an early flood-warning system and an initiative to help local farmers maximize crop yields while reducing their water footprint. 

But before we get on to how ML can help advance flood modeling, let’s explore what it is and how it works. 

Machine learning demystified: ML vs traditional programming

A good place to start when understanding machine learning is how it differs from traditional programming. 

Traditional programming involves writing a set of explicit instructions (code) that the computer follows to perform a task. We input the data and the model makes complex, physics-based calculations to arrive at its output e.g. flood maps, showing the location, depth and extent of inundation. 

In machine learning, the process is flipped: we supply the computer with input data and the desired output data. The model then automatically learns the rules or patterns that connect them and repeatedly adjusts the internal model parameters so that it improves its ability to make predictions or decisions. 

Neurons, layers and learning

Modern ML models are complicated artificial neural networks (ANNs), inspired by the human brain. They are designed to interpret data, which can more or less be anything that can be converted to numbers – audio, text, satellite images, videos and so on.

These ML systems are made up of layers of “neurons” (tiny processing units). Each neuron takes some input, does some complicated mathematics and passes the result along to the next layer.

To understand one of these systems, let’s consider a specific task: finding flood defenses from satellite images.

Layers

  • The input layer receives raw data (e.g. pixels from a satellite image).
  • The hidden layers do the processing. This is where the network learns to uncover the complex patterns in the data.
  • The output layer gives the final result (e.g. “this is a flood defense”).

Learning

In the case of “supervised” learning the process begins with training the model on labeled datasets, with the model learning to associate specific visual features (e.g. presence of water, buildings or surrounding topography) with some target (e.g. the location of a known flood defense). The network will learn by adjusting the strength of connections between neurons (called “weights”). 

Crucially, we measure how close to the correct answer the model gets (this is called the “loss function” – essentially whether the model correctly identified a flood defense) and then uses that information to adjust all of the (thousands or even millions of) internal parameters, so that over time it iteratively gets closer to the correct answer. 

Output

If the model learns to solve the task well, it should be able to tell us the locations of flood defenses in images. In other cases these kinds of models can be used to improve data. With our FathomDEM (digital elevation model, see below), for example, we used ML to spot and fix errors in elevation data. 

The power and limitations of machine learning

AI’s power stems from its ability to recognize incredibly complex patterns in massive datasets, particularly useful in domains where we can’t simply write down the answer (e.g. correcting DEMs) or where traditional computation power becomes a limiting factor (e.g. climate simulations).

However, much of the processing is carried out in the model’s hidden layers, which means it functions like a “black box”. When we input the data and tell the model the output we want we do not tell it how to reach that output. And because we don’t know how or why it arrives at the result, we cannot assume it is recognizing the correct patterns and that therefore the output is correct.

Most common deep learning models, by their nature, do not make their calculations using known physical equations, for example the conservation and hydraulics equations that govern hydrodynamic processes. This lack of inherent physical understanding can lead to results that are not physically plausible. 

In order to harness the enormous potential of ML, it’s crucial to apply strict guardrails and principles. At Fathom we apply a set of rules.

Fathom’s guardrails and principles

  • We do not use machine learning for everything. Models are targeted for well-understood problems – for example, the lack of accurate global terrain data (see machine learning in action, below). 
  • We combine the strengths of ML with the rigor of physics and human expertise. We do not replace one with the other. 
  • Our models must be useful for real-world application. 
  • We don’t assume the output is correct – we always apply rigorous evaluation using our in-house domain knowledge in flood, terrain and climate modeling.

Machine learning in action

Isolating the problem of terrain data for FABDEM

Modeling floods on a global scale is notoriously difficult. This is largely because,  while we have plenty of high-quality local data, there are huge areas of the world where data is poor-quality or non-existent. 

A good example of this is terrain modeling. Terrain data is crucial to understanding how and where water flows. While there is high-quality terrain data (usually LiDAR) in some parts of the world, in many others the data that is available is much more coarse and full of systematic errors. 

At Fathom, we isolated this well-understood problem of representing terrain on a global scale and used an ML regression model to improve the data that was available and create our cutting-edge, ‘bare earth’ digital elevation model, FABDEM. This was the first global elevation model with buildings and forests removed at 1 arcsecond (∼30 m) grid spacing.

The next generation: FathomDEM

After successfully applying tried-and-tested ML techniques with FABDEM, we applied them to Fathom’s next-generation digital elevation model, FathomDEM

FathomDEM is a global map of Earth’s surface heights, created using an advanced hybrid AI model that combines two powerful techniques: vision transformers and convolutional neural networks (CNNs). The model is used to identify and fix biases in the existing COPDEM (Copernicus Digital Elevation Model) data by understanding patterns in the surrounding landscape.

To do this, it was set up within a UNet framework, where it analyzed each pixel and predicted adjustments to improve the accuracy. The model was trained using high-quality LiDAR data from diverse regions around the world, refining it step by step and gradually learning to correct elevation errors. 

Once trained, it was used to improve elevation data globally, including in places without reference data. Finally, the individual predictions were blended together to create a seamless, more accurate global elevation map.

FathomDEM has been evaluated by an independent team of researchers and benchmarked against six global DEMs in a study published in December 2025, which found that “FathomDEM consistently performs best”.

Applied machine learning in FathomDEM+

In 2026, Fathom launched the next-generation global terrain dataset designed to provide a single, uniform and independently benchmarked¹ view of terrain worldwide – FathomDEM+. This dataset resolves the legacy issue of inconsistent elevation data, enabling businesses to work from a single, consistent view of terrain.

We combined the bespoke machine-learning techniques from FathomDEM with the world’s largest curated collection of LiDAR and other high-resolution datasets spanning over 10 million km 2. Meaning that FathomDEM+ can deliver near-LiDAR-quality elevation data globally. Where LiDAR is available, it is incorporated directly; elsewhere, the base DEM (FathomDEM) uses machine learning trained on LiDAR terrain data to correct elevation bias and removes surface artifacts, improving consistency across regions.

Learn more about our Global Terrain Data – FathomDEM+ >

How does FathomDEM differ from the FABDEM approach?

In simple terms, FABDEM was created by correcting one pixel at a time using a method called random forest regression. It used various pieces of information (predictors) along with accurate LiDAR data to guide the corrections. While this worked well, it sometimes missed the bigger picture, because it wasn’t able to easily consider the surrounding context of its predictions. 

FathomDEM, on the other hand, takes the whole landscape into account. It uses 2D spatial information to look at how nearby areas are related when deciding how to make corrections and is also able to make its predictions spatially. This helps it improve an entire region in a more consistent way. 

What’s next for machine learning?

AI and machine learning, as in all fields of technology, is constantly developing and advancing. At the same time we are seeing more availability of extremely rich datasets, which can be used to both train AI models and to evaluate them. 

These include those generated by the recent NASA-led SWOT (Surface Water Ocean Topography) satellite mission, which provide an incredibly detailed, nuanced view of all of the Earth’s water systems, whether rivers, reservoirs, oceans or lakes. 

You can learn more about the SWOT mission and Fathom’s involvement here.

These exciting advancements could revolutionize flood science, especially the field of hydrology, and we could see machine learning vastly improve the way we model flooding, for the benefit of communities and businesses around the world. 

introduction to machine learning webinar

Want to take a deeper dive into machine learning?

Check out our 3-part webinar series on machine learning and its application in flood modeling in partnership with the Chartered Institute of Water and Environmental Management.

Related content

Read more
Event

Machine learning series pt 1. An introduction to machine learning

Read more
Event

Machine learning series pt. 2. The future of terrain data

Read more
Event

Machine learning series pt. 3. Unlocking the power of global flood maps

Read more
Event

Research overview:  FathomDEM – A new benchmark for digital elevation models