Coastal regions are extremely vulnerable to storms and tropical cyclones, which have caused significant economic costs and numerous fatalities. This underscores the urgent need for action to protect the sustainability and resilience of coastal communities. In this paper, we present an AI-driven geospatial analysis pipeline for automating coastal disaster assessment by detecting building damage from satellite imagery. Firstly, we propose an effective and scalable pipeline to train an artificial intelligence (AI) model for damaged building detection using a limited dataset. Specifically, we use Microsoft’s Building Footprint dataset as pretraining data, allowing our AI model to quickly adapt and learn the Puerto Rico landscape. Subsequently, we fine tune the model with a carefully engineered sequence using manually annotated data and self-annotated data. Upon training, we used our AI model to generate geospatial heatmaps of damaged building counts and damage ratio, which are useful to assess the storm damage and coastal vulnerability. Our approach placed us in top 5% in the public leaderboard, enabling us to be shortliested for the global semi-final rounds.
1. Objective: The goal of the challenge is to develop a machine learning model to identify and detect “damaged” and “un-damaged” coastal infrastructure (residential and commercial buildings), which have been impacted by natural calamities such as hurricanes, cyclones, etc. Participants will be given pre- and post-cyclone satellite images of a site impacted by Hurricane Maria in 2017 and build a machine learning model, designed to detect four different objects in a satellite image of a cyclone impacted area:
2. Mandatory Dataset:
3. Optional Dataset (that we used):
1. Dataset Collection: Manually annotating all four classes in the provided high-resolution satellite dataset from Maxar's GEO-1 mission, covering an area of 327 sq.km of San Juan, Puerto Rico, is a time-consuming task. With only one month for the competition duration, this task poses significant challenges in terms of time and energy allocation.
2. Class Imbalanced: The dataset contains four unique classes. However, our analysis indicates that damaged buildings are significantly underrepresented compared to undamaged ones. Moreover, residential buildings are more prevalent than commercial ones. This imbalance may introduce bias into the model, causing it to favor the majority class.
3. Out-of-Distribution data: We noticed that the competition’s validation dataset comprises only buildings from rural settings. However, the training dataset consists of a mixture of images from rural settings, industrial zones, and urban areas. Our empirical study reveals that mixing images from non-rural settings can have a severe impact on model learning.
Before delving into the proposed methodology, we introduce the key elements and assumptions as shown here:
Key Element | Description | |
---|---|---|
1 | Target Region | Puerto Rico |
2 | Object Detection Model | YOLOv8n |
3 | Microsoft BF dataset | Only the Puerto Rico region |
4 | Puerto Rico dataset | 5690 unique data |
5 | Non-experts | Annotators with limited expertise in the given task |
6 | Experts | Annotators with expertise in the given task |
7 | Crowdsourced dataset | Dataset annotated by non-experts (200 unique data) |
8 | Expert dataset | Datasets annotated by the experts (28 unique data) |
Our assumptions:
The goal of Phase 1 is to identify and detect “damaged” and “undamaged” coastal infrastructure, which is an object detection task. To tackle this challenge, our team has opted for Ultralytics YOLOv8, one of the state-of-the-art (SOTA) object detection models renowned for its speed and accuracy. Despite the availability of competitors like YOLOv9, we prefer Ultralytics YOLOv8 for its user-friendliness and well-documented workflows that streamline training and deployment. We choose the smallest YOLOv8 - YOLOv8n, since it is unwise to use larger model when dealing with limited dataset, as it may lead to overfitting. Given more time, we would explore other YOLOv8 version and other SOTA models when we have a bigger dataset. Meanwhile, our empirical study revealed that the main influencing factor on the detection accuracy is the quantity and quality of the annotated dataset. Hence, we argue that the main focus of the challenge should be data annotation. We provide details on how we built our training dataset in the next section.
We conducted a comprehensive series of experiments, submitting a total of 30 entries. Here are select highlights:
Setup | Pretraining | Crowdsourced Dataset | Expert Dataset | MLOps | mAP |
---|---|---|---|---|---|
A | ✓ | 0.10 | |||
B | ✓ | ✓ | 0.44 | ||
C | ✓ | ✓ | 0.39 | ||
D | ✓ | ✓ | ✓ | 0.50 | |
E | ✓ | ✓ | 0.24 | ||
F | ✓ | ✓ | ✓ | ✓ | 0.51 |
Further experiments were conducted, including:
1. Dataset quality is what you need: There are 2 observations from our study. Firstly, data quality is as important as data quantity. Secondly, having annotators with expertise in building damage assessment is crucial for producing the high-quality 'expert dataset.' On the contrary, non-experts tend to generate a lower quality dataset, which we refer to as a 'crowdsourced dataset.' However, a high-quality dataset tends to be smaller in size because it takes time to carefully annotate the data. Conversely, a high-quantity dataset tends to have lower quality due to a lack of expertise and attention. This mirrors a real-world scenario of a quality-quantity tradeoff. Fortunately, we found that we can combine the strengths of both datasets, as demonstrated in Setup D from our ablation study in Table I. This involves fine-tuning the pretrained model on the crowdsourced dataset, followed by fine-tuning on the expert dataset.
2. Start with a small model: We recommend starting with a smaller model. It is unwise to use a larger model when dealing with a limited dataset, as it may lead to overfitting. Our empirical study agrees with this hypothesis, as we failed to achieve a high mAP score using the bigger YOLOv8 version. Given more time, we would explore the bigger YOLOv8 version and other state-of-the-art (SOTA) models when we have a larger dataset.