Sentinel-2cloud maskingprocessingcompositing

Dealing with Clouds in Sentinel-2 Data: Masking Techniques That Work

Kazushi MotomuraJanuary 2, 20265 min read
Dealing with Clouds in Sentinel-2 Data: Masking Techniques That Work

Quick Answer: Clouds are the biggest practical limitation of optical satellite data. Sentinel-2 Level-2A includes a Scene Classification Map (SCL) that classifies each pixel as cloud, cloud shadow, vegetation, water, etc. Mask pixels with SCL values 3, 8, 9, 10 to remove clouds and shadows. For persistent cloud cover, build temporal composites using the median or best-pixel approach across multiple dates. In tropical regions, expect 60-80% cloud cover — plan for composite windows of 2-3 months.

I once spent a week processing Sentinel-2 data over Borneo, only to realize that every single acquisition in my three-month window had more than 70% cloud cover. Welcome to tropical remote sensing.

Clouds are not a minor inconvenience in optical satellite work — they're the primary limiting factor. Globally, about 67% of Earth's surface is covered by clouds at any given time. In equatorial regions, the figure exceeds 80% during monsoon seasons.

The SCL Approach

Sentinel-2 Level-2A products include the Scene Classification Map (SCL), generated by the Sen2Cor atmospheric correction processor. It classifies every pixel into one of 12 categories:

SCL ValueClassAction
0No dataExclude
1Saturated/defectiveExclude
2Dark area pixelsKeep (with caution)
3Cloud shadowsMask
4VegetationKeep
5Bare soilsKeep
6WaterKeep
7Cloud low probabilityKeep or mask
8Cloud medium probabilityMask
9Cloud high probabilityMask
10Thin cirrusMask
11Snow/iceContext-dependent

The conservative approach: mask everything with SCL values 0, 1, 3, 8, 9, and 10. This removes definite clouds, cloud shadows, thin cirrus, and defective pixels.

The aggressive approach: also mask SCL 7 (cloud low probability) and 2 (dark area pixels, which sometimes indicate undetected cloud shadow). This removes more potential contamination but also throws away more valid data.

SCL Limitations

The SCL isn't perfect. I've encountered several recurring issues:

Commission errors over bright surfaces: White sand beaches, salt flats, and limestone exposures are sometimes classified as clouds. If your study area includes bright surfaces, verify the SCL against the actual imagery.

Missed thin cirrus: Thin, semi-transparent cirrus clouds can pass through the SCL filter, especially at the "low probability" threshold. These contaminate reflectance values without being obvious in the classification.

Cloud shadow misplacement: Shadow detection depends on estimating cloud height and solar geometry. The shadow mask can be displaced by several hundred meters, missing the actual shadow while masking valid pixels nearby.

Snow/cloud confusion: In mountainous areas during winter, distinguishing fresh snow from clouds is genuinely difficult. The SWIR-based discrimination helps (snow absorbs SWIR; clouds don't), but it's not foolproof.

Building Cloud-Free Composites

When single acquisitions are too cloudy, compositing aggregates multiple dates to fill gaps.

Median Composite

Take all valid (non-clouded) pixels from a time window and compute the median value for each pixel. The median is preferred over the mean because it's resistant to outliers — a partially unmasked cloud or shadow won't corrupt the result as badly.

Window selection matters: Too short and you don't have enough clear observations. Too long and genuine surface changes (crop growth, seasonal vegetation shifts) contaminate the composite. Rules of thumb:

  • Temperate regions, summer: 1-month window usually sufficient
  • Temperate regions, winter: 2-3 months (more clouds, less frequent clear sky)
  • Tropical regions: 3-6 months for dry season composite; may not be possible during wet season
  • Arid regions: 2-week window often sufficient

Best-Pixel Composite

Instead of the median, select the single "best" observation for each pixel — usually the one with the highest NDVI (for vegetation) or the lowest blue-band reflectance (as a proxy for least atmospheric contamination).

Best-pixel composites preserve more spectral fidelity than median composites because they use actual observations rather than statistical summaries. The downside is that adjacent pixels may come from different dates, creating spatial inconsistencies in rapidly changing landscapes.

Harmonic Fitting

For annual monitoring, fitting a harmonic function (sine/cosine curves) to the time series of valid observations provides a modeled estimate for any date. This handles irregular observation gaps elegantly and produces smooth, continuous time series. It's more complex to implement but produces cleaner results for phenology tracking.

Practical Workflow

Here's the workflow I use most frequently:

  1. Download Level-2A data for your area and time window
  2. Apply SCL mask: Remove pixels with SCL values 0, 1, 3, 8, 9, 10
  3. Check coverage: Calculate the percentage of valid pixels per scene. Discard scenes with less than 20% valid coverage — they contribute noise without useful data
  4. Composite: If single-date coverage is insufficient, build a median composite from remaining valid observations
  5. Visual QC: Always inspect the result. Zoom to areas where you suspect residual cloud contamination. Compare against the input scenes.

The Cloud Cover Metadata Trap

Sentinel-2 metadata includes a scene-level cloud cover percentage. It's tempting to use this to filter — "give me all scenes with less than 20% clouds." But this number represents the entire scene, not your specific area of interest.

A scene might have 15% overall cloud cover, but if those clouds sit directly over your study area, the image is useless for your purpose. Conversely, a 60% cloud cover scene might be perfectly clear over your region.

Always check cloud cover spatially, not just numerically.

When Clouds Win

Sometimes there's no optical solution. Monsoon-season data in Southeast Asia, wet-season data in Central Africa, persistent stratus over coastal deserts — these situations defeat any amount of temporal compositing.

That's when SAR data becomes essential. Sentinel-1's C-band radar penetrates clouds completely. You lose the spectral information of Sentinel-2, but you gain reliable, all-weather observations. Many operational monitoring systems — flood mapping, deforestation detection in the tropics — rely on SAR precisely because clouds make optical monitoring unreliable.

The most robust monitoring systems fuse both: optical data when available (for its spectral richness) and SAR data when clouds prevent optical observations (for its all-weather reliability). This complementary approach acknowledges what no amount of cloud masking can fix — sometimes the sky simply isn't cooperating.

Kazushi Motomura

Kazushi Motomura

Remote sensing specialist with 10+ years in satellite data processing. Founder of Off-Nadir Lab. Master's in Satellite Oceanography (Kyushu University).