Scaling Workflows#

We demonstrate how GitHub Actions can be used for scaling computationally expensive workflows through a use case aiming to measure glacier surface velocity from satellite imagery.

  • how to perform batch computing by running many workflows in parallel

  • how to build complex pipelines by calling workflows from another workflow

  • how to specify paramers to run a workflow

Measuring Glacier Surface Velocity#

Quinn Brencher, University of Washington

This set of Github Actions workflows allows you to measure horizontal glacier surface velocity from Sentinel-2 image pairs using autoRIFT software. No external accounts or API keys are required. These workflows were created for the Github Actions for Scientific Data Workflows workshop at the 2024 SciPy conference.

Usage#

We use three workflows to batch process image pairs for glacier surface velocity. For demonstration purposes the workflows are only set up to work over the Yazghil Glacier in Pakistan. To run the workflows, simply fork this repository, visit the “Actions” tab, and choose the batch_image_correlation workflow (which runs the other two workflows as well).

plot

1. image_correlation_pair#

This workflow calls a Python script (image_correlation.py) that runs autoRIFT on a pair of spatially overlapping Sentinel-2 L2A images. It requires the product names of the two images. The images are downloaded from aws using the Element 84 Earth Search API. Only the near infrared band (NIR, B08) is used which has a spatial resolution of 10 m. autoRIFT is used to perform image correlation. Search distances are scaled with temporal baseline assuming a maximum surface velocity of 1000 m/yr, so images acquired farther apart in time take longer to process. Surface velocity maps are saved as geotifs and uploaded as Github Artifacts.

plot

2. batch_image_correlation#

This workflow can be used to create surface velocity maps from many pairs of Sentinel-2 images. Required inputs include maximum cloud cover percent, start month (recommend >=5 to minimize snow cover), end month (recommend <=10 to minimize snow cover), and number of pairs per image, e.g.:

  • 1 pair per image: (imgi, imgi+1), (imgi+1, imgi+2), (imgi+2, imgi+3), …

  • 2 pairs per image: (imgi, imgi+1), (imgi, imgi+2), (imgi+1, imgi+2), …

  • 3 pairs per image: (imgi, imgi+1), (imgi, imgi+2), (imgi, imgi+3), …

Only the first suitable image is selected for each month. Once image pairs are identified, a matrix job is set up to run image_correlation_pair for each pair. Finally, summary_statistics is run.

3. summary_statistics#

This workflow downloads all of the velocity maps created during a batch_image_correlation run and uses them to calculate and plot median velocity, standard deviation of velocity, and valid pixel count across all velocity maps. The summary statistics plot is uploaded as a Github Artifact.

plot

Acknowledgements#