— 3 min read
TODO: pricing section. Graph of costs per request usage. Mapbox free until 750,000 requests. https://www.mapbox.com/pricing/#tile
Serving aerial imagery map tiles on the fly with AWS Lambda and Cloud-Optimized GeoTIFF.
Aerial imagery is huge, and generating and storing map tiles is non-trivial.
For a rough bounding box of the U.S. we can use mercantile
to
count the number of tiles at a mid-zoom.
> echo "[-126.71,24.49,-66.59,49.48]" | mercantile tiles 10 | wc -l
15824
That's 15,000 tiles at zoom 10. At each additional zoom each tile is split into 4. So at zoom 16, which displays the NAIP tiles in full resolution (at retina @2x resolution), the number of tiles is
15,000×416−10≈65,000,000That's a lot of tiles. From previous exploration of tiling
NAIP imagery, I found that each retina tile is around 500KB uncompressed or
100KB when compressed with pngquant
. Thus even compressed NAIP tiles for the
contiguous U.S. would take up 6.5 TB of space, and that's just for zoom 16. Zoom
15 would take another ~1.5 TB of space, and so on.
NAIP data and Landsat images are hosted on AWS open datasets. The former is in requester pays buckets; the latter is free for anyone to download. Crucially, both are stored in Cloud-Optimized GeoTIFF (COG) format. This format uses HTTP range requests to download a small portion of a larger COG file. This means it's feasible to quickly work with portions of large images.
Install cogeo-mosaic
, which takes .tif
files as input and creates a
MosaicJSON. cogeo-mosaic
depends on pygeos
, which I've been unable to
install through pip, so I first install that through Conda.
conda install pygeos -c conda-forge -y
pip install cogeo-mosaic
First download the manifest.txt
. This file contains a listing of all files
stored in the naip-visualization
S3 bucket.
> aws s3 cp s3://naip-visualization/manifest.txt ./ --request-payer
> head -n 5 manifest.txt
al/2011/100cm/fgdc/30085/m_3008501_ne_16_1_20110815.txt
al/2011/100cm/fgdc/30085/m_3008501_nw_16_1_20110815.txt
al/2011/100cm/fgdc/30085/m_3008502_ne_16_1_20110815.txt
al/2011/100cm/fgdc/30085/m_3008502_nw_16_1_20110815.txt
al/2011/100cm/fgdc/30085/m_3008503_ne_16_1_20110815.txt
Aerial imagery is recorded by state, and most states have multiple years of availability. For example, imagery was taken of Alabama in 2011, 2013, 2015, and 2017. If I create a MosaicJSON using all years; the on-the-fly computations would be more difficult because more source imagery would have to be compared to generate a tile. I only care about most-recent imagery, so I'll extract the tif image URLs of the latest year for each state.
# Generate file with state abbrevations
cat manifest.txt \
`# Extract text before first /, i.e. the state abbr` \
| awk -F '/' '{print $1}' \
`# Deduplicate` \
| uniq \
`# Remove an extraneous line` \
| sed '/manifest.test/d' \
`# Save to states.txt` \
> states.txt
# For each state abbrevation, get the most recent year of images
# For example, `al/2017`
cat states.txt | while read state
do
cat manifest.txt \
`# Keep only images of this state` \
| grep "^${state}/" \
`# Extract year` \
| awk -F '/' '{print $2}' \
`# Deduplicate` \
| uniq \
`# Sort by year, descending` \
| sort -nr \
`# Keep most recent year` \
| head -n 1 \
`# Prepend with state, so that it's {state}/{year}` \
| sed -e "s|^|${state}/|" \
`# Save to states_latest.txt` \
>> states_latest.txt
done
# For each latest state-year combination, keep URLs of applicable tif images
cat states_latest.txt | while read state_latest
do
cat manifest.txt \
| grep "^${state_latest}/" \
| grep ".tif" \
| sed -e 's|^|s3://naip-visualization/|' \
>> tif_latest.txt
done
See how many tif images per state
cat states.txt | while read state
do
# printf "State: $state "
cat tif_latest.txt \
| grep "^s3://naip-visualization/${state}/" \
| wc -l
done
Example with Rhode Island
cat tif_latest.txt \
| grep "^s3://naip-visualization/ri/" \
| cogeo-mosaic footprint - > footprint.geojson
Total number of files
> wc -l tif_latest.txt
219068 tif_latest.txt
NAIP imagery tiffs are in a requester pays bucket. In order to access them, you
need to set the AWS_REQUEST_PAYER
environment variable:
export AWS_REQUEST_PAYER="requester"
I also found that on an AWS EC2 instance; cogeo-mosaic create
was failing
while it was working on my local computer. In general, if cogeo-mosaic create
isn't working for some URL; you should run rio info <URL>
and see what the
error is, since cogeo-mosaic
uses rasterio
internally, but doesn't currently
print rasterio
errors to stdout. In my case, I had to set the certificates
path (see
cogeotiff/rio-tiler#19,
mapbox/rasterio#942).
export CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
Then create the MosaicJSON file. GET requests are priced at $0.0004
per 1000
requests, so creating the MosaicJSON should cost 0.0004 * (219068 / 1000) =
0.087
. 9 cents!
This took about 1.5GB of memory.
cat tif_latest.txt \
| cogeo-mosaic create - \
> naip_mosaic.json
git clone https://github.com/developmentseed/cogeo-mosaic-tiler.git
# Create lambda package
cd cogeo-mosaic-tiler & make package
# Deploy
npm install serverless -g
sls deploy --bucket kylebarron-landsat-test --region us-west-2
Add the mosaic json
Create Mosaic id (https://github.com/developmentseed/cogeo-mosaic-tiler/blob/master/doc/API.md#mosaicjson-path)
bucket_name="..."
gzip -c naip_mosaic.json > naip_mosaic.json.gz
hash=$(sha224sum naip_mosaic.json.gz | awk '{print $1}')
aws s3 cp naip_mosaic.json.gz "s3://${bucket_name}/mosaics/${hash}.json.gz"
git clone https://github.com/developmentseed/awspds-mosaic
cd awspds-mosaic
make package
cd services/landsat
sls deploy \
--region us-west-2 \
--bucket a-bucket-where-you-store-data \
--token {OPTIONAL MAPBOX TOKEN}
This StackOverflow answer is a good overview of how to connect Cloudflare as a proxy for API Gateway, so I won't go over the entire process. Just a couple edits I made:
landsat-on-aws
and the old naip-on-aws
appear to have only served a single image at a time. The former is still running, and you can see the website at https://landsatonaws.com/.
rio-tiler-mosaic
.rio glui
: inspect a COG in the browser on the fly. I see this as a raster equivalent of the great mbview
, which lets you inspect vector tiles easily. rio glui
can only inspect one COG at a time, but still helpful.