asf_search Best Practices#

In addition to covering best practices, this page also contains advanced search techniques and serves as the "philosophy of asf_search".

Topics covered include:

General recommendations, including working with results, performance, common search filters and types, and count
Specifics for some datasets
Granule and product searches and the preferred method for these
Secondary searches such as stacking
Download and recommended authentication method
Advanced search techniques, including ranges, subclasses, large result sets, and more

General Recommendations#

This section contains information on result sets, general performance, the different search types available, common filter examples, and count.

Result Sets#

Search results are returned as an ASFSearchResults object, a sublass of User List, containing a list of ASFProduct objects. Each of these classes provides some additional functionality to aid in working with the results and individual products. ASFProduct provides a number of metadata fields, such as:

Geographic coordinates
Latitude/Longitude
Shape type
Scene and product metadata
Path, frame
Platform, beam, polarization
File name, size, URL

Geographic coordinates are stored in the geometry attribute:

results[0].geometry

Other metadata is available through the properties attribute:

results[0].properties

ASFProduct objects provides geojson-based serialization, in the form of a geojson feature snippet:

print(results[0])

ASFSearchResults also supports the following output formats:

csv
jsonlite
jsonlite2
metalink
kml

General performance#

When searching for multiple products it's faster to search all products at once in a single search, rather than running a separate query for each product, which involves multiple https requests.

import asf_search as asf

granules = ['S1B_IW_GRDH_1SDV_20161124T032008_20161124T032033_003095_005430_9906', 'S1-GUNW-D-R-087-tops-20190301_20190223-161540-20645N_18637N-PP-7a85-v2_0_1', 'ALPSRP111041130']

# THIS IS SLOW AND MAKES MORE NETWORK REQUESTS THAN NECESSARY
batched_results = ASFSearchResults([])
for granule in granules:
    unbatched_response = asf.granule_search(granules_list=granule)
    batched_results.extend(batched_results)

# THIS WILL ALWAYS BE FASTER
fast_results = asf.granule_search(granules_list=granules)

If you need to perform intermediate operations on large results (such as writing metadata to a file or calling some external process on results), use the search_generator() method to operate on results as they're returned page-by-page (default page size is 250).

import asf_search as asf

opts = asf.ASFSearchOptions(platform=asf.DATASET.SENTINEL1, maxResults=1000)

for page in asf.search_generator(opts=opts):
    foo(page)

Differences between search types#

To see details on different search types, see the Searching section.

Common Filters#

Search options can be specified using kwargs, which also allows them to be handled using a dictionary:

opts = {
    'platform': asf.PLATFORM.ALOS,
    'start': '2010-01-01T00:00:00Z',
    'end': '2010-02-01T23:59:59Z'
}

Below are some common filter examples:

results = asf.geo_search(
    intersectsWith='POLYGON((-91.97 28.78,-88.85 28.78,-88.85 30.31,-91.97 30.31,-91.97 28.78))',
    platform=asf.PLATFORM.UAVSAR,
    processingLevel=asf.PRODUCT_TYPE.METADATA,
    maxResults=250)

search_count()#

You may use the search_count() method to return the count of total results matching the passed search options.

This example returns the current size of the SENTINEL1 catalog:

opts = {
'platform': asf.PLATFORM.SENTINEL1}
count = asf.search_count(**opts)

Dataset Specifics#

Constants are provided for each dataset. The list of constants can be found here.

Basic dataset search example:

sentinel_results = asf.search(dataset=asf.DATASET.SENTINEL1, maxResults=250)

You can view the metadata for your results via the properties dictionary:

sentinel_results[0].properties

Or you can view the metadata as a geojson formatted dictionary:

sentinel_results.geojson()

NISAR#

asf_search supports searching for lists of short names by the shortName keyword. The currently available NISAR data that CMR provides lacks searchable additional attributes. Therefore, the best way to search for NISAR results is via combinations of shortName, dataset, platform, and granule_list/product_list keywords.

NISAR example:

nisar_gslc_gunw = asf.search(shortName=['NISAR_L2_GSLC_V1', 'NISAR_L2_GUNW_V1'], opts=search_opts, maxResults=250)
print(nisar_gslc_gunw)

Opera-S1#

The Opera dataset has both standard products and CalVal (calibration/validation) products available. Please note that the CalVal products are treated as their own dataset in asf_search. Both can be found in the constants list.

SLC-Burst#

The SLC Burst dataset has both tiff and xml data associated with a single entry in CMR. To access the xml data, see the section on downloading additional files.

fullBurstID, relativeBurstID, and absoluteBurstID are SLC Burst specific filters. To get a temporal stack of products over a single burst frame, use fullBurstID, which is shared between all bursts over a single frame.

Search Specifics#

This section contains information on granule and product searches, secondary searches, and other search details.

Granule and Product Search#

granule_search() and product_search() are similar. Granule (also called a scene) searches include all files types for the specified granule, whereas product searches specify one file type. Granule searches can be 1:many, whereas a product search will always be 1:1.

Granule search example:

granule_list = [
    'S1B_IW_GRDH_1SDV_20190822T151551_20190822T151616_017700_0214D2_6084',
    'S1B_IW_GRDH_1SDV_20190810T151550_20190810T151615_017525_020F5A_2F74',
    'S1B_IW_GRDH_1SDV_20190729T151549_20190729T151614_017350_020A0A_C3E2',
    'S1B_IW_GRDH_1SDV_20190717T151548_20190717T151613_017175_0204EA_4181',
    'S1B_IW_GRDH_1SDV_20190705T151548_20190705T151613_017000_01FFC4_24EC',
    'S1B_IW_GRDH_1SDV_20190623T151547_20190623T151612_016825_01FA95_14B9',
    'S1B_IW_GRDH_1SDV_20190611T151546_20190611T151611_016650_01F566_D7CE',
    'S1B_IW_GRDH_1SDV_20190530T151546_20190530T151611_016475_01F02E_BF97',
    'S1B_IW_GRDH_1SDV_20190518T151545_20190518T151610_016300_01EAD8_9308',
    'S1B_IW_GRDH_1SDV_20190506T151544_20190506T151609_016125_01E56C_1D67'
]
results = asf.granule_search(granule_list)
print(results)

Product search example:

product_list = [
    'S1A_IW_GRDH_1SDV_20190809T001336_20190809T001401_028485_033839_78A1-GRD_HD',
    'S1A_IW_GRDH_1SDV_20150322T000454_20150322T000524_005137_006794_56E3-GRD_HD',
    'S1A_IW_GRDH_1SDV_20160121T001256_20160121T001321_009585_00DF26_5B84-GRD_HD',
    'S1A_IW_GRDH_1SDV_20151117T000448_20151117T000513_008637_00C455_3DC2-GRD_HD'
]
results = asf.product_search(product_list)
print(results)

granule_search() and product_search() do not make use of any other search filters, but will accept kwargs for consistency with other search functions:

results = asf.granule_search(granule_list=granule_list)
print(f'{len(results)} results found')

Note about incorrect methods#

It is generally preferred to "collapse" many small queries into fewer large queries. That is, it may be easy and logically reasonable to run a number of small granule_search() queries via a foreach loop over each of the items in the original granule list. Please do not do this. It consumes a lot of resources at both ASF and at CMR.

Instead, combine your small queries into a single large query where possible, as shown above, and then post-process the results locally. granule_search() and product_search() can support very large lists, and will break them up internally when needed.

frame vs asfframe#

When using the frame keyword with certain platforms/datasets, asf_search will implicitly swap to using the asfframe keyword instead at search time. The platforms/datasets this affects are:

SENTINEL-1A/B
ALOS

In the query to CMR, this means searching by the FRAME_NUMBER instead of the CENTER_ESA_FRAME additional attribute. A way to avoid this on searches and use CENTER_ESA_FRAME with the above platforms/datasets is to use cmr_keywords:

asf.search(platform=asf.PLATFORM.SENTINEL1, cmr_keywords=[('attribute[]', 'int,CENTER_ESA_FRAME,1001')], maxResults=250)

Stacking#

Once you have identified a result set or a product id, you may wish to build a baseline stack based on those results. You may use either the stack() or stack_from_id() methods to accomplish this.

stack_from_id() is provided largely as a convenience: internally, it performs a product_search() using the provided ID, and then returns the results of that product's stack() method. For this reason, it is recommended that if you have an ASFProduct object at hand, you use that to build your stack directly, as it removes the need for the additional search action. For other cases where you have parameters describing your reference scene but not an ASFProduct object itself, it is appropriate to use one of the various search features available to obtain an ASFProduct first.

A basic example using ASFProduct.stack():

import asf_search as asf

reference = asf.product_search('S1A_IW_SLC__1SDV_20220215T225119_20220215T225146_041930_04FE2E_9252-SLC')[0]

print(reference.stack())

The results are a standard ASFSearchResults object containing a list of ASFProduct objects, each with all the usual functionality. There are 2 additional fields in the ASFProduct objects: temporalBaseline and perpendicularBaseline. temporalBaseline describes the temporal offset in days from the reference scene used to build the stack. perpendicularBaseline describes the perpendicular offset in meters from the reference scene used the build the stack. The reference scene is included in the stack and will always have a temporal and perpendicular baseline of 0.

Platform vs Dataset#

asf_search provides 2 major keywords with subtle differences:

platform
dataset

platform maps to the platform[] CMR keyword; values like Sentinel-1A, UAVSAR, ALOS. A limitation of searching by platform is that for platforms like Sentinel-1A there are a lot of Sentinel-1 derived product types (OPERA-S1, SLC-BURST). For every SLC product, there are 27 additional OPERA-S1 and SLC-BURST products, which can lead to homogeneous results depending on your search filters.

The dataset keyword serves as a solution for this. Each "dataset" is a collection of concept ids generally associated with commonly used datasets.

# At the time of writing will likely contain mostly `OPERA-S1` and/or `SLC-BURST` products
platform_results = asf.search(dataset=asf.PLATFORM.SENTINEL1, maxResults=250) 

# Will contain everything but `OPERA-S1` and/or `SLC-BURST` products
dataset_results = asf.search(dataset=asf.DATASET.SENTINEL1, maxResults=250)

# Will contain OPERA-S1 Products
opera_results = asf.search(dataset=asf.DATASET.OPERA_S1, maxResults=250)

# Will contain SLC-BURST products
slc_burst_results = asf.search(dataset=asf.DATASET.SLC_BURST, maxResults=250)

CMR UAT Host#

asf_search defaults to querying against the production CMR API, cmr.earthdata.nasa.gov. In order to use another CMR host, set the host keyword with ASFSearchOptions.

uat_opts = asf.ASFSearchOptions(host='cmr.uat.earthdata.nasa.gov', maxResults=250)
uat_results = asf.search(opts=uat_opts)

Campaign lists#

asf_search provides a built in method for searching for campaigns via platform.

asf.campaigns(platform=asf.PLATFORM.SENTINEL1A)

CMR Keyword Aliasing#

asf_search aliases the following keywords behind the scenes with corresponding collection concept ids for improved search performance:

platform
processingLevel

The Alias lists are updated as needed with each release, but if you're not finding expected results, then the alias list may be out of date. In order to skip the aliasing step, set the collectionAlias keyword to false with ASFSearchOptions

opts = asf.ASFSearchOptions(collectionAlias=False, maxResults=250)
unaliased_results = asf.search(opts=opts)

Please note, this will result in slower average search times. If there are any results missing from new datasets, please report it as an issue in github with the concept id and name of the collection missing from the dataset.

Download#

This Jupyter notebook covers the available authentication methods. Once authenticated, it provides a workflow for downloading search results.

Recommended Authentication#

Using .netrc credentials is the preferred method for authentication. This guide will show you how to set up a .netrc file. Requests will attempt to get the authentication credentials for the URL’s hostname from your .netrc file. The .netrc file overrides raw HTTP authentication headers set with headers=. If credentials for the hostname are found, the request is sent with HTTP Basic Auth.

Advanced Search Techniques#

Below you will find recommendations for advanced search techniques, such as subclassing, authentication, and the preferred method for large searches.

Sentinel-1 and GroupID#

Sentinel-1 products as well as most Sentinel-1 derived datasets (OPERA-S1, SLC-Burst) have a group id associated with them. This means that getting the original source scene, or any product associated with that scene, is as simple as using the groupID keyword in a search.

import asf_search as asf

burst_name = 'S1_279916_IW1_20230418T162849_VV_A7E1-BURST'
burst_granule = asf.search(granule_list=['S1_279916_IW1_20230418T162849_VV_A7E1-BURST'])[0]

groupID = burst_granule.properties['groupID']

# gets the parent SLC of the burst product
parent_slc = asf.search(groupID=groupID, processingLevel=asf.PRODUCT_TYPE.SLC)[0]

# gets all other SLC Bursts associated with the same parent SLC
bursts_in_same_scene = asf.search(groupID=groupID, processingLevel=asf.PRODUCT_TYPE.BURST)

# gets ALL Sentinel-1 products and derived products available for the parent scene
all_products_for_scene = asf.search(groupID=groupID)

Subclassing#

ASFProduct is the base class for all search result objects. There are several subclasses of ASFProduct that are used for specific platforms and product types with unique properties/functionality.

Key Methods:

geojson()
download()
stack()
get_stack_opts() (returns None in ASFProduct, implemented by ASFStackableProduct subclass and its subclasses)
centroid()
remotezip() (requires optional dependency to be installed)
get_property_paths() (gets product's keywords and their paths in umm dictionary)
translate_product() (reads properties from umm, populates properties with associated keyword)
get_sort_keys()
umm_get()

Key Properties:

properties
_base_properties (what get_property_paths() uses to find values in umm JSON properties)
umm (the product's umm JSON from CMR)
metadata (the product's metadata JSON from CMR)

ASFStackableProduct is an important ASFProduct subclass, from which stackable product types meant for time series analysis are derived. ASFStackableProduct has a class enum, BaselineCalcType, that determines how perpendicular stack calculations are handled. Each subclass keeps track of their baseline calculation type via the baseline_type property.

Inherits: ASFProduct

Inherited By: ALOSProduct; ERSProduct; JERSProduct; RADARSATProduct; S1Product; S1BurstProduct; OPERAS1Product, ARIAS1GUNWProduct

Key Methods:

get_baseline_calc_properties()
get_stack_opts() (overrides ASFproduct)
is_valid_reference()
get_default_baseline_product_type()

Key Definitions for class enum BaselineCalcType:

PRE_CALCULATED: has pre-calculated insarBaseline value that will be used for perpendicular calculations
CALCULATED: uses position/velocity state vectors and ascending node time for perpendicular calculations

Key Fields:

baseline
baseline_type (BaselineCalcType.PRE_CALCULATED by default or BaselineCalcType.CALCULATED)

Because ASFProduct is built for subclassing, that means you can provide your own custom subclasses derived directly from ASFProduct or even from a pre-existing subclass like S1Product or OperaS1Product.

For more information on subclassing, see the Jupyter notebook.

Using authenticated searches#

Downloading data, and accessing some data, requires an authenticated session with Earthdata Login. To simplify this workflow, the ASFSession class is provided.

auth_with_creds()
auth_with_token()
auth_with_cookiejar()

Creating an authenticated session example:

from getpass import getpass
session = asf.ASFSession()
session.auth_with_creds(input('EDL Username'), getpass('EDL Password'))

The ASFSearchOptions class is provided for storing and validating search parameters. Creating an ASFSearchOptions object is required to pass our authenticated session to search().

search_opts = asf.ASFSearchOptions(
dataset=asf.DATASET.NISAR,
session=session)

nisar_response = asf.search(opts=search_opts, maxResults=250)

search_generator() for large result sets#

The recommended way to perform large, long-running searches is to use search_generator() to yield CMR results page by page. This allows you to stream results to a file in the event CMR times out. Different output formats can be used.

Note that asf_search queries CMR with page sizes of 250, so setting maxResults=500 means asf_search will have to query CMR twice, each time returning 250 products:

large_results_generator = asf.search_generator(maxResults=500, platform=asf.PLATFORM.SENTINEL1A)

with open("search_results.metalink", "w") as f:
    f.writelines(asf.export.results_to_metalink(large_results_generator))

Another usage example:

import asf_search as asf
opts = asf.ASFSearchOptions(shortName='ARIA_S1_GUNW')
urs = []
for page in asf.search_generator(opts=opts):
    urs.extend(product.properties['fileID'] for product in page)
    print(len(urs))

Downloading additional files#

Some product types, such as SLC Bursts or Opera-S1 products, have several files that can be downloaded. We can specify which files to download by setting the fileType and using the FileDownloadType enum class.

Additional files are stored in this array:

product.properties['additionalUrls']

To download only the additional files:

FileDownloadType.ADDITIONAL_FILES    # everything in 'additionalUrls'

To download the default file:

FileDownloadType.DEFAULT_FILE        # The default data file, 'url'

To download both:

FileDownloadType.ALL_FILES           # all of the above

This example will download all additional files under the additionalUrls attribute:

cslc_results[0].download(session=session, path = './', fileType=asf.FileDownloadType.ADDITIONAL_FILES)

To be more specific, we can use the download_urls() or download_url() methods

print(f"Additional urls: {opera_results[0].properties['additionalUrls']}")

url = opera_results[0].properties['additionalUrls'][0]
fileName = url.split('/')[-1]

asf.download_url(url, session=session, path ='./', filename=fileName)

S3 URIs#

Some product types (Sentinel-1, BURST, OPERA, NISAR) have s3 direct access URIs available. They are accessible under the s3Urls properties key:

ASFProduct.properties['s3Urls'].

CMR Keywords Search Parameter#

You can also search for granules using readable_granule_name via pattern matching.

To do this, you can pass the CMR search keyword config directly with the cmr_keywords search parameter. This allows you to pass any valid CMR keyword-value pair that isn't covered by asf_search directly, as well as configure existing parameter behavior.

More info on pattern matching and parameter options can be found here.

Example:

gslc_results = asf.search(granule_list=['*046_009_A_095*'], cmr_keywords=('options[readable_granule_name][pattern]', 'true'), opts=search_opts)

for product in gslc_results:
    print(product.properties['fileID'])