File Lifecycle¶
-
States [ all possible states for a given file ]
- DISCOVERED : raw file is available in the NASA opendirectory
- DOWNLOADING : file downloading to local has begun
- DOWNLOADED : file successfully downloaded to local
- DOWNLOADING_FAILED : file failed to download successfully
- IGNORE : file to be ignored for downloading, maximum retries reached for downloading it
- SKIPPED : file which are not eligible for processing; based on metadata or outdated timestamp
- READY : files eligible for processing (
PROCESSING_FAILEDfiles also come here) - PROCESSING : file processing started
- PROCESSED : file successfully processed
- PROCESSING_FAILED : file failed to process successfully
- ABANDONED : file has failed to be processed and has reached maximum limit for retry attempts at processing
-
Terminal States
- IGNORE
- SKIPPED
- PROCESSED
- ABANDONED
-
Transitions [ how the state transition will take place ]
- DISCOVERED ➡ DOWNLOADING
- DISCOVERED ➡ DOWNLOADED
- DOWNLOADING ➡ DOWNLOADING_FAILED
- DOWNLOADING ➡ DOWNLOADED
- DOWNLOADING_FAILED ➡ DOWNLOADING
- DOWNLOADING_FAILED ➡ IGNORE
- DOWNLOADED ➡ DOWNLOADING_FAILED
- DOWNLOADED ➡ SKIPPED
- DOWNLOADED ➡ READY
- READY ➡ PROCESSING
- PROCESSING ➡ PROCESSING_FAILED
- PROCESSING ➡ PROCESSED
- PROCESSING_FAILED ➡ READY
- PROCESSING_FAILED ➡ ABANDONED
- PROCESSED ➡ PROCESSING_FAILED
-
Transition Conditions
- DOWNLOADED ➡ READY
Conditions:- file passes validation
- file belongs to an
ACTIVEslot - processing prerequisites satisfied, depends on algorithm
- DOWNLOADED ➡ READY
-
Scenarios [ what does a normal, failure and other particular scenarios look like ]
-
Normal workflow
DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADED ➡ READY ➡ PROCESSING ➡ PROCESSED -
Failure1 workflow [ file failed to be downloaded ]
DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADING_FAILED ➡ max_limit[ DOWNLOADING ➡ DOWNLOADING_FAILED ] ➡ IGNORE -
Failure2 workflow [ file skipped for processing ]
DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADED ➡ SKIPPED -
Failure3 workflow [ file failed to be processed ]
DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADED ➡ READY ➡ PROCESSING ➡ PROCESSING_FAILED ➡ max_limit[ PROCESSING ➡ PROCESSING_FAILED ] ➡ ABANDONED -
Failure4 workflow [ file failed to download due to timeout ]
DISCOVERED ➡ DOWNLOADING ➡timeout➡ DOWNLOADING_FAILED... -
Failure5 workflow [ file failed to process due to timeout ]
DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADED ➡ READY ➡ PROCESSING ➡timeout➡ PROCESSING_FAILED... -
Failure6 workflow [ worker crash during download ]
DISCOVERED ➡ DOWNLOADING ➡system crash➡timeout recovery➡ DOWNLOADING_FAILED... -
Failure7 workflow [ worker crash during processing ] DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADED ➡ READY ➡ PROCESSING ➡
system crash➡timeout recovery➡ PROCESSING_FAILED... -
Failure8 workflow [ duplicate discovery attempt ] DISCOVERED ➡
metadata poll again➡ignore due to unique constraint
raw_file_hashis unique -
Operational scenario
DISCOVERED ➡file already exists➡ DOWNLOADED
-
-
Failure recovery
- Download recovery
DOWNLOADINGANDnow - download_started_at > download_timeoutANDdownload_attempts < max_download_attempts➡DOWNLOADING_FAILEDDOWNLOADINGANDnow - download_started_at > download_timeoutANDdownload_attempts >= max_download_attempts➡IGNORE- Processing recovery
PROCESSINGANDnow - processing_started_at > processing_timeoutANDprocessing_attempts < max_processing_attempts➡PROCESSING_FAILEDPROCESSINGANDnow - processing_started_at > processing_timeoutANDprocessing_attempts >= max_processing_attempts➡ABANDONED- Corruption validation
DOWNLOADED➡DOWNLOADING_FAILEDPROCESSED➡PROCESSING_FAILEDWhen file validation fails. File validation metrics could be filesize, invalid format, failure to read etc.
-
Retry policy
-
Download
Retry whenDOWNLOADING_FAILED
download_attempts < max_download_attempts
Retry occurs immediately
download_attempts incremented when download attempt begins
If download_attempts exceeded ➡IGNORE -
Processing
Retry whenPROCESSING_FAILED
processing_attempts < max_processing_attempts
Retry occurs immediately
processing_attempts incremented when processing begins
If processing_attempts exceeded ➡ABANDONED
-