Skip to content

File Lifecycle

  1. States [ all possible states for a given file ]

    • DISCOVERED : raw file is available in the NASA opendirectory
    • DOWNLOADING : file downloading to local has begun
    • DOWNLOADED : file successfully downloaded to local
    • DOWNLOADING_FAILED : file failed to download successfully
    • IGNORE : file to be ignored for downloading, maximum retries reached for downloading it
    • SKIPPED : file which are not eligible for processing; based on metadata or outdated timestamp
    • READY : files eligible for processing (PROCESSING_FAILED files also come here)
    • PROCESSING : file processing started
    • PROCESSED : file successfully processed
    • PROCESSING_FAILED : file failed to process successfully
    • ABANDONED : file has failed to be processed and has reached maximum limit for retry attempts at processing
  2. Terminal States

    • IGNORE
    • SKIPPED
    • PROCESSED
    • ABANDONED
  3. Transitions [ how the state transition will take place ]

    • DISCOVERED ➡ DOWNLOADING
    • DISCOVERED ➡ DOWNLOADED
    • DOWNLOADING ➡ DOWNLOADING_FAILED
    • DOWNLOADING ➡ DOWNLOADED
    • DOWNLOADING_FAILED ➡ DOWNLOADING
    • DOWNLOADING_FAILED ➡ IGNORE
    • DOWNLOADED ➡ DOWNLOADING_FAILED
    • DOWNLOADED ➡ SKIPPED
    • DOWNLOADED ➡ READY
    • READY ➡ PROCESSING
    • PROCESSING ➡ PROCESSING_FAILED
    • PROCESSING ➡ PROCESSED
    • PROCESSING_FAILED ➡ READY
    • PROCESSING_FAILED ➡ ABANDONED
    • PROCESSED ➡ PROCESSING_FAILED
  4. Transition Conditions

    • DOWNLOADED ➡ READY
      Conditions:
      • file passes validation
      • file belongs to an ACTIVE slot
      • processing prerequisites satisfied, depends on algorithm
  5. Scenarios [ what does a normal, failure and other particular scenarios look like ]

    • Normal workflow
      DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADED ➡ READY ➡ PROCESSING ➡ PROCESSED

    • Failure1 workflow [ file failed to be downloaded ]
      DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADING_FAILED ➡ max_limit[ DOWNLOADING ➡ DOWNLOADING_FAILED ] ➡ IGNORE

    • Failure2 workflow [ file skipped for processing ]
      DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADED ➡ SKIPPED

    • Failure3 workflow [ file failed to be processed ]
      DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADED ➡ READY ➡ PROCESSING ➡ PROCESSING_FAILED ➡ max_limit[ PROCESSING ➡ PROCESSING_FAILED ] ➡ ABANDONED

    • Failure4 workflow [ file failed to download due to timeout ]
      DISCOVERED ➡ DOWNLOADING ➡ timeout ➡ DOWNLOADING_FAILED...

    • Failure5 workflow [ file failed to process due to timeout ]
      DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADED ➡ READY ➡ PROCESSING ➡ timeout ➡ PROCESSING_FAILED...

    • Failure6 workflow [ worker crash during download ]
      DISCOVERED ➡ DOWNLOADING ➡ system crashtimeout recovery ➡ DOWNLOADING_FAILED...

    • Failure7 workflow [ worker crash during processing ] DISCOVERED ➡ DOWNLOADING ➡ DOWNLOADED ➡ READY ➡ PROCESSING ➡ system crashtimeout recovery ➡ PROCESSING_FAILED...

    • Failure8 workflow [ duplicate discovery attempt ] DISCOVERED ➡ metadata poll againignore due to unique constraint
      raw_file_hash is unique

    • Operational scenario
      DISCOVERED ➡ file already exists ➡ DOWNLOADED

  6. Failure recovery

    • Download recovery

    DOWNLOADING AND now - download_started_at > download_timeout AND download_attempts < max_download_attemptsDOWNLOADING_FAILED

    DOWNLOADING AND now - download_started_at > download_timeout AND download_attempts >= max_download_attemptsIGNORE

    • Processing recovery

    PROCESSING AND now - processing_started_at > processing_timeout AND processing_attempts < max_processing_attemptsPROCESSING_FAILED

    PROCESSING AND now - processing_started_at > processing_timeout AND processing_attempts >= max_processing_attemptsABANDONED

    • Corruption validation

    DOWNLOADEDDOWNLOADING_FAILED

    PROCESSEDPROCESSING_FAILED

    When file validation fails. File validation metrics could be filesize, invalid format, failure to read etc.

  7. Retry policy

    • Download
      Retry when DOWNLOADING_FAILED
      download_attempts < max_download_attempts
      Retry occurs immediately
      download_attempts incremented when download attempt begins
      If download_attempts exceeded ➡ IGNORE

    • Processing
      Retry when PROCESSING_FAILED
      processing_attempts < max_processing_attempts
      Retry occurs immediately
      processing_attempts incremented when processing begins
      If processing_attempts exceeded ➡ ABANDONED