Skip to content

ProcessedFileRepository

Repository responsible for tracking file lifecycle state through downloading and processing stages.

Responsibilities

  • Persist ProcessedFile entities
  • Update lifecycle states
  • Query retryable files
  • Track processed outputs

Lifecycle States

  • DISCOVERED
  • DOWNLOADING
  • DOWNLOADED
  • READY
  • PROCESSING
  • PROCESSED
  • FAILED states

API Reference

backend.database.repositories.processed_file_repository.ProcessedFileRepository

This class is for processed_file table required to keep a log of which files are processed and which are not.

create_file

create_file(raw_file_name, raw_file_hash, raw_file_path, raw_file_size, processed_file_name, processed_file_hash, processed_file_path, processed_file_size, datetime_of_observation, instrument, status, error_message, downloaded_at, last_downloading_attempt_at, downloading_attempt_count, processed_at, last_processing_attempt_at, processing_attempt_count, previous_file_name)

Checks if file status is enum or not. creates the file details to a row in the table

:param raw_file_name: raw file name, as appears in img-hdr.txt - primary key :param raw_file_hash: hash value of unprocessed file :param raw_file_path: raw file path. :param raw_file_size: raw file size in bytes. :param processed_file_name: renamed processed file. :param processed_file_hash: hash value of processed file. :param processed_file_path: processed file path. :param processed_file_size: processed file size in bytes :param datetime_of_observation: date time of observation :param instrument: instrument used for observation :param status: file processing status :param error_message: error thrown as a result of failed file processing :param downloaded_at: UTC timestamp when file downloaded. :param last_downloading_attempt_at: UTC timestamp of the most recent downloading attempt. :param downloading_attempt_count: Number of downloading attempts made. :param processed_at: UTC timestamp when processing completed. :param last_processing_attempt_at: UTC timestamp of the most recent processing attempt. :param processing_attempt_count: Number of processing attempts made. :param previous_file_name: previous file name which will help with processing :return: Returns True only if the number of created rows is 1

create_indexes_sql classmethod

create_indexes_sql()

Query to create index for processed_file table for faster accessibility while filtering based on status and datetime_of_observation.

create_table_sql classmethod

create_table_sql()

Query to create processed_file table. timestamps are stored as ISO-8601 UTC strings

delete_file

delete_file(file)

Deletes processed file record

:param file: processed file domain entity :return : Returns boolean value, True if deletion happened

exists_by_name

exists_by_name(raw_file_name)

Checks if a processed file record exists.

:param raw_file_name: Raw file name. :return: True if record exists.

get_downloaded_files_by_time

get_downloaded_files_by_time(instrument, download_start_utc, download_end_utc)

get downloaded files by instrument and download_at time window

:param instrument: instrument used for observation :param download_start_utc: start timestamp (UTC) :param download_end_utc: end timestamp (UTC) :return: list of process file entities

get_files_by_observation

get_files_by_observation(instrument, observation_start_utc, observation_end_utc)

Returns files within a observation period and instrument

:param instrument: Instrument used for observation :param observation_start_utc: starting utc timestamp of observation :param observation_end_utc: ending utc timestamp of observation :return: list of processed file domain entities

get_files_by_observation_and_status

get_files_by_observation_and_status(instrument, status, observation_start_utc, observation_end_utc, limit, offset)

returns files for a given observation time preiod and status

:param instrument: Instrument used for observation :param status: file status :param observation_start_utc: starting utc timestamp of observation :param observation_end_utc: ending utc timestamp of observation :param limit: max number of rows to fetch :param offset: number of rows to skip :return: list of processed file domain entities

get_files_by_status

get_files_by_status(instrument, status, order_by, ascending=True)

Fetch files by status

:param instrument: instrument used for observation :param status: status of the file :param order_by: column used for sorting :param ascending: boolean, to set the sorting order :return: list of files domian entity

read_file_by_name

read_file_by_name(raw_file_name)

Fetch processed file record using raw file name value

:param raw_file_name: Computed name value of raw file, primary key :return : returns complete file processing data

save

save(file)

Persists latest state of a processed file domain entity.

:param file: Domain object containing latest state. :return: True if exactly one row updated.