ProcessedFileRepository¶
Repository responsible for tracking file lifecycle state through downloading and processing stages.
Responsibilities¶
- Persist ProcessedFile entities
- Update lifecycle states
- Query retryable files
- Track processed outputs
Lifecycle States¶
- DISCOVERED
- DOWNLOADING
- DOWNLOADED
- READY
- PROCESSING
- PROCESSED
- FAILED states
API Reference¶
backend.database.repositories.processed_file_repository.ProcessedFileRepository ¶
This class is for processed_file table required to keep a log
of which files are processed and which are not.
create_file ¶
create_file(raw_file_name, raw_file_hash, raw_file_path, raw_file_size, processed_file_name, processed_file_hash, processed_file_path, processed_file_size, datetime_of_observation, instrument, status, error_message, downloaded_at, last_downloading_attempt_at, downloading_attempt_count, processed_at, last_processing_attempt_at, processing_attempt_count, previous_file_name)
Checks if file status is enum or not. creates the file details to a row in the table
:param raw_file_name: raw file name, as appears in img-hdr.txt - primary key :param raw_file_hash: hash value of unprocessed file :param raw_file_path: raw file path. :param raw_file_size: raw file size in bytes. :param processed_file_name: renamed processed file. :param processed_file_hash: hash value of processed file. :param processed_file_path: processed file path. :param processed_file_size: processed file size in bytes :param datetime_of_observation: date time of observation :param instrument: instrument used for observation :param status: file processing status :param error_message: error thrown as a result of failed file processing :param downloaded_at: UTC timestamp when file downloaded. :param last_downloading_attempt_at: UTC timestamp of the most recent downloading attempt. :param downloading_attempt_count: Number of downloading attempts made. :param processed_at: UTC timestamp when processing completed. :param last_processing_attempt_at: UTC timestamp of the most recent processing attempt. :param processing_attempt_count: Number of processing attempts made. :param previous_file_name: previous file name which will help with processing :return: Returns True only if the number of created rows is 1
create_indexes_sql
classmethod
¶
create_indexes_sql()
Query to create index for processed_file table
for faster accessibility while filtering based
on status and datetime_of_observation.
create_table_sql
classmethod
¶
create_table_sql()
Query to create processed_file table.
timestamps are stored as ISO-8601 UTC strings
delete_file ¶
delete_file(file)
Deletes processed file record
:param file: processed file domain entity :return : Returns boolean value, True if deletion happened
exists_by_name ¶
exists_by_name(raw_file_name)
Checks if a processed file record exists.
:param raw_file_name: Raw file name. :return: True if record exists.
get_downloaded_files_by_time ¶
get_downloaded_files_by_time(instrument, download_start_utc, download_end_utc)
get downloaded files by instrument and download_at time window
:param instrument: instrument used for observation :param download_start_utc: start timestamp (UTC) :param download_end_utc: end timestamp (UTC) :return: list of process file entities
get_files_by_observation ¶
get_files_by_observation(instrument, observation_start_utc, observation_end_utc)
Returns files within a observation period and instrument
:param instrument: Instrument used for observation :param observation_start_utc: starting utc timestamp of observation :param observation_end_utc: ending utc timestamp of observation :return: list of processed file domain entities
get_files_by_observation_and_status ¶
get_files_by_observation_and_status(instrument, status, observation_start_utc, observation_end_utc, limit, offset)
returns files for a given observation time preiod and status
:param instrument: Instrument used for observation :param status: file status :param observation_start_utc: starting utc timestamp of observation :param observation_end_utc: ending utc timestamp of observation :param limit: max number of rows to fetch :param offset: number of rows to skip :return: list of processed file domain entities
get_files_by_status ¶
get_files_by_status(instrument, status, order_by, ascending=True)
Fetch files by status
:param instrument: instrument used for observation :param status: status of the file :param order_by: column used for sorting :param ascending: boolean, to set the sorting order :return: list of files domian entity
read_file_by_name ¶
read_file_by_name(raw_file_name)
Fetch processed file record using raw file name value
:param raw_file_name: Computed name value of raw file, primary key :return : returns complete file processing data
save ¶
save(file)
Persists latest state of a processed file domain entity.
:param file: Domain object containing latest state. :return: True if exactly one row updated.