ProcessedFile¶
ProcessedFile is the central lifecycle entity responsible for tracking
the complete state evolution of LASCO image files throughout the pipeline.
It models:
- raw file discovery
- downloading
- processing readiness
- image processing execution
- retry handling
- terminal completion states
Responsibilities¶
The entity maintains lifecycle consistency for:
- raw FITS files
- processed image outputs
- retry attempts
- processing timestamps
- failure recovery
- predecessor relationships (C3 running difference)
Lifecycle Diagram¶
Core Guarantees¶
Immutable Transitions¶
All lifecycle transitions return a new immutable entity instance.
This prevents accidental mutation and ensures predictable state evolution.
Retry-Safe Processing¶
The entity explicitly tracks:
- download retry count
- processing retry count
- last retry timestamps
This allows recovery-oriented workflows without duplicating work.
Terminal State Enforcement¶
Terminal states prevent further transitions once processing is complete or permanently abandoned.
Terminal states include:
PROCESSEDSKIPPEDIGNOREABANDONED
State Categories¶
| Category | States |
|---|---|
| Discovery | DISCOVERED |
| Downloading | DOWNLOADING, DOWNLOADED, DOWNLOADING_FAILED |
| Processing Preparation | READY |
| Processing | PROCESSING, PROCESSED, PROCESSING_FAILED |
| Terminal | SKIPPED, IGNORE, ABANDONED |
C3 Processing Relationship¶
For LASCO C3 processing, the entity also stores:
previous_file_name
This enables running-difference processing between sequential observations.
Design Notes¶
The lifecycle model is intentionally deterministic and state-driven.
The processing pipeline never infers workflow state from filesystem conditions alone — all orchestration decisions are derived from persisted lifecycle state.
This provides:
- resumability
- idempotent execution
- crash recovery
- workflow observability
API Reference¶
backend.database.domain.processed_file.ProcessedFile
dataclass
¶
Domain entity representing the processing state of a raw image file.
Tracks the transformation of a raw file into its processed output, including hash integrity, storage paths, retry attempts, and lifecycle status within the pipeline.
Attributes:
| Name | Type | Description |
|---|---|---|
raw_file_name |
str
|
Original raw file name - primary key. |
raw_file_hash |
Optional[str]
|
Content hash of the original raw file. |
raw_file_path |
Optional[Path]
|
Storage path of the raw file. |
raw_file_size |
Optional[int]
|
Size of the raw file in bytes. |
processed_file_name |
Optional[str]
|
Processed file name. |
processed_file_hash |
Optional[str]
|
Content hash of the processed file. |
processed_file_path |
Optional[Path]
|
Storage path of the processed file. |
processed_file_size |
Optional[int]
|
Size of the processed file in bytes. |
datetime_of_observation |
datetime
|
date time of observation |
instrument |
Instrument
|
Instrument used for obesrvation |
status |
FileStatus
|
Current processing lifecycle state. |
error_message |
Optional[str]
|
Error details if processing failed. |
downloaded_at |
Optional[datetime]
|
UTC timestamp when file downloaded. |
last_downloading_attempt_at |
Optional[datetime]
|
UTC timestamp of the most recent downloading attempt. |
downloading_attempt_count |
int
|
Number of downloading attempts made. |
processed_at |
Optional[datetime]
|
UTC timestamp when processing completed. |
last_processing_attempt_at |
Optional[datetime]
|
UTC timestamp of the most recent processing attempt. |
processing_attempt_count |
int
|
Number of processing attempts made. |
previous_file_name |
Optional[str]
|
previous file name which will help in processing |
Invariants
- status value is one of the FileStatus enums
can_retry_downloading ¶
can_retry_downloading(max_downloading_attempts)
Determines whether downloading can be retried based on attempt limits :param max_downloading_attempts: Maximum allowed download attempts. :return: True if download retry is allowed False otherwise.
can_retry_processing ¶
can_retry_processing(max_processing_attempts)
Determines whether processing can be retried based on attempt limits :param max_processing_attempts: Maximum allowed processing attempts. :returns: True if processing retry is allowed False otherwise.
can_transition ¶
can_transition(new_status)
Checks if file can legally transition from its current status to given new status based on the lifecycle state machine.
:param new_status: Target status to validate transition against :return: True if transition is allowed. False otherwise
from_row
classmethod
¶
from_row(row)
Creates a ProcessedFile domain entity from a database row.
:param row: Database row containing processed_file table data. :return: Constructed domain entity populated from DB row.
identity ¶
identity()
Returns the unique identity of the processed file domain entity.
The raw file name acts as the natural identity since it uniquely represents the source file across the pipeline.
:returns: Raw file name (primary identity).
is_download_complete ¶
is_download_complete()
Checks whether the raw file has been successfully downloaded and is ready for further pipeline decisions.
A file is considered download complete if it has reached
DOWNLOADED state.
:return: True if file status is DOWNLOADED, False otherwise.
is_terminal ¶
is_terminal()
checks if current status of the file is terminal lifecycle status or not. Terminal states indicate no further processing or retries will occur.
:return: True if file is in terminal state. False otherwise
transition_to ¶
transition_to(new_status)
Creates a new immutable ProcessedFile instance with updated status after validating that the transition is allowed.
:param new_status: Target lifecycle status :return: New instance with updated status Raises: ValueError: If transition is not allowed by the lifecycle rules.