Skip to content

ProcessedFile

ProcessedFile is the central lifecycle entity responsible for tracking the complete state evolution of LASCO image files throughout the pipeline.

It models:

  • raw file discovery
  • downloading
  • processing readiness
  • image processing execution
  • retry handling
  • terminal completion states

Responsibilities

The entity maintains lifecycle consistency for:

  • raw FITS files
  • processed image outputs
  • retry attempts
  • processing timestamps
  • failure recovery
  • predecessor relationships (C3 running difference)

Lifecycle Diagram

Processed File Lifecycle


Core Guarantees

Immutable Transitions

All lifecycle transitions return a new immutable entity instance.

This prevents accidental mutation and ensures predictable state evolution.

Retry-Safe Processing

The entity explicitly tracks:

  • download retry count
  • processing retry count
  • last retry timestamps

This allows recovery-oriented workflows without duplicating work.

Terminal State Enforcement

Terminal states prevent further transitions once processing is complete or permanently abandoned.

Terminal states include:

  • PROCESSED
  • SKIPPED
  • IGNORE
  • ABANDONED

State Categories

Category States
Discovery DISCOVERED
Downloading DOWNLOADING, DOWNLOADED, DOWNLOADING_FAILED
Processing Preparation READY
Processing PROCESSING, PROCESSED, PROCESSING_FAILED
Terminal SKIPPED, IGNORE, ABANDONED

C3 Processing Relationship

For LASCO C3 processing, the entity also stores:

  • previous_file_name

This enables running-difference processing between sequential observations.


Design Notes

The lifecycle model is intentionally deterministic and state-driven.

The processing pipeline never infers workflow state from filesystem conditions alone — all orchestration decisions are derived from persisted lifecycle state.

This provides:

  • resumability
  • idempotent execution
  • crash recovery
  • workflow observability

API Reference

backend.database.domain.processed_file.ProcessedFile dataclass

Domain entity representing the processing state of a raw image file.

Tracks the transformation of a raw file into its processed output, including hash integrity, storage paths, retry attempts, and lifecycle status within the pipeline.

Attributes:

Name Type Description
raw_file_name str

Original raw file name - primary key.

raw_file_hash Optional[str]

Content hash of the original raw file.

raw_file_path Optional[Path]

Storage path of the raw file.

raw_file_size Optional[int]

Size of the raw file in bytes.

processed_file_name Optional[str]

Processed file name.

processed_file_hash Optional[str]

Content hash of the processed file.

processed_file_path Optional[Path]

Storage path of the processed file.

processed_file_size Optional[int]

Size of the processed file in bytes.

datetime_of_observation datetime

date time of observation

instrument Instrument

Instrument used for obesrvation

status FileStatus

Current processing lifecycle state.

error_message Optional[str]

Error details if processing failed.

downloaded_at Optional[datetime]

UTC timestamp when file downloaded.

last_downloading_attempt_at Optional[datetime]

UTC timestamp of the most recent downloading attempt.

downloading_attempt_count int

Number of downloading attempts made.

processed_at Optional[datetime]

UTC timestamp when processing completed.

last_processing_attempt_at Optional[datetime]

UTC timestamp of the most recent processing attempt.

processing_attempt_count int

Number of processing attempts made.

previous_file_name Optional[str]

previous file name which will help in processing

Invariants
  • status value is one of the FileStatus enums

can_retry_downloading

can_retry_downloading(max_downloading_attempts)

Determines whether downloading can be retried based on attempt limits :param max_downloading_attempts: Maximum allowed download attempts. :return: True if download retry is allowed False otherwise.

can_retry_processing

can_retry_processing(max_processing_attempts)

Determines whether processing can be retried based on attempt limits :param max_processing_attempts: Maximum allowed processing attempts. :returns: True if processing retry is allowed False otherwise.

can_transition

can_transition(new_status)

Checks if file can legally transition from its current status to given new status based on the lifecycle state machine.

:param new_status: Target status to validate transition against :return: True if transition is allowed. False otherwise

from_row classmethod

from_row(row)

Creates a ProcessedFile domain entity from a database row.

:param row: Database row containing processed_file table data. :return: Constructed domain entity populated from DB row.

identity

identity()

Returns the unique identity of the processed file domain entity.

The raw file name acts as the natural identity since it uniquely represents the source file across the pipeline.

:returns: Raw file name (primary identity).

is_download_complete

is_download_complete()

Checks whether the raw file has been successfully downloaded and is ready for further pipeline decisions.

A file is considered download complete if it has reached DOWNLOADED state.

:return: True if file status is DOWNLOADED, False otherwise.

is_terminal

is_terminal()

checks if current status of the file is terminal lifecycle status or not. Terminal states indicate no further processing or retries will occur.

:return: True if file is in terminal state. False otherwise

transition_to

transition_to(new_status)

Creates a new immutable ProcessedFile instance with updated status after validating that the transition is allowed.

:param new_status: Target lifecycle status :return: New instance with updated status Raises: ValueError: If transition is not allowed by the lifecycle rules.