Database Schema¶
This document defines the relational schema used in Comet Hunter.
The system is built on SQLite with a focus on deterministic behavior, idempotent ingestion, and efficient time-based querying.
Tables Overview¶
| Table Name | Purpose |
|---|---|
downlink_slot |
Tracks scheduled communication windows for data availability |
file_metadata |
Stores metadata of available LASCO files (from C2 and C3 coronagraphs) |
processed_file |
Tracks lifecycle of each file through ingestion and processing |
1. downlink_slot¶
Represents scheduled downlink windows during which image data is expected.
Schema¶
CREATE TABLE downlink_slot (
wk INTEGER NOT NULL,
doy INTEGER NOT NULL,
wdy TEXT NOT NULL,
bot_utc TEXT NOT NULL,
eot_utc TEXT NOT NULL,
ant TEXT,
status TEXT NOT NULL CHECK(status IN ('MISSED','PENDING','ACTIVE','DONE')),
PRIMARY KEY (wk, doy, wdy, bot_utc)
);
Indexes¶
CREATE INDEX idx_schedule_status_time
ON downlink_slot (status, bot_utc)
Notes¶
- Composite primary key ensures uniqueness per slot
bot_utcandeot_utcare stored in ISO 8601 UTC format- Status lifecycle (ideal): PENDING → ACTIVE → DONE
- Indexed for fast retrieval of active and upcoming slots
2. file_metadata¶
Stores metadata parsed from LASCO sources before file download
Schema¶
CREATE TABLE file_metadata (
raw_file_name TEXT PRIMARY KEY,
raw_file_hash TEXT,
datetime_of_observation TEXT NOT NULL,
last_modified_utc TEXT NOT NULL,
instrument TEXT NOT NULL,
exposure_time REAL NOT NULL,
width INTEGER NOT NULL,
height INTEGER NOT NULL,
roll REAL NOT NULL
);
Indexes¶
CREATE INDEX idx_file_metadata_observation_time
ON file_metadata (instrument, datetime_of_observation);
CREATE INDEX idx_file_metadata_modified_time
ON file_metadata (instrument, last_modified_utc);
CREATE INDEX idx_file_metadata_name
ON file_metadata (raw_file_name);
Notes¶
raw_file_nameis the identifierlast_modified_utcis used only for ingestiondatetime_of_observationis used for chronological playbackinstrumentvalues:C2,C3- Hash values are optional and maybe populated later
3. processed_file¶
Tracks full lifecycle of a file from discovery to processing
Schema¶
CREATE TABLE processed_file (
raw_file_name TEXT PRIMARY KEY,
raw_file_hash TEXT UNIQUE,
raw_file_path TEXT UNIQUE NOT NULL,
raw_file_size INTEGER,
processed_file_name TEXT UNIQUE,
processed_file_hash TEXT UNIQUE,
processed_file_path TEXT UNIQUE,
processed_file_size INTEGER,
datetime_of_observation TEXT NOT NULL,
instrument TEXT NOT NULL,
status TEXT NOT NULL,
error_message TEXT,
downloaded_at TEXT,
last_downloading_attempt_at TEXT,
downloading_attempt_count INTEGER NOT NULL DEFAULT 0,
processed_at TEXT,
last_processing_attempt_at TEXT,
processing_attempt_count INTEGER NOT NULL DEFAULT 0,
previous_file_name TEXT
);
Indexes¶
CREATE INDEX idx_processed_status_time
ON processed_file (status, datetime_of_observation);
Notes¶
raw_file_nameis the primary key- Separate tracking for download and processing retries
- Immutable state transitions enforced at domain level
- Supports idempotent re-execution
- Find detailed of State lifecycle here