Skip to content

Database Schema

This document defines the relational schema used in Comet Hunter.
The system is built on SQLite with a focus on deterministic behavior, idempotent ingestion, and efficient time-based querying.


Tables Overview

Table Name Purpose
downlink_slot Tracks scheduled communication windows for data availability
file_metadata Stores metadata of available LASCO files (from C2 and C3 coronagraphs)
processed_file Tracks lifecycle of each file through ingestion and processing

Represents scheduled downlink windows during which image data is expected.

Schema

CREATE TABLE downlink_slot (
    wk INTEGER NOT NULL,
    doy INTEGER NOT NULL,
    wdy TEXT NOT NULL,
    bot_utc TEXT NOT NULL,
    eot_utc TEXT NOT NULL,
    ant TEXT,
    status TEXT NOT NULL CHECK(status IN ('MISSED','PENDING','ACTIVE','DONE')),
    PRIMARY KEY (wk, doy, wdy, bot_utc)
);

Indexes

CREATE INDEX idx_schedule_status_time
ON downlink_slot (status, bot_utc)

Notes

  • Composite primary key ensures uniqueness per slot
  • bot_utc and eot_utc are stored in ISO 8601 UTC format
  • Status lifecycle (ideal): PENDING → ACTIVE → DONE
  • Indexed for fast retrieval of active and upcoming slots

2. file_metadata

Stores metadata parsed from LASCO sources before file download

Schema

CREATE TABLE file_metadata (
    raw_file_name TEXT PRIMARY KEY,
    raw_file_hash TEXT,
    datetime_of_observation TEXT NOT NULL,
    last_modified_utc TEXT NOT NULL,
    instrument TEXT NOT NULL,
    exposure_time REAL NOT NULL,
    width INTEGER NOT NULL,
    height INTEGER NOT NULL,
    roll REAL NOT NULL
);

Indexes

CREATE INDEX idx_file_metadata_observation_time
ON file_metadata (instrument, datetime_of_observation);

CREATE INDEX idx_file_metadata_modified_time
ON file_metadata (instrument, last_modified_utc);

CREATE INDEX idx_file_metadata_name
ON file_metadata (raw_file_name);

Notes

  • raw_file_name is the identifier
  • last_modified_utc is used only for ingestion
  • datetime_of_observation is used for chronological playback
  • instrument values: C2,C3
  • Hash values are optional and maybe populated later

3. processed_file

Tracks full lifecycle of a file from discovery to processing

Schema

CREATE TABLE processed_file (
    raw_file_name TEXT PRIMARY KEY,
    raw_file_hash TEXT UNIQUE,
    raw_file_path TEXT UNIQUE NOT NULL,
    raw_file_size INTEGER,

    processed_file_name TEXT UNIQUE,
    processed_file_hash TEXT UNIQUE,
    processed_file_path TEXT UNIQUE,
    processed_file_size INTEGER,

    datetime_of_observation TEXT NOT NULL,
    instrument TEXT NOT NULL,
    status TEXT NOT NULL,

    error_message TEXT,

    downloaded_at TEXT,
    last_downloading_attempt_at TEXT,
    downloading_attempt_count INTEGER NOT NULL DEFAULT 0,

    processed_at TEXT,
    last_processing_attempt_at TEXT,
    processing_attempt_count INTEGER NOT NULL DEFAULT 0,
    previous_file_name TEXT
);

Indexes

CREATE INDEX idx_processed_status_time
ON processed_file (status, datetime_of_observation);

Notes

  • raw_file_name is the primary key
  • Separate tracking for download and processing retries
  • Immutable state transitions enforced at domain level
  • Supports idempotent re-execution
  • Find detailed of State lifecycle here