Data Models¶

All models are Python dataclass instances defined in ecgdatakit.models.

Import: from ecgdatakit import ECGRecord, Lead, PatientInfo, RecordingInfo, ...

ECGRecord¶

class ecgdatakit.models.ECGRecord[source]¶

Bases: object

Unified ECG record returned by all parsers.

Every parser in ECGDataKit produces an ECGRecord. Use to_dict() or to_json() to obtain a format-agnostic, JSON-serialisable representation that is identical regardless of the original file format.

Samples are stored as raw ADC values by default. Call to_physical() to convert all leads to physical voltage units, then convert_units() to switch between uV, mV, or V.

__init__(patient=<factory>, recording=<factory>, leads=<factory>, interpretation=<factory>, measurements=<factory>, median_beats=<factory>, annotations=<factory>, source_format='', raw_metadata=<factory>)¶

Parameters:

patient (PatientInfo)
recording (RecordingInfo)
leads (list[Lead])
interpretation (Interpretation)
measurements (GlobalMeasurements)
median_beats (list[Lead])
annotations (dict[str, str])
source_format (str)
raw_metadata (dict)

Return type:

None

patient: PatientInfo¶: Patient demographics.

recording: RecordingInfo¶: Recording session metadata (includes device and acquisition setup).

leads: list[Lead]¶: ECG lead waveforms.

interpretation: Interpretation¶: Machine or physician interpretation.

measurements: GlobalMeasurements¶: Global ECG interval/axis measurements.

median_beats: list[Lead]¶: Median/template beats if available.

annotations: dict[str, str]¶: Additional key-value annotations.

source_format: str = ''¶: Parser identifier (e.g. "hl7_aecg", "dicom").

raw_metadata: dict¶: Original format-specific metadata from the source file.

to_physical()[source]¶

Convert all leads and median beats from raw ADC to physical units.

Returns a new ECGRecord where every Lead has is_raw=False. Leads already in physical units are unchanged.

Return type:: ECGRecord

convert_units(target)[source]¶

Convert all leads and median beats to the specified voltage unit.

Parameters:: target (str) – Target unit ("uV", "mV", "V").
Raises:: RawSamplesError – If any lead is still raw ADC.
Return type:: ECGRecord

plot(show=True, rows=None, cols=None, **kwargs)[source]¶

Plot the ECG record with patient/device header and all leads.

Parameters:

show (bool) – Display the plot immediately (default True).
rows (int | None) – Number of rows in the subplot grid.
cols (int | None) – Number of columns in the subplot grid.
**kwargs – Extra arguments forwarded to the underlying plot function (e.g. figsize, x_axis).

to_dict(include_samples=True)[source]¶

Convert the record to the unified JSON schema.

Parameters:: include_samples (bool) – If True (default), each lead contains its full sample array. Set to False for metadata-only export.
Return type:: dict

to_json(include_samples=True, indent=2)[source]¶

Serialise the record to a JSON string.

Parameters:

include_samples (bool) – Include full sample arrays (default True).
indent (int | None) – JSON indentation level. None for compact output.

Return type:

str

Lead¶

class ecgdatakit.models.Lead[source]¶

Bases: object

Single ECG lead with signal data.

Resolution and scaling

ECG file formats store a raw ADC resolution value in format-specific units (e.g. nV/count for ISHNE and SCP-ECG, µV/count for Sierra XML). The parser converts this to a normalised scale factor stored in resolution, expressed in the unit given by units:

physical_value = samples * resolution + offset   (in ``units``)

The original, unconverted value from the file is preserved in adc_resolution for reference.

Example — ISHNE file with ampl_res = 153 (nV/count):

adc_resolution = 153.0 — raw file value (nV/count)
resolution = 0.153 — converted: 153 / 1000 (µV/count)
units = "uV"

Auto-detection of is_raw

Parsers set is_raw automatically. When resolution == 1.0 and offset == 0.0 the samples are already in physical units (is_raw=False); otherwise they are raw ADC counts (is_raw=True) that need scaling via to_physical().

__init__(label, samples, sampling_rate, resolution=1.0, resolution_unit='', offset=0.0, units='', is_raw=True, adc_resolution=0.0, adc_resolution_unit='', quality=None, transducer='', prefiltering='', annotations=<factory>)¶

Parameters:

label (str)
samples (ndarray[tuple[Any, ...], dtype[float64]])
sampling_rate (int)
resolution (float)
resolution_unit (str)
offset (float)
units (str)
is_raw (bool)
adc_resolution (float)
adc_resolution_unit (str)
quality (int | None)
transducer (str)
prefiltering (str)
annotations (dict[str, str])

Return type:

None

label: str¶: Lead name (e.g. "I", "V1").

samples: ndarray[tuple[Any, ...], dtype[float64]]¶: Signal sample values (raw ADC or physical, depending on is_raw).

sampling_rate: int¶: Samples per second (Hz).

resolution: float = 1.0¶: Normalised scale factor for ADC-to-physical conversion, in the unit given by resolution_unit. Computed from adc_resolution by the parser (e.g. adc_resolution / 1000 for nV → µV). Used by to_physical(): physical = samples * resolution + offset.

resolution_unit: str = ''¶: Unit of the resolution scale factor (e.g. "uV", "mV"). After to_physical(), the resulting samples are in this unit. Set by the parser based on the format specification.

offset: float = 0.0¶: Additive offset for ADC-to-physical conversion (default 0.0). Used by to_physical(): physical = samples * resolution + offset.

units: str = ''¶: Current unit of samples. Empty when is_raw=True (samples are dimensionless ADC counts). Set to the physical unit after to_physical() or convert_units() is called (e.g. "uV", "mV").

is_raw: bool = True¶: True if samples are raw ADC counts needing scaling, False if samples are already in physical units. Parsers set this automatically: is_raw = not (resolution == 1.0 and offset == 0.0).

adc_resolution: float = 0.0¶: Original ADC resolution exactly as stored in the source file, before any unit conversion. For example, ISHNE stores nV/count and SCP-ECG stores nV/unit — this field preserves that raw value (e.g. 153.0 for 153 nV/count). The converted value used for scaling is in resolution.

adc_resolution_unit: str = ''¶: Unit of adc_resolution as defined by the source format (e.g. "nV" for ISHNE and SCP-ECG).

quality: int | None = None¶: Signal quality indicator (format-specific).

transducer: str = ''¶: Transducer type.

prefiltering: str = ''¶: Pre-filtering description.

annotations: dict[str, str]¶: Per-lead measurements/annotations (format-specific key-value pairs).

to_physical()[source]¶

Convert raw ADC samples to physical voltage units.

Applies physical = samples * resolution + offset and returns a new Lead with is_raw=False. If this lead is already in physical units, returns self unchanged.

Raises:: ValueError – If resolution is zero (conversion undefined).
Return type:: Lead

convert_units(target)[source]¶

Convert between physical voltage units (uV, mV, V).

Parameters:

target (str) – Target unit string ("uV", "mV", "V" and common aliases like "µV").

Returns:

A new Lead with samples scaled to target.

Return type:

Lead

Raises:

RawSamplesError – If samples are still raw ADC (is_raw=True).
ValueError – If the current or target unit is not a recognized voltage unit.

to_dict(include_samples=True)[source]¶

Convert to a JSON-serialisable dictionary.

Parameters:: include_samples (bool) – If True (default), include the full sample array. Set to False for a lightweight summary.
Return type:: dict

PatientInfo¶

class ecgdatakit.models.PatientInfo[source]¶

Bases: object

Patient demographic information.

patient_id: str = ''¶: Patient identifier.

first_name: str = ''¶: First name.

last_name: str = ''¶: Last name.

birth_date: datetime | None = None¶: Date of birth.

sex: str = ''¶: Sex ("M", "F", or "U").

race: str = ''¶: Race/ethnicity.

age: int | None = None¶: Age in years.

weight: float | None = None¶: Weight in kg.

height: float | None = None¶: Height in cm.

medications: list[str]¶: Current medications.

clinical_history: str = ''¶: Clinical history notes.

to_dict()[source]¶

Convert to a JSON-serialisable dictionary.

Return type:: dict

__init__(patient_id='', first_name='', last_name='', birth_date=None, sex='', race='', age=None, weight=None, height=None, medications=<factory>, clinical_history='')¶

Parameters:

patient_id (str)
first_name (str)
last_name (str)
birth_date (datetime | None)
sex (str)
race (str)
age (int | None)
weight (float | None)
height (float | None)
medications (list[str])
clinical_history (str)

Return type:

None

RecordingInfo¶

class ecgdatakit.models.RecordingInfo[source]¶

Bases: object

Recording session metadata.

__init__(date=None, end_date=None, duration=None, technician='', referring_physician='', room='', location='', device=<factory>, acquisition=<factory>)¶

Parameters:

date (datetime | None)
end_date (datetime | None)
duration (timedelta | None)
technician (str)
referring_physician (str)
room (str)
location (str)
device (DeviceInfo)
acquisition (AcquisitionSetup)

Return type:

None

date: datetime | None = None¶: Recording start time.

end_date: datetime | None = None¶: Recording end time.

duration: timedelta | None = None¶: Recording duration.

technician: str = ''¶: Technician name.

referring_physician: str = ''¶: Referring physician name.

room: str = ''¶: Room identifier.

location: str = ''¶: Facility/location.

device: DeviceInfo¶: Acquisition device info.

acquisition: AcquisitionSetup¶: Signal acquisition setup (signal characteristics + filters).

to_dict()[source]¶

Convert to a JSON-serialisable dictionary.

Return type:: dict

DeviceInfo¶

class ecgdatakit.models.DeviceInfo[source]¶

Bases: object

Acquisition device metadata.

manufacturer: str = ''¶: Device manufacturer.

model: str = ''¶: Device model name.

name: str = ''¶: Device name (distinct from model, when available).

serial_number: str = ''¶: Device serial number.

software_version: str = ''¶: Software version.

institution: str = ''¶: Institution name.

department: str = ''¶: Department name.

acquisition_type: str = ''¶: Acquisition type.

to_dict()[source]¶

Convert to a JSON-serialisable dictionary.

Return type:: dict

__init__(manufacturer='', model='', name='', serial_number='', software_version='', institution='', department='', acquisition_type='')¶

Parameters:

manufacturer (str)
model (str)
name (str)
serial_number (str)
software_version (str)
institution (str)
department (str)
acquisition_type (str)

Return type:

None

FilterSettings¶

class ecgdatakit.models.FilterSettings[source]¶

Bases: object

Signal filtering applied during acquisition or processing.

highpass: float | None = None¶: Highpass cutoff frequency (Hz).

lowpass: float | None = None¶: Lowpass cutoff frequency (Hz).

notch: float | None = None¶: Notch filter frequency (Hz).

notch_active: bool | None = None¶: Whether notch filter is active.

artifact_filter: bool | None = None¶: Whether artifact filter is active.

to_dict()[source]¶

Convert to a JSON-serialisable dictionary.

Return type:: dict

__init__(highpass=None, lowpass=None, notch=None, notch_active=None, artifact_filter=None)¶

Parameters:

highpass (float | None)
lowpass (float | None)
notch (float | None)
notch_active (bool | None)
artifact_filter (bool | None)

Return type:

None

AcquisitionSetup¶

class ecgdatakit.models.AcquisitionSetup[source]¶

Bases: object

Signal acquisition configuration: characteristics and filter settings.

__init__(signal=<factory>, filters=<factory>)¶

Parameters:

signal (SignalCharacteristics)
filters (FilterSettings)

Return type:

None

signal: SignalCharacteristics¶: Technical signal encoding and acquisition metadata.

filters: FilterSettings¶: Filter settings applied during acquisition.

to_dict()[source]¶

Convert to a JSON-serialisable dictionary.

Return type:: dict

SignalCharacteristics¶

class ecgdatakit.models.SignalCharacteristics[source]¶

Bases: object

Technical signal encoding and acquisition metadata.

sampling_rate: int = 0¶: Samples per second (Hz).

resolution: float = 0.0¶: ADC resolution factor (e.g. µV per count).

bits_per_sample: int | None = None¶: Bits per sample (e.g. 16, 12, 32).

signal_offset: int | None = None¶: ADC zero/offset value.

signal_signed: bool | None = None¶: Whether samples are signed.

number_channels_allocated: int | None = None¶: Total channels in the file.

number_channels_valid: int | None = None¶: Channels successfully parsed.

electrode_placement: str = ''¶: Electrode placement code.

compression: str = ''¶: Compression method (e.g. "none", "huffman").

data_encoding: str = ''¶: Data encoding (e.g. "base64_int16le", "int16", "format_212").

acsetting: int | None = None¶: AC setting code.

filtered: bool | None = None¶: Whether data was pre-filtered.

downsampled: bool | None = None¶: Whether data was downsampled.

upsampled: bool | None = None¶: Whether data was upsampled.

waveform_modified: bool | None = None¶: Whether waveform was modified.

__init__(sampling_rate=0, resolution=0.0, bits_per_sample=None, signal_offset=None, signal_signed=None, number_channels_allocated=None, number_channels_valid=None, electrode_placement='', compression='', data_encoding='', acsetting=None, filtered=None, downsampled=None, upsampled=None, waveform_modified=None, downsampling_method='', upsampling_method='')¶

Parameters:

sampling_rate (int)
resolution (float)
bits_per_sample (int | None)
signal_offset (int | None)
signal_signed (bool | None)
number_channels_allocated (int | None)
number_channels_valid (int | None)
electrode_placement (str)
compression (str)
data_encoding (str)
acsetting (int | None)
filtered (bool | None)
downsampled (bool | None)
upsampled (bool | None)
waveform_modified (bool | None)
downsampling_method (str)
upsampling_method (str)

Return type:

None

downsampling_method: str = ''¶: Downsampling method description.

upsampling_method: str = ''¶: Upsampling method description.

to_dict()[source]¶

Convert to a JSON-serialisable dictionary.

Return type:: dict

Interpretation¶

class ecgdatakit.models.Interpretation[source]¶

Bases: object

Machine or physician ECG interpretation.

statements: list[tuple[str, str]]¶

Interpretation text statements as (left, right) tuples.

Each tuple contains a primary statement and an optional qualifier. For formats without a left/right distinction the qualifier is "".

severity: str = ''¶: Severity ("NORMAL", "ABNORMAL", "BORDERLINE").

source: str = ''¶: Source ("machine", "overread", "confirmed").

interpreter: str = ''¶: Physician name (if overread).

interpretation_date: datetime | None = None¶: When interpretation was made.

to_dict()[source]¶

Convert to a JSON-serialisable dictionary.

Return type:: dict

__init__(statements=<factory>, severity='', source='', interpreter='', interpretation_date=None)¶

Parameters:

statements (list[tuple[str, str]])
severity (str)
source (str)
interpreter (str)
interpretation_date (datetime | None)

Return type:

None

GlobalMeasurements¶

class ecgdatakit.models.GlobalMeasurements[source]¶

Bases: object

Global ECG interval and axis measurements.

heart_rate: int | None = None¶: Heart rate (bpm).

rr_interval: int | None = None¶: RR interval (ms).

pr_interval: int | None = None¶: PR interval (ms).

qrs_duration: int | None = None¶: QRS duration (ms).

qt_interval: int | None = None¶: QT interval (ms).

qtc_bazett: int | None = None¶: QTc Bazett (ms).

qtc_fridericia: int | None = None¶: QTc Fridericia (ms).

p_axis: int | None = None¶: P-wave axis (degrees).

qrs_axis: int | None = None¶: QRS axis (degrees).

t_axis: int | None = None¶: T-wave axis (degrees).

qrs_count: int | None = None¶: Total QRS count.

to_dict()[source]¶

Convert to a JSON-serialisable dictionary.

Return type:: dict

__init__(heart_rate=None, rr_interval=None, pr_interval=None, qrs_duration=None, qt_interval=None, qtc_bazett=None, qtc_fridericia=None, p_axis=None, qrs_axis=None, t_axis=None, qrs_count=None)¶

Parameters:

heart_rate (int | None)
rr_interval (int | None)
pr_interval (int | None)
qrs_duration (int | None)
qt_interval (int | None)
qtc_bazett (int | None)
qtc_fridericia (int | None)
p_axis (int | None)
qrs_axis (int | None)
t_axis (int | None)
qrs_count (int | None)

Return type:

None

Resolution Pipeline (ADC → Physical Units)¶

ECG hardware digitises analogue signals into integer ADC counts. The Lead dataclass carries the metadata needed to convert those counts back to physical voltage values.

Fields involved¶

Field	Example	Meaning
`adc_resolution`	`153.0`	Raw value from the file (e.g. 153 nV/count for ISHNE)
`adc_resolution_unit`	`"nV"`	Unit of `adc_resolution` as defined by the source format
`resolution`	`0.153`	Scale factor normalised to `resolution_unit` (153 nV → 0.153 µV)
`resolution_unit`	`"uV"`	Unit of the `resolution` scale factor — the unit samples will be in after `to_physical()`
`offset`	`0.0`	Additive offset: `physical = samples × resolution + offset`
`units`	`""` or `"uV"`	Current unit of `samples`. Empty when `is_raw=True`; set after conversion
`is_raw`	`True` / `False`	`True` → samples are dimensionless ADC counts. `False` → already in `units`

Conversion formula¶

physical_value = samples × resolution + offset

Auto-detection by parsers¶

Parsers compute is_raw automatically:

is_raw = not (resolution == 1.0 and offset == 0.0)

If resolution is 1.0 and offset is 0.0, the data is already in physical units — no scaling is needed, and units is set directly. Otherwise, units stays empty until to_physical() is called.

Example: ISHNE Holter (153 nV/count)¶

record = FileParser().parse("holter.ecg", auto_scale=False)
lead = record.leads[0]
# lead.adc_resolution      → 153.0        (raw file value)
# lead.adc_resolution_unit → "nV"         (file stores nV/count)
# lead.resolution           → 0.153       (153 nV ÷ 1000 = 0.153 µV)
# lead.resolution_unit      → "uV"        (resolution is in µV/count)
# lead.units                → ""          (raw ADC, no unit yet)
# lead.is_raw               → True

physical = lead.to_physical()
# physical.samples → original × 0.153
# physical.units   → "uV"
# physical.is_raw  → False

in_mv = physical.convert_units("mV")
# in_mv.units → "mV"

Using `auto_scale`¶

FileParser().parse(path, auto_scale=True) (default) calls to_physical() then convert_units("mV") automatically on every lead that has scaling metadata.

Working with Data Models¶

ECGDataKit functions accept both Lead objects and raw numpy arrays. When passing a numpy array, provide the sample rate via fs.

Using numpy arrays¶

import numpy as np
from ecgdatakit.processing import diagnostic_filter, detect_r_peaks
from ecgdatakit.plotting import plot_lead

signal = np.array([0.12, 0.15, 0.13, ...], dtype=np.float64)

filtered = diagnostic_filter(signal, fs=500)
peaks = detect_r_peaks(filtered)
fig = plot_lead(filtered, peaks=peaks)

Note: fs is required when passing a numpy array and will raise a TypeError if omitted. When passing a Lead, fs is ignored.

Using Lead objects¶

from ecgdatakit import Lead

lead = Lead(
    label="II",
    samples=samples,
    sampling_rate=500,
    units="mV",
    is_raw=False,
)

# No need for fs= when using Lead objects
filtered = diagnostic_filter(lead)

Extracting numpy arrays¶

raw_array = lead.samples     # NDArray[np.float64]
fs = lead.sampling_rate        # int (Hz)

Building a Lead from external data¶

import numpy as np
from ecgdatakit import Lead

# Synthetic sine wave (10 s at 500 Hz)
fs = 500
t = np.arange(fs * 10, dtype=np.float64) / fs
signal = np.sin(2 * np.pi * 1.2 * t)

lead = Lead(label="II", samples=signal, sampling_rate=fs, units="mV", is_raw=False)

# From a pandas DataFrame
import pandas as pd

df = pd.read_csv("ecg_data.csv")
lead = Lead(
    label="V1",
    samples=df["voltage"].to_numpy(dtype=np.float64),
    sampling_rate=250,
    units="mV",
    is_raw=False,
)

Building an ECGRecord from scratch¶

from ecgdatakit import ECGRecord, Lead, PatientInfo, RecordingInfo
import numpy as np

leads = [
    Lead(label=name, samples=np.random.randn(5000).astype(np.float64),
         sampling_rate=500, units="mV", is_raw=False)
    for name in ["I", "II", "III", "aVR", "aVL", "aVF",
                 "V1", "V2", "V3", "V4", "V5", "V6"]
]

rec = RecordingInfo()
rec.acquisition.signal.sampling_rate = 500

record = ECGRecord(
    patient=PatientInfo(patient_id="001", first_name="Jane", last_name="Doe"),
    recording=rec,
    leads=leads,
)

All fields are optional with sensible defaults.

Data Models¶

ECGRecord¶

Lead¶

PatientInfo¶

RecordingInfo¶

DeviceInfo¶

FilterSettings¶

AcquisitionSetup¶

SignalCharacteristics¶

Interpretation¶

GlobalMeasurements¶

Resolution Pipeline (ADC → Physical Units)¶

Fields involved¶

Conversion formula¶

Auto-detection by parsers¶

Example: ISHNE Holter (153 nV/count)¶

Using auto_scale¶

Working with Data Models¶

Using numpy arrays¶

Using Lead objects¶

Extracting numpy arrays¶

Building a Lead from external data¶

Building an ECGRecord from scratch¶

Using `auto_scale`¶