Data Models

All models are Python dataclass instances defined in ecgdatakit.models.

Import: from ecgdatakit import ECGRecord, Lead, PatientInfo, RecordingInfo, ...

ECGRecord

class ecgdatakit.models.ECGRecord[source]

Bases: object

Unified ECG record returned by all parsers.

Every parser in ECGDataKit produces an ECGRecord. Use to_dict() or to_json() to obtain a format-agnostic, JSON-serialisable representation that is identical regardless of the original file format.

Samples are stored as raw ADC values by default. Call to_physical() to convert all leads to physical voltage units, then convert_units() to switch between uV, mV, or V.

__init__(patient=<factory>, recording=<factory>, leads=<factory>, interpretation=<factory>, measurements=<factory>, median_beats=<factory>, annotations=<factory>, source_format='', raw_metadata=<factory>)
Parameters:
Return type:

None

patient: PatientInfo

Patient demographics.

recording: RecordingInfo

Recording session metadata (includes device and acquisition setup).

leads: list[Lead]

ECG lead waveforms.

interpretation: Interpretation

Machine or physician interpretation.

measurements: GlobalMeasurements

Global ECG interval/axis measurements.

median_beats: list[Lead]

Median/template beats if available.

annotations: dict[str, str]

Additional key-value annotations.

source_format: str = ''

Parser identifier (e.g. "hl7_aecg", "dicom").

raw_metadata: dict

Original format-specific metadata from the source file.

to_physical()[source]

Convert all leads and median beats from raw ADC to physical units.

Returns a new ECGRecord where every Lead has is_raw=False. Leads already in physical units are unchanged.

Return type:

ECGRecord

convert_units(target)[source]

Convert all leads and median beats to the specified voltage unit.

Parameters:

target (str) – Target unit ("uV", "mV", "V").

Raises:

RawSamplesError – If any lead is still raw ADC.

Return type:

ECGRecord

plot(show=True, rows=None, cols=None, **kwargs)[source]

Plot the ECG record with patient/device header and all leads.

Parameters:
  • show (bool) – Display the plot immediately (default True).

  • rows (int | None) – Number of rows in the subplot grid.

  • cols (int | None) – Number of columns in the subplot grid.

  • **kwargs – Extra arguments forwarded to the underlying plot function (e.g. figsize, x_axis).

to_dict(include_samples=True)[source]

Convert the record to the unified JSON schema.

Parameters:

include_samples (bool) – If True (default), each lead contains its full sample array. Set to False for metadata-only export.

Return type:

dict

to_json(include_samples=True, indent=2)[source]

Serialise the record to a JSON string.

Parameters:
  • include_samples (bool) – Include full sample arrays (default True).

  • indent (int | None) – JSON indentation level. None for compact output.

Return type:

str

Lead

class ecgdatakit.models.Lead[source]

Bases: object

Single ECG lead with signal data.

Resolution and scaling

ECG file formats store a raw ADC resolution value in format-specific units (e.g. nV/count for ISHNE and SCP-ECG, µV/count for Sierra XML). The parser converts this to a normalised scale factor stored in resolution, expressed in the unit given by units:

physical_value = samples * resolution + offset   (in ``units``)

The original, unconverted value from the file is preserved in adc_resolution for reference.

Example — ISHNE file with ampl_res = 153 (nV/count):

  • adc_resolution = 153.0 — raw file value (nV/count)

  • resolution = 0.153 — converted: 153 / 1000 (µV/count)

  • units = "uV"

Auto-detection of is_raw

Parsers set is_raw automatically. When resolution == 1.0 and offset == 0.0 the samples are already in physical units (is_raw=False); otherwise they are raw ADC counts (is_raw=True) that need scaling via to_physical().

__init__(label, samples, sampling_rate, resolution=1.0, resolution_unit='', offset=0.0, units='', is_raw=True, adc_resolution=0.0, adc_resolution_unit='', quality=None, transducer='', prefiltering='', annotations=<factory>)
Parameters:
Return type:

None

label: str

Lead name (e.g. "I", "V1").

samples: ndarray[tuple[Any, ...], dtype[float64]]

Signal sample values (raw ADC or physical, depending on is_raw).

sampling_rate: int

Samples per second (Hz).

resolution: float = 1.0

Normalised scale factor for ADC-to-physical conversion, in the unit given by resolution_unit. Computed from adc_resolution by the parser (e.g. adc_resolution / 1000 for nV → µV). Used by to_physical(): physical = samples * resolution + offset.

resolution_unit: str = ''

Unit of the resolution scale factor (e.g. "uV", "mV"). After to_physical(), the resulting samples are in this unit. Set by the parser based on the format specification.

offset: float = 0.0

Additive offset for ADC-to-physical conversion (default 0.0). Used by to_physical(): physical = samples * resolution + offset.

units: str = ''

Current unit of samples. Empty when is_raw=True (samples are dimensionless ADC counts). Set to the physical unit after to_physical() or convert_units() is called (e.g. "uV", "mV").

is_raw: bool = True

True if samples are raw ADC counts needing scaling, False if samples are already in physical units. Parsers set this automatically: is_raw = not (resolution == 1.0 and offset == 0.0).

adc_resolution: float = 0.0

Original ADC resolution exactly as stored in the source file, before any unit conversion. For example, ISHNE stores nV/count and SCP-ECG stores nV/unit — this field preserves that raw value (e.g. 153.0 for 153 nV/count). The converted value used for scaling is in resolution.

adc_resolution_unit: str = ''

Unit of adc_resolution as defined by the source format (e.g. "nV" for ISHNE and SCP-ECG).

quality: int | None = None

Signal quality indicator (format-specific).

transducer: str = ''

Transducer type.

prefiltering: str = ''

Pre-filtering description.

annotations: dict[str, str]

Per-lead measurements/annotations (format-specific key-value pairs).

to_physical()[source]

Convert raw ADC samples to physical voltage units.

Applies physical = samples * resolution + offset and returns a new Lead with is_raw=False. If this lead is already in physical units, returns self unchanged.

Raises:

ValueError – If resolution is zero (conversion undefined).

Return type:

Lead

convert_units(target)[source]

Convert between physical voltage units (uV, mV, V).

Parameters:

target (str) – Target unit string ("uV", "mV", "V" and common aliases like "µV").

Returns:

A new Lead with samples scaled to target.

Return type:

Lead

Raises:
  • RawSamplesError – If samples are still raw ADC (is_raw=True).

  • ValueError – If the current or target unit is not a recognized voltage unit.

to_dict(include_samples=True)[source]

Convert to a JSON-serialisable dictionary.

Parameters:

include_samples (bool) – If True (default), include the full sample array. Set to False for a lightweight summary.

Return type:

dict

PatientInfo

class ecgdatakit.models.PatientInfo[source]

Bases: object

Patient demographic information.

patient_id: str = ''

Patient identifier.

first_name: str = ''

First name.

last_name: str = ''

Last name.

birth_date: datetime | None = None

Date of birth.

sex: str = ''

Sex ("M", "F", or "U").

race: str = ''

Race/ethnicity.

age: int | None = None

Age in years.

weight: float | None = None

Weight in kg.

height: float | None = None

Height in cm.

medications: list[str]

Current medications.

clinical_history: str = ''

Clinical history notes.

to_dict()[source]

Convert to a JSON-serialisable dictionary.

Return type:

dict

__init__(patient_id='', first_name='', last_name='', birth_date=None, sex='', race='', age=None, weight=None, height=None, medications=<factory>, clinical_history='')
Parameters:
Return type:

None

RecordingInfo

class ecgdatakit.models.RecordingInfo[source]

Bases: object

Recording session metadata.

__init__(date=None, end_date=None, duration=None, technician='', referring_physician='', room='', location='', device=<factory>, acquisition=<factory>)
Parameters:
Return type:

None

date: datetime | None = None

Recording start time.

end_date: datetime | None = None

Recording end time.

duration: timedelta | None = None

Recording duration.

technician: str = ''

Technician name.

referring_physician: str = ''

Referring physician name.

room: str = ''

Room identifier.

location: str = ''

Facility/location.

device: DeviceInfo

Acquisition device info.

acquisition: AcquisitionSetup

Signal acquisition setup (signal characteristics + filters).

to_dict()[source]

Convert to a JSON-serialisable dictionary.

Return type:

dict

DeviceInfo

class ecgdatakit.models.DeviceInfo[source]

Bases: object

Acquisition device metadata.

manufacturer: str = ''

Device manufacturer.

model: str = ''

Device model name.

name: str = ''

Device name (distinct from model, when available).

serial_number: str = ''

Device serial number.

software_version: str = ''

Software version.

institution: str = ''

Institution name.

department: str = ''

Department name.

acquisition_type: str = ''

Acquisition type.

to_dict()[source]

Convert to a JSON-serialisable dictionary.

Return type:

dict

__init__(manufacturer='', model='', name='', serial_number='', software_version='', institution='', department='', acquisition_type='')
Parameters:
  • manufacturer (str)

  • model (str)

  • name (str)

  • serial_number (str)

  • software_version (str)

  • institution (str)

  • department (str)

  • acquisition_type (str)

Return type:

None

FilterSettings

class ecgdatakit.models.FilterSettings[source]

Bases: object

Signal filtering applied during acquisition or processing.

highpass: float | None = None

Highpass cutoff frequency (Hz).

lowpass: float | None = None

Lowpass cutoff frequency (Hz).

notch: float | None = None

Notch filter frequency (Hz).

notch_active: bool | None = None

Whether notch filter is active.

artifact_filter: bool | None = None

Whether artifact filter is active.

to_dict()[source]

Convert to a JSON-serialisable dictionary.

Return type:

dict

__init__(highpass=None, lowpass=None, notch=None, notch_active=None, artifact_filter=None)
Parameters:
  • highpass (float | None)

  • lowpass (float | None)

  • notch (float | None)

  • notch_active (bool | None)

  • artifact_filter (bool | None)

Return type:

None

AcquisitionSetup

class ecgdatakit.models.AcquisitionSetup[source]

Bases: object

Signal acquisition configuration: characteristics and filter settings.

__init__(signal=<factory>, filters=<factory>)
Parameters:
Return type:

None

signal: SignalCharacteristics

Technical signal encoding and acquisition metadata.

filters: FilterSettings

Filter settings applied during acquisition.

to_dict()[source]

Convert to a JSON-serialisable dictionary.

Return type:

dict

SignalCharacteristics

class ecgdatakit.models.SignalCharacteristics[source]

Bases: object

Technical signal encoding and acquisition metadata.

sampling_rate: int = 0

Samples per second (Hz).

resolution: float = 0.0

ADC resolution factor (e.g. µV per count).

bits_per_sample: int | None = None

Bits per sample (e.g. 16, 12, 32).

signal_offset: int | None = None

ADC zero/offset value.

signal_signed: bool | None = None

Whether samples are signed.

number_channels_allocated: int | None = None

Total channels in the file.

number_channels_valid: int | None = None

Channels successfully parsed.

electrode_placement: str = ''

Electrode placement code.

compression: str = ''

Compression method (e.g. "none", "huffman").

data_encoding: str = ''

Data encoding (e.g. "base64_int16le", "int16", "format_212").

acsetting: int | None = None

AC setting code.

filtered: bool | None = None

Whether data was pre-filtered.

downsampled: bool | None = None

Whether data was downsampled.

upsampled: bool | None = None

Whether data was upsampled.

waveform_modified: bool | None = None

Whether waveform was modified.

__init__(sampling_rate=0, resolution=0.0, bits_per_sample=None, signal_offset=None, signal_signed=None, number_channels_allocated=None, number_channels_valid=None, electrode_placement='', compression='', data_encoding='', acsetting=None, filtered=None, downsampled=None, upsampled=None, waveform_modified=None, downsampling_method='', upsampling_method='')
Parameters:
  • sampling_rate (int)

  • resolution (float)

  • bits_per_sample (int | None)

  • signal_offset (int | None)

  • signal_signed (bool | None)

  • number_channels_allocated (int | None)

  • number_channels_valid (int | None)

  • electrode_placement (str)

  • compression (str)

  • data_encoding (str)

  • acsetting (int | None)

  • filtered (bool | None)

  • downsampled (bool | None)

  • upsampled (bool | None)

  • waveform_modified (bool | None)

  • downsampling_method (str)

  • upsampling_method (str)

Return type:

None

downsampling_method: str = ''

Downsampling method description.

upsampling_method: str = ''

Upsampling method description.

to_dict()[source]

Convert to a JSON-serialisable dictionary.

Return type:

dict

Interpretation

class ecgdatakit.models.Interpretation[source]

Bases: object

Machine or physician ECG interpretation.

statements: list[tuple[str, str]]

Interpretation text statements as (left, right) tuples.

Each tuple contains a primary statement and an optional qualifier. For formats without a left/right distinction the qualifier is "".

severity: str = ''

Severity ("NORMAL", "ABNORMAL", "BORDERLINE").

source: str = ''

Source ("machine", "overread", "confirmed").

interpreter: str = ''

Physician name (if overread).

interpretation_date: datetime | None = None

When interpretation was made.

to_dict()[source]

Convert to a JSON-serialisable dictionary.

Return type:

dict

__init__(statements=<factory>, severity='', source='', interpreter='', interpretation_date=None)
Parameters:
Return type:

None

GlobalMeasurements

class ecgdatakit.models.GlobalMeasurements[source]

Bases: object

Global ECG interval and axis measurements.

heart_rate: int | None = None

Heart rate (bpm).

rr_interval: int | None = None

RR interval (ms).

pr_interval: int | None = None

PR interval (ms).

qrs_duration: int | None = None

QRS duration (ms).

qt_interval: int | None = None

QT interval (ms).

qtc_bazett: int | None = None

QTc Bazett (ms).

qtc_fridericia: int | None = None

QTc Fridericia (ms).

p_axis: int | None = None

P-wave axis (degrees).

qrs_axis: int | None = None

QRS axis (degrees).

t_axis: int | None = None

T-wave axis (degrees).

qrs_count: int | None = None

Total QRS count.

to_dict()[source]

Convert to a JSON-serialisable dictionary.

Return type:

dict

__init__(heart_rate=None, rr_interval=None, pr_interval=None, qrs_duration=None, qt_interval=None, qtc_bazett=None, qtc_fridericia=None, p_axis=None, qrs_axis=None, t_axis=None, qrs_count=None)
Parameters:
  • heart_rate (int | None)

  • rr_interval (int | None)

  • pr_interval (int | None)

  • qrs_duration (int | None)

  • qt_interval (int | None)

  • qtc_bazett (int | None)

  • qtc_fridericia (int | None)

  • p_axis (int | None)

  • qrs_axis (int | None)

  • t_axis (int | None)

  • qrs_count (int | None)

Return type:

None

Resolution Pipeline (ADC → Physical Units)

ECG hardware digitises analogue signals into integer ADC counts. The Lead dataclass carries the metadata needed to convert those counts back to physical voltage values.

Fields involved

Field

Example

Meaning

adc_resolution

153.0

Raw value from the file (e.g. 153 nV/count for ISHNE)

adc_resolution_unit

"nV"

Unit of adc_resolution as defined by the source format

resolution

0.153

Scale factor normalised to resolution_unit (153 nV → 0.153 µV)

resolution_unit

"uV"

Unit of the resolution scale factor — the unit samples will be in after to_physical()

offset

0.0

Additive offset: physical = samples × resolution + offset

units

"" or "uV"

Current unit of samples. Empty when is_raw=True; set after conversion

is_raw

True / False

True → samples are dimensionless ADC counts. False → already in units

Conversion formula

physical_value = samples × resolution + offset

Auto-detection by parsers

Parsers compute is_raw automatically:

is_raw = not (resolution == 1.0 and offset == 0.0)

If resolution is 1.0 and offset is 0.0, the data is already in physical units — no scaling is needed, and units is set directly. Otherwise, units stays empty until to_physical() is called.

Example: ISHNE Holter (153 nV/count)

record = FileParser().parse("holter.ecg", auto_scale=False)
lead = record.leads[0]
# lead.adc_resolution      → 153.0        (raw file value)
# lead.adc_resolution_unit → "nV"         (file stores nV/count)
# lead.resolution           → 0.153       (153 nV ÷ 1000 = 0.153 µV)
# lead.resolution_unit      → "uV"        (resolution is in µV/count)
# lead.units                → ""          (raw ADC, no unit yet)
# lead.is_raw               → True

physical = lead.to_physical()
# physical.samples → original × 0.153
# physical.units   → "uV"
# physical.is_raw  → False

in_mv = physical.convert_units("mV")
# in_mv.units → "mV"

Using auto_scale

FileParser().parse(path, auto_scale=True) (default) calls to_physical() then convert_units("mV") automatically on every lead that has scaling metadata.

Working with Data Models

ECGDataKit functions accept both Lead objects and raw numpy arrays. When passing a numpy array, provide the sample rate via fs.

Using numpy arrays

import numpy as np
from ecgdatakit.processing import diagnostic_filter, detect_r_peaks
from ecgdatakit.plotting import plot_lead

signal = np.array([0.12, 0.15, 0.13, ...], dtype=np.float64)

filtered = diagnostic_filter(signal, fs=500)
peaks = detect_r_peaks(filtered)
fig = plot_lead(filtered, peaks=peaks)

Note: fs is required when passing a numpy array and will raise a TypeError if omitted. When passing a Lead, fs is ignored.

Using Lead objects

from ecgdatakit import Lead

lead = Lead(
    label="II",
    samples=samples,
    sampling_rate=500,
    units="mV",
    is_raw=False,
)

# No need for fs= when using Lead objects
filtered = diagnostic_filter(lead)

Extracting numpy arrays

raw_array = lead.samples     # NDArray[np.float64]
fs = lead.sampling_rate        # int (Hz)

Building a Lead from external data

import numpy as np
from ecgdatakit import Lead

# Synthetic sine wave (10 s at 500 Hz)
fs = 500
t = np.arange(fs * 10, dtype=np.float64) / fs
signal = np.sin(2 * np.pi * 1.2 * t)

lead = Lead(label="II", samples=signal, sampling_rate=fs, units="mV", is_raw=False)
# From a pandas DataFrame
import pandas as pd

df = pd.read_csv("ecg_data.csv")
lead = Lead(
    label="V1",
    samples=df["voltage"].to_numpy(dtype=np.float64),
    sampling_rate=250,
    units="mV",
    is_raw=False,
)

Building an ECGRecord from scratch

from ecgdatakit import ECGRecord, Lead, PatientInfo, RecordingInfo
import numpy as np

leads = [
    Lead(label=name, samples=np.random.randn(5000).astype(np.float64),
         sampling_rate=500, units="mV", is_raw=False)
    for name in ["I", "II", "III", "aVR", "aVL", "aVF",
                 "V1", "V2", "V3", "V4", "V5", "V6"]
]

rec = RecordingInfo()
rec.acquisition.signal.sampling_rate = 500

record = ECGRecord(
    patient=PatientInfo(patient_id="001", first_name="Jane", last_name="Doe"),
    recording=rec,
    leads=leads,
)

All fields are optional with sensible defaults.