Parsing API Reference

Import: from ecgdatakit import FileParser, parse_batch

FileParser

Auto-detect format and parse any supported ECG file

Parser

Base class for all ECG format parsers

parse_batch()

Parse multiple ECG files in parallel

FileParser

class ecgdatakit.parsing.parser.FileParser[source]

Bases: object

Auto-discovers parsers and dispatches files to the right one.

__init__()[source]
Return type:

None

property parsers: list[type[Parser]]

List of discovered Parser subclasses.

static supported_formats()[source]

Return a description of every supported ECG format.

Can be called without instantiation:

FileParser.supported_formats()

Each entry contains:

  • name – short format name (e.g. "HL7 aECG")

  • description – one-line description

  • extensions – list of typical file extensions

Return type:

list[dict[str, str | list[str]]]

parse(file_path, auto_scale=True, units='mV')[source]

Parse an ECG file, auto-detecting the format.

Parameters:
  • file_path (str | Path) – Path to the ECG file.

  • auto_scale (bool) – When True (default), leads with scaling metadata are automatically converted to physical units (see units). Leads without sufficient metadata are left as raw ADC values and a warning is emitted. Set to False to always receive raw ADC samples.

  • units (str) – Target voltage unit when auto_scale is True. Accepted values: "uV" (microvolts), "mV" (millivolts, default), "V" (volts). Ignored when auto_scale is False.

Raises:

ValueError – If no parser can handle the file or units is not recognised.

Return type:

ECGRecord

Parser

class ecgdatakit.parsing.parser.Parser[source]

Bases: ABC

Base class for all ECG format parsers.

FORMAT_NAME: str = ''
FORMAT_DESCRIPTION: str = ''
FILE_EXTENSIONS: list[str] = []
abstractmethod static can_parse(file_path, header)[source]

Check if this parser handles the given file.

Parameters:
  • file_path (Path) – Path to the ECG file.

  • header (bytes) – First 4096 bytes of the file for format sniffing.

Return type:

bool

abstractmethod parse(file_path)[source]

Parse the file and return a structured ECGRecord.

Return type:

ECGRecord

Parameters:

file_path (Path)

parse_batch

ecgdatakit.parsing.batch.parse_batch(files, max_workers=None)[source]

Parse multiple ECG files in parallel.

Parameters:
  • files (list[Path | str]) – Paths to ECG files.

  • max_workers (int | None) – Maximum number of worker processes. Defaults to CPU count.

Yields:

ECGRecord – Parsed records in the same order as the input files.

Return type:

Iterator[ECGRecord]

Examples

Basic usage

from ecgdatakit import FileParser

fp = FileParser()
record = fp.parse("ecg_file.xml")

Auto-scaling

auto_scale controls whether leads are automatically converted from raw ADC integers to millivolts. When True (default), leads with scaling metadata (resolution, offset, units) are converted to mV automatically. Leads without sufficient metadata are left as raw ADC values and a warning is emitted:

UserWarning: Leads ['Ch1', 'Ch2'] contain raw ADC samples — no scaling
metadata available. Pass auto_scale=False to get raw values.
# Default — leads with scaling metadata are converted to mV
record = fp.parse("ecg_file.xml")

# Raw ADC values, no conversion
record = fp.parse("ecg_file.xml", auto_scale=False)

Set to False to always receive raw ADC samples. See Signal Scaling for details on which formats provide scaling metadata and how to convert manually.

Listing supported formats

for fmt in FileParser.supported_formats():
    print(fmt["name"], fmt["extensions"])

Batch parsing

from ecgdatakit import parse_batch

records = list(parse_batch(file_list, max_workers=4))