io Module#

Input/output functions for epidemiological data.

This module provides functions for reading and writing epidemiological data in various formats with automatic format detection and validation.

Functions#

episia.data.io.read_csv(path, low_memory=True, **kwargs)[source]#

Read CSV file into Dataset.

Parameters:
  • path (str | Path) – Path to CSV file

  • low_memory (bool) – Optimize memory usage

  • **kwargs – Additional arguments for pd.read_csv

Returns:

Dataset object

Return type:

Dataset

episia.data.io.read_excel(path, sheet_name=0, low_memory=True, **kwargs)[source]#

Read Excel file into Dataset.

Parameters:
  • path (str | Path) – Path to Excel file

  • sheet_name (str | int | List | None) – Sheet to read

  • low_memory (bool) – Optimize memory usage

  • **kwargs – Additional arguments for pd.read_excel

Returns:

Dataset object

Return type:

Dataset

episia.data.io.read_parquet(path, low_memory=True, **kwargs)[source]#

Read Parquet file into Dataset.

Parameters:
  • path (str | Path) – Path to Parquet file

  • low_memory (bool) – Optimize memory usage

  • **kwargs – Additional arguments for pd.read_parquet

Returns:

Dataset object

Return type:

Dataset

episia.data.io.from_pandas(df, low_memory=True)[source]#

Create Dataset from pandas DataFrame.

Parameters:
  • df (DataFrame) – pandas DataFrame

  • low_memory (bool) – Optimize memory usage

Returns:

Dataset object

Return type:

Dataset

episia.data.io.from_dict(data, low_memory=True, **kwargs)[source]#

Create Dataset from dictionary.

Parameters:
  • data (Dict) – Dictionary of data

  • low_memory (bool) – Optimize memory usage

  • **kwargs – Additional arguments for pd.DataFrame

Returns:

Dataset object

Return type:

Dataset

episia.data.io.from_records(records, low_memory=True, **kwargs)[source]#

Create Dataset from list of records.

Parameters:
  • records (List[Dict]) – List of dictionaries

  • low_memory (bool) – Optimize memory usage

  • **kwargs – Additional arguments for pd.DataFrame.from_records

Returns:

Dataset object

Return type:

Dataset

episia.data.io.read_surveillance_format(path, format_type='auto', low_memory=True, **kwargs)[source]#

Read surveillance data in standard formats.

Parameters:
  • path (str | Path) – Path to surveillance data file

  • format_type (str) – Format type (‘sidesp’, ‘who’, ‘ecdc’, ‘auto’)

  • low_memory (bool) – Optimize memory usage

  • **kwargs – Additional arguments

Returns:

Dataset object

Return type:

Dataset

episia.data.io.detect_format(path)[source]#

Detect file format from extension or content.

Parameters:

path (str | Path) – Path to file

Returns:

Detected format string

Return type:

str

episia.data.io.export_dataset(dataset, path, format='auto', **kwargs)[source]#

Export Dataset to file.

Parameters:
  • dataset (Dataset) – Dataset to export

  • path (str | Path) – Output path

  • format (str) – Output format (‘csv’, ‘excel’, ‘parquet’, ‘auto’)

  • **kwargs – Additional arguments for writer

Return type:

None

Examples#

Reading data:

from episia.data.io import read_csv, read_excel, from_pandas

# Read CSV
ds = read_csv("surveillance_data.csv")

# Read Excel
ds = read_excel("surveillance_data.xlsx", sheet_name="Weekly")

# Create from pandas DataFrame
import pandas as pd
df = pd.DataFrame({'cases': [10, 20, 30]})
ds = from_pandas(df)

# Create from dictionary
data = {'date': ['2023-01-01', '2023-01-02'], 'cases': [10, 15]}
ds = from_dict(data)

Exporting data:

from episia.data.io import export_dataset

# Export to CSV
export_dataset(ds, "output.csv")

# Export to Excel with options
export_dataset(ds, "output.xlsx", sheet_name="Results", index=False)

Format detection:

from episia.data.io import detect_format

fmt = detect_format("data.csv")  # Returns 'csv'
fmt = detect_format("data.xlsx")  # Returns 'excel'