descriptive Module#

Descriptive statistics for epidemiological data.

This module provides functions for calculating confidence intervals for proportions, means, and other descriptive statistics commonly used in epidemiological analysis.

Classes#

class episia.stats.descriptive.CI_Method(value)[source]#

Bases: Enum

Methods for calculating confidence intervals.

AGRESTI_COULL = 'agresti_coull'#
CLOPPER_PEARSON = 'clopper_pearson'#
DELTA = 'delta'#
JEFFREYS = 'jeffreys'#
WALD = 'wald'#
WILSON = 'wilson'#
class episia.stats.descriptive.ProportionResult(proportion, ci_lower, ci_upper, sample_size, numerator, denominator, method)[source]#

Bases: object

Rich result object for proportion calculations.

Parameters:
__repr__()[source]#

Return repr(self).

Return type:

str

ci_lower: float#
ci_upper: float#
denominator: int#
method: str#
numerator: int#
proportion: float#
sample_size: int#
to_dict()[source]#

Convert result to dictionary.

Return type:

Dict

class episia.stats.descriptive.MeanResult(mean, ci_lower, ci_upper, sample_size, std_dev, method)[source]#

Bases: object

Rich result object for mean calculations.

Parameters:
__repr__()[source]#

Return repr(self).

Return type:

str

ci_lower: float#
ci_upper: float#
mean: float#
method: str#
sample_size: int#
std_dev: float#
to_dict()[source]#

Convert result to dictionary.

Return type:

Dict

Functions#

episia.stats.descriptive.proportion_ci(numerator=None, denominator=None, method=CI_Method.WILSON, confidence=0.95, *, k=None, n=None, **kwargs)[source]#

Wrapper accepting both proportion_ci(45, 200) and proportion_ci(k=45, n=200).

Parameters:
Return type:

ProportionResult

episia.stats.descriptive.mean_ci(data, confidence=0.95, method='t_distribution', population_std=None)[source]#

Calculate mean with confidence interval.

Parameters:
  • data (ndarray) – Array-like numeric data

  • confidence (float) – Confidence level (default: 0.95)

  • method (str) – ‘t_distribution’ (small samples) or ‘normal’ (large samples)

  • population_std (float | None) – Known population standard deviation (optional)

Returns:

MeanResult object

Return type:

MeanResult

Example

>>> data = np.array([1.2, 1.5, 1.8, 2.1, 1.9])
>>> result = mean_ci(data)
>>> print(result.mean)
episia.stats.descriptive.incidence_rate(cases, person_time, confidence=0.95, multiplier=1)[source]#

Calculate person-time incidence rate with confidence interval.

Uses Byar’s approximation when cases >= 10, exact Poisson (chi-squared) otherwise consistent with OpenEpi and Rothman & Greenland.

Parameters:
  • cases (int) – Number of incident cases.

  • person_time (float) – Total person-time at risk (any unit: years, months, days).

  • confidence (float) – Confidence level (default 0.95).

  • multiplier (int) – Scale factor for display, e.g. 100, 1_000, 100_000. Does not affect stored rate — only __repr__ scaling.

Returns:

IncidenceRateResult

Return type:

IncidenceRateResult

Example

>>> # HIV seroconversion cohort, 20 cases / 500 person-years
>>> result = incidence_rate(20, 500, multiplier=100)
>>> print(result)
Rate: 4.0000 (2.4423-6.1780) per 100 person-time
episia.stats.descriptive.attack_rate(cases, population, confidence=0.95)[source]#

Calculate attack rate (cumulative incidence) with CI.

Parameters:
  • cases (int) – Number of cases

  • population (int) – Population at risk

  • confidence (float) – Confidence level

Returns:

ProportionResult object

Return type:

ProportionResult

episia.stats.descriptive.prevalence(cases, population, confidence=0.95, method=CI_Method.WILSON)[source]#

Calculate point prevalence with confidence interval.

Parameters:
  • cases (int) – Number of prevalent cases (existing cases at time T).

  • population (int) – Total population examined.

  • confidence (float) – Confidence level (default 0.95).

  • method (CI_Method) – CI method (default Wilson).

Returns:

ProportionResult

Return type:

ProportionResult

Example

>>> # HTA survey, Burkina Faso STEPS 2013
>>> result = prevalence(1056, 4800)
>>> print(result)
Proportion: 0.2200 (0.2085-0.2319)
episia.stats.descriptive.median_ci(data, confidence=0.95, method='exact')[source]#

Calculate median with confidence interval.

Parameters:
  • data (ndarray) – Array-like numeric data

  • confidence (float) – Confidence level

  • method (str) – ‘exact’ or ‘normal_approximation’

Returns:

Dictionary with median and CI

Return type:

Dict[str, float]

episia.stats.descriptive.interquartile_range(data, return_quartiles=False)[source]#

Calculate interquartile range (IQR).

Parameters:
  • data (ndarray) – Array-like numeric data

  • return_quartiles (bool) – If True, returns Q1, Q3, and IQR

Returns:

IQR value or dictionary with quartiles

Return type:

float | Dict[str, float]

Examples#

Proportion confidence intervals:

from episia.stats.descriptive import proportion_ci, CI_Method

# Wilson interval (recommended)
prop = proportion_ci(45, 100, method=CI_Method.WILSON)
print(prop)  # Proportion: 0.4500 (0.354-0.549)

# Wald interval (large samples only)
prop_wald = proportion_ci(45, 100, method=CI_Method.WALD)

# Exact Clopper-Pearson (conservative)
prop_exact = proportion_ci(5, 10, method=CI_Method.CLOPPER_PEARSON)

Mean confidence intervals:

import numpy as np
from episia.stats.descriptive import mean_ci

data = np.array([23, 25, 27, 22, 24, 26, 28, 21, 23, 25])
mean_result = mean_ci(data, confidence=0.95)
print(mean_result)  # Mean: 24.4000 (22.825-25.975)

Incidence rates:

from episia.stats.descriptive import incidence_rate

# 10 cases over 1000 person-years
ir = incidence_rate(cases=10, person_time=1000)
print(f"Incidence rate: {ir['rate']:.4f} per person-year")
print(f"95% CI: {ir['ci_lower']:.4f}-{ir['ci_upper']:.4f}")

Median with confidence interval:

median_result = median_ci(data, method='exact')
print(f"Median: {median_result['median']:.1f} "
      f"({median_result['ci_lower']:.1f}-{median_result['ci_upper']:.1f})")