validator Module#

Data validation functions for ensuring data quality in epidemiological analyses.

This module provides comprehensive validation functions to prevent common errors and ensure data meets required standards.

Exceptions#

exception episia.core.validator.ValidationError[source]#

Bases: ValueError

Custom exception for validation errors.

Functions#

episia.core.validator.validate_2x2_table(a, b, c, d, allow_zero=True)[source]#

Validate 2x2 contingency table values.

Parameters:
  • a (Any) – Table cell values

  • b (Any) – Table cell values

  • c (Any) – Table cell values

  • d (Any) – Table cell values

  • allow_zero (bool) – Whether zero values are allowed

Returns:

Validated integers

Raises:

ValidationError – If values are invalid

Return type:

Tuple[int, int, int, int]

episia.core.validator.validate_proportion(value, name='proportion', allow_boundary=True)[source]#

Validate that a value is a valid proportion (0-1).

Parameters:
  • value (Any) – Value to validate

  • name (str) – Name for error messages

  • allow_boundary (bool) – Whether 0 and 1 are allowed

Returns:

Validated proportion

Raises:

ValidationError – If value is invalid

Return type:

float

episia.core.validator.validate_confidence_level(confidence, name='confidence level')[source]#

Validate confidence level (0 < confidence < 1).

Parameters:
  • confidence (Any) – Confidence level to validate

  • name (str) – Name for error messages

Returns:

Validated confidence level

Raises:

ValidationError – If confidence is invalid

Return type:

float

episia.core.validator.validate_sample_size(n, name='sample size', min_size=1)[source]#

Validate sample size.

Parameters:
  • n (Any) – Sample size to validate

  • name (str) – Name for error messages

  • min_size (int) – Minimum allowed sample size

Returns:

Validated sample size

Raises:

ValidationError – If sample size is invalid

Return type:

int

episia.core.validator.validate_dataframe(df, required_columns=None, min_rows=1, allow_nan=False)[source]#

Validate pandas DataFrame for epidemiological analysis.

Parameters:
  • df (Any) – DataFrame to validate

  • required_columns (List[str] | None) – Columns that must be present

  • min_rows (int) – Minimum number of rows

  • allow_nan (bool) – Whether NaN values are allowed

Returns:

Validated DataFrame

Raises:

ValidationError – If DataFrame is invalid

Return type:

DataFrame

episia.core.validator.validate_binary_variable(series, name='binary variable')[source]#

Validate that a series contains only binary values (0/1 or True/False).

Parameters:
  • series (Any) – Series to validate

  • name (str) – Name for error messages

Returns:

Validated series

Raises:

ValidationError – If series is invalid

Return type:

Series

episia.core.validator.validate_date_series(dates, name='date series')[source]#

Validate date series for time series analysis.

Parameters:
  • dates (Any) – Dates to validate

  • name (str) – Name for error messages

Returns:

Validated DatetimeIndex

Raises:

ValidationError – If dates are invalid

Return type:

DatetimeIndex

episia.core.validator.validate_numeric_array(array, name='numeric array', min_length=1, allow_nan=False, allow_inf=False)[source]#

Validate numeric array.

Parameters:
  • array (Any) – Array to validate

  • name (str) – Name for error messages

  • min_length (int) – Minimum array length

  • allow_nan (bool) – Whether NaN values are allowed

  • allow_inf (bool) – Whether infinite values are allowed

Returns:

Validated numpy array

Raises:

ValidationError – If array is invalid

Return type:

ndarray

episia.core.validator.validate_model_parameters(params, required_params, param_types)[source]#

Validate model parameters.

Parameters:
  • params (Dict[str, Any]) – Parameter dictionary

  • required_params (List[str]) – Required parameter names

  • param_types (Dict[str, type]) – Expected types for parameters

Returns:

Validated parameters

Raises:

ValidationError – If parameters are invalid

Return type:

Dict[str, Any]

episia.core.validator.check_convergence(values, tolerance=1e-06, max_iterations=1000, iteration=0)[source]#

Check if iterative algorithm has converged.

Parameters:
  • values (ndarray) – Current values

  • tolerance (float) – Convergence tolerance

  • max_iterations (int) – Maximum allowed iterations

  • iteration (int) – Current iteration number

Returns:

True if converged

Raises:

ValidationError – If max iterations exceeded

Return type:

bool

episia.core.validator.validate_positive(value, name='value', strict=True)[source]#

Validate that a value is positive.

Parameters:
  • value (Any) – Value to validate

  • name (str) – Name for error messages

  • strict (bool) – Whether zero is allowed

Returns:

Validated positive value

Raises:

ValidationError – If value is not positive

Return type:

float

Examples#

Validating a 2x2 contingency table:

from episia.core.validator import validate_2x2_table

# Valid table
a, b, c, d = validate_2x2_table(40, 10, 20, 30)

# This would raise ValidationError
# validate_2x2_table(-1, 10, 20, 30)  # Negative value

Validating a proportion:

from episia.core.validator import validate_proportion

p = validate_proportion(0.75, name="attack rate")
# p = validate_proportion(1.2)  # Would raise error

Validating a DataFrame:

import pandas as pd
from episia.core.validator import validate_dataframe

df = pd.DataFrame({'cases': [10, 20, 30], 'date': ['2023-01-01', '2023-01-02', '2023-01-03']})
df = validate_dataframe(df, required_columns=['cases', 'date'])