regression Module#
Regression models for epidemiological analysis.
This module provides functions for fitting and interpreting regression models commonly used in epidemiology, including logistic regression for binary outcomes and Poisson regression for count data.
Classes#
- class episia.stats.regression.RegressionType(value)[source]#
Bases:
EnumTypes of regression models.
- LINEAR = 'linear'#
- LOGISTIC = 'logistic'#
- POISSON = 'poisson'#
- class episia.stats.regression.ModelSelection(value)[source]#
Bases:
EnumModel selection criteria.
- AIC = 'aic'#
- BIC = 'bic'#
- LIKELIHOOD_RATIO = 'likelihood_ratio'#
- class episia.stats.regression.RegressionResult(coefficients, odds_ratios, ci_lower, ci_upper, p_values, variable_names, model_type, n_observations, log_likelihood, aic, bic, convergence, iterations)[source]#
Bases:
objectResult object for regression analysis.
- Parameters:
Functions#
- episia.stats.regression.logistic_regression(X, y, variable_names=None, add_intercept=True, method='irls', max_iter=100, tol=1e-06)[source]#
Fit logistic regression model for binary outcomes.
- Parameters:
X (ndarray) – Design matrix (n_samples, n_features)
y (ndarray) – Binary outcome (0 or 1)
variable_names (List[str] | None) – Names of predictor variables
add_intercept (bool) – Whether to add intercept term
method (str) – Fitting method (‘irls’ or ‘newton’)
max_iter (int) – Maximum iterations
tol (float) – Convergence tolerance
- Returns:
RegressionResult object
- Return type:
Example
>>> X = np.array([[1, 25], [1, 30], [1, 35], [0, 40]]) >>> y = np.array([1, 1, 0, 0]) >>> result = logistic_regression(X, y, ['exposed', 'age'])
- episia.stats.regression.poisson_regression(X, y, offset=None, variable_names=None, add_intercept=True, max_iter=100, tol=1e-06)[source]#
Fit Poisson regression model for count data.
- Parameters:
X (ndarray) – Design matrix
y (ndarray) – Count outcome (non-negative integers)
offset (ndarray | None) – Offset term (e.g., log(person-time))
variable_names (List[str] | None) – Names of predictor variables
add_intercept (bool) – Whether to add intercept term
max_iter (int) – Maximum iterations
tol (float) – Convergence tolerance
- Returns:
RegressionResult object
- Return type:
- episia.stats.regression.likelihood_ratio_test(model_full, model_reduced)[source]#
Perform likelihood ratio test between nested models.
- Parameters:
model_full (RegressionResult) – Full model (more parameters)
model_reduced (RegressionResult) – Reduced model (fewer parameters)
- Returns:
Dictionary with test statistics
- Return type:
- episia.stats.regression.hosmer_lemeshow_test(y_true, y_pred, n_groups=10)[source]#
Hosmer-Lemeshow goodness-of-fit test for logistic regression.
- episia.stats.regression.calculate_vif(X)[source]#
Calculate Variance Inflation Factors for multicollinearity detection.
- episia.stats.regression.stepwise_selection(X, y, model_type=RegressionType.LOGISTIC, direction='both', criterion=ModelSelection.AIC, max_vars=None)[source]#
Perform stepwise variable selection.
- Parameters:
X (ndarray) – Design matrix
y (ndarray) – Outcome
model_type (RegressionType) – Type of regression model
direction (str) – ‘forward’, ‘backward’, or ‘both’
criterion (ModelSelection) – Selection criterion
max_vars (int | None) – Maximum number of variables to include
- Returns:
Dictionary with selected model and steps
- Return type:
- episia.stats.regression.roc_auc_from_logistic(model, X, y)[source]#
Calculate ROC AUC from logistic regression model.
- Parameters:
model (RegressionResult) – Fitted logistic regression model
X (ndarray) – Design matrix (with intercept if model has it)
y (ndarray) – True outcomes
- Returns:
AUC value
- Return type:
Examples#
Logistic regression:
import numpy as np
from episia.stats.regression import logistic_regression
# Data: exposure, age, outcome
X = np.array([[1, 25], [1, 30], [1, 35], [0, 40], [0, 45], [0, 50]])
y = np.array([1, 1, 0, 0, 0, 1])
result = logistic_regression(
X, y,
variable_names=['exposed', 'age'],
add_intercept=True
)
print(result.summary())
# Extract odds ratios
for i, var in enumerate(result.variable_names):
print(f"{var}: OR={result.odds_ratios[i]:.2f} "
f"(95% CI: {result.ci_lower[i]:.2f}-{result.ci_upper[i]:.2f})")
Poisson regression:
from episia.stats.regression import poisson_regression
# Count data with offset (log person-time)
X = np.array([[1, 0], [1, 1], [0, 0], [0, 1]])
y = np.array([5, 12, 3, 8])
offset = np.log([100, 100, 100, 100]) # person-time
result = poisson_regression(
X, y, offset=offset,
variable_names=['exposed', 'age']
)
print(result.summary())
Likelihood ratio test:
from episia.stats.regression import likelihood_ratio_test
# Full model vs reduced model
lrt = likelihood_ratio_test(full_model, reduced_model)
print(f"LR test: χ²={lrt['lr_statistic']:.3f}, p={lrt['p_value']:.4f}")
Multicollinearity check:
vif = calculate_vif(X)
for var, value in vif.items():
print(f"{var}: VIF={value:.2f}")