Automated Bayesian-frequentist statistics and publication-ready reports

From raw data to APA-formatted PDF, DOCX, and HTML — now with 15+ data formats, AI-powered chat, and autonomous robustness checks.

Install from PyPI Read the Docs GitHub

Terminal

$ pip install statforge[full]

Collecting statforge[full]...

✔ Successfully installed statforge-0.2.0

$ statforge run experiment_data.csv

▸ DataLoader done

▸ AssumptionChecker done

▸ MethodSelector done

▸ ModelFitter done

▸ ResultFormatter done

▸ ReportBuilder done

✔ report.pdf generated (APA7 format)

Architecture

The Six-Stage Asynchronous Pipeline

Each stage executes via asyncio.to_thread, streaming real-time progress events to the CLI while offloading CPU-bound computation to separate threads.

📂

DataLoader

15+ formats: CSV, TSV, JSON, Excel, Parquet, Feather, SPSS, Stata, SAS, HDF5, SQLite, and remote URLs. Automatic format detection with lazy dependency loading.

🔬

AssumptionChecker

Shapiro-Wilk normality and Levene homoscedasticity tests with borderline detection (0.04 < p < 0.06). Results cached via joblib.Memory keyed on SHA-256 data hashes.

🧭

MethodSelector

Decision-tree mapping group count + assumption results to a ranked test list. Routes to t_test, mann_whitney, anova, kruskal_wallis, or regression.

⚙️

ModelFitter

Plugin registry dispatching to registered models via the @register decorator. Ships with frequentist (ANOVA, T-Test) and Bayesian (PyMC T-Test, ANOVA, Regression) models.

📊

ResultFormatter

APA7-compliant table builder with effect sizes (η², Cohen's d, R²). Formats p-values per APA convention (< .001 or exact to three decimals).

📄

ReportBuilder

Jinja2 template orchestrator generating PDF, DOCX, or HTML output. Supports pluggable journal templates (APA7, Vancouver, IEEE, Nature).

What's New

v0.2.0 — Major Update

Three headline features expand StatForge from a pipeline runner into a full research companion.

📂

15+ Data Formats

Load CSV, TSV, JSON, Excel, Parquet, Feather, SPSS, Stata, SAS, HDF5, SQLite databases, and remote URLs. Optional dependencies are imported lazily with clear install hints.

💬

AI Data Analyst Chat

Run statforge chat data.csv to explore your dataset interactively. Each row becomes a searchable document (microgpt philosophy). Connects to Claude API or uses a built-in rule engine.

🔄

Autonomous Robustness

Pass --auto to the run command. When borderline assumptions are detected (0.04 < p < 0.06), both parametric and non-parametric tests run automatically and results are compared.

Core Capabilities

Built for Rigorous Research

Every feature maps directly to the implemented Python codebase. No aspirational claims.

🛡️

Interactive CLI Validation

The statforge validate command performs preliminary data quality checks: flagging missing values, detecting data type anomalies, and screening for outliers via IQR methods — entirely decoupled from the statistical models.

🎯

PriorAdvisor

Auto-suggests data-driven weakly informative priors (Normal with μ = observed mean, σ = 2× observed SD). Documents the rationale for peer review. Runs automated sensitivity analysis across uninformative, weakly informative, and informative prior variants.

📝

Automated Methods Section

The MethodsBuilder synthesizes journal-ready prose specifying the exact test name, StatForge version, assumption rationale, significance threshold (α), and effect size metric with interpretation scale (per Cohen, 1988).

Deployment

Deployment & Analysis Services

StatForge is free and open-source. Professional services are available for institutions requiring bespoke configurations.

Open Source

Free

Full CLI pipeline execution
Frequentist & Bayesian model suite
APA/Standardized report generation
Plugin registry for custom models
Community support via GitHub

Install via pip

Professional

Custom Solutions

Custom Quote

Bespoke PyMC model development
Institutional / lab deployment
Custom Jinja2 journal templates
Onboarding & training sessions
Priority engineering support

Contact for Details