Automated Bayesian-frequentist statistics and publication-ready reports

From raw data to APA-formatted PDF, DOCX, and HTML — now with 15+ data formats, AI-powered chat, and autonomous robustness checks.

The Six-Stage Asynchronous Pipeline

Each stage executes via asyncio.to_thread, streaming real-time progress events to the CLI while offloading CPU-bound computation to separate threads.

📂
DataLoader

DataLoader

15+ formats: CSV, TSV, JSON, Excel, Parquet, Feather, SPSS, Stata, SAS, HDF5, SQLite, and remote URLs. Automatic format detection with lazy dependency loading.

🔬
AssumptionChecker

AssumptionChecker

Shapiro-Wilk normality and Levene homoscedasticity tests with borderline detection (0.04 < p < 0.06). Results cached via joblib.Memory keyed on SHA-256 data hashes.

🧭
MethodSelector

MethodSelector

Decision-tree mapping group count + assumption results to a ranked test list. Routes to t_test, mann_whitney, anova, kruskal_wallis, or regression.

⚙️
ModelFitter

ModelFitter

Plugin registry dispatching to registered models via the @register decorator. Ships with frequentist (ANOVA, T-Test) and Bayesian (PyMC T-Test, ANOVA, Regression) models.

📊
ResultFormatter

ResultFormatter

APA7-compliant table builder with effect sizes (η², Cohen's d, R²). Formats p-values per APA convention (< .001 or exact to three decimals).

📄
ReportBuilder

ReportBuilder

Jinja2 template orchestrator generating PDF, DOCX, or HTML output. Supports pluggable journal templates (APA7, Vancouver, IEEE, Nature).

v0.2.0 — Major Update

Three headline features expand StatForge from a pipeline runner into a full research companion.

📂

15+ Data Formats

Load CSV, TSV, JSON, Excel, Parquet, Feather, SPSS, Stata, SAS, HDF5, SQLite databases, and remote URLs. Optional dependencies are imported lazily with clear install hints.

💬

AI Data Analyst Chat

Run statforge chat data.csv to explore your dataset interactively. Each row becomes a searchable document (microgpt philosophy). Connects to Claude API or uses a built-in rule engine.

🔄

Autonomous Robustness

Pass --auto to the run command. When borderline assumptions are detected (0.04 < p < 0.06), both parametric and non-parametric tests run automatically and results are compared.

Built for Rigorous Research

Every feature maps directly to the implemented Python codebase. No aspirational claims.

🛡️

Interactive CLI Validation

The statforge validate command performs preliminary data quality checks: flagging missing values, detecting data type anomalies, and screening for outliers via IQR methods — entirely decoupled from the statistical models.

🎯

PriorAdvisor

Auto-suggests data-driven weakly informative priors (Normal with μ = observed mean, σ = 2× observed SD). Documents the rationale for peer review. Runs automated sensitivity analysis across uninformative, weakly informative, and informative prior variants.

📝

Automated Methods Section

The MethodsBuilder synthesizes journal-ready prose specifying the exact test name, StatForge version, assumption rationale, significance threshold (α), and effect size metric with interpretation scale (per Cohen, 1988).

Deployment & Analysis Services

StatForge is free and open-source. Professional services are available for institutions requiring bespoke configurations.

Open Source
Free
  • Full CLI pipeline execution
  • Frequentist & Bayesian model suite
  • APA/Standardized report generation
  • Plugin registry for custom models
  • Community support via GitHub
Install via pip