Data Import
Drop file here or click to browse
CSV, Excel (.xlsx, .xls), or paste from Excel
Demonstrates multiple regression with real-world relationships between price, square footage, bedrooms, and age.
Showing first 10 rows of 0 total rows
| Variable | Coefficient | Std Error | t-stat | p-value | 95% CI Lower | 95% CI Upper |
|---|
Variance Inflation Factor (VIF) is used to detect multicollinearity between predictor variables. VIF analysis requires at least two predictor variables. For simple regression (one predictor), multicollinearity is not a concern since there's only one independent variable.
Tip: Add another predictor variable to enable VIF analysis.
| Variable | VIF | R² | Interpretation |
|---|
VIF = Variance Inflation Factor. VIF > 5 suggests moderate multicollinearity; VIF > 10 indicates high multicollinearity.
Tests for regression assumptions (linearity, homoscedasticity, normality, etc.)
Run regression to see diagnostic test results
| # | Actual | Predicted | Residual | Std. Residual |
|---|
What is Linear Regression?
Linear regression is a statistical method used to model the relationship between a dependent variable (Y) and one or more independent variables (X). It finds the best-fitting straight line (or hyperplane for multiple predictors) that minimizes the sum of squared differences between observed and predicted values.
Simple regression: One predictor (y = mx + b)
Multiple regression: Multiple predictors (y = b + m₁x₁ + m₂x₂ + ... + mₖxₖ)
Model Fit Statistics
- R² (R-squared)
- Proportion of variance in Y explained by the model. Range: 0 to 1. Higher = better fit.
- Adjusted R²
- R² adjusted for the number of predictors. Penalizes adding useless variables. Use this when comparing models with different numbers of predictors.
- F-statistic
- Tests whether the model as a whole is significant. Compares your model to a model with no predictors.
- p-value (Model)
- Probability of getting these results if no variables actually predict Y. < 0.05 means the model is statistically significant.
Coefficient Statistics
- Coefficient (m or b)
- The estimated effect of each variable on Y. For continuous X: change in Y for a 1-unit increase in X.
- Std Error
- Precision of the coefficient estimate. Smaller = more precise.
- t-stat
- Coefficient divided by its standard error. Larger absolute values indicate more significant predictors.
- p-value (Coefficient)
- Tests if the coefficient is significantly different from zero. < 0.05 means the variable contributes to predicting Y.
- 95% CI
- 95% Confidence Interval. We're 95% confident the true coefficient lies within this range.
Residuals Diagnostics
- Residual
- Difference between actual and predicted values (Y - Ŷ). Should be randomly scattered around zero.
- Standardized Residual
- Residual divided by its standard deviation. Values > 2 or < -2 may indicate outliers.
- MSE (Mean Squared Error)
- Average squared difference between observed and predicted values. Lower = better predictions.
- Standard Error
- Standard deviation of the residuals. Typical prediction error in same units as Y.
Significance Codes
- *** p < 0.001
- Very strong evidence against null hypothesis
- ** p < 0.01
- Strong evidence against null hypothesis
- * p < 0.05
- Moderate evidence against null hypothesis
- (no stars)
- Not statistically significant at p < 0.05 level
Confidence Band vs Prediction Interval
- Confidence Band (Mean Response)
- The shaded region on the scatter plot shows where the mean of Y is likely to fall for a given X. Narrower band = more precise estimate of the regression line. Use this for estimating average values.
- Prediction Interval (Individual Values)
- Wider interval that accounts for both uncertainty in the regression line AND the natural variation of individual data points. Use this for predicting individual future observations. (Not shown on chart - would be ~2-3x wider)
Assumptions of Linear Regression
- Linearity: Relationship between X and Y is linear
- Independence: Observations are independent of each other
- Homoscedasticity: Residuals have constant variance at all levels of X
- Normality: Residuals are approximately normally distributed
- No multicollinearity: Predictors are not highly correlated with each other
Multicollinearity & VIF Explained
Multicollinearity occurs when predictor variables are highly correlated with each other. This can make it difficult to determine the individual effect of each predictor.
VIF (Variance Inflation Factor) measures how much the variance of a coefficient is inflated due to multicollinearity:
- VIF = 1: No correlation
- 1 < VIF ≤ 5: Low multicollinearity (acceptable)
- 5 < VIF ≤ 10: Moderate multicollinearity (review variables)
- VIF > 10: High multicollinearity (consider removing redundant variables)
- VIF = ∞: Perfect multicollinearity (one variable is a linear combination of others)
What to do: If VIF is high, consider removing one of the correlated variables or combining them into a single predictor.
Third-Party Licenses
- Chart.js — MIT License © Chart.js Contributors
- SheetJS (XLSX) — Apache 2.0 License © SheetJS LLC
- linreg-core — MIT OR Apache-2.0 © Jesse Anderson - Custom Rust WASM OLS regression engine