Linear Regression Calculator

Slope, intercept, R², Pearson r, and a scatter plot with line of best fit — from your X,Y data.

Load Example

#XY

Minimum 3 data points required. Empty rows are ignored.

Formulas Used

// Slope (b₁) — least squares b₁ = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / Σ[(xᵢ − x̄)²] // Intercept (b₀) b₀ = ȳ − b₁ × x̄ // Predicted value ŷ = b₀ + b₁ × x // Pearson correlation coefficient r = Σ[(xᵢ−x̄)(yᵢ−ȳ)] / √(Σ(xᵢ−x̄)² × Σ(yᵢ−ȳ)²) // Coefficient of determination = // Residual eᵢ = yᵢ − ŷᵢ

Interpreting R² and Pearson r

|r| RangeR² RangeCorrelation StrengthInterpretation
0.00 – 0.190.00 – 0.04Very WeakLittle to no linear relationship
0.20 – 0.390.04 – 0.15WeakSome tendency but high scatter
0.40 – 0.590.16 – 0.35ModerateNoticeable linear trend
0.60 – 0.790.36 – 0.63StrongClear linear relationship
0.80 – 1.000.64 – 1.00Very StrongPoints are close to the regression line

What Is Linear Regression?

Linear regression is one of the most fundamental tools in statistics and data analysis. It models the relationship between a dependent variable Y and an independent variable X using a straight line: ŷ = b₀ + b₁x. The line is fitted by minimizing the sum of squared residuals — the vertical distances between each observed point and the predicted line.

Developed in the early 19th century by Carl Friedrich Gauss and Adrien-Marie Legendre, the least squares method remains the most widely used approach in regression analysis. It is the foundation of econometrics, machine learning, experimental science, and virtually every field that models quantitative relationships.

How to Interpret the Regression Equation

  • Slope (b₁): The expected change in Y for a one-unit increase in X. If b₁ = 2.5, Y increases by 2.5 for every unit increase in X.
  • Intercept (b₀): The predicted value of Y when X = 0. May not always be meaningful outside the data range.
  • R² (coefficient of determination): The proportion of variance in Y explained by X. R² = 0.72 means 72% of the variation in Y is explained by the linear relationship with X.
  • Pearson r: The correlation coefficient. r = +1 is perfect positive, r = −1 is perfect negative, r = 0 is no linear relationship. In simple linear regression, R² = r².

Real-World Uses of Linear Regression

Economics and Finance

Economists use regression to model relationships like GDP vs. unemployment, or advertising spend vs. revenue. Finance uses regression to calculate beta (slope of stock returns vs. market returns), the foundation of CAPM.

Medicine and Epidemiology

Medical researchers use regression to model dose-response relationships, predict patient outcomes from clinical measurements, and study risk factors. For example: blood pressure vs. age, or drug dose vs. effect size.

Machine Learning

Linear regression is the simplest supervised learning algorithm. It serves as the baseline for all regression problems and is the basis for ridge regression, lasso, and logistic regression.

Assumptions of Linear Regression

  • Linearity: The relationship between X and Y is linear.
  • Independence: Observations are independent of each other.
  • Homoscedasticity: The variance of residuals is constant across all values of X.
  • Normality: Residuals are normally distributed (required for inference, not prediction).

Violating these assumptions does not invalidate regression entirely — it affects the reliability of inference. For prediction purposes, regression is often robust to mild violations.

Frequently Asked Questions

Linear regression finds the best-fitting straight line through a set of data points by minimizing the sum of squared residuals. The line equation ŷ = b₀ + b₁x describes how Y changes with X. It is used for prediction, estimating relationships, and testing whether X has a statistically significant effect on Y.

R² measures how well the regression line fits the data. R² = 0.85 means 85% of the variability in Y is explained by X. In social sciences, R² = 0.3 may be considered good; in physical sciences, R² below 0.95 may be considered poor. Always interpret R² relative to the field and problem context.

Pearson r measures the strength and direction of a linear relationship: +1 = perfect positive, −1 = perfect negative, 0 = no linear relationship. In simple linear regression, R² = r². Important: correlation does not imply causation — two variables can be correlated due to a third confounding variable or pure coincidence.

The least squares slope b₁ = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / Σ[(xᵢ − x̄)²]. The numerator is the covariance; the denominator is the variance of X. The intercept b₀ = ȳ − b₁ × x̄ ensures the line passes through the centroid (x̄, ȳ) of the data.

Correlation (Pearson r) measures association — it is symmetric (swapping X and Y gives the same r). Regression produces a prediction equation and is directional — which variable is X (predictor) and Y (outcome) matters. Regression gives slope and intercept quantifying rate of change; correlation gives only the normalized measure.

A residual is the difference between an observed Y and the predicted Ŷ: eᵢ = yᵢ − ŷᵢ. The regression line minimizes Σeᵢ² (least squares). Residuals should be randomly scattered around zero. Systematic patterns suggest the linear model is not appropriate — try a polynomial or log transformation.

Technically 3 (2 always produce R² = 1 trivially). In practice, 10+ for meaningful regression; 20+ for reliable inference. Too few points make it impossible to assess linearity or detect outliers. A rule of thumb: at least 10 observations per predictor variable for multiple regression.

Yes — use the "Predict Y" section above. Enter any X value to get ŷ = b₀ + b₁x. Be cautious with extrapolation (X outside your data range) — the linear relationship may not hold beyond observed X values. Interpolation (within the data range) is generally more reliable.

Related Calculators