What does R-squared mean?

R-squared (coefficient of determination) measures the proportion of variance in Y that is explained by X. R² = 0.85 means 85% of the variability in Y is explained by the linear relationship with X. R² ranges from 0 (no fit) to 1 (perfect fit).

What is Pearson correlation coefficient?

Pearson r measures the strength and direction of the linear relationship between X and Y, ranging from −1 (perfect negative) to +1 (perfect positive). In simple linear regression, R² = r².

How is the slope calculated in linear regression?

The slope b₁ = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / Σ[(xᵢ − x̄)²]. The intercept b₀ = ȳ − b₁ × x̄.

What is the difference between correlation and regression?

Correlation (Pearson r) measures strength and direction of association and is symmetric. Regression produces a prediction equation and is directional — which variable is X (predictor) matters.

What are residuals in linear regression?

A residual is the difference between an observed Y value and the value predicted by the regression line: eᵢ = yᵢ − ŷᵢ. The least squares method minimizes Σeᵢ².

How many data points do I need for linear regression?

You need at least 3 points. In practice, 10+ for meaningful regression and 20+ for reliable inference.

Can linear regression predict future values?

Yes — the equation ŷ = b₀ + b₁x predicts Y for new X values. Be cautious with extrapolation outside the data range.

Linear Regression Calculator — Slope, Intercept, R² & Scatter Plot

Load Example

Data Points (X, Y)

#	X	Y

Minimum 3 data points required. Empty rows are ignored.

Regression Results

Slope (b₁)

—

Intercept (b₀)

—

R² (fit)

—

Pearson r

—

Equation

—

Data Points (n)

—

Correlation Strength

—

Predict Y for a Given X

Enter X value

Predicted Y (ŷ)

—

Scatter Plot with Regression Line

Residuals Table

#	X	Y (observed)	Ŷ (predicted)	Residual (e)

Formulas Used

// Slope (b₁) — least squares b₁ = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / Σ[(xᵢ − x̄)²] // Intercept (b₀) b₀ = ȳ − b₁ × x̄ // Predicted value ŷ = b₀ + b₁ × x // Pearson correlation coefficient r = Σ[(xᵢ−x̄)(yᵢ−ȳ)] / √(Σ(xᵢ−x̄)² × Σ(yᵢ−ȳ)²) // Coefficient of determination R² = r² // Residual eᵢ = yᵢ − ŷᵢ

Interpreting R² and Pearson r

\|r\| Range	R² Range	Correlation Strength	Interpretation
0.00 – 0.19	0.00 – 0.04	Very Weak	Little to no linear relationship
0.20 – 0.39	0.04 – 0.15	Weak	Some tendency but high scatter
0.40 – 0.59	0.16 – 0.35	Moderate	Noticeable linear trend
0.60 – 0.79	0.36 – 0.63	Strong	Clear linear relationship
0.80 – 1.00	0.64 – 1.00	Very Strong	Points are close to the regression line

What Is Linear Regression?

Linear regression is one of the most fundamental tools in statistics and data analysis. It models the relationship between a dependent variable Y and an independent variable X using a straight line: ŷ = b₀ + b₁x. The line is fitted by minimizing the sum of squared residuals — the vertical distances between each observed point and the predicted line.

Developed in the early 19th century by Carl Friedrich Gauss and Adrien-Marie Legendre, the least squares method remains the most widely used approach in regression analysis. It is the foundation of econometrics, machine learning, experimental science, and virtually every field that models quantitative relationships.

How to Interpret the Regression Equation

Slope (b₁): The expected change in Y for a one-unit increase in X. If b₁ = 2.5, Y increases by 2.5 for every unit increase in X.
Intercept (b₀): The predicted value of Y when X = 0. May not always be meaningful outside the data range.
R² (coefficient of determination): The proportion of variance in Y explained by X. R² = 0.72 means 72% of the variation in Y is explained by the linear relationship with X.
Pearson r: The correlation coefficient. r = +1 is perfect positive, r = −1 is perfect negative, r = 0 is no linear relationship. In simple linear regression, R² = r².

Real-World Uses of Linear Regression

Economics and Finance

Economists use regression to model relationships like GDP vs. unemployment, or advertising spend vs. revenue. Finance uses regression to calculate beta (slope of stock returns vs. market returns), the foundation of CAPM.

Medicine and Epidemiology

Medical researchers use regression to model dose-response relationships, predict patient outcomes from clinical measurements, and study risk factors. For example: blood pressure vs. age, or drug dose vs. effect size.

Machine Learning

Linear regression is the simplest supervised learning algorithm. It serves as the baseline for all regression problems and is the basis for ridge regression, lasso, and logistic regression.

Assumptions of Linear Regression

Linearity: The relationship between X and Y is linear.
Independence: Observations are independent of each other.
Homoscedasticity: The variance of residuals is constant across all values of X.
Normality: Residuals are normally distributed (required for inference, not prediction).

Violating these assumptions does not invalidate regression entirely — it affects the reliability of inference. For prediction purposes, regression is often robust to mild violations.

Frequently Asked Questions

Linear regression finds the best-fitting straight line through a set of data points by minimizing the sum of squared residuals. The line equation ŷ = b₀ + b₁x describes how Y changes with X. It is used for prediction, estimating relationships, and testing whether X has a statistically significant effect on Y.

R² measures how well the regression line fits the data. R² = 0.85 means 85% of the variability in Y is explained by X. In social sciences, R² = 0.3 may be considered good; in physical sciences, R² below 0.95 may be considered poor. Always interpret R² relative to the field and problem context.

Pearson r measures the strength and direction of a linear relationship: +1 = perfect positive, −1 = perfect negative, 0 = no linear relationship. In simple linear regression, R² = r². Important: correlation does not imply causation — two variables can be correlated due to a third confounding variable or pure coincidence.

The least squares slope b₁ = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / Σ[(xᵢ − x̄)²]. The numerator is the covariance; the denominator is the variance of X. The intercept b₀ = ȳ − b₁ × x̄ ensures the line passes through the centroid (x̄, ȳ) of the data.

Correlation (Pearson r) measures association — it is symmetric (swapping X and Y gives the same r). Regression produces a prediction equation and is directional — which variable is X (predictor) and Y (outcome) matters. Regression gives slope and intercept quantifying rate of change; correlation gives only the normalized measure.

A residual is the difference between an observed Y and the predicted Ŷ: eᵢ = yᵢ − ŷᵢ. The regression line minimizes Σeᵢ² (least squares). Residuals should be randomly scattered around zero. Systematic patterns suggest the linear model is not appropriate — try a polynomial or log transformation.

Technically 3 (2 always produce R² = 1 trivially). In practice, 10+ for meaningful regression; 20+ for reliable inference. Too few points make it impossible to assess linearity or detect outliers. A rule of thumb: at least 10 observations per predictor variable for multiple regression.

Yes — use the "Predict Y" section above. Enter any X value to get ŷ = b₀ + b₁x. Be cautious with extrapolation (X outside your data range) — the linear relationship may not hold beyond observed X values. Interpolation (within the data range) is generally more reliable.

Linear Regression Calculator