R-Squared (Coefficient of Determination) Calculator

Data Visualization

R-Squared (Coefficient of Determination) Calculator

What is R-Squared?

R-squared, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

Formula and Its Meaning

The R-squared value is calculated as the square of the correlation coefficient (r) between the observed and predicted values of the dependent variable:

\[R^2 = r^2 = \left(\frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}\right)^2\]

Where:

  • \(n\) is the number of observations
  • \(x\) represents the independent variable
  • \(y\) represents the dependent variable
  • \(\sum\) denotes the sum

R-squared values range from 0 to 1, where:

  • 0 indicates that the model explains none of the variability of the response data around its mean
  • 1 indicates that the model explains all the variability of the response data around its mean

Calculation Steps

  1. Calculate the sums: \(\sum x\), \(\sum y\), \(\sum xy\), \(\sum x^2\), and \(\sum y^2\)
  2. Calculate the correlation coefficient (r) using the formula above
  3. Square the correlation coefficient to get R-squared

Example Calculation

Let's calculate R-squared for the dataset:

X: 1, 2, 3, 4, 5

Y: 2, 4, 5, 4, 5

  1. Calculate sums:
    • \(\sum x = 15\)
    • \(\sum y = 20\)
    • \(\sum xy = 70\)
    • \(\sum x^2 = 55\)
    • \(\sum y^2 = 86\)
  2. Calculate r: \[r = \frac{5(70) - (15)(20)}{\sqrt{[5(55) - 15^2][5(86) - 20^2]}} \approx 0.8018\]
  3. Calculate R-squared: \[R^2 = (0.8018)^2 \approx 0.6429\]

Therefore, approximately 64.29% of the variance in Y can be explained by the variance in X.

Visual Representation

This scatter plot represents the example dataset. The red line indicates the best fit line, and the closeness of the points to this line visually represents the strength of the correlation and, consequently, the R-squared value.