Correlation Coefficient Calculator

Data Visualization

Correlation Coefficient Calculator

What is the Correlation Coefficient?

The correlation coefficient, often denoted as r, is a statistical measure that quantifies the strength and direction of the linear relationship between two variables. It ranges from -1 to +1, where:

  • +1 indicates a perfect positive correlation
  • 0 indicates no linear correlation
  • -1 indicates a perfect negative correlation

Formula and Its Components

The formula for the Pearson correlation coefficient is:

\[r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}\]

Where:

  • r = correlation coefficient
  • n = number of pairs of data
  • Σxy = sum of the products of paired data
  • Σx = sum of x values
  • Σy = sum of y values
  • Σx² = sum of squared x values
  • Σy² = sum of squared y values

Calculation Steps

  1. Calculate Σx, Σy, Σxy, Σx², and Σy²
  2. Compute the numerator: n(Σxy) - (Σx)(Σy)
  3. Compute the denominator: \(\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}\)
  4. Divide the numerator by the denominator

Example Calculation

Let's calculate the correlation coefficient for the dataset:

X: 1, 2, 3, 4, 5

Y: 2, 4, 5, 4, 5

  1. n = 5
  2. Σx = 1 + 2 + 3 + 4 + 5 = 15
  3. Σy = 2 + 4 + 5 + 4 + 5 = 20
  4. Σxy = (1)(2) + (2)(4) + (3)(5) + (4)(4) + (5)(5) = 70
  5. Σx² = 1² + 2² + 3² + 4² + 5² = 55
  6. Σy² = 2² + 4² + 5² + 4² + 5² = 86
  7. Numerator: 5(70) - (15)(20) = 50
  8. Denominator: \(\sqrt{[5(55) - 15^2][5(86) - 20^2]} = \sqrt{(25)(30)} = \sqrt{750} \approx 27.39\)
  9. r = 50 / 27.39 ≈ 0.83

Visual Representation

Correlation Coefficient Example Visualization 1 2 3 4 5 2 4 5 (1,2) (2,4) (3,5) (4,4) (5,5) X Values Y Values Calculation Details: r = 0.83 (Strong Positive) n = 5 pairs Σxy = 70 Σx = 15, Σy = 20 Σx² = 55, Σy² = 86 Numerator = 50 Denominator ≈ 27.39 Data Points Trend Line (r = 0.83)

This scatter plot represents the example dataset. The red line indicates the positive correlation between X and Y variables.