Free statistics calculators designed for data scientists. This Correlation Coefficient Calculator:
To use the calculator, enter the X values into the left box and the associated Y values into the right box, separated by commas or new line characters. Hit calculate. It will calculate the correlation coefficient and generate an r-squared goodness of fit test result.
For easy entry, you can copy and paste your data into the entry box from Excel. You can save your data for use with this calculator and other calculators on this site. Just hit the "save data" button. It will save the data in your browser (not on our server, it remains private). Saved data sets will appear on the list of saved datasets below the data entry panel. To retrieve it, click the "load data" button next to it.
Can be comma separated or one line per data point; you can also cut and paste from Excel.
Saved in your browser; you can retrieve these and use them in other calculators on this site.
Need to pass an answer to a friend? It's easy to link and share the results of this calculator. Hit calculate - then simply cut and paste the url after hitting calculate - it will retain the values you enter so you can share them via email or social media.
The Pearson product moment correlation coefficient measures the degree to which variation in one variable can be associate with variation in another. For purposes of this calculator, we refer to the first (our X values) as the independent variable and the associated Y values as the dependent variable. When we calculate the correlation between the two, we are evaluating the extent to which they change together.
A strong positive correlation coefficient indicates that the two tend to increase and decrease alongside each other, in similar proportion. Strongly negatively correlated variables will move in opposite directions. It should be noted that unlike the coefficients of a linear regression equation, where you can estimate the value of one variable given the other, the scale of the correlation coefficient is independent of any differences in the scale of the underlying variables. We are analyzing the covariance of the two, divided by the product of their variance. Thus, we analyze the relative change rather than the absolute magnitude of the change.
We include a second statistic, the r-squared value, as a means of evaluating the consistency of this trend. A high R-squared value indicates the data points consistently move together in accordance with the trend denoted in their correlation coefficient. A low r-Squared value indicates there's a lot of noise in the system and only a portion of the variation in dependent variable can be explained by changes in the independent variable. Look at your r-squared value to assess your confidence in in the correlation.
As always, use common sense in interpreting these results. Correlation in data doesn't necessarily indicate causation or predict replication in the real world. Given a large enough pile of data to go shopping in, you will inevitably find a few spurious correlations to capture your imagination. This is where holding back a validation sample can keep you honest - look skeptically at any trend that doesn't replicate in your holdout sample. Similarly, look for opportunities to A/B test your proposed changes on a small sample before ramping them to full volume.