Free statistics calculators designed for data scientists. This outlier calculator:

- Examines Data For Statistical Outliers
- Generates a list of outlier datapoints
- Lets you reuse your data for other analysis

This calculator examines a set of numbers and identifies data points which fall meaningfully outside the typical range of the distribution. Enter each data point as a separate value, separated by commas. Then hit calculate. The calculator will generate a list of points which are significantly outside the observed distribution.

Want to do more analysis? We have tools that will allow you to plot the distribution and generate a histogram. Even better, you can save your data from this calculator and reuse it on that web page! Or come back and use it to check your work later. Simply hit the "save data" button. It will save the data in your browser (not our server, it remains private to you). It will appear on the list of saved datasets below the data entry panel. To retrieve it, click the "load data" button next to it.

The outliers tagged by the outlier calculator are observations which are significantly away from the core of the distribution. In this case, we calculated the interquartile range (the gap between the 25th and 75th percentile) to measure the variation in the sample. An observation is tagged as an outlier if it is greater than a multiple (1.5) of the interquartile range above or below the boundariers of the interquartile range.

For example, assume you enter 20 data points as your observations and hit calculate. If the 25th percentile value was 5 and the 75th percentile was 15, the interquartile range would be calculated as 10. Using the logic of tagging points that are 1.5 x the distance of the interquartile range away from the upper and lower bound of the interquartile range, we designate anything above 30 (15 + 10 x 1.5 = 30) and below -10 (5 - 10 x 1.5 = -10) as outliers.

4

55

74

102

Can be comma separated or one line per data point; you can also cut and paste from Excel.

Saved in your browser; you can retrieve these and use them in other calculators on this site.

Need to pass an answer to a friend? It's easy to link and share the results of this calculator. Hit calculate - then simply cut and paste the url after hitting calculate - it will retain the values you enter so you can share them via email or social media.

There are no hard and fast rules about what to do with outliers. Your handling of outliers should be driven by the goals of the analysis. Are you attempting to model "normal conditions" or are you looking for extreme behavior and notable "differences"? This will guide your efforts in analyzing outliers.

If you are using data to model the expected value of a process, you may want to exclude them. For example, when I was working on pricing models, I would frequently implement special rules for exceptionally high or low prices. The price paid for old close-out items or the price paid for "emergency expediting" is not representative of what the typical customer will pay. A few large outliers can easily introduce bias into a linear regression model. Thus, we usually excluded them from the model and calculated our statistics using a more representative sample.

In contrast, when I was doing exploratory data analysis to identify credit card fraud or other relatively rare events, the outliers were of extreme interest. This is particularly true when you are building rules for real time scanning - you want to tag events which are notably outside the typical range of behavior for a human being to review and evaluate. For example, there is not really good reason for someone to make seven in-store payments within the course of a single day - this indicates a check kiting scheme (many years ago). Similar approaches can be used to look at engineering failures and relationship attrition - the part (or person) may do fine under normal stress but a specific events causes them to crack. The outliers often have more useful information than modeling normal operations.

This approach uses the interquartile range to measure the variance of the underlying data. This is what is known as a non-parametric statistical test, which doesn't require you to specify an underlying distribution as part of the test. This means you can apply it to a very broad range of data. Now, if you are confident that the data you are analyzing comes from a normal distribution, you can use other tools such as Grubb's test for outliers

Either way, they should not be ignored in serious analysis...