Using Excel for Statistical Analyses

Students who are already using Excel for other work often want to use Excel for statistics. Excel does have some statistical capabilities. However, numerous problems with this have been documented. Statisticians have been documenting these problems for at least nine years and subsequent versions of Excel have not addressed the problems adequately. One software designer has commented that a major part of the problem is that a spreadsheet is basically cell-based, where professional statistical software is column-based (where a column holds the observations on a single random variable.) That is a drawback which simply can't be fixed, since a spreadsheet must be cell-based in order to function appropriately for spreadsheet applications.

One major problem is that Excel does not handle missing data appropriately. Consider a dataset with observations on two variables, and some values are missing. The appropriate way to compute a correlation on this data is to omit the points that do not have observations on both variables. That is not what Excel does. Try it on some data with missing values on some of the variables. Then go through and delete the cases with missing data and do the analysis again. You'll find that you get different results.

In computing a standard deviation (and correlation coefficients), there are two algebraically equivalent formulas. One is easier to use when using a simple calculator and is sometimes called the "computation formula." However, it is numerically unstable, meaning that it can be quite inaccurate on some datasets because of round-off error. With appropriate training, a person could learn to recognize the kinds of datasets for which Excel is likely to compute the standard deviation or correlation coefficient incorrectly. However, good statistical software is more accurate than that and doesn't require the user to first determine whether this is the kind of dataset on which the software obtains correct answers.

Excel does have extensive graphing capabilities. However, some of the main graphs available are misleading as summaries of the data. If you use Excel to make the plots needed to analyze the data, it is important to make sure you are choosing an appropriate graph.


ACC MATH 1342 software home | ACC MATH 1342 home | ACC Math home | ACC home


Last updated August 10, 2003. Comments, questions, suggestions?