Describing Data: Averages, Spread, and Expected Values
Learning objectives
By the end of this chapter, you should be able to:
- Calculate and interpret measures of central tendency (mean, median, mode) for different types of data.
- Calculate and interpret measures of dispersion (range, variance, standard deviation) to assess variability.
- Use the coefficient of variation to compare relative variability between datasets.
- Construct and interpret simple frequency distributions to summarise data and identify patterns.
- Calculate expected values to support decisions under uncertainty, and explain what the result does (and does not) tell you.
Overview & key concepts
Managers and analysts often face large volumes of data—daily sales, processing times, defect rates, customer spend, or budget variances. Descriptive statistics convert raw numbers into a compact summary that helps you:
- understand what a “typical” value looks like,
- judge how consistent (or volatile) the data are,
- compare performance across products, sites, or periods,
- support decisions where outcomes are uncertain.
Two ideas run throughout this chapter:
- Central tendencyanswers:Where is the data typically located?
- Dispersionanswers:How widely does the data vary around that typical value?
A separate but related tool, expected value, summarises uncertain outcomes using probabilities.
Measures of central tendency
Measures of central tendency describe a representative value for a dataset. The most appropriate measure depends on the shape of the data and what you are trying to communicate.
Mean (arithmetic average)
The mean is the total of all values divided by the number of values:
- Mean= (sum of values) ÷ (number of values)
The mean uses every data point, which makes it useful for budgeting, forecasting, and performance measurement. However, it can be pulled up or down by extreme values (outliers).
Exam tip: If a dataset contains an extreme value, comment on whether the mean still gives a representative “typical” figure, and consider the median as a comparison.
Median
The median is the middle value when the data are placed in order. If there is an even number of values, the median is the average of the two central values.
The median is often preferred when the dataset is skewed or contains outliers, because it depends on position rather than magnitude.
Exam tip: Always sort the data before selecting the median. State clearly whether you have an odd or even number of observations.
Mode
The mode is the most frequently occurring value. Some datasets have one mode, multiple modes, or no mode.
The mode is particularly useful where the most common outcome is more informative than an average (for example, the most common order size or the most common waiting time band).
Exam tip: If every value occurs once, state explicitly that there is no mode (rather than leaving it blank).
Measures of dispersion
Dispersion shows how spread out the data are. Two datasets can share the same mean but have very different consistency—an important distinction in performance control and risk assessment.
Range
The range is:
- Range= maximum value − minimum value
It is quick to calculate, but it only uses two observations and can be distorted by outliers.
Exam tip: Use range as a quick first comment on spread, but support it with standard deviation when the question asks for a fuller measure of variability.
Variance and standard deviation
Variance and standard deviation measure spread around the mean.
- Calculate each deviation from the mean: (value − mean).
- Square each deviation.
- Sum the squared deviations.
- Divide to obtain the variance.
- Take the square root to obtain the standard deviation.
There are two common versions:
- Population variance(use when the dataset is the full set you are analysing):
- Variance = Σ(x − mean)² ÷ n
- Sample variance(use when the dataset is a sample used to estimate a wider population):
- Variance = Σ(x − mean)² ÷ (n − 1)
Exam tip: Unless the question indicates you are sampling or estimating a wider population, treat the dataset as the full period under review and divide by n. Use (n − 1) only when sampling/estimation is clearly intended.
Interpreting standard deviation: Standard deviation is a typical distance from the mean. It is not a guarantee that most values fall within one standard deviation of the mean unless you make additional distribution assumptions.
Coefficient of variation
The coefficient of variation (CV) compares variability relative to the mean:
- CV= (standard deviation ÷ mean) × 100%
CV is helpful when comparing datasets with different average levels (for example, two products with different average demand).
Exam tip: CV is most meaningful when the mean is positive and not close to zero. If the mean is very small (or negative), explain why CV may be misleading.
Frequency distributions
A frequency distribution groups data into intervals (classes) and counts how many observations fall into each interval. It helps you see clustering, gaps, and potential skewness.
Good class design is:
- complete(all observations included),
- non-overlapping(no value fits two classes),
- consistent(class widths usually equal unless there is a clear reason otherwise),
- clear at boundaries(so values are allocated unambiguously).
Exam tip: Where possible, use equal-width intervals. Open-ended classes (such as “£180+”) can be acceptable for quick summaries, but they are less useful for charts and comparisons because the class width is not defined.
Expected values
An expected value (EV) summarises uncertain outcomes as a probability-weighted average:
- Expected value= Σ (outcome × probability)
For an EV calculation to be valid, the probabilities used must represent all outcomes and must total 1.0.
EV is a useful decision aid, but it has important limits:
- EV isnotthe most likely outcome.
- The EV may be a value thatnever actually occurs(for example, if outcomes are £0 or £300, the EV could be £150).
- EV is most informative when the decision is repeatable over time or when using a risk-neutral decision rule. Otherwise, downside risk and risk appetite must also be discussed.
Exam tip: After calculating EV, add a short comment on risk: the downside outcome, its probability, and whether the decision maker might still reject the option despite a positive EV.
Core theory and frameworks
Data summary workflow
When you are given a dataset, aim to build a short story from the numbers rather than listing calculations.
1) Sanity-check first
Confirm what each figure represents (units, time period, and whether any values look like errors or one-off events).
2) Pick a headline “typical” value
- Use themeanwhen you need an overall level that reflects all observations.
- Use themedianwhen extremes could distort the picture.
- Use themodewhen the most common outcome is the key point (especially for categories or standard order sizes).
3) Add spread to show consistency
Start with the range for a quick sense-check, then use standard deviation for a fuller measure of variability around the mean. State whether you are using the population or sample approach.
4) Compare datasets fairly (if required)
If average levels differ significantly, use CV to compare variability relative to the mean (and note any limitations if the mean is very small or negative).
5) Show the shape, not just the average
Use a frequency distribution to highlight clustering, gaps, and potential skewness—then explain what the shape suggests for control, forecasting, or capacity planning.
6) If outcomes are uncertain, separate “average outcome” from “risk”
Compute EV as a probability-weighted average (checking probabilities total 1), then discuss downside exposure rather than treating EV as a guaranteed result.
Worked example
Narrative scenario
A small retail company tracks its daily sales over a week to understand typical performance and variability. The sales figures (GBP) are:
£120, £150, £130, £180, £170, £160, £140.
The company wants to summarise the sales data using measures of central tendency and dispersion, build a simple frequency distribution, and calculate the expected value for a potential promotional offer.
Required
- Calculate the mean, median, and mode of the sales data.
- Determine the range, variance, and standard deviation.
- Build a simple frequency distribution.
- Calculate the expected value for a promotional offer with given probabilities.
Solution
1) Mean, median, and mode
Mean
Sum of sales = 120 + 150 + 130 + 180 + 170 + 160 + 140 = 1,050
Number of days, n = 7
Mean = 1,050 ÷ 7 = £150
Median
Sorted data: 120, 130, 140, 150, 160, 170, 180
Median (middle value) = £150
Mode
All values occur once, so there is no mode.
2) Range, variance, and standard deviation
Range
Maximum = 180, Minimum = 120
Range = 180 − 120 = £60
Variance and standard deviation
Mean = £150
| Day’s sales (x) | x − mean | (x − mean)² |
|---|---|---|
| 120 | −30 | 900 |
| 150 | 0 | 0 |
| 130 | −20 | 400 |
| 180 | 30 | 900 |
| 170 | 20 | 400 |
| 160 | 10 | 100 |
| 140 | −10 | 100 |
| Total | - | 2,800 |
Σ(x − mean)² = 2,800
Population variance (treating this week as the full period under review):
Variance = 2,800 ÷ 7 = 400
Standard deviation = √400 = £20.00
Sample variance (treating this week as a sample used to estimate a wider pattern):
Variance = 2,800 ÷ (7 − 1) = 2,800 ÷ 6 = 466.67
Standard deviation = √466.67 ≈ £21.60
3) Frequency distribution
Use equal-width intervals and ensure all values fit exactly one class. With a minimum of £120 and maximum of £180, a convenient class width is £20:
- £120–£139
- £140–£159
- £160–£179
- £180–£199
Count the observations:
- £120–£139: 120, 130 →2
- £140–£159: 140, 150 →2
- £160–£179: 160, 170 →2
- £180–£199: 180 →1
Frequency table
| Sales interval | Frequency |
|---|---|
| £120–£139 | 2 |
| £140–£159 | 2 |
| £160–£179 | 2 |
| £180–£199 | 1 |
| Total | 7 |
4) Expected value
Promotional offer outcomes:
- 0.5 probability of£200profit
- 0.5 probability of£100profit
Probabilities total: 0.5 + 0.5 = 1.0 (complete set of outcomes)
EV = (0.5 × 200) + (0.5 × 100)
EV = 100 + 50 = £150
Interpretation of the results
- Typical daily sales:Mean and median are both£150, suggesting £150 is a sensible headline figure for this week’s “typical” day.
- Variability:Arange of £60indicates noticeable movement across the week. Astandard deviation of about £20(population method) indicates that daily sales are typically about £20 away from the mean. This does not guarantee that most days fall within ±£20 without further assumptions about the distribution.
- Pattern from the frequency distribution:Sales are spread fairly evenly across the middle bands, with one day in the highest band (£180–£199).
- Decision support under uncertainty:The promotional offer has anexpected profit of £150. This is an average across outcomes, not the most likely outcome and not a guaranteed result. A risk-aware comment should note the downside outcome (£100) and its probability (0.5).
Common pitfalls and misunderstandings
- Mean vs median:Using the mean for a dataset with extreme values can produce a “typical” figure that few periods achieve.
- Forgetting to sort for the median:The median depends on ordered data; the middle of an unsorted list is meaningless.
- Assuming a mode must exist:Many datasets have no single most common value.
- Range over-reliance:Range can exaggerate variability if one extreme value is unusual; it ignores the rest of the data.
- Mixing population and sample formulas:Dividing by n in one step and by (n − 1) in another produces inconsistent results.
- Using CV mechanically:CV is not helpful when the mean is zero/near zero or where negative values occur.
- Weak class design in frequency tables:Overlapping classes, gaps, or unclear boundaries lead to incorrect counts and unreliable conclusions.
- Expected value without risk discussion:EV is a weighted average; it does not describe downside exposure or outcome volatility.
- Probabilities not totalling 1:EV calculations require a complete set of outcomes; probabilities must sum to 1.
- Units confusion:Variance is in squared units; standard deviation returns to the original units.
Summary and further reading
Descriptive statistics convert raw data into useful information. Measures of central tendency (mean, median, mode) describe typical values, while measures of dispersion (range, variance, standard deviation) quantify variability. The coefficient of variation supports comparisons of relative volatility across datasets with different average levels. Frequency distributions reveal patterns such as clustering and skewness. Expected value summarises uncertain outcomes using probabilities, but it should be interpreted alongside downside outcomes and risk appetite.
For further study, read broadly in business statistics and decision-making under uncertainty, focusing on practical interpretation as well as calculation.
FAQ
Why is the median often preferred over the mean in skewed datasets?
Because the median depends on position rather than the size of the values. A small number of unusually large or small observations can shift the mean substantially, while the median typically remains stable, giving a more representative “middle” for skewed data.
How does the coefficient of variation help in comparing datasets?
It scales variability to the level of the mean. Two datasets may have different average sizes; CV expresses spread as a percentage of the mean, supporting more meaningful comparisons of relative volatility.
What are the limitations of using range as a measure of dispersion?
Range uses only the maximum and minimum values and ignores the distribution of the rest of the data. It can be heavily influenced by a single unusual observation and provides no information about how values cluster around the mean.
How is expected value used in decision-making under uncertainty?
Expected value combines outcomes and probabilities into a single average figure, useful for comparing options in repeatable decisions. However, it is not the most likely outcome and it does not show risk. A complete answer should also comment on downside outcomes and their likelihood.
Why is standard deviation preferred over variance for interpreting spread?
Standard deviation is expressed in the original units (e.g., pounds, minutes, units), so it can be interpreted directly alongside the mean. Variance is in squared units and is therefore less intuitive.
Summary (Recap)
This chapter explained how to describe datasets using measures of central tendency and dispersion, how to compare relative variability using the coefficient of variation, how to construct and interpret simple frequency distributions, and how to calculate expected values for decisions under uncertainty. The worked example demonstrated each technique using weekly sales data and reinforced that interpretation should distinguish between “average outcome” and risk.
Glossary
Mean
An average calculated by dividing the total of all observations by the number of observations. It reflects every data point and can be affected by extreme values.
Median
The middle value after sorting the data (or the average of the two middle values if there is an even count). Often robust when data are skewed.
Mode
The most frequently occurring value. A dataset may have one mode, multiple modes, or none.
Range
Maximum minus minimum. A quick spread measure based only on the two extremes.
Variance
A spread measure based on squared deviations from the mean. It can be calculated using a population (÷ n) or sample (÷ n − 1) approach.
Standard deviation
The square root of variance, expressing spread in the same units as the original data.
Coefficient of variation
Standard deviation divided by mean (usually shown as a percentage). Compares variability relative to average level.
Expected value
A probability-weighted average of outcomes: Σ(outcome × probability). Probabilities must total 1 for a complete set of outcomes.
Frequency distribution
A summary that groups data into intervals and counts how many observations fall into each interval.
Probability
A measure of likelihood between 0 and 1. For a complete set of outcomes, probabilities should total 1.
Test your knowledge
Practice questions specifically for this topic.
Written by
AccountingBody Editorial Team