ACCACIMAICAEWAATManagement Accounting

Estimating Costs: High–Low, Scattergraphs and Regression

AccountingBody Editorial Team

This chapter delves into estimating costs using the high–low method, scattergraphs, and regression analysis. It explains how to separate fixed and variable…

Learning objectives

By the end of this chapter you should be able to:

Use the high–low method to estimate the fixed and variable elements of a mixed cost from activity data.
Build and interpret a scattergraph to visualise cost behaviour and identify unusual observations.
Calculate and interpret correlation (r) and the coefficient of determination (R²) when analysing cost and activity.
Construct a simple linear regression cost model and use it to forecast costs at specified activity levels.
Explain common weaknesses in cost estimation (outliers, step changes, mixed drivers, and forecasts outside the relevant range) and suggest practical improvements.

Overview & key concepts

When planning and controlling operating performance, it is often helpful to understand how costs respond to changes in activity. Many costs are neither purely fixed nor purely variable; instead, they include a “standing” (fixed) element plus an additional amount that increases with a driver such as units produced, machine hours, or number of orders processed.

This chapter focuses on three practical tools used in cost estimation:

High–low (a quick two-point estimate)
Scattergraphs (a visual sense-check and outlier detector)
Linear regression (a best-fit model using all observations)

A useful way to think about them is:

Scattergraphs diagnose whether a simple model is suitable.
High–low and regression estimate the fixed and variable parameters once the data looks usable.

Cost function

A cost function links total cost to an activity level by splitting cost into:

a fixed element (incurred even at zero activity, within the relevant range), and
a variable element (changes with activity).

The standard linear cost function is:

Y = a + bX

Where:

Y= total cost
a= fixed cost (intercept)
b= variable cost per unit of activity (slope)
X= activity level (the cost driver)

Always sanity-check the driver. A model can look statistically strong but still be unsuitable if the driver does not make operational sense.

Fixed, variable and mixed costs

Fixed costs

Fixed costs stay constant in total within the relevant range. If output rises, the fixed cost per unit falls because the same total is spread over more units.

Variable costs

Variable costs increase as activity increases, typically at a broadly constant rate per unit within the relevant range.

Mixed (semi-variable) costs

Mixed costs contain both fixed and variable elements (for example, a service charge plus a usage-based charge). The aim of high–low and regression is to separate these two elements for forecasting and decision-making.

Relevant range

A cost model is reliable only within the activity band over which the business operated in a broadly consistent way. Outside that band, cost behaviour may change because of capacity limits, different working methods, or changes in input prices.

Examples of changes that can break a simple linear model include:

adding an extra supervisor or shift once output crosses a threshold
overtime premiums at high activity
bulk discounts or shortages changing variable cost per unit
hiring extra equipment above normal capacity

When forecasting, compare the target activity to the observed range and consider whether operations would remain similar.

High–Low method

What it does

The high–low method estimates variable cost per unit from the difference between the highest and lowest activity observations, then derives fixed cost by substituting back into the cost equation.

Mechanics

Variable cost per unit:

b = (Cost at high activity − Cost at low activity) / (High activity − Low activity)

Fixed cost (using either high or low point):

a = Total cost − (b × Activity)

Final exam-style presentation should be stated explicitly as:

Total cost = Fixed cost + (Variable cost per unit × Activity)

Strengths and weaknesses

Fast and easy to apply.
Uses only two observations, so it can be distorted if either point is unusual.
Best used as a quick estimate, supported by a scattergraph and (where available) regression.

Rounding good practice

Use the unrounded b to compute a, then round the final equation.

Scattergraphs

A scattergraph plots total cost (vertical axis) against activity (horizontal axis). Its purpose is to diagnose the data before relying on calculations. Any conclusion should be evidence-based and linked to what is visible on the plot.

A practical conclusion framework is:

Linearity:Do points cluster around a straight line (a linear model may be reasonable)?
Outliers:Are any observations clearly away from the main pattern (possible errors or one-off events)?
Step changes:Are there flat bands followed by jumps (suggesting capacity thresholds or step costs)?
Changing spread at higher activity:If points fan out as activity rises, the straight-line estimate may be less reliable at higher activity levels.

A scattergraph helps you judge suitability, but it does not prove the relationship will remain stable over time.

If these checks suggest problems, do not rely on high–low or a single regression line without adjustment (for example, investigate the outlier, split the data into segments, or consider an additional driver).

Correlation and coefficient of determination

Correlation coefficient (r)

Correlation indicates the direction and strength of a linear relationship between two variables, ranging from −1 to +1.

r close to +1: strong positive linear relationship
r close to −1: strong negative linear relationship
r close to 0: weak linear relationship

Correlation does not prove cause-and-effect. Business logic must support the choice of driver.

Coefficient of determination (R²)

R² shows how much of the variation in cost is explained by the activity variable in the model.

R² close to 1: activity explains most of the observed movement in cost in this dataset
A highR²does not guarantee a good model if key drivers are missing, operations changed, or the model is used outside the relevant range

In a simple single-variable linear model with an intercept (the standard setup in cost estimation), the following relationship holds:

R² = r²

Regression analysis

Regression estimates the “best fit” straight-line relationship using all observations by minimising the sum of squared residuals.

Model form

Y = a + bX

Where:

bestimates variable cost per unit of the driver
aestimates fixed cost

Final exam-style presentation should be stated explicitly as:

Total cost = Fixed cost + (Variable cost per unit × Activity)

Residuals

A residual is the prediction error for a data point:

Residual = Actual cost − Predicted cost

Residual patterns can warn that:

the relationship is not linear
the model is missing a second driver
a particular period is unusual and should be investigated

How regression is handled in questions

In practice, regression is typically obtained from spreadsheet or calculator functions that provide slope and intercept. In exam questions, you may be given either:

the regression equation directly, or
summary regression output (including slope, intercept, r, and/or R²), or
sufficient data to calculate the line using least-squares formulas.

Worked example

Narrative scenario

A fulfilment business tracks monthly warehouse handling cost against the number of customer orders dispatched. Management expects the cost to include a monthly “ready-to-operate” element (supervision, security, basic utilities) plus a variable element driven by order volume (pick/pack materials and handling time).

Eight months of comparable data have been recorded. Month 6 included a temporary weekend shift to clear a backlog, which may have affected cost behaviour.

Data:

Month 1: 4,000 orders, £45,200
Month 2: 4,400 orders, £47,100
Month 3: 5,000 orders, £50,600
Month 4: 5,400 orders, £52,400
Month 5: 6,000 orders, £55,500
Month 6: 6,400 orders, £60,800
Month 7: 7,000 orders, £60,400
Month 8: 7,600 orders, £63,200

Required

Estimate the variable cost per order and fixed monthly cost using the high–low method.
Explain what you would look for on a scattergraph and identify any observation that should be investigated.
Using regression, state the regression equation and interpret r and R².
Forecast the cost at 8,000 orders and comment on the relevant range.
Discuss potential pitfalls and how the analysis could be improved.

Solution

1) High–Low method

High activity: 7,600 orders, £63,200
Low activity: 4,000 orders, £45,200

Step 1: Variable cost per order:

b = (63,200 − 45,200) / (7,600 − 4,000)
b = 18,000 / 3,600 = £5.00 per order

Step 2: Fixed cost (use the high point):

a = 63,200 − (5.00 × 7,600)
a = 63,200 − 38,000 = £25,200

Step 3: Cost equation (exam format):

Total handling cost = 25,200 + (5.00 × Orders)

2) Scattergraph (interpretation)

On a scattergraph of cost against orders, you would look for:

whether the points broadly follow a straight-line pattern (linearity)
whether any month sits noticeably away from the general trend (outlier)
whether costs jump at a threshold (step costs)
whether points become more dispersed at higher volumes (less reliable estimates at the top end)

Based on the plotted data, Month 6 would likely sit above the main trend and should be investigated, consistent with the operational note about a temporary weekend shift.

A visual check supports judgement, but it does not guarantee the relationship will remain stable over time.

3) Regression line and interpretation

Using all eight observations, the regression model is approximately:

Y = 24,455 + 5.231X

Interpretation:

Fixed cost (intercept) ≈ £24,455 per month
Variable cost per order (slope) ≈ £5.231 per order

Fit measures:

r ≈ 0.984
R² ≈ 0.968

Interpretation of the statistics:

The positiverindicates costs generally rise as orders increase.
In this dataset,R²suggests that most of the variation in monthly handling cost is explained by order volume, with the remainder likely due to other factors (for example, the Month 6 weekend shift).
Because this is a single-variable linear model with an intercept,R² = r²applies.

Regression cost equation (exam format):

Total handling cost = 24,455 + (5.231 × Orders)

If Month 6 is treated as an outlier and excluded after investigation, the line becomes much tighter. For example, a recalculated line excluding Month 6 is approximately:

Total handling cost = 25,230 + (5.020 × Orders)

This illustrates why diagnosing outliers before relying on a model is important.

4) Forecast at 8,000 orders (regression model)

Using the regression equation (all data):

Estimated cost = 24,455 + (5.231 × 8,000)
Estimated cost = 24,455 + 41,848 = £66,303 (approx.)

Relevant range comment:
The observed activity range is 4,000 to 7,600 orders. Forecasting at 8,000 orders is a small extrapolation beyond the dataset, so capacity assumptions (overtime, extra shifts, additional space) should be checked before relying on the figure.

5) Potential pitfalls and improvements

Potential pitfalls:

Driver choice:Orders may be a good driver, but costs may also depend on delivery routes, product mix, or returns handling.
Outliers and one-offs:Abnormal staffing patterns (such as temporary shifts) can distort both high–low and regression results.
Step costs:If an extra supervisor, shift, or hired equipment is needed beyond a threshold, the relationship may jump rather than remain linear.
Forecasting outside the relevant range:Even small extrapolations should be checked against operational capacity.
Changing spread at higher activity:If costs become less predictable at high volumes, forecasts near the top end are less reliable.

Improvements:

investigate unusual months and decide whether adjustment or exclusion is justified
test alternative drivers (and justify the final choice with business logic)
consider a second driver if operations suggest mixed drivers (e.g., orders and number of delivery routes)
review residuals for patterns and split the model into capacity bands if a step change is suspected

Common pitfalls and misunderstandings

Confusing the highest cost with the highest activity: high–low is based on the highest and lowest activity observations.
Ignoring the scattergraph: a straight-line model should be supported visually before relying on calculations.
Treating a high R² as proof of causation: it indicates fit, not cause-and-effect.
Using the model outside the relevant range: cost behaviour can change once capacity limits are reached.
Rounding too early: premature rounding can distort the implied fixed cost.
Over-reliance on high–low: it is a quick estimate, not a robust model.

Summary

Cost estimation is used to forecast spending, support pricing decisions, and interpret performance. Mixed costs can often be modelled as a fixed element plus a variable rate linked to an activity driver. Scattergraphs help diagnose whether a linear model is reasonable and highlight outliers or step changes before calculations are applied. The high–low method provides a fast estimate but can be distorted because it relies on only two observations. Regression typically provides a stronger estimate because it uses all observations and provides fit statistics such as r and R², but results should still be interpreted in context and within the relevant range.

FAQ

How does the high–low method estimate costs?

It uses the highest and lowest activity observations to estimate the variable cost per unit from the change in cost divided by the change in activity. Fixed cost is then found by substituting the variable rate back into the cost equation.

What should a scattergraph conclusion cover?

A strong answer comments on (1) linearity, (2) outliers, (3) step changes, and (4) whether the spread increases at higher activity, which can make forecasts less reliable near the top end.

How do r and R² differ?

r indicates direction and strength of a linear relationship. R² indicates how much variation in cost is explained by the model. In a single-variable linear model with an intercept, R² = r².

What practical information might be provided for regression?

Questions may provide the regression equation directly, provide regression output (slope/intercept and fit measures), or provide enough data to calculate the best-fit line using least-squares formulas.

Why does the relevant range matter?

Because the cost relationship is based on how the business operated when the data was generated. If the forecast activity requires different capacity, staffing, or methods, the relationship may change and the model may mislead.

Glossary

Cost function
A model that links total cost to activity by splitting cost into a standing amount plus a rate that changes with the chosen driver.

Fixed cost
A cost that, over the short run, does not move in total as activity changes—so the cost per unit falls when output rises (within the relevant range).

Variable cost
A cost that increases as activity increases, typically at a broadly steady amount per unit of the chosen driver (within the relevant range).

Mixed cost
A cost with two parts: a base amount paid for access/capacity, plus an extra amount that depends on usage.

Relevant range
The activity band where operations are broadly comparable to the period that generated the data; outside it, the relationship may change because capacity, prices, or methods change.

High–low method
A quick estimation approach that uses only the highest-activity and lowest-activity observations to estimate a variable rate and then back out the fixed element.

Scattergraph
A plot used to judge whether a straight-line relationship looks sensible and to spot unusual periods before doing calculations.

Outlier
A data point that does not fit the general pattern—often caused by error, one-off events, or a change in operating conditions.

Correlation (r)
A score showing whether cost and activity tend to move together in a straight-line way, and how strongly they do so.

Coefficient of determination (R²)
A measure of how much of the movement in cost the model accounts for; useful for judging fit, but not proof that the driver causes the cost.

Regression line
The straight line that best fits the data overall by minimising total squared prediction errors.

Residual
The gap between actual cost and the model’s estimate for the same activity; residual patterns can indicate non-linearity, missing drivers, or unusual periods.

Written by

AccountingBody Editorial Team

Continue Learning

Multiple Choice Questions

Test your recall