Sampling and Measurement for Business Data
Learning objectives
By the end of this chapter, you should be able to:
- Explain why organisations use sampling instead of measuring every item, focusing on speed, cost and decision usefulness.
- Select a suitable sampling method for a given business question, considering population structure and the level of precision needed.
- Calculate and interpret key sampling measures for proportions, including standard error and confidence intervals.
- Identify common sources of bias and measurement error, and explain how they weaken conclusions.
- Present sample-based findings responsibly, making clear what the results do—and do not—support.
Overview & key concepts
Businesses often need answers faster than “measure everything” would allow. Checking every invoice, surveying every customer, or testing every product unit can be too slow and expensive. Sampling solves this by using a carefully chosen subset to estimate what is happening across the whole population.
Sampling is widely used in finance and operations, for example:
- checking compliance with internal controls (invoices, approvals, expense claims)
- estimating error rates in transaction processing
- quality checks for goods received
- customer service monitoring and complaint categorisation
Sampling does not replace good records. It works best when the underlying data is complete and the measurement rules are consistent.
Population, sampling frame and sample
Population: the full set of items you want to understand (for example, all supplier invoices posted in the last quarter).
Sampling frame: the practical list (or system extract) from which you can select items (for example, an invoice listing from the accounts payable system).
Sample: the subset you actually test (for example, 150 invoices selected from the listing).
To keep these roles clear, use this mental model:
Population (what you care about) → Frame (what you can list) → Sample (what you test)
A large sample is not automatically representative—how you select it matters as much as how many you select.
Sampling frame
A good frame:
- includes every population item exactly once (as far as practicable)
- is up to date
- contains identifiers needed for selection and follow-up (invoice number, supplier, date, value band, department)
If the frame misses items (for example, invoices processed outside the main system), results may be biased even if selection within the frame is random.
Sampling methods used in business
Random sampling
In simple random sampling, each item in the frame has an equal chance of selection. More generally, “random” means each item has a known chance of selection determined by a defined procedure.
Random sampling is a strong default when the population is fairly uniform and you want an overall estimate.
Stratified sampling
The population is split into meaningful subgroups (strata) and sampling is carried out within each subgroup. Common strata include:
- value bands (low/medium/high)
- region
- department
- product line
Stratification is useful when some subgroups are higher risk or behave differently. It improves insight because you can estimate results by stratum and for the population overall.
Systematic sampling
Select every kth item after a random start (for example, every 50th invoice). This can be efficient, but it is risky if the list order has a pattern linked to the outcome (e.g. invoices batched by supplier, department, or time period such as end-of-month processing spikes).
Sampling error and bias
Sampling error
Sampling error is the gap between the sample estimate and the true population value that arises purely because you did not test everything. Even a well-designed sample will have sampling error.
Sampling error generally reduces when sample size increases (though with diminishing returns).
Bias
Bias is a consistent distortion that pushes results in one direction. Bias does not disappear just because the sample is large.
Common bias sources include:
- coverage bias(the frame excludes some items)
- selection bias(the selection method favours certain items)
- non-response bias(relevant when people can refuse to respond, e.g. surveys)
- survivorship bias(items that “drop out” are not captured, such as cancelled transactions omitted from the frame)
Measurement error
Measurement error arises when the compliance test (or other measurement) is applied inconsistently or is poorly defined. Typical causes include:
- unclear definitions (what counts as “approved”?)
- inconsistent evidence standards (what evidence is acceptable?)
- differences between reviewers
- system timestamps not reflecting the true sequence of actions
A useful control lens is to distinguish:
- control design: is the control actually built into the process (e.g. required approval step exists)?
- control operation: is the control applied consistently in practice (e.g. approvals happen on time, by the right person, with proper evidence)?
Measurement rules should be designed so that “not compliant” reflects process weakness, not reviewer inconsistency.
Confidence intervals
A sample gives a best estimate (for example, a compliance rate), but any estimate from a subset will vary from sample to sample. A confidence interval puts numbers around that uncertainty by giving a range produced by a defined method.
“95% confidence” describes the performance of the procedure over repeated use. If the same sampling plan were repeated many times and an interval were calculated each time, most of those intervals (around 95%) would include the true population proportion. For the single interval in your report, the right interpretation is: this range is what our method produces from this sample, given our assumptions and data quality.
Higher confidence levels widen intervals; larger samples narrow them.
Core theory and frameworks
Choosing a sampling approach
Choose the method that best matches the business question:
- Want one overall estimate quickly and the population is fairly uniform → simple random sampling.
- Want reliable coverage of high-risk groups or expect different behaviour across subgroups → stratified sampling.
- Need an efficient method from an ordered list and can check that ordering is harmless → systematic sampling.
When risk is not evenly distributed (for example, high-value invoices), stratified sampling is often more informative because it prevents the sample being dominated by low-risk items.
Sample size and precision
All else equal, larger samples produce narrower confidence intervals (more precision). However, the improvement is not linear: doubling sample size does not halve uncertainty.
For proportion estimates, uncertainty depends on:
- sample size (n)
- how close the proportion is to 50% (uncertainty is usually greatest near 50%)
Practical decisions about sample size should balance cost, time and the impact of getting the conclusion wrong.
Measurement consistency framework (Design–Calibrate–Assure)
Design: Write a short, evidence-based rule for how an item is classified (e.g. what counts as approved, what evidence is acceptable, and what to do when evidence is missing).
Calibrate: Test the rule on a small pilot set, compare how different reviewers apply it, and tighten wording where disagreements appear.
Assure: During the main review, use the same rule for every item, keep an audit trail of judgement calls, and perform an independent re-check on a subset to quantify reviewer agreement.
The goal is that differences in results reflect the process being tested—not differences in who did the testing.
Bias and error mitigation
Common control actions include:
- reconcile the sampling frame back to system totals (counts and value)
- ensure all departments/sources are included before selection
- use a documented random selection method within each stratum
- document how missing items and exceptions are handled
- separate “not compliant” from “not testable” (missing evidence) to avoid mixing outcomes
Confidence interval calculation for a proportion
Simple random-sample approximation (often used as a baseline)
Standard error of a sample proportion:
SE = sqrt( p̂(1 − p̂) / n )
A commonly used 95% confidence interval:
CI = p̂ ± 1.96 × SE
Rule of thumb for the normal approximation: it is usually acceptable when both of these are at least about 5:
- n × p̂
- n × (1 − p̂)
For very small samples or very high/low proportions, alternative interval methods exist, but they are not required for most business applications at this level.
Finite population correction (FPC)
If the sample is a noticeable fraction of the population, uncertainty reduces because you have observed a large share of all items. A practical rule of thumb is that FPC becomes more relevant when the sample is more than about 5–10% of the population.
Reporting and communication
A clear sample-based report should state:
- the question being answered
- the population and time period
- the sampling frame and how completeness was checked
- the sampling method and sample size
- the definition of the measure (what “compliant” means)
- the estimate and the confidence interval (or other uncertainty statement)
- limitations and potential bias risks
- practical next steps (e.g. targeted follow-up in high-risk strata)
Avoid presenting a confidence interval as a guarantee. It is a structured way of showing uncertainty, not a promise of correctness.
Practical applications in business
Sampling supports better decisions in areas such as:
- internal control monitoring (approvals, segregation of duties)
- supplier and expense policy compliance
- service quality monitoring (call handling standards, complaint resolution)
- operational performance checks (cycle times, error rates)
- inventory and goods-received checks
The same principles apply: define the measure clearly, build a complete frame, select a defensible sample, and report uncertainty and limitations.
Worked example
Narrative scenario
A retail company, ABC Retailers, wants to assess the compliance of its supplier invoices with internal approval policies. The company processes thousands of invoices each quarter, making it impractical to review each one individually. Instead, ABC Retailers decides to use sampling to estimate the compliance rate.
The population consists of all supplier invoices posted in the last quarter. The company is concerned that some invoices may bypass approval steps, leading to control failures. To address this, ABC Retailers chooses stratified sampling by invoice value bands (low, medium, high) to ensure that high-value items, which carry more risk, are adequately represented.
Sampling plan: The 150 invoices are selected proportionally across the three value bands (i.e. each band contributes to the sample roughly in line with its share of the population invoice count). This supports using the unweighted combined sample proportion as an overall estimate.
The measurement rule defines compliance as meeting all of the following:
- the approver has the required authority level
- the approval timestamp is earlier than the posting timestamp
- supporting documents are attached in line with policy
ABC Retailers aims to estimate the compliance rate by value band and overall, and to highlight any concentration of non-compliance.
Required
- Calculate the sample proportion of compliant invoices.
- Determine the standard error and 95% confidence interval for the compliance rate.
- Identify potential biases in the sampling method.
- Suggest improvements to the sampling and measurement process.
- Report the findings with appropriate limitations.
Solution
1) Sample proportion
Across the combined sample of 150 invoices, 120 are found compliant.
p̂ = x / n
p̂ = 120 / 150 = 0.80
The estimated compliance rate from the sample is 80%.
2) Standard error and 95% confidence interval
Note on method (important): The confidence interval calculation below uses the simple random-sample approximation. It is reasonable here because the strata were sampled in proportion to their population counts, so the unweighted combined proportion is a reasonable estimate of the population proportion; the standard error formula remains an approximation. If high-value items were deliberately oversampled, the overall rate should be computed as a weighted proportion by stratum size, and uncertainty should be calculated using a stratified variance approach.
Standard error (simple approximation):
SE = sqrt( p̂(1 − p̂) / n )
SE = sqrt( 0.80 × 0.20 / 150 ) = sqrt( 0.16 / 150 ) = sqrt(0.0010667) ≈ 0.0327
95% confidence interval:
CI = p̂ ± 1.96 × SE
CI = 0.80 ± 1.96 × 0.0327 ≈ 0.80 ± 0.0641
So:
- lower bound: 0.80 − 0.0641 = 0.7359
- upper bound: 0.80 + 0.0641 = 0.8641
The 95% confidence interval is approximately 73.59% to 86.41%.
3) Potential biases
Even with stratified sampling, bias can arise if:
- coverage is incomplete: invoices from certain sources (manual entries, certain departments, separate systems) are missing from the frame
- strata are misclassified: invoices are placed in the wrong value band due to data errors
- selection within strata is not properly random: for example, choosing invoices with clearer documentation
- measurement is inconsistent: reviewers interpret “supporting documents” differently, or apply different evidence standards
4) Improvements to sampling and measurement
Practical improvements include:
- Reconcile the frame to system totals (invoice count and total value) to confirm completeness before selection.
- Use a documented random selection method within each value band, retaining evidence of how selection was performed.
- For better diagnosis, calculate and report compliance ratesper value bandas well as overall.
- Tighten measurement reliability using the Design–Calibrate–Assure approach:
- clarify evidence requirements and how to treat missing evidence
- pilot the test and align reviewer judgements
- perform an independent re-check on a subset to quantify agreement
5) Reporting the findings with limitations
A suitable report statement would be:
- Estimated compliance rate (combined sample): 80%.
- 95% confidence interval (approximation under proportional allocation): approximately 73.59% to 86.41%.
- The organisation should review results by value band to see whether non-compliance is concentrated in higher-risk invoices.
- Limitations:
- the result is sample-based and subject to sampling error
- any missing invoices from the sampling frame could bias results
- inconsistent interpretation of evidence could distort the measured compliance rate
Treatment of “not testable” items (if they occur): If some invoices cannot be tested (for example, missing documentation), report “not testable” as a separate outcome and state the rule used for the compliance rate. Common approaches are (i) exclude “not testable” from the denominator, or (ii) treat “not testable” as non-compliant. Whichever approach is used, it should be applied consistently and explained.
Interpretation of the results
An 80% sample compliance rate indicates that most tested invoices met the policy definition. However, the confidence interval shows that the true population rate could plausibly be materially lower (around 73.59%) or higher (around 86.41%).
Because the sampling is stratified by value band, the most decision-useful follow-up is usually:
- compare compliance rates across bands
- investigate drivers of non-compliance in any weaker band
- check whether weaknesses reflect control design gaps (control missing) or operating failures (control inconsistently applied)
Common pitfalls and misunderstandings
- Mixing up population, frame and sample: the frame is the practical list you sample from, not the population itself.
- Assuming “random” always means “equal chance”: equal chance applies to simple random sampling; other designs may use unequal but known probabilities.
- Ignoring measurement ambiguity: if evidence rules are unclear, results can reflect reviewer differences rather than process performance.
- Overstating confidence intervals: an interval is produced by a method and assumptions; it is not a guarantee for one specific sample.
- Using systematic sampling without checking ordering risk: list patterns (including time-based batching) can create unrepresentative selection.
- Reporting only the overall rate: stratification is most valuable when you also report differences by subgroup.
- Failing to separate “not compliant” from “not testable”: missing evidence may need a separate category and a stated denominator rule.
Summary
Sampling enables organisations to reach timely, cost-effective conclusions when full measurement is too slow or costly. Strong sampling work depends on:
- a complete sampling frame
- an appropriate selection method (often stratified where risk is uneven)
- clear and consistently applied measurement rules
- transparent reporting of uncertainty and limitations
Confidence intervals help communicate the reliability of sample-based estimates, particularly when decisions depend on whether a rate is comfortably above or below an acceptable threshold.
FAQ
Why is sampling preferred over measuring the whole population?
Because it is often faster and cheaper while still producing useful evidence for decisions. Sampling allows an organisation to estimate population performance with an explicit statement of uncertainty, rather than delaying action until every item has been checked.
How do you choose the right sampling method?
Match the method to the risk and structure of the population. Use simple random sampling for broad estimates where items are similar. Use stratified sampling when different subgroups matter or when high-risk items must be properly represented. Use systematic sampling only when the list order is unlikely to be linked to the outcome.
What are common sources of bias, and how can they be reduced?
Typical sources include incomplete sampling frames, non-random selection, and inconsistent measurement rules. Reduce bias by reconciling the frame to system totals, using defensible selection methods within strata, and tightening measurement definitions through pilots, training and review.
How do confidence intervals help interpretation?
They show the plausible range of the population value given a sample, rather than presenting a single point estimate as if it were exact. This supports better judgement, especially where decisions depend on thresholds and risk tolerance.
What are the implications of measurement error?
Measurement error can misclassify items and produce the wrong conclusion (either overstating or understating the true rate). Clear definitions, evidence standards, training and consistency checks reduce this risk.
Why must limitations be reported?
Because sample-based findings are uncertain and can be affected by bias and measurement weaknesses. A clear statement of limitations helps decision-makers use the results appropriately and prevents overconfidence in a single estimate.
Summary (Recap)
This chapter explained why sampling is essential for practical business decisions when full measurement is too slow or costly. It set out key sampling approaches—random, stratified and systematic—and showed how sampling error, bias and measurement error affect reliability. It also demonstrated how to estimate a compliance rate and build an approximate 95% confidence interval under proportional stratified allocation, while explaining when a weighted stratified approach would be required. The central message is that sampling is only as good as the frame, the selection method and the consistency of measurement.
Glossary
Population
The full set of items you want to draw conclusions about (e.g. all invoices posted in the last quarter).
Sampling frame
The practical list or system extract from which the sample is selected.
Sample
A subset selected for testing and used to estimate population performance.
Random sampling
Selection using a defined random procedure where each item has a known chance of selection. In simple random sampling, each item in the frame has an equal chance.
Stratified sampling
Selection performed separately within defined subgroups (such as value bands) to ensure representation and improve insight.
Systematic sampling
Selection of every kth item after a random start; efficient but sensitive to patterns in list ordering.
Sampling error
The difference between a sample estimate and the true population value arising because only part of the population is tested.
Bias
A consistent distortion that pushes results away from the true population value (often caused by frame or selection weaknesses).
Measurement error
Misclassification or inaccuracy caused by unclear definitions, inconsistent evidence standards, reviewer differences or flawed data capture.
Confidence interval (CI)
A calculated range around an estimate that shows plausible values for the population parameter based on the sample.
Confidence level
The long-run rate at which intervals produced by the same method would contain the true population value (commonly 95%).
Precision (margin of error)
The half-width of the confidence interval; smaller margins indicate a tighter estimate.
Finite population correction (FPC)
An adjustment that can reduce estimated uncertainty when the sample is a noticeable fraction of the population (often relevant above about 5–10%).
Outlier
An observation that is unusually high or low compared with the rest and may reflect a genuine extreme or an error in data capture.
Test your knowledge
Practice questions specifically for this topic.
Written by
AccountingBody Editorial Team