Sampling and Data Collection for Business Questions
This chapter explores the essential concepts and techniques of sampling and data collection for business questions. It covers the selection of appropriate…
Learning objectives
By the end of this chapter you should be able to:
- Explain why sampling is used in business decisions, including estimating population values and supporting comparison questions (for example, comparing periods, products, or locations).
- Distinguish between probability sampling methods (simple random, systematic, stratified, cluster) and identify suitable applications.
- Calculate and interpret key sampling metrics using plain-text equations (sampling interval, response rate, point estimate, standard error, margin of error, confidence interval).
- Interpret sample results with appropriate caution, recognising sampling error, selection bias, non-response bias and measurement error, and propose practical controls to reduce these risks.
Overview & key concepts
Businesses often need reliable answers quickly, but testing every item in a population can be impractical. Sampling is a disciplined way to collect evidence from a subset of items (a sample) and use it to draw conclusions about the whole group (the population). When designed and executed well, sampling supports operational and financial decisions—for example:
- estimating defect rates (quality costs, rework, warranty claims),
- testing invoice accuracy (control effectiveness and compliance),
- assessing customer satisfaction (service improvements and marketing decisions),
- checking inventory condition (obsolescence and shrinkage indicators).
Sampling itself is not recorded through journal entries. However, sample-based findings can feed into accounting estimates and management action (for example, warranty provision reviews, inventory write-down investigations, or credit loss assessment inputs). The key is to use a representative sampling frame, an appropriate selection method, and consistent measurement rules.
Population and sampling frame
Population
The full set of items relevant to the question (for example, all units produced this week, all invoices raised in a quarter, or all customers who purchased a product in December).
Sampling frame
The source list from which you actually select the sample (for example, the production batch log, the invoice register, or the CRM customer list).
A good sampling frame should match the population as closely as possible. If the frame is incomplete or outdated, results may be biased even if selection within the frame is random.
Example: If you want to test invoice accuracy for all invoices raised in January, but the frame excludes invoices raised offline and uploaded later, your sample may under-represent higher-risk invoices and overstate accuracy.
Sample, parameter, and statistic
- Asampleis the subset selected for observation or measurement.
- Aparameteris the true value for the population (unknown), such as the actual defect rate across all units produced.
- Astatisticis the value computed from the sample, such as the sample defect rate.
For a proportion-type question (defective vs not defective, accurate vs inaccurate), use:
- Population proportion (parameter):p(unknown)
- Sample proportion (statistic):p̂ = x/n
- wherex is the count of items with the attribute of interest(for example, defects) andnis the sample size.
Sampling is about inference: using a sample statistic as a practical estimate of the population parameter, while acknowledging uncertainty.
Sampling error and bias
Sampling error
The difference between a sample statistic and the true population value that arises because only part of the population is observed. Sampling error exists even when the sample is properly selected.
Sampling error generally reduces as sample size increases, but improvements diminish: precision improves with √n, so the benefit from each additional observation reduces as n becomes larger.
Bias
A systematic distortion that pushes results in a particular direction. Bias is caused by how items are selected or measured, not by chance.
Common business sources include:
- sampling only one shift or one machine,
- surveying only online customers when the population includes in-store customers,
- a sampling frame that excludes certain invoice types,
- inconsistent definitions of “defect” or “error”.
Bias is often more harmful than sampling error because it can produce consistently misleading results even with a large sample.
Stratification and clustering
Stratified sampling
The population is divided into meaningful groups (strata) and a sample is taken within each stratum.
Benefits:
- ensures coverage across key segments,
- often improves precision when strata behave differently (for example, defect rates vary by product line or shift).
Cluster sampling
The population is divided into natural groups (clusters) such as stores, branches, or delivery routes. A selection of clusters is sampled, and data is collected within selected clusters.
Benefits:
- practical and cost-effective where travel or access costs are high.
Risk:
- results can be less precise than a simple random sample of the same size if observations within a cluster are similar.
- If clustered, treat intervals from the simple SE approach as too narrow unless many clusters are sampled.
Measurement error and confidence
Measurement error
Error arising because the data collected does not reflect the true attribute being measured. Causes include ambiguous definitions, inconsistent recording, poor question design, faulty instruments, or human error.
Measurement error can create bias, especially if it affects some groups more than others (for example, one inspector classifies borderline faults differently).
Confidence and uncertainty
Point estimates should be paired with an indication of uncertainty. For a proportion, a commonly used approximation is:
- Sample proportion: p̂ = x/n
- Standard error (SE) ≈ sqrt( p̂ * (1 − p̂) / n )
- Margin of error (ME) ≈ z * SE
- where z depends on the confidence level (approximately 2 for a 95% style interval)
- Approximate confidence interval ≈ p̂ ± ME
A confidence interval is a communication tool: it puts a plausible band around your estimate based on how sampling typically varies at this sample size. If you repeated the same approach many times, the bands you calculate would usually capture the true rate. In business terms, it helps you judge whether the estimate is precise enough to act on—rather than treating a single percentage as exact.
Assumptions behind the interval (important)
The simple normal-approximation interval for a proportion is most suitable when:
- The sample is large enough for the approximation to behave well.
- Rule-of-thumb: n·p̂ and n·(1−p̂) should not be small (often both ≥ around 5 to 10).
- Selection is close to random and observations are effectively independent.
- If the method introduces dependence (for example, heavy clustering), the simple SE can understate uncertainty.
- The sample is not a tiny or overwhelming fraction of the population.
- Whenn is a noticeable fraction of N(often >5–10%), afinite population correction (FPC)can be considered; it becomes more relevant as the fraction increases (for example, around 20%+).
- FPC factor ≈ sqrt( (N − n) / (N − 1) )
- This reduces the SE because sampling without replacement from a finite population adds less uncertainty than sampling from an effectively infinite one. Mentioning FPC is a sensible refinement; using it is optional unless specifically required.
Core theory and frameworks
Simple random sampling
Every item in the sampling frame has an equal chance of selection. This method is appropriate when:
- a complete list exists,
- the population is reasonably homogeneous for the purpose of the question.
Selection can be evidenced by retaining the random-number output (or an audit trail of the selection process) showing that item choice was not influenced by judgement.
Systematic sampling
Systematic sampling selects items using a fixed interval after a random start:
- k = N / n
- where N is the frame size and n is the intended sample size.
In practice the achieved sample size will be approximately N/k; it may differ slightly depending on rounding, list ordering and where the selection sequence ends.
Evidence/control points (what you should be able to show in working papers):
- how k was determined and whether it was rounded or n adjusted,
- how the random start was generated (between 1 and k),
- that items were selected consistently (start, start + k, start + 2k, …) until the end of the list.
If k is not an integer, decide whether to round k or adjust n, and document the choice. Rounding affects the achieved sample size and can create small deviations from the intended sampling rate.
Systematic sampling is efficient, but can be distorted by periodic patterns in the list. If the ordering is linked to risk (for example, grouped by branch, salesperson, or time blocks), consider stratification or randomising the list order.
Stratified sampling
Stratified sampling divides the population into strata and samples within each. Allocation can be:
- proportional to stratum size, or
- weighted toward higher-risk or more variable strata.
This is often the best approach where performance differs across segments and you need coverage across each segment for decision-making.
Cluster sampling
Cluster sampling selects groups (clusters) and measures within them. It is used when access costs are high and individual random sampling is impractical.
Practical caution on uncertainty: results from clustered designs typically have wider uncertainty than a simple random sample of the same size. Unless many clusters are sampled, treat simple SE-based intervals as optimistic (too narrow).
Non-probability sampling
Convenience sampling, judgement sampling and voluntary response sampling can be useful for quick insights and early-stage exploration. However, selection probabilities are unknown, so results should not be treated as precise estimates of the population.
Avoid projecting results to the whole population without strong justification.
Reducing errors in data collection
To improve reliability:
- define the measure (for example, what counts as a defect; what counts as an invoice error),
- standardise procedures (timing, unit of analysis, classification rules),
- design questions carefully (avoid leading wording; keep questions unambiguous),
- train and calibrate collectors (especially where judgement is involved),
- use validation checks (range checks, reconciliations, spot reviews),
- record non-response and missing data (do not silently drop observations).
Comparing two groups using samples
Business questions often involve comparison, not just single estimates. Examples include:
- “Is this week’s defect rate higher than last week’s?”
- “Do two branches differ in invoice accuracy?”
- “Did a process change reduce average rework hours?”
For fair comparisons, keep sampling frames and measurement rules consistent and consider stratifying to control for mix changes (for example, product mix or customer mix changing between groups).
Comparing two proportions (high level)
If each item is classed into two outcomes (defective/not defective, accurate/inaccurate), calculate a sample proportion for each group:
- Group A: p̂A = xA / nA
- Group B: p̂B = xB / nB
- Difference in sample proportions: D = p̂A − p̂B
Practical interpretation:
- If D is small, the difference may be normal sampling variation.
- If D is large and consistent with operational knowledge (for example, a known machine issue or process change), it may indicate a genuine performance difference requiring action.
A quick sense-check (not a formal test) is to compute an uncertainty range for each group and see whether the ranges overlap materially. Use this only as screening: overlapping ranges do not prove “no difference”, and non-overlap does not by itself prove a “real” difference if the method or measurement rules differ.
In business terms, a difference is worth escalating when:
- it is unlikely to be explained purely by sampling variation at the observed sample sizes, and
- it is large enough to matter operationally or financially (decision relevance), not merely detectable.
Comparing two means (high level)
When measuring a numeric variable (for example, time to process an invoice, rework hours per unit, customer waiting time):
- calculate the sample mean for each group, and
- focus on both the size of the difference and the consistency of measurement.
You can communicate uncertainty by:
- stating sample sizes and acknowledging variability,
- avoiding over-precision where data is noisy,
- using a practical threshold: “Is the difference large enough to change decisions?”
Common interpretation pitfalls in comparisons
- assuming any difference implies a real change (sampling variation can produce differences),
- ignoring changes in mix (product mix, customer mix, shift mix),
- comparing groups collected using different measurement rules (creates measurement bias),
- failing to check whether frame coverage differs between groups.
Worked example
Narrative scenario
A manufacturing company wants to estimate the defect rate of its weekly output. It produces 1,000 units per week and inspects a sample of 200 units. In the sample, 18 units are defective.
The company also runs a customer satisfaction survey sent to 500 customers and receives 140 usable responses.
Finally, the company plans to test invoice accuracy by selecting every 30th invoice from a list of 4,800 invoices.
Required
- Calculate the sample defect rate and an approximate 95% margin of error.
- Determine the response rate for the customer survey.
- Confirm the systematic sampling interval and the implied sample size for invoice testing.
- Interpret the results and suggest improvements.
Solution
1) Defect rate and margin of error
Sample defect rate (point estimate)
p̂ = x / n = 18 / 200 = 0.09 = 9%
Standard error (approximation for a proportion)
SE ≈ sqrt( p̂ * (1 − p̂) / n )
= sqrt( 0.09 * 0.91 / 200 )
0.09 * 0.91 = 0.0819
0.0819 / 200 = 0.0004095
SE ≈ sqrt(0.0004095) ≈ 0.0202
Approximate 95% margin of error (z ≈ 2)
ME ≈ 2 * 0.0202 ≈ 0.0404 = 4.04%
Approximate confidence interval
0.09 ± 0.0404 → 0.0496 to 0.1304
≈ 5.0% to 13.0%
Optional refinement: finite population correction (FPC)
Here n/N = 200/1,000 = 20%, so an FPC could be considered as a refinement:
FPC ≈ sqrt( (N − n) / (N − 1) )
= sqrt( (1,000 − 200) / (1,000 − 1) )
= sqrt( 800 / 999 ) ≈ sqrt(0.8008) ≈ 0.895
Adjusted SE ≈ 0.0202 * 0.895 ≈ 0.0181
Adjusted ME ≈ 2 * 0.0181 ≈ 0.0362 (3.62%)
Adjusted interval ≈ 9.0% ± 3.6% → about 5.4% to 12.6%
Interpretation
The best estimate of the defect rate is 9%. A reasonable 95%-style interval is roughly 5% to 13% (or slightly narrower with an FPC). The interval is wide enough to justify investigating process drivers (shift, machine, product type) and considering stratified sampling if defect risk is not uniform.
2) Customer survey response rate
Response rate = responses / customers contacted
= 140 / 500 = 0.28 = 28%
Interpretation
A 28% response rate creates a material risk of non-response bias. The direction of bias is unknown without follow-up evidence.
3) Systematic sampling interval and implied sample size
Selecting every 30th invoice implies:
- Systematic interval: k = 30
- Approximate sample size: N/k = 4,800 / 30 = 160 invoices
In practice, the achieved sample size will be close to this figure, but it can differ slightly depending on the random start and where the sequence ends.
Implementation reminder
Use a random start between 1 and 30, then select every 30th item consistently.
4) Improvements and practical recommendations
For the defect estimate
- If defects differ by shift, machine or product type, stratify (for example, sample within each shift or each product line).
- Tighten and document the defect definition and train inspectors to reduce measurement error.
- If tighter precision is required, increase sample size, recognising diminishing gains: precision improves with √n.
For the survey
- Use follow-ups, mixed channels, or incentives to reduce non-response.
- Compare respondent vs non-respondent characteristics (where known) to assess representativeness.
- Ensure questions are unambiguous and linked to decisions (what would change if satisfaction is low?).
For invoice accuracy
- Check for periodic patterns in the invoice list (grouping by branch, salesperson, or time). If patterns exist, systematic sampling may distort results.
- Consider stratifying by branch or invoice type if error risk differs across segments.
- Define “error” categories (pricing, tax coding, customer details, authorisation) and record error types to support corrective action.
Common pitfalls and misunderstandings
- Confusing population with sampling frame:frame coverage risk can bias results even with random selection.
- Ignoring non-response bias:low response rates threaten representativeness; direction is unknown without follow-up.
- Overlooking measurement error:unclear definitions and inconsistent recording can create biased data.
- Misapplying systematic sampling:failing to use a random start or ignoring periodic patterns can distort results.
- Overstating precision:point estimates should be paired with uncertainty and limitations.
- Using convenience sampling as proof:results are indicative only when selection probabilities are unknown.
- Poor documentation:lack of evidence about the method reduces reliability and defensibility.
- Insufficient training:inconsistent classification rules undermine comparability across data collectors or periods.
Summary and further reading
Sampling provides a structured way to answer business questions when full-population testing is impractical. Probability-based methods support stronger inference; non-probability methods are best treated as indicative. Reliable sampling depends on a representative sampling frame, an appropriate selection method, consistent measurement rules, and clear documentation. Results should be interpreted with attention to sampling error, bias risks, and the assumptions behind confidence intervals—especially when comparing groups.
For further development, consult introductory business statistics texts, quality control guidance on sampling design, and practical survey design references focused on reducing non-response and measurement error.
FAQ
What is the difference between a population and a sampling frame?
The population is the full group you want conclusions about. The sampling frame is the list you actually sample from. If the frame excludes items that belong to the population, results may be biased even if the selection method is random.
How does stratified sampling differ from cluster sampling?
Stratified sampling samples within each subgroup to ensure coverage across key segments. Cluster sampling selects a subset of groups and measures within those groups for practicality. Stratified sampling usually improves precision when subgroups differ. Cluster sampling can reduce cost but often needs more caution when expressing uncertainty.
What are common sources of bias in sampling?
Selection bias (incomplete frame), non-response bias (surveys), measurement error (inconsistent definitions), and timing bias (sampling at convenient but unrepresentative times). Mitigation includes improving frame completeness, follow-ups, standardised definitions, training, and documentation.
Why is systematic sampling efficient, and what are its risks?
It is efficient because selection is quick once an interval is set. Risks arise if the list has repeating patterns that align with the interval. Use a random start, check list ordering for periodicity, and document how k and the start point were chosen.
How can measurement error be reduced in data collection?
Use clear definitions, standard procedures, training, and validation checks. Where judgement is involved, calibrate between collectors. Record missing data and reasons for non-response so you can assess whether the data is systematically distorted.
Summary (Recap)
This chapter explained how sampling supports business decisions when full-population testing is impractical. It defined population, sampling frame, parameters and sample statistics, and distinguished sampling error from bias. It covered probability methods (simple random, systematic, stratified, cluster) and non-probability approaches, with practical controls to reduce measurement error and bias. It also introduced comparison questions (two groups or two periods) at a high level, focusing on practical interpretation and the need for consistent frames and measurement rules. The worked example demonstrated defect-rate estimation with uncertainty, survey response rates, and systematic sampling for invoice testing.
Glossary
Population
The full set of items relevant to a business question (for example, all invoices in a period).
Sampling frame
The list or source from which the sample is selected. If it differs from the population, results may be biased.
Sample
The subset of items selected from the frame for measurement or inspection.
Parameter (p)
The true population value being estimated (unknown), such as the population defect rate.
Statistic
A value calculated from the sample used to estimate the parameter.
Sample proportion (p̂)
The sample-based estimate of a population proportion: p̂ = x/n.
Sampling error
Random difference between a sample statistic and the population parameter due to observing only part of the population.
Bias
Systematic distortion in results due to selection or measurement issues (for example, incomplete frames, non-response, inconsistent classification).
Stratified sampling
Sampling method that divides the population into subgroups and samples within each to ensure representation and often improve precision.
Cluster sampling
Sampling method that selects groups (clusters) and measures within selected groups; practical but uncertainty is often understated by simple formulas unless many clusters are sampled.
Measurement error
Error caused by how data is collected or recorded, including unclear definitions, inconsistent methods, or faulty instruments.
Confidence interval
A range around a sample estimate that summarises typical sampling variation for the method and sample size, often expressed as estimate ± margin of error.
Written by
AccountingBody Editorial Team
Continue Learning