Why We Sample And What Makes A Sample Useful
Population
The whole group you want information about (for example, all students in your year group).
Sample
A subset of the population from which data are collected.
- We take samples because measuring everyone can be too slow, expensive, or impractical.
- The goal is to use the sample to make an inference (a conclusion) about the population.
- Whether that inference is reasonable depends on how the sample was chosen.
Sampling frame
A list (or other complete listing) of all members of the population from which a sample can be selected.
A sample is most convincing when it is representative, meaning it reflects important features of the population (such as age ranges, classes, or product types) in a similar way to the population.
Simple Random Sampling Gives Each Member An Equal Chance
- In a simple random sample, every member of the population has an equal chance of being selected, and the selection is genuinely random.
- This reduces selection bias because you are not choosing people based on convenience.
- Common ways to select a simple random sample include pulling names from a hat (when the group is small) or using a calculator/computer random number generator (RNG).
How To Take A Simple Random Sample (Method)
- Define the population clearly.
- Create a sampling frame (a complete list).
- Number the members from 1 to $N$.
- Use an RNG to generate $n$ distinct numbers.
- Select the individuals with those numbers.
- "Random" does not mean "haphazard."
- Choosing "whoever is nearby" or "the first 20 people I see" is usually convenience sampling, not random sampling.
Systematic Sampling Uses A Random Start And A Fixed Interval
- A systematic sample is formed from an ordered sampling frame by:
- choosing a random starting point, then
- selecting items at regular intervals.
- If the sampling frame has $N$ items and you want a sample of size $n$, a common interval is $$k \approx \frac{N}{n}$$
- Choose a random start from 1 to $k$, then take every $k$th item.
- A school has an ordered list of 240 students and you want a sample of 30.
- Compute $k = \frac{240}{30} = 8$.
- Pick a random start between 1 and 8, say 5.
- Then select students numbered 5, 13, 21, 29, and so on.
- Systematic sampling can be biased if the list has a repeating pattern that interacts with your interval.
- For example, if a list alternates boy, girl and you select every 2nd student, your sample may contain only one gender.
Stratified Sampling Preserves Proportions Of Key Groups
- Use a stratified sample when the population contains important subgroups (called strata) and you want them fairly represented.
- Here, "fairly represented" means the proportions in the sample match the proportions in the population.
- For example, if 60% of the population is in one group, about 60% of the sample should come from that group.
How To Build A Stratified Sample
- Choose the strata (for example, class, age band, or car type).
- Find each group size in the population.
- Decide the overall sample size $n$.
- Compute each group's sample size using $$ni = \frac{Ni}{N}\times n$$
- From each group, take a simple random sample of size $n_i$.
- You want 10 students from a year group of 80.
- They are taught in five classes of 16.
- Sampling fraction: $\frac{10}{80} = \frac{1}{8}$.
- From each class: $\frac{1}{8}\times 16 = 2$ students.
- So, take a simple random sample of 2 students from each class.
Car park inspection
- Check 20 cars out of 500. The lot has 100 compacts, 250 estates, 150 saloons.
- Sampling fraction: $\frac{20}{500} = \frac{1}{25}$.
- So sample:
- compacts: $\frac{1}{25}\times 100 = 4$
- estates: $\frac{1}{25}\times 250 = 10$
- saloons: $\frac{1}{25}\times 150 = 6$
- Then take a simple random sample within each car type.
Stratified sampling is especially useful when you expect different strata to behave differently (for example, different classes taught by different teachers). It often improves representativeness without needing a huge sample.
Quota Sampling Is Practical When The Sampling Frame Is Unknown
- Sometimes a sampling frame is unknown or does not exist (for example, surveying "people in the downtown area").
- In these cases, fully random methods may be impossible.
- A quota sample is used by collecting responses until you have enough data in each chosen category (for example, 15 local men and 15 local women).
- You might stand somewhere and ask people who are willing to talk until each quota is reached.
- An interviewer needs a sample of 50 students to survey methods of transport.
- They go to a bus stop near a school and ask the first 50 students they see.
- This is quota sampling.
- It is quick and avoids problems of not finding enough participants, but it is not random and is likely to over-represent students from that school and students who travel by bus.
- Quota sampling can produce strong selection bias.
- Results may depend heavily on where you stand, what time you collect data, and who is willing to respond.
Response Rate Affects How Much You Can Trust A Survey
- Even if you choose a careful sampling method, not everyone selected will respond.
- This matters because non-response can distort results.
- If you invite $n$ people and receive $r$ responses, then $$\text{response rate} = \frac{r}{n}\times 100\%$$
- A low response rate can create non-response bias if the people who do not respond are systematically different from those who do.
When evaluating a survey, comment on:
- whether selection was random or non-random,
- whether important groups were represented (or should have been),
- likely bias (who is over-represented or under-represented), and
- response rate and possible non-response bias.
Sample Size And Reliability: Bigger Is Usually Better (But Not Always)
- In general, a larger sample size makes generalizations more reliable because it reduces sampling variability (the natural differences you would get if you repeated the sampling process).
- However, bias does not disappear just because the sample is large.
- A large biased sample can give a very precise answer to the wrong question.
- Sampling is like tasting soup: a bigger spoonful helps, but if you only scoop from the oily top, your taste will still be misleading.
- You need a well-chosen spoonful.
- In real investigations you often balance time and cost, access to a sampling frame, and accuracy.
- The best method is usually the one you can justify as most representative given constraints.
- The "best" sampling technique depends on what you know about the population and what could introduce bias.
- Leaf lengths on a type of tree: a simple random sample of leaves across many trees (or systematic sampling along branches), making sure you do not only measure easy-to-reach leaves.
- How students travel to school: simple random sample from the student list, or stratified by year level/class if you think travel differs by group.
- Average mark in a year group: simple random sample of scores, or stratified by class if classes have different teachers or different test conditions.
- Sampling school employees (150 teachers, 20 admin, 30 facilities) for 20 people: stratified sampling by job role (proportional allocation).
- Average time in a 10 km race: systematic sampling of finishers (every $k$th finisher) or simple random sampling from the registration list.
- Explain the difference between a population and a sampling frame.
- Why is a random start essential in systematic sampling?
- What does "fairly represented" mean in stratified sampling?
- Give one strength and one limitation of quota sampling.
- A survey invited 200 people and got 74 responses. Calculate the response rate.