If the work does not reach a standard outlined by the performance level descriptors, 0 marks are awarded for this criterion.
Recording Data
- Data can be classified as quantitative or qualitative. Both types of data should be included in this section.
- The amount of data collected will depend on the type of investigation and the sampling rate.
- Some investigations will naturally produce more raw data than others.
If large amounts of raw data are collected, it is recommended that students only present a sample of the data in the body of the essay and include the whole table of raw data in the appendix.
- Whatever the amount of data collected, it should be presented in an appropriate titled data table (or results table) and labelled as Table 1, Table 2, etc.
- A data table must include the units of the measurement together with the absolute uncertainty.
- The data must be recorded to the correct number of decimal places.
- This is determined by the measuring device and should be consistent for the same device.
- Quantitative data should also be included in the results table, depending on the protocol used.
SI units should be used throughout the report.
ExampleResearch question:
“ How does altitude (100m to 600 m) affect the pH level, colour and width size of Nebbiolo wine grapes cultivated in Argentina’s Mendoza region?”
- Note that the above table includes the uncertainty of each measurement as well as the units.
- Qualitative data are also included in form of a paragraph (format table is also allowed).
- Although this example includes calculated means for each sample at different altitudes, the candidate states that the complete raw data table from where these means were calculated is included in the appendix.
- This is a correct way to show the data collection in an EE.
Data Processing and Graphing
- Data processing involves transforming the raw data into different forms that allow the relationship between the variables to be determined and the research question to be answered.
- This could include finding the average when multiple trials have been conducted, calculating an enthalpy change or rate of reaction, plotting a graph and determining a best-fit line, and conducting appropriate statistical analysis.
The use of spreadsheets might be appropriate here as they allow for the easier processing of data.
- Because the scope of possible EE topics is so large, it is not possible to give explicit instructions for every form of data processing.
- Therefore, it is recommended that students do their own research to determine the type of data processing required for their investigation.
- Whatever the form of data processing used in the investigation, it is recommended to present an example calculation which is clear and easy to follow.
Pay particular attention to the use of significant figures and decimal places in the calculations.
- Graphing is an important part of data processing.
- A graph provides a visual representation of the processed data and makes it easier to determine any relationships or trends in said data.
An instance of a graph is shown below for an investigation to determine how concentration of substrate affects the rate of reaction of a specific enzyme:
- Note that the previous graph has a title which starts with Figure 1, labelled axes with units, and a best-fit line.
- From the best-fit line, we can see that the rate of decomposition is directly proportional to the concentration of H₂O₂.
Research question:
''How is the height of dough used to make bread with yeasts affected by the addition of different volumes of homemade fungicide made out of onion?''
- The graph above clearly presents the relevant variables in alignment with the stated Research Question.
- However, to enhance its analytical rigor, error bars representing the standard deviation should be included for each data point.
- Additionally, the coefficient of determination (R²) should be calculated and displayed to assess the strength of the correlation.
Research question
“How do different volumes of acetic acid affect the pH of the solutions where Phaseolus vulgaris Haricot beans are placed, producing a change of weight in the beans due to osmosis?”
Example 3 shows a correct application of a statistical analysis, using a sufficient set of data and with sufficient justified scientific support.
Uncertainties and Errors
Random Errors
- Whenever a measurement is taken in the laboratory or field work, there is an uncertainty associated with that measurement.
- These are known as random uncertainties or random errors.
- Random errors are caused by the limit of precision of the apparatus used to take the measurement and will cause the measured value to be either higher or lower than the actual value.
- These uncertainties are an unavoidable part of the measuring process and cannot be completely eliminated.
- However, they can be reduced by conducting repeat trials and taking an average and by using more precise apparatus.
- Random errors will cause the measured value to be either higher or lower than the actual value.
- They are usually expressed together with the measured value as a range using the ± sign and are known as absolute uncertainties.
- For instance, the mass of a substance measured on a mass balance could be expressed as: $$2.50 \pm 0.01 \text{ g}$$
- Note that this mass is recorded to the same precision as the absolute uncertainty (in this example, two decimal places).
- This tells us that the actual mass of the substance lies somewhere between 2.49 g and 2.51 g.
- Another instance is the measuring of a volume of solution using a graduated cylinder, which is expressed as: $$50.0 \pm 0.5 \text{ cm}^3$$
- Once again, we see that the volume is recorded to the same precision as the absolute uncertainty (one decimal place).
- The absolute uncertainty of a piece of apparatus will differ depending on the precision of the apparatus.
- More precise apparatus will have a lower absolute uncertainty, and less precise apparatus, a higher absolute uncertainty.
For instance, a mass recorded on a mass balance that can measure to four decimal places is more precise than one that can measure to two decimal places and therefore has a lower absolute uncertainty, as can be seen below.
$$2.50 \pm 0.01 \text{ g}$$
$$2.5000 \pm 0.0001 \text{ g}$$
- The measurement that is below is more precise and has a lower absolute uncertainty, which equates to a lower random error.
- The absolute uncertainty of a piece of apparatus can sometimes be found in the apparatus itself.
- If not, the absolute uncertainty can be determined as follows:
- For analogue apparatus, the absolute uncertainty can be taken as half the smallest scale division. If the smallest scale division is 1 cm³, the absolute uncertainty is ± 0.5 cm³.
- For digital apparatus, the absolute uncertainty can be taken as the smallest scale division. If the smallest scale division is 0.01 g, the absolute uncertainty is ± 0.01 g.
Systematic Errors
- Systematic errors are caused by problems or flaws with the experimental design.
- They cause the measured value to be consistently higher or consistently lower than the actual value.
- Unlike random errors, they cannot be reduced by conducting repeat trials.
- However, they can be reduced or eliminated by modifying the experimental design.
Some instances of systematic errors are:
- Heat loss in an investigation to determine an enthalpy change
- The loss of a product, such as a gas, from a leaking tube
- Forgetting to zero a mass balance or to calibrate a measuring device
- Reading from the top of the meniscus instead of the bottom when using a graduated cylinder
Accuracy and Precision
- Accuracy refers to the closeness of the measured value to the actual value.
- Measured values with high accuracy have smaller systematic errors and vice versa.
- Precision refers to the number of significant figures, or decimal places, there are in a measured value.
- Measured values with higher precision have lower random errors.
- Take, for instance, the data shown in the table below for three experiments to measure the enthalpy change of neutralization ($ΔH_n$).
- The literature value is −57.0 kJ mol⁻¹.
- Experiment 1 has high accuracy and high precision.
- Experiment 2 has low accuracy but high precision.
- Experiment 3 has the highest accuracy but the lowest precision.
Uncertainties of averaged values
TipIt is recommended that students conduct repeat trials in their investigation which will require repeat measurements of the dependent variable.
- The average of these values is then taken and the uncertainty of the averaged value must be considered.
- For averaged values, the uncertainty should be the same as for the individual values.
Representing uncertainties graphically
- Uncertainties can be represented graphically through the use of error bars.
- Error bars show the maximum and minimum range of the uncertainty of the plotted point.
- They are usually plotted above and below the plotted point (for the y-value), but can also be plotted from side to side (for the x-value).
- They are usually plotted using graphing software such as Excel or Google Sheets.
- An instance of a graph with error bars is shown above.
- As can be observed, the larger error bars show a larger uncertainty and vice versa.
- A second instance of a graph with error bars is shown below.
- This graph has temperature on the y-axis and time on the x-axis.
- The error bars for the temperature show an uncertainty of ± 2.5 °C.
- In the above graph, the error bar for the temperature (on the y-axis) is larger than the one for the time (on the x-axis).
- In this case, it would be appropriate to take the larger uncertainty of the temperature as the overall uncertainty and give less significance to the smaller uncertainty of the time.
- The gradient of a best-fit line can be determined using error bars.
- To do this, two lines are drawn; one with the minimum gradient and one with the maximum gradient; with both lines passing through the error bars.
- The graph below shows the two lines drawn with the maximum and minimum gradients (also known as the worst-fit lines).
The gradient of the best-fit line is the average of the minimum gradient and the maximum gradient. $$m = \frac{m_{\text{maximum gradient}} + m_{\text{minimum gradient}}}{2}$$
The uncertainty of the final gradient is calculated as follows: $$\Delta m = \frac{m_{\text{maximum gradient}} - m_{\text{minimum gradient}}}{2}$$
R² – the coefficient of determination
- The coefficient of determination (R²) is a measure of how close the data is to the best-fit line and also how well the model fits the data.
- It is a measure of how well the independent variable explains the variation in the dependent variable.
- Note that students do not have to understand how the R² is calculated, as this can be done by most graphing software.
- However, students should understand how to interpret the value of the R² specifically with respect to the strength of the relationship between the independent and dependent variables.
- R² values can range from 0.0 to 1.0.
- The higher the value of the R², the better the fit of the data points with the best-fit line.
- An R² value of 1.0 suggests a perfect fit between the data and the model used.
- In other words, all of the variance in the dependent variable is explained by the independent variable.
- Lower values of R² suggest that the independent variable cannot explain all the variance in the dependent variable.
- Very low values of R², such as 0, suggest that none of the variance in the dependent variable is explained by the independent variable.
- In this case, it is likely that the wrong model has been chosen to analyze the data.
Consider the two graphs shown and their R² values.
- The graph on the left, with an R² of 1.0, indicates that all (100%) of the variation in the dependent variable is explained by the independent variable.
- The linear model used perfectly predicts the dependent variable.
- The graph on the right, with an R² of 0.83, indicates that 83% of the variation in the dependent variable is explained by the independent variable.
- In other words, all the variance in the data cannot be accounted for by the linear model.
Dealing with Outliers
- An outlier is a data point that differs significantly from the other data points in a set of data.
- Outliers can be higher or lower than other data points.
- They usually occur as a result of flaws in the methodology, human error, or faulty measuring equipment.
- Outliers should not be removed from the calculations during data processing.
- The justification given for this is that outliers are measured values, and removing or ignoring them can be considered data manipulation.
- If a student has outliers in their collected data, it is recommended that they present their data processing both with and without the outlier(s).
- This will allow the impact of the outliers to be demonstrated.
- An alternative method is to identify the flaw or error in the methodology, take steps to remedy the flaw, and repeat the measurement.
In this case, the modification(s) made should be described in the report.
Key points
- All measured data has an uncertainty or error associated with it, known as its random error or random uncertainty.
- Raw data must be presented with its absolute uncertainty using the symbol ±, such as 2.50 ± 0.01 g.
- The precision of the measured value and the absolute uncertainty should be the same.
- Random errors cause values to be either higher or lower than the actual value.
- Random errors cannot be eliminated, but can be reduced by conducting repeat trials (and taking an average) and by using more precise apparatus.
- Systematic errors are caused by flaws in the experimental design.
- They produce results which are consistently higher or lower than the actual value.
- They cannot be reduced or eliminated by taking repeat measurements.
- They can be reduced or eliminated by making changes or modifications to the design of the experiment.
- Graphs should include error bars and R² value, if the graph requires a trend line.
- Outliers should be dealt with.
- th appropriately and not ignored. In case the candidate decides not to include them in the calculations, this decision should be clearly supported.