Rick Mathew
Valuable Member
View BadgesPartner Member 2024
Excellence Award
Article Contributor
Hospitality Award
Ocala Reef Club Member
INTRODUCTION
BACKGROUND
In May of 2022 we (@Dan_P, @taricha and I) posted an article that looked at the quality of the data reported by ICP-OES measurements from several vendors. Our goal was to help bring some clarity to the community to help evaluate and interpret the results of an ICP test report. Here is a link to the article https://www.reef2reef.com/ams/how-we-use-icp-oes-results-of-unknown-accuracy-and-precision.862/
In the above mentioned study we focused on whether or not the reported measurements made sense, that is - are they real? Secondly we looked at the variability that a vendor had when making multiple measurements from the same sample.
I don’t remember exactly where the statement came from but someone said “I can do that well with my HANNA Checkers!” ….Really?? Sounds like a challenge to me! So we took the challenge and embarked on an adventure to see if this could be true….Reef Myth Busters!
For some context it might be helpful to scan through this post on R2R where others express the same concerns and questions we are exploring here. https://www.reef2reef.com/threads/icp-test-results-vs-hobby-grade.733197/
HYPOTHESIS
"Based on commonly held beliefs that ICP is more accurate and precise method for saltwater aquarium testing, it is hypothesized that commonly available Hobby Grade Testing Kits when used correctly and with good laboratory practices, can provide equally accurate and precise results as compared to ICP Testing for selected elements.”
So come along with us on this journey and see if Hobby Grade Test Kits can stand up to the power of ICP-OES and ICP-MS!
OVERVIEW
Obviously there are not Hobby Grade Test Kits available for every element on the ICP testing docket, so we selected 8 for which there were readily available test kits. They are Calcium, Copper, Iodine, Iron, Magnesium, Phosphorous, Potassium and Silica.
We selected 3 reefers to do the Hobby Grade Test…3 highly skilled and competent testers…or maybe NOT! ☺. We then selected 5 ICP vendors, 3 ICP-OES and 2 ICP-MS.
The experimental design is essentially a Round Robin Experiment (4) which consisted of creating two samples for each of the participants. One sample would be a freshly prepared batch of commercially available reef salt. We chose Red Sea Blue Bucket. The second sample would be this fresh salt mix spiked to a specific level with NIST traceable individual element standards. These Standards are the critical part of our experiment! While we can’t know exactly the values of the mixed salt, we can make additions to that mix with these Lab-Certified reference standards, and thus know with a high level of certainty the difference in the two samples. This gives us a solid basis to assess the accuracy of the ICP results and our own hobby test methods.
The primary performance criterion for the evaluation of the participant’s measurements would be what percentage of the spiked element they detected. For example, let’s say I measured the Calcium level in the “un-spiked” sample to be 350ppm and the spiked sample to be 480. Then 480-350 would be the 130ppm calculated spiked level. The actual level is 150ppm. The percent detected would then be (130/150) x 100 = 87%
METHODS
EXPERIMENTAL SETUP
5000 mL of saltwater from RED SEA salt was mixed to 35ppt as measured by Hanna HI98319 Salinity Tester calibrated with Hanna NIST traceable Calibration Solution. Water sample was stored for 24 hours at 26⁰C.
Sample was split into two 2000 mL samples using a volumetric flask
Sample #1 was the UN-Spiked sample (SALINITY 34.8ppt pH 8.0)
Sample # 2 was spiked using NIST Traceable element standards from Inorganic Ventures (1) according to the following table: (SALINITY 34.8 ppt pH 7.8)
Samples #1 and #2 were divided into four 500mL and distributed according to the following:
ICP testers
ROUND ROBIN (4) EXPERIMENTAL FLOW CHART
RESULTS
As you can imagine there is a massive amount of data that goes with this project. We will not burden you with all of that information but if you are interested we would be happy to provide you the details. Most of the results will be presented in the form of charts and graphs with a brief explanation following each. The charts will show the results of the percent level of detection for each of the testers. Each chart will be a different element. For the chemical test data we have a lot more data points to do some statistical analysis. So we will look at standard deviations, confidence intervals (Margin of Error) and Relative Accuracy for each of the measurements. These can be seen in Tables 1-3. However, with the ICP data we only have one measurement per sample per vendor which reduces the statistical tools available. For the ICP we will focus on the percentage of the known spiked amounts that were detected. So let’s have at it!
MEASUREMENT RESULTS BY ELEMENT
Each chart will indicate the percent of the spiked amount detected for each element. The charts show the results of each individual ICP Vendor as well as the average of all the vendors and 3 Hobby Grade Chemical Testers and the Average results of their tests. The error bar (I) on the Hobby Grade Test is a calculated error based on the average standard deviation of the measurements as a percent of the level of spiking. This effectively distinguishes between instances where the ICP vendors achieved results comparable to those obtained from a chemical kit, and instances where they faced significant challenges outside the range of variations observed in our chemical tests.
ICP testing detects between 80-85% of the spiked amount (except one vendor) with a standard deviation of about 41ppm The hobby grade tests (except Tester #3) measures 90+% of the spiked amount giving a slightly better performance with an overall standard deviation of 13ppm.
A note on the Hobby Grade Tests used in this experiment:
There are hobby kits available, such as the Red Sea and Hach used in this experiment, that are equally accurate or possibly even better than ICP. However, it needs to be noted that the Hanna HI-758 also used in this experiment under-measures for values above 400ppm. However its precision (consistency) is very good which allows for a correction using regression. Rick’s post in March 2023 discusses the details of this issue. (3e)
The data used in this experiment is the corrected data.
There is a high degree of variability between ICP vendors reported measurements from 22% to 82% the standard deviation of the measurements is .064ppm. The two chemical tests with a standard deviation of .021ppm look to be better performers at 99 & 109% detection levels.
For Iodine again we see a wide degree of variation in the ICP measurements with a standard deviation of .042ppm. Whereas the hobby grade chemical testers have a much lower variability with a standard deviation of .009ppm.
ICP measurements for the most part did not detect the iron - and what was detected did not reflect the actual spiked amount, whereas the hobby grade chemical test did a fair job of detecting the approximate scale of the spike - but with large variability. Does this across-the board non-measurement reflect failure of ICP detection, or loss in storage? In our previous work we found it likely that ICP vendors can measure Fe in the 10 ppb range consistently, (3e) and it has been demonstrated by Rick’s sample storage work that Fe is an element with a propensity to loss during storage, and loss could be mitigated - but not entirely halted - by use of nitric acid (3a) Thus we believe this discrepancy between chemical and ICP methods almost entirely represents loss in the sample, and that the ICP vendor methods do not preserve Fe present in water samples. For this reason it is unlikely that their measurements would reflect the amounts actually in aquarium water.
For some this magnesium result might be a confusing and puzzling outcome; however in many past studies (3a - 3d) (See Appendix Charts 1 & 2) it was noted that ICP vendors varied widely in their measurement of magnesium both between each other as well as compared to hobby grade tests (5) . From this data the hobby grade chemical test used outperformed the ICP tests in detecting the spike amount.
From the results of previous work it (3a) is known that the storage (transit) time of a sample will impact the results of the phosphorus measurement due to biological activity. It should be noted that this water sample was not tank water but freshly prepared salt mix which should have a much lower level of bio-activity. The expectation would be that the ICP tests would be more reflective of the spiked amount. The Chemical test results lost about 20 percentage points (100%-80%) from the initial measurement over the 7 day storage time, with the initial measurements being very close to the spiked amount. Additionally, the ICP vendor that adds a stabilizer to the Phosphorus sample, and uses a chemical test (Oceamo) instead of ICP measurement got a result that was generally within the error bars of the hobby Chemical test results. All these observations point toward the instability of phosphorus in the samples as a major factor in the difference between hobby chemical test kit and ICP results. (Separate from the instability issue, phosphorus is also simply a challenging element for the ICP-OES measurement protocol.)
Potassium should theoretically be one of the easiest elements to measure by ICP, and this turns out to be the case - the ICP vendors (with one exception) perform equally or better than the hobby grade chemical testers. The results indicate that both methods would yield reasonably good measurement of potassium, each doing a good job of capturing the majority of the 60 ppm spike.
From the above data, the ICP results and chemical test both generally did a good job at measuring the Silicon addition. (Note Chem. Tester #1’s Silica kit in all other cases performs close to the other chem. testers, suggesting a likely sample contamination.)
The chart below consolidates all the above data into the averages of both the ICP testers and the hobby grade testers.
On some of the elements, both the hobby test kits and the ICP vendor averages seem generally good at capturing the amount spiked, but overall the hobby tester average was closer on 5 of the 8 elements. This chart shouldn’t be interpreted to paint too rosy an impression. The hobbyist, of course, must operate without the benefit of consensus over multiple testers and methods, relying on individual results. Those individual results as we’ve seen in the earlier charts had larger variation that gets somewhat washed away in the big-picture average.
The results illustrated in the above charts clearly point to the hobby grade testing giving on average equivalent or better results than the ICP testing in this experiment.
HOBBY GRADE TESTER’S STATISTICS
Below are Tables 1-3 containing the individual Hobby Grade Tester’s statistics by individual test method for both the initial sample measurement as well as the stored sample measurement? (See Appendix for Term Definitions) (6)
DISCUSSION
We set out to test the idea that standard hobby tests kits could produce results that were comparable to that obtained by ICP test vendors. To evaluate this proposition we spiked a saltwater with a number of elements via certified single-element standards and evaluated the ICP and hobby testers on their ability to detect this known spike. The objective for each tester was to determine by how much the two samples differed in the eight spiked elements. The above charts and data tell a very interesting story about the relationship between ICP testing and the typical chemical hobby grade test kits for these eight analytes. The performance chart below provides a high-level summary of all these results.
“Commonly available Hobby Grade Testing Kits when used correctly and with good laboratory practices can provide equally accurate and precise results as compared to ICP testing for these selected elements.” In some cases in this experiment the Hobby Grade Kits did better!
In short, if you have a well-respected hobby chemical test kit that performs consistently for you, there is no reason for ICP measurements to be given precedence over your chemical test results. The data suggests that would be counterproductive more often than not.
FIT FOR USE---WHICH METHOD OR VENDOR DO I USE?
We often hear about trusting a method or vendor in discussions about which to choose (SEE HERE (3g) ) Trust is not a reliable way to determine which test kit or vendor fits your testing needs. A more objective approach is to figure out which test kit or vendor delivers the testing accuracy and precision that enables you to make decisions about dosing amounts or whether there are contaminants in the water. A test that enables decision making is known as fit for use. For example, if one is measuring the weight of scrap iron, the level of accuracy and precision would not be the same as one being used to measure gold. The cost to measure a ton of scrap iron to less than milligram accuracy would be far greater than the value of the scrap iron. The article “How we Use ICP-OES Results of Unknown Accuracy and Precision” under the section titled “Finding the Gage Variation of A measurement System” (3c) explains the idea of fit for purpose in much more detail. This is really not a trivial point because if the required outcome is not known or defined then there is no way to know if any test method will be satisfactory or not.
WHEN TEST ACCURACY AND PRECISION ARE UNKNOWN
Our experiment provides you data to decide whether a hobby test kit or an ICP will be useful to you making a dosing or remediation decision for Ca, Cu, I, Fe, P, Mg, K and Si. What about the elements where accuracy and precision are unknown? How do you respond when an ICP result indicates a depletion or increase in an element concentration? Is it real or a test method error?
There is no doubt that ICP measurements cover a much wider range of elements than hobby kit test. They can provide valuable information to the reef enthusiast. The ability to measure elements for which there are no chemical test kits is an important service to the reef keeper. That being said it would be nice to know the Margin of Error (6) (Confidence Interval) for the element measured. The correct way to report a measurement (8) is with a ± value or percentage related to the measurement; for example, 450 ppm ± 5 ppm. This would mean the true value is somewhere between 445 ppm and 455 ppm. Every measurement has variability (error), this is absolutely normal. The level of variability effects what you can conclude from the measurement. If the variability is large, then the confidence in a single measurement would be low. If the variability is low then there is a higher level of confidence in the measurement. That does not mean it is accurate, but it is repeatable which would allow for data trending and help with dosing decisions. None of the ICP vendors report accuracy or precision for their measurements. ATI provides their limits of detection and quantification at the low and high ends of concentration,(9) and Triton once provided documentation of their limits of detection,(10) but that was from 2016 or earlier and has been removed from their site and no updated version has been released. It should be noted that lower and upper limits say nothing of the accuracy of measured values. OCEAMO in February of 2022 and July 2023(7) posted on a thread their reported uncertainty values for a number of elements. This is a step in the right direction. We echo the sentiment expressed by Sanjay Joshi in the recent Reefbuilders ICP article (3h) “I think each of the ICP providers should take the onus to convince us that their tests are accurate with low levels of uncertainty and expected variability.” It may be noted that our data, as well as Sanjay’s and others’ illustrates why this hope is unlikely to come to pass. The results of some ICP vendors do not hold up to the hobbyist’s expectations when compared to certified standards and so those ICP vendors are certainly unlikely to release such data. In light of this, and since Oceamo has demonstrated that such analysis can be done and released, perhaps hobbyists should take the perspective that in a hobby with multiple vendors offering ICP services with claims of accuracy - those vendors who do not demonstrate their accuracy with data, are likely unable or unwilling to do so.
Since such publicly released self-audits are currently not widely done, the only way to assess accuracy that we are aware of, is to conduct a validation experiment in which a series of known standards are prepared and measured multiple time (replicates), which our previous experiment attempted to tackle (3c) in the Gage R&R section.
CONCLUSIONS & LEARNING
FITNESS FOR USE
An important fact to keep in mind is that the outcomes of the above experiment are not judgments of a “good” or “bad” test; it is only looking at the accuracy and precision of a test method. The question of good or bad is a “fitness for use” question. This question is generally related to the requirements of the parameter being tested and the range of values considered being ok by the tester, also called the control limits. (SEE ABOVE FIT FOR USE SECTION)
TRENDING TO DETERMINE DOSING LEVELS
The above chart shows the unspiked and spiked values for Mg as measured by the chemical tests (left) and ICP vendors (right). Note that within the chemical testers there was very good agreement as to the "trend" or the size of the spike (within 14ppm) even though there was a somewhat larger disagreement (55ppm) about what the actual value of Mg was.
Within the ICP vendors, the same holds true, though at a larger scale of variation. The ICP vendors disagreed about the actual value of Mg by 150-200ppm, and yet 4 out of 5 were still able to detect the "trend" of a spike of 100ppm. The practical application is that for a hobbyist, you can get better quality data for the question "is my Mg stable?" than you can for the question "is my Mg = 1300?"
KEY FINDINGS
GENERAL COMMENTS
This project has taken considerable effort and resources by @Dan_P, @taricha and myself, but we think there is value in doing this type of experimentation to help better define relevant factors in water testing in the Reefing Community. It is our hope that you find this information valuable and useful in your efforts to maintain the health of your reef system. In the APPENDIX we have provide you some additional information that you might find helpful. There is a lot more that could be said and well as a more in depth analysis of the all of this data, but we will save that for another day. This is probably enough to digest for now.
SOME REFERENCE LINKS
1 Inorganic Ventures
https://www.inorganicventures.com/products/
https://www.txscientific.com/new-products-c34.aspx
2) AN ACCURATE AND PRECISE METHOD FOR MEASURING IODINE (Red Sea Special Method)
https://www.reef2reef.com/ams/an-accurate-and-precise-method-for-measuring-iodine.909/
3) REFERENCES TO PREVIOUS ICP COMPARISONS
a. STORAGE EXPERIMENT---SEE ICP REPLICATION EXPERIMENT OBSERVATION SECTION
https://www.reef2reef.com/ams/sample-storage-and-its-impact-on-measurement-results- part-3.800/
b. GETTING IT RIGHT #4 INSTRUMENTAL TESTING---SEE ICP TESTING SECTION
https://www.reef2reef.com/ams/part-4-getting-it-right-colorimetric-instrumental-testing-methods-digital.748/
c. HOW WE USE ICP-OES RESULTS ARTICLE
https://www.reef2reef.com/ams/how-we-use-icp-oes-results-of-unknown-accuracy-and-precision.862/
d. FRITZ WORK
https://fritzaquatics.com/assets/files/uploads/ICP_TES.pdf
WATER QUALITY MONITORING PAPER FROM WHO https://apps.who.int/iris/bitstream/handle/10665/41851/0419217304_eng.pdf?sequence=1&isAllowed=y
e. HI-758 CALCIUM CHECKER ACCURACY AND PRECISION POST
https://www.reef2reef.com/threads/eggs-over-easy-what-hanna-checkers-do-you-use.971288/page-4#post-11201128
f. RICHARD ROSS TRITON ICP ANALYSIS
https://reefs.com/magazine/skeptical-reefkeeping-12/
g. ICP TEST RESULTS VS. HOBBY GRADE R2R POST
https://www.reef2reef.com/threads/icp-test-results-vs-hobby-grade.733197/
h. Sanjay Joshi - The Reef Builders’ Big ICP Test Review
https://reefbuilders.com/2023/07/12/the-reef-builders-big-icp-test-review-by-sanjay-joshi/
4) OUTLINE OF ROUND ROBIN EXPERIMENT
If different methods are to be used in a round-robin experiment for a chemical test, the goal remains the same: to assess the agreement and variability among the results obtained from different laboratories or analysts. However, in this case, each participant would apply a different testing method to the same set of samples, allowing for a comparison of the performance and accuracy of the different methods.
Here's an adapted approach for a round-robin experiment involving different methods in a chemical test:
1. Selection of Participants: Similar to the previous scenario, multiple laboratories or analysts with expertise in the field are chosen to participate in the round-robin experiment. Each participant should have a specific testing method they will apply.
2. Sample Preparation: A set of identical samples is prepared, representing the target analyte or substance to be tested using qualified reference standards. The samples are appropriately labeled to ensure consistent identification throughout the experiment.
3. Sample Distribution: The samples are distributed among the participating laboratories or analysts. Each participant receives the same set of samples to analyze using their respective testing method.
4. Testing Procedure: Each participant performs the test or measurement on the assigned samples using their specific testing method. The methods should be followed precisely, adhering to the established protocols and procedures.
5. Data Collection: Each laboratory or analyst records and reports their results for the samples they analyzed, along with any relevant qualitative observations. The data should include the measurements or outcomes generated by each testing method.
6. Data Analysis: The collected data from all participants are compiled and analyzed to assess the agreement and variability among the results obtained using different methods. Statistical techniques, such as comparing means, evaluating standard deviations, or performing regression analyses, can be employed to quantify the differences and similarities between the methods.
7. Result Interpretation: The results obtained from the round-robin experiment are interpreted to compare the performance and accuracy of the different testing methods. Key parameters, such as precision, accuracy, sensitivity, specificity, or any other relevant metrics, can be evaluated to determine the strengths and limitations of each method.
By conducting a round-robin experiment with different methods, researchers can gain insights into the variability and reliability of different testing approaches for the same analyte or substance. This information can help guide decisions about which methods are most suitable for specific applications, provide opportunities for method improvement, and contribute to the development of standardized procedures within the field.
5) MAGNESIUM TEST ADJUSTMENT TO COMPENSATE FOR HIGH LEVEL OF CALCIUM
For the spiked sample the level of calcium was above the recommend level for the chemical tests for magnesium (>500ppm). The sample was diluted by a specific amount to be within the acceptable range of calcium and then back calculated to get measurement for magnesium.
6) STATISTICAL TERM DEFINITIONS
Standard Deviation of the measurements
The standard deviation of a measurement is a statistical measure that quantifies the amount of variation or dispersion within a set of data points. It provides information about how spreads out the values are from the mean or average value.
Margin of error of the measurement
The margin of error provides a range of values around an estimated parameter that reflects the likely variability in the population. It helps to assess the precision and reliability of the measurement and allows for a more accurate interpretation of the results. In this case it is calculated using the Confidence function in Excel at a 95% confidence level.
Relative Accuracy
The term "relative accuracy" does not have a universally standardized definition in statistics or measurement science. However, it can be understood as a measure of how close a measurement or estimate is to the true value or target value, relative to a given reference or benchmark. A relative accuracy of 98% would indicate the measurement is within 2% of the known value.
9) ATI LIMITS OF DETECTION & QUANTIFICATION
https://atiaquaristik.com/en/?page_id=1422
BACKGROUND
In May of 2022 we (@Dan_P, @taricha and I) posted an article that looked at the quality of the data reported by ICP-OES measurements from several vendors. Our goal was to help bring some clarity to the community to help evaluate and interpret the results of an ICP test report. Here is a link to the article https://www.reef2reef.com/ams/how-we-use-icp-oes-results-of-unknown-accuracy-and-precision.862/
In the above mentioned study we focused on whether or not the reported measurements made sense, that is - are they real? Secondly we looked at the variability that a vendor had when making multiple measurements from the same sample.
I don’t remember exactly where the statement came from but someone said “I can do that well with my HANNA Checkers!” ….Really?? Sounds like a challenge to me! So we took the challenge and embarked on an adventure to see if this could be true….Reef Myth Busters!
For some context it might be helpful to scan through this post on R2R where others express the same concerns and questions we are exploring here. https://www.reef2reef.com/threads/icp-test-results-vs-hobby-grade.733197/
HYPOTHESIS
"Based on commonly held beliefs that ICP is more accurate and precise method for saltwater aquarium testing, it is hypothesized that commonly available Hobby Grade Testing Kits when used correctly and with good laboratory practices, can provide equally accurate and precise results as compared to ICP Testing for selected elements.”
So come along with us on this journey and see if Hobby Grade Test Kits can stand up to the power of ICP-OES and ICP-MS!
OVERVIEW
Obviously there are not Hobby Grade Test Kits available for every element on the ICP testing docket, so we selected 8 for which there were readily available test kits. They are Calcium, Copper, Iodine, Iron, Magnesium, Phosphorous, Potassium and Silica.
We selected 3 reefers to do the Hobby Grade Test…3 highly skilled and competent testers…or maybe NOT! ☺. We then selected 5 ICP vendors, 3 ICP-OES and 2 ICP-MS.
The experimental design is essentially a Round Robin Experiment (4) which consisted of creating two samples for each of the participants. One sample would be a freshly prepared batch of commercially available reef salt. We chose Red Sea Blue Bucket. The second sample would be this fresh salt mix spiked to a specific level with NIST traceable individual element standards. These Standards are the critical part of our experiment! While we can’t know exactly the values of the mixed salt, we can make additions to that mix with these Lab-Certified reference standards, and thus know with a high level of certainty the difference in the two samples. This gives us a solid basis to assess the accuracy of the ICP results and our own hobby test methods.
The primary performance criterion for the evaluation of the participant’s measurements would be what percentage of the spiked element they detected. For example, let’s say I measured the Calcium level in the “un-spiked” sample to be 350ppm and the spiked sample to be 480. Then 480-350 would be the 130ppm calculated spiked level. The actual level is 150ppm. The percent detected would then be (130/150) x 100 = 87%
METHODS
EXPERIMENTAL SETUP
5000 mL of saltwater from RED SEA salt was mixed to 35ppt as measured by Hanna HI98319 Salinity Tester calibrated with Hanna NIST traceable Calibration Solution. Water sample was stored for 24 hours at 26⁰C.
Sample was split into two 2000 mL samples using a volumetric flask
Sample #1 was the UN-Spiked sample (SALINITY 34.8ppt pH 8.0)
Sample # 2 was spiked using NIST Traceable element standards from Inorganic Ventures (1) according to the following table: (SALINITY 34.8 ppt pH 7.8)
- 500mL to @Dan_P
- 500mL to @taricha
- 500mL to @Rick Mathew
- 500mL split between the following vendors:
- Triton ICP-OES
- ATI-ICP-OES
- Fauna Marin ICP-OES
- ICP Analysis ICP-MS
- Oceamo ICP-MS
- The testing protocol:
- Chemical testers
- Test their samples immediately upon arrival ≈ 3 days
- Run 3 replicas for each element.
- Store the remaining samples for 7 additional days and retest with 3 replicas.
- Record data in data template.
- Test Kits Used
ICP testers
- Tested samples according to their individual protocols
- Report results
- Time from sampling date to test results date averaged 9.5 days. Longest was 13 and shortest was 7. One sample got incorrectly forwarded by the post office and took 33 days. (Fauna Marin)
ROUND ROBIN (4) EXPERIMENTAL FLOW CHART
RESULTS
As you can imagine there is a massive amount of data that goes with this project. We will not burden you with all of that information but if you are interested we would be happy to provide you the details. Most of the results will be presented in the form of charts and graphs with a brief explanation following each. The charts will show the results of the percent level of detection for each of the testers. Each chart will be a different element. For the chemical test data we have a lot more data points to do some statistical analysis. So we will look at standard deviations, confidence intervals (Margin of Error) and Relative Accuracy for each of the measurements. These can be seen in Tables 1-3. However, with the ICP data we only have one measurement per sample per vendor which reduces the statistical tools available. For the ICP we will focus on the percentage of the known spiked amounts that were detected. So let’s have at it!
MEASUREMENT RESULTS BY ELEMENT
Each chart will indicate the percent of the spiked amount detected for each element. The charts show the results of each individual ICP Vendor as well as the average of all the vendors and 3 Hobby Grade Chemical Testers and the Average results of their tests. The error bar (I) on the Hobby Grade Test is a calculated error based on the average standard deviation of the measurements as a percent of the level of spiking. This effectively distinguishes between instances where the ICP vendors achieved results comparable to those obtained from a chemical kit, and instances where they faced significant challenges outside the range of variations observed in our chemical tests.
ICP testing detects between 80-85% of the spiked amount (except one vendor) with a standard deviation of about 41ppm The hobby grade tests (except Tester #3) measures 90+% of the spiked amount giving a slightly better performance with an overall standard deviation of 13ppm.
A note on the Hobby Grade Tests used in this experiment:
There are hobby kits available, such as the Red Sea and Hach used in this experiment, that are equally accurate or possibly even better than ICP. However, it needs to be noted that the Hanna HI-758 also used in this experiment under-measures for values above 400ppm. However its precision (consistency) is very good which allows for a correction using regression. Rick’s post in March 2023 discusses the details of this issue. (3e)
The data used in this experiment is the corrected data.
There is a high degree of variability between ICP vendors reported measurements from 22% to 82% the standard deviation of the measurements is .064ppm. The two chemical tests with a standard deviation of .021ppm look to be better performers at 99 & 109% detection levels.
For Iodine again we see a wide degree of variation in the ICP measurements with a standard deviation of .042ppm. Whereas the hobby grade chemical testers have a much lower variability with a standard deviation of .009ppm.
ICP measurements for the most part did not detect the iron - and what was detected did not reflect the actual spiked amount, whereas the hobby grade chemical test did a fair job of detecting the approximate scale of the spike - but with large variability. Does this across-the board non-measurement reflect failure of ICP detection, or loss in storage? In our previous work we found it likely that ICP vendors can measure Fe in the 10 ppb range consistently, (3e) and it has been demonstrated by Rick’s sample storage work that Fe is an element with a propensity to loss during storage, and loss could be mitigated - but not entirely halted - by use of nitric acid (3a) Thus we believe this discrepancy between chemical and ICP methods almost entirely represents loss in the sample, and that the ICP vendor methods do not preserve Fe present in water samples. For this reason it is unlikely that their measurements would reflect the amounts actually in aquarium water.
For some this magnesium result might be a confusing and puzzling outcome; however in many past studies (3a - 3d) (See Appendix Charts 1 & 2) it was noted that ICP vendors varied widely in their measurement of magnesium both between each other as well as compared to hobby grade tests (5) . From this data the hobby grade chemical test used outperformed the ICP tests in detecting the spike amount.
From the results of previous work it (3a) is known that the storage (transit) time of a sample will impact the results of the phosphorus measurement due to biological activity. It should be noted that this water sample was not tank water but freshly prepared salt mix which should have a much lower level of bio-activity. The expectation would be that the ICP tests would be more reflective of the spiked amount. The Chemical test results lost about 20 percentage points (100%-80%) from the initial measurement over the 7 day storage time, with the initial measurements being very close to the spiked amount. Additionally, the ICP vendor that adds a stabilizer to the Phosphorus sample, and uses a chemical test (Oceamo) instead of ICP measurement got a result that was generally within the error bars of the hobby Chemical test results. All these observations point toward the instability of phosphorus in the samples as a major factor in the difference between hobby chemical test kit and ICP results. (Separate from the instability issue, phosphorus is also simply a challenging element for the ICP-OES measurement protocol.)
Potassium should theoretically be one of the easiest elements to measure by ICP, and this turns out to be the case - the ICP vendors (with one exception) perform equally or better than the hobby grade chemical testers. The results indicate that both methods would yield reasonably good measurement of potassium, each doing a good job of capturing the majority of the 60 ppm spike.
From the above data, the ICP results and chemical test both generally did a good job at measuring the Silicon addition. (Note Chem. Tester #1’s Silica kit in all other cases performs close to the other chem. testers, suggesting a likely sample contamination.)
The chart below consolidates all the above data into the averages of both the ICP testers and the hobby grade testers.
On some of the elements, both the hobby test kits and the ICP vendor averages seem generally good at capturing the amount spiked, but overall the hobby tester average was closer on 5 of the 8 elements. This chart shouldn’t be interpreted to paint too rosy an impression. The hobbyist, of course, must operate without the benefit of consensus over multiple testers and methods, relying on individual results. Those individual results as we’ve seen in the earlier charts had larger variation that gets somewhat washed away in the big-picture average.
The results illustrated in the above charts clearly point to the hobby grade testing giving on average equivalent or better results than the ICP testing in this experiment.
HOBBY GRADE TESTER’S STATISTICS
Below are Tables 1-3 containing the individual Hobby Grade Tester’s statistics by individual test method for both the initial sample measurement as well as the stored sample measurement? (See Appendix for Term Definitions) (6)
DISCUSSION
We set out to test the idea that standard hobby tests kits could produce results that were comparable to that obtained by ICP test vendors. To evaluate this proposition we spiked a saltwater with a number of elements via certified single-element standards and evaluated the ICP and hobby testers on their ability to detect this known spike. The objective for each tester was to determine by how much the two samples differed in the eight spiked elements. The above charts and data tell a very interesting story about the relationship between ICP testing and the typical chemical hobby grade test kits for these eight analytes. The performance chart below provides a high-level summary of all these results.
- A plus (+) indicates the results of the individual chemical tester was equivalent to or outperformed the majority of the individual ICP testers.
- A minus (-) indicates the individual chemical tester gave worse results than the majority of the ICP testers.
“Commonly available Hobby Grade Testing Kits when used correctly and with good laboratory practices can provide equally accurate and precise results as compared to ICP testing for these selected elements.” In some cases in this experiment the Hobby Grade Kits did better!
In short, if you have a well-respected hobby chemical test kit that performs consistently for you, there is no reason for ICP measurements to be given precedence over your chemical test results. The data suggests that would be counterproductive more often than not.
FIT FOR USE---WHICH METHOD OR VENDOR DO I USE?
We often hear about trusting a method or vendor in discussions about which to choose (SEE HERE (3g) ) Trust is not a reliable way to determine which test kit or vendor fits your testing needs. A more objective approach is to figure out which test kit or vendor delivers the testing accuracy and precision that enables you to make decisions about dosing amounts or whether there are contaminants in the water. A test that enables decision making is known as fit for use. For example, if one is measuring the weight of scrap iron, the level of accuracy and precision would not be the same as one being used to measure gold. The cost to measure a ton of scrap iron to less than milligram accuracy would be far greater than the value of the scrap iron. The article “How we Use ICP-OES Results of Unknown Accuracy and Precision” under the section titled “Finding the Gage Variation of A measurement System” (3c) explains the idea of fit for purpose in much more detail. This is really not a trivial point because if the required outcome is not known or defined then there is no way to know if any test method will be satisfactory or not.
WHEN TEST ACCURACY AND PRECISION ARE UNKNOWN
Our experiment provides you data to decide whether a hobby test kit or an ICP will be useful to you making a dosing or remediation decision for Ca, Cu, I, Fe, P, Mg, K and Si. What about the elements where accuracy and precision are unknown? How do you respond when an ICP result indicates a depletion or increase in an element concentration? Is it real or a test method error?
There is no doubt that ICP measurements cover a much wider range of elements than hobby kit test. They can provide valuable information to the reef enthusiast. The ability to measure elements for which there are no chemical test kits is an important service to the reef keeper. That being said it would be nice to know the Margin of Error (6) (Confidence Interval) for the element measured. The correct way to report a measurement (8) is with a ± value or percentage related to the measurement; for example, 450 ppm ± 5 ppm. This would mean the true value is somewhere between 445 ppm and 455 ppm. Every measurement has variability (error), this is absolutely normal. The level of variability effects what you can conclude from the measurement. If the variability is large, then the confidence in a single measurement would be low. If the variability is low then there is a higher level of confidence in the measurement. That does not mean it is accurate, but it is repeatable which would allow for data trending and help with dosing decisions. None of the ICP vendors report accuracy or precision for their measurements. ATI provides their limits of detection and quantification at the low and high ends of concentration,(9) and Triton once provided documentation of their limits of detection,(10) but that was from 2016 or earlier and has been removed from their site and no updated version has been released. It should be noted that lower and upper limits say nothing of the accuracy of measured values. OCEAMO in February of 2022 and July 2023(7) posted on a thread their reported uncertainty values for a number of elements. This is a step in the right direction. We echo the sentiment expressed by Sanjay Joshi in the recent Reefbuilders ICP article (3h) “I think each of the ICP providers should take the onus to convince us that their tests are accurate with low levels of uncertainty and expected variability.” It may be noted that our data, as well as Sanjay’s and others’ illustrates why this hope is unlikely to come to pass. The results of some ICP vendors do not hold up to the hobbyist’s expectations when compared to certified standards and so those ICP vendors are certainly unlikely to release such data. In light of this, and since Oceamo has demonstrated that such analysis can be done and released, perhaps hobbyists should take the perspective that in a hobby with multiple vendors offering ICP services with claims of accuracy - those vendors who do not demonstrate their accuracy with data, are likely unable or unwilling to do so.
Since such publicly released self-audits are currently not widely done, the only way to assess accuracy that we are aware of, is to conduct a validation experiment in which a series of known standards are prepared and measured multiple time (replicates), which our previous experiment attempted to tackle (3c) in the Gage R&R section.
CONCLUSIONS & LEARNING
FITNESS FOR USE
An important fact to keep in mind is that the outcomes of the above experiment are not judgments of a “good” or “bad” test; it is only looking at the accuracy and precision of a test method. The question of good or bad is a “fitness for use” question. This question is generally related to the requirements of the parameter being tested and the range of values considered being ok by the tester, also called the control limits. (SEE ABOVE FIT FOR USE SECTION)
TRENDING TO DETERMINE DOSING LEVELS
The above chart shows the unspiked and spiked values for Mg as measured by the chemical tests (left) and ICP vendors (right). Note that within the chemical testers there was very good agreement as to the "trend" or the size of the spike (within 14ppm) even though there was a somewhat larger disagreement (55ppm) about what the actual value of Mg was.
Within the ICP vendors, the same holds true, though at a larger scale of variation. The ICP vendors disagreed about the actual value of Mg by 150-200ppm, and yet 4 out of 5 were still able to detect the "trend" of a spike of 100ppm. The practical application is that for a hobbyist, you can get better quality data for the question "is my Mg stable?" than you can for the question "is my Mg = 1300?"
KEY FINDINGS
- It appears from the outcomes of this experiment, a well performed chemical test for the elements included in this work, can yield equally accurate and precise measurements when compared to ICP.
- Magnesium appears to be a challenge for ICP measurements
- Data supports the idea “Trending” to determine dosing requirements by comparing a previous measurement to the current measurement rather than shooting for an “absolute value”
- Using the method of “Spiked vs Un-Spiked” measurements is useful in evaluation testing methods.
- Measurement variability between vendors for some elements is high.
- Vendors in general are able to detect the presence of the spiked element but not quite so good at quantifying the spiked amount.
GENERAL COMMENTS
This project has taken considerable effort and resources by @Dan_P, @taricha and myself, but we think there is value in doing this type of experimentation to help better define relevant factors in water testing in the Reefing Community. It is our hope that you find this information valuable and useful in your efforts to maintain the health of your reef system. In the APPENDIX we have provide you some additional information that you might find helpful. There is a lot more that could be said and well as a more in depth analysis of the all of this data, but we will save that for another day. This is probably enough to digest for now.
APPENDIX
SOME REFERENCE LINKS
1 Inorganic Ventures
https://www.inorganicventures.com/products/
https://www.txscientific.com/new-products-c34.aspx
2) AN ACCURATE AND PRECISE METHOD FOR MEASURING IODINE (Red Sea Special Method)
https://www.reef2reef.com/ams/an-accurate-and-precise-method-for-measuring-iodine.909/
3) REFERENCES TO PREVIOUS ICP COMPARISONS
a. STORAGE EXPERIMENT---SEE ICP REPLICATION EXPERIMENT OBSERVATION SECTION
https://www.reef2reef.com/ams/sample-storage-and-its-impact-on-measurement-results- part-3.800/
b. GETTING IT RIGHT #4 INSTRUMENTAL TESTING---SEE ICP TESTING SECTION
https://www.reef2reef.com/ams/part-4-getting-it-right-colorimetric-instrumental-testing-methods-digital.748/
c. HOW WE USE ICP-OES RESULTS ARTICLE
https://www.reef2reef.com/ams/how-we-use-icp-oes-results-of-unknown-accuracy-and-precision.862/
d. FRITZ WORK
https://fritzaquatics.com/assets/files/uploads/ICP_TES.pdf
WATER QUALITY MONITORING PAPER FROM WHO https://apps.who.int/iris/bitstream/handle/10665/41851/0419217304_eng.pdf?sequence=1&isAllowed=y
e. HI-758 CALCIUM CHECKER ACCURACY AND PRECISION POST
https://www.reef2reef.com/threads/eggs-over-easy-what-hanna-checkers-do-you-use.971288/page-4#post-11201128
f. RICHARD ROSS TRITON ICP ANALYSIS
https://reefs.com/magazine/skeptical-reefkeeping-12/
g. ICP TEST RESULTS VS. HOBBY GRADE R2R POST
https://www.reef2reef.com/threads/icp-test-results-vs-hobby-grade.733197/
h. Sanjay Joshi - The Reef Builders’ Big ICP Test Review
https://reefbuilders.com/2023/07/12/the-reef-builders-big-icp-test-review-by-sanjay-joshi/
4) OUTLINE OF ROUND ROBIN EXPERIMENT
If different methods are to be used in a round-robin experiment for a chemical test, the goal remains the same: to assess the agreement and variability among the results obtained from different laboratories or analysts. However, in this case, each participant would apply a different testing method to the same set of samples, allowing for a comparison of the performance and accuracy of the different methods.
Here's an adapted approach for a round-robin experiment involving different methods in a chemical test:
1. Selection of Participants: Similar to the previous scenario, multiple laboratories or analysts with expertise in the field are chosen to participate in the round-robin experiment. Each participant should have a specific testing method they will apply.
2. Sample Preparation: A set of identical samples is prepared, representing the target analyte or substance to be tested using qualified reference standards. The samples are appropriately labeled to ensure consistent identification throughout the experiment.
3. Sample Distribution: The samples are distributed among the participating laboratories or analysts. Each participant receives the same set of samples to analyze using their respective testing method.
4. Testing Procedure: Each participant performs the test or measurement on the assigned samples using their specific testing method. The methods should be followed precisely, adhering to the established protocols and procedures.
5. Data Collection: Each laboratory or analyst records and reports their results for the samples they analyzed, along with any relevant qualitative observations. The data should include the measurements or outcomes generated by each testing method.
6. Data Analysis: The collected data from all participants are compiled and analyzed to assess the agreement and variability among the results obtained using different methods. Statistical techniques, such as comparing means, evaluating standard deviations, or performing regression analyses, can be employed to quantify the differences and similarities between the methods.
7. Result Interpretation: The results obtained from the round-robin experiment are interpreted to compare the performance and accuracy of the different testing methods. Key parameters, such as precision, accuracy, sensitivity, specificity, or any other relevant metrics, can be evaluated to determine the strengths and limitations of each method.
By conducting a round-robin experiment with different methods, researchers can gain insights into the variability and reliability of different testing approaches for the same analyte or substance. This information can help guide decisions about which methods are most suitable for specific applications, provide opportunities for method improvement, and contribute to the development of standardized procedures within the field.
5) MAGNESIUM TEST ADJUSTMENT TO COMPENSATE FOR HIGH LEVEL OF CALCIUM
For the spiked sample the level of calcium was above the recommend level for the chemical tests for magnesium (>500ppm). The sample was diluted by a specific amount to be within the acceptable range of calcium and then back calculated to get measurement for magnesium.
6) STATISTICAL TERM DEFINITIONS
Standard Deviation of the measurements
The standard deviation of a measurement is a statistical measure that quantifies the amount of variation or dispersion within a set of data points. It provides information about how spreads out the values are from the mean or average value.
Margin of error of the measurement
The margin of error provides a range of values around an estimated parameter that reflects the likely variability in the population. It helps to assess the precision and reliability of the measurement and allows for a more accurate interpretation of the results. In this case it is calculated using the Confidence function in Excel at a 95% confidence level.
Relative Accuracy
The term "relative accuracy" does not have a universally standardized definition in statistics or measurement science. However, it can be understood as a measure of how close a measurement or estimate is to the true value or target value, relative to a given reference or benchmark. A relative accuracy of 98% would indicate the measurement is within 2% of the known value.
7) OCEAMO UNCERTAINTY THREADS
https://www.reef2reef.com/threads/i...seawater-certified-reference-material.891657/
https://www.reef2reef.com/threads/c...ility-of-icp-ms-seawater-measurements.997436/
8) CORRECT WAY TO REPORT MEASUREMENTS
https://www.sigmamagic.com/blogs/measurement-uncertainty/9) ATI LIMITS OF DETECTION & QUANTIFICATION
https://atiaquaristik.com/en/?page_id=1422
Last edited: