Whenever we are doing analytical work, we usually want to use statistics based on normal distributions. This applies to not only the actual analysis, but also the sampling protocols. We always hope that we are in a situation where the value of interest (usually a mean of several analyses) and the relative standard deviation of that value (again from several independent analyses) is small. What we would like to see is a plot of analytical results that looks something like:
Figure 1 was created using Excel’s NORM.INV function set for a mean of 100 and a standard deviation of 5. The RAND() function was used to create 20 random points. These were plotted as a Histogram using the Excel Data Analysis Add-In. When the mean and standard deviation of the “random data” was calculated I obtained 101.7 and 6.15 respectively. This gave a relative standard deviation of 6.04%. This would be typical of many analytical sampling and analysis protocols.
Of course, things never seem to work out as well as the “example” above. Even when our sampling and analytical protocols are in control, we occasionally come across difficulties. This is especially true with trace analysis or analysis of samples that contain small numbers of particles of interest. When this happens we could be looking at Poisson processes. This paper is a simplified look at this to help data users recognize the issues.
A Poisson process in any process that follows a Poisson distribution (wasn’t that helpful!). The kinds of processes include the number of customers arriving per minute at Starbucks. That is, it is a count of random events (like customers coming through the door per unit time) that are “rare” (as in “not too many”). If we are looking at a handful of arrivals per minute then the Poisson distribution is a good model. If there are hundreds of thousands of independent arrivals per minute, then the Poisson distribution is usually not a good model. A graph of a Poisson process might look like:
In Figure 2 we simulate the number of customers arriving at Starbucks using a Poisson distribution. We assume that the average over our experiment is 3.7 customers per minute. We would expect to count 3 arrivals per minute about 21% of the time (that is what is meant by “Frequency”). We would expect to count 0 and 8 arrivals per minute about 2% of the time. This “example” was generated in Excel using the POISSON.DIST function. We can estimate the mean by:
Where:
λ = the mean
i = the number of arrivals
pi = Poisson probability of i arrivals
We can estimate the variance by:
When we do these calculations on our simulated data for n = 0 to 10, we find that the mean calculates out very closely to 3.7 (actually 3.63) and the variance is also very close to the same number (also being 3.63). This is typical of a Poisson distribution. The mean always equals the variance. Hence, the standard deviation is the square root of the mean. And finally, the relative standard deviation is:
Hence, when we have a limited number of counts, the relative standard deviation of the counts is simply the inverse of the square root of the mean. When the number of counts is low, the relative standard deviation is high. In our example the RSD is nearly 52%.
When we are taking samples that have a limited number of particles of interest, the variability coming from the variability in the number of particles in a sample is very high. That variability often follows a Poisson distribution. Let’s consider a mental example.
Suppose we have a large lot that contains random particles of interest scattered throughout. Let’s also assume that those particles are very uniform with each particle contributing 10 concentration units based on our sampling protocol. An example might be particles of lead each weighing 10 ug. If we pulled 1.00 gram samples and found that, on the average, we got 2.5 particles per sample, we would naturally conclude that we have 25 ppm of lead in the large lot (25 ug per every 1 gram). Because these occur as discrete particles, the distribution of sample results would probably be Poisson and look something like:
Figure 3
We would have a lot of data points well away from our mean value of 25. In fact, when we model this in Excel we obtain a mean of 24.99, a variance of 249.6, a standard deviation of 15.80 and a relative standard deviation of 63%. The astute reader will note that the mean is not numerically equal to the variance because each particle contributes 10 units to the answer. Hence, the Poisson mean is multiplied by 10 and the variance is multiplied by 102. Interestingly, the relative standard deviation remains unaltered in this kind of unit transformation. The relative standard deviation remains a reflection of the variability in the number of particles collected. If, of course, there was also variability in the size or lead content of the particles, the variability would be even higher.
Now let us suppose that the average number of particles in our samples rose to 35. We would expect our average lead result for the lot to go up to 350 ppm. Our sampling histogram would look something like:
Notice now that the histogram is starting to look like a normal distribution curve. Nevertheless, when we calculate mean, variance, standard deviation and relative standard deviation from our model we obtain: 349.98, 3501, 59.17 and 16.9% respectively. Although we seem to have a “Gaussian curve” we still have a very high relative standard deviation. Furthermore, this large variability is all coming from variability in the number of particles in the samples we have been taking from the lot. We should again see that the relative standard deviation is still predicted well by the inverse square root of the average number of particles. That is, λ-0.5 or 35-0.5 = 16.9%.
We could continue this process, but it should be clear by now that it requires quite a few particles in each sample for the counting error to “disappear.” It turns out that the “rule of thumb” used by radiochemists is pretty good. They like to have around 10,000 counts (i.e. detected radioactive disintegrations, also a Poisson process) for every radiometric analysis. This puts the relative standard deviation from counting error alone at 1% (10,000-0.5 = 0.01). It is not usually hard to have 10,000 or more particles of interest in a sample unless we are doing trace analysis (well under 1%) or using very small sample sizes (milligrams, microliters, etc.). The analyst and data users should be wary when low level analysis is being done using techniques that require very small samples. The precision of the analysis could crater due to the appearance of Poisson counting error beginning to dominate the analytical methodology. This can actually begin to be significant long before the data begins to look “non-normal” in histograms (e.g. 35 particle example).
In designing your experiments please keep in mind:
- Where particulates are important, try to ensure that well over 10,000 particles of interest are likely to be present in all samples and sub-samples.
- Where relative standard deviations begin to exceed 15% and there are no reasonable analytical explanations (e.g. detection or control limits reached), begin to suspect Poisson counting errors. That is, too few particles, counts, photons, etc., are being collected, generated or detected to ensure low counting error.
- Remember that many of the data analysis schemes (sampling protocols, confidence limits, control charting, etc.) assume normal distribution. If Poisson processes are in play many of these schemes fail to be reasonable methods for data generation and interpretation. Very big errors could be in the offing.
Comments