How to evaluate the results of diagnostic studies: how to prove that new method is better than gold standard?
I understand how to evaluate therapeutic studies: basically you compare the results of two groups that were chosen randomly. One group gets the new drug, the other group gets the placebo (or the drug against which the new drug should be tested). Neither the physicians nor the patients know in which group they are (double-blind testing). You can clearly see the results and test the statistical significance (e.g. with a t-test, depending on the study design).
I have a problem understanding diagnostic studies. My problem lies in the fact that you do not know "the truth", i.e. whether a patient really has the condition which should be diagnosed. Therefore you use the so-called "gold standard", i.e. the best diagnostic method so far. There is no problem when the new diagnostic method is worse than the gold standard. Yet I think there might be a problem when the new diagnostic method is better than the gold standard! Why? Because the results will be different which would be interpreted as inferior.
Let me give you an example: Let's assume that a patient really has some form of cancer, yet the gold standard is not able to detect it. The new method is able to detect it and therefore both tests disagree. Because the gold standard is taken as the reference the new method would get a minus point here... although it was right in the first place!
My questions
Is my thinking correct? If yes, how do handle this problem in practice? Is there a common term for this kind of problem? If no, where lies my misunderstanding?
Edit
Perhaps another (more extreme) example is in order: Let's say the gold standard has an accuracy of 50%, i.e. the toss of a coin. If you had a new method with an accuracy of 100% (so the results of this method and "the truth" are the same) and tested it against the gold standard this new method would get an accuracy of only 50% (i.e. the accuracy of the gold standard against "the truth").
2 Comments
Sorted by latest first Latest Oldest Best
The accuracy of a new diagnostic test can be determined by comparing the test results with the results of a biopsy (taking a piece of tissue), which is done at some point after the test, and which should reliably confirm/exclude the pathology in question.
For example, they tested the accuracy of "multi-detector CT" in differentiating between gallbladder inflammation and cancer. They compared what the CT has shown and what the biopsy after gallbladder removal has shown. So, they did not compare the results of the new test with the "gold standard test" but with the cases of cancer they proved later by biopsy.
They use the mentioned process not only to evaluate the accuracy of new tests but also to evaluate the results of the existing tests. For example, they try to determine if a certain "shadow" in the ultrasound image speaks for gallstones or cancer (which they can see after gallbladder removal).
Wikipedia says:
Before widespread acceptance of any new test, the former test retains its status as the "gold standard".
which means that a competitor must perform better than the current gold standard over repeated and diverse tests.
Single isolated cases like the examples you have given do contribute to the overall data, but are also to be treated as possible 'needles in haystacks'. Numerous tests are required to eliminate anomalies, and a variety of statistical methods can be used to determine if anomalies can be viewed as exceptions to the rule.
In summary, no single test will suffice. It takes a multitude of tests over a variety of patients and conditions for a procedure to be validated.
Terms of Use Privacy policy Contact About Cancellation policy © freshhoot.com2025 All Rights reserved.