Center for Diagnostics and Telemedicine has introduced an innovative method for testing artificial intelligence (AI) systems in healthcare, enabling faster and more accurate assessment of their reliability. This breakthrough is expected to streamline the integration of AI technologies into clinical practice across Moscow. Researchers at the Center have established the optimal number of studies required to objectively evaluate the accuracy of AI, significantly reducing the time and resources needed for validation. The new approach has already demonstrated its effectiveness in radiology and is adaptable to other medical fields.
“For years, Moscow has led the way in applying AI in medicine. This new method marks a major step forward,” said Yuri Vasilev, Chief Consultant for Diagnostic Imaging of the Moscow Health Care Department. “Previously, there was no clear guideline on the number of studies needed for objective AI testing, often requiring extensive samples and considerable resources. Now, we can determine the precise number of medical studies necessary to ensure accuracy, allowing for quicker adaptation and more effective use by clinicians. We are confident that this approach will enhance the accuracy and safety of artificial intelligence as a tool for healthcare professionals and patients, thereby improving diagnostic quality and facilitating earlier detection of diseases.”
Researchers at the Center for Diagnostics and Telemedicine have developed an innovative methodology for determining the optimal number of medical studies required to ensure the reliable evaluation of medical AI systems. Researchers analyzed over 2 million test variants and demonstrated that, for an objective assessment of binary classification algorithms—such as detecting pathologies in images—a minimum of 400 studies is required, with at least 10% representing each class (i.e., cases with marked pathologies). Increasing the sample size beyond this threshold does not affect the results, making this technique highly efficient.
While validated in radiology, the approach is applicable to any medical AI system utilizing binary classification. This discovery will facilitate testing and implementation of artificial intelligence in the field of medicine, thereby enhancing its accuracy and reliability.
“Traditional testing approaches couldn’t definitively determine the necessary sample size to verify AI performance,” added Vasiliev. “This new technique provides a stable, universal framework regardless of image type or AI used. It will enhance the speed and reliability of AI integration into clinical practice. As AI tackles increasingly complex medical tasks, scientists at the Center for Diagnostics and Telemedicine proposed an alternative approach. They analyzed more than 2 million combinations of test sample parameters and 25,000 medical images, examined diagnostic metrics, and proved that at least 400 studies are necessary for stable results. Each class should comprise at least 10% of the sample, or 40 studies, and increasing the sample size further does not affect final accuracy. The findings are independent of medical image type or neural network, making the technique universal. While validated in radiological diagnostics, this approach can be scaled to other medical AI systems with binary classification, which will be the next stage of research,” said Yuri Vasiliev.
The findings are detailed in the paper “Empirical Approach to Sample Size Estimation for Testing of AI Algorithms,” which has received positive reviews from the Russian Academy of Sciences and won the AI Journey competition. The methodology is based on extensive empirical data and offers a robust alternative to traditional sample size calculations, which are often unsuitable for binary classification AI models.
Since 2020, the Center for Diagnostics and Telemedicine has led the world’s largest prospective clinical study on the implementation of computer vision for medical image analysis. The Center continues to develop original methodologies for comprehensive AI assessment and practical integration into Moscow’s health system.
Founded in 1996, the Center is a leading institution under the Moscow Healthcare Department, focusing on the advancement of AI in medicine, development of radiology, research, and medical education. This initiative aligns with the Moscow Healthcare Development Strategy 2030, aimed at improving the quality and accessibility of care for city residents.