Your new analysis is a mixed-method evaluation of AI in chest diagnostics. Why did you look at this subject and how did you carry out the evaluation?
Angus: AI is seen by the UK government and NHS leaders as one of the key solutions to many pressures and challenges faced by the NHS. There are several anticipated benefits, in terms of efficiency and effectiveness. At the same time, there's limited evidence on how AI is selected, put into action, and the impact and outcomes that it has. It’s also been unclear how it's experienced by key stakeholder groups, including NHS staff and patients.
RSET was commissioned to evaluate the AI Diagnostic Fund, a major NHS England programme, which aimed to introduce AI tools to support chest diagnostics in roughly half the NHS trusts in England. This represented a great opportunity for us to help address some of these gaps in knowledge. So we did a rapid project in two phases. The first of these looked at the evidence base around AI in radiology diagnostics, as well as a mixed-methods empirical study focusing on the procurement and preparation for deployment of AI tools.
We then moved on the second phase, which was fully empirical and drew together a mixed-methods design of qualitative methods to look at implementation, usage and experiences, a quantitative study looking at the impact on service delivery from different perspectives using national and local datasets, and health economic analysis that drew on the same data and some bespoke local data from participating sites.
It’s important to note that no clinical decisions were being made without human input. There was some prioritising of cases based on AI, but there wasn't any autonomous AI decision-making.
Chris: This was an evaluation of a real-life deployment. As we found with our systematic review, there were hardly any real-life deployments being evaluated beforehand – a lot of it is done in controlled environments. To find out how it’s going in real-life applications is really important.
Kevin: It gives a real-world understanding of what could happen where a rollout on that basis occurs.
What were the main findings from the work?
Chris: The impact of the AI did depend on the purpose for which it was being used in the different hospitals. All of the hospitals we looked at were using it to support clinical decision-making. The AI would analyse the x-ray image after it was taken, and would then rank it to assess whether something could be considered potentially abnormal or not. If it was possibly abnormal, it would be prioritised to go to the front of the queue for the human readers to look at the images and then make a decision.
Where we were able to measure it, about 90% of images found to be of high suspicion of cancer were prioritised by the AI. We also found that prioritisation does mean suspected cancers are more likely to be turned around within 24 hours than less urgent images.
The results were, however, quite inconsistent from the trusts we were looking at. Some trusts noted an increase in the numbers of people being followed up, some showed a decrease. It wasn't clear why that might be, but it could relate to existing capacity backlogs, any changes being implemented to mitigate risks with the AI, and different resourcing between the trusts.
Angus: We found that a lot more work was involved in getting AI tools into use than perhaps had been anticipated. There was also a lot of variation in how AI was put into action. That variation included the people who were using the AI tools – some sites just had radiology or radiography teams working with it, while others had a wider range of clinicians using the tool. Given the work involved in implementing AI, we think it is important that service leaders start with the problem, rather than the solution – that is, ensuring that AI tools are well suited to addressing issues faced by their services.
Another important finding from our work is that generally both staff and patients were quite positive overall about the AI. They saw potential benefits in terms of efficiency. Staff found the tools valuable, particularly around prioritisation and clinical decision-making. It was seen as a source of reassurance and a second pair of eyes when going through their own checks.
But in the event of autonomous AI coming in, staff and patients felt there would be a need for strong monitoring and governance of AI tool performance – to make sure that it was not missing cases and to have the processes in place to manage any errors in the future.
Does using AI in this way save money?
Kevin: AI was identified to be cost effective, in that the overall costs were lower and effectiveness in terms of health outcomes was increased compared to the comparator (the period immediately prior to deployment). This represents an almost ideal situation. However, it is very important to note that although consistent in our analyses, the magnitude of health outcome gains in terms of quality-adjusted life years (QALYs) was negligible.
The implementation costs, in terms of the scale of costs for a radiology department in any given trust, were actually quite modest. But that finding is in the context of trusts or departments which may be under financial pressures or have limitations in terms of staffing and resourcing. We were also aware of notable gaps in reported implementation costs due to the retrospective collection of data.
In terms of who was involved in those activities, often you would find local staff either being seconded into project management roles (local data management or IT specialists, or even clinicians in some instances). They were taking that on in addition to their regular duties or even being completely pulled away from those. In departments under pressure, that can be an issue. Having non-specialists in a complex delivery programme like implementing AI brought its own challenges, and in some instances actually delayed the go-live of AI.
Which factors might impact on how effective it is for a trust to use this technology?
Chris: Some trusts are poorly equipped to monitor impact, which is a factor. We also found better outcomes in trusts where there didn't seem to be major problems with capacity for CT scans. If you do have big problems with your CT capacity, effectiveness might be different. If everything is quite efficient and images are being turned around pretty quickly anyway, you may not see such big benefits.
But there is also a preventative aspect. While things may be working for now and you may be thinking “I may not need the AI”, if increasing pressures mean that things get worse, it may be that the AI can mitigate some of those things ahead of time.
Resourcing is another factor. Who does the reading? How is it set up within the trust? These questions had different answers across the different trusts. For example, one trust outsources a large number of their reading and reporting of the images, while some trusts do a lot of that in house. Another thing is the stability of the supplier. A supplier for one of the sites went bankrupt and the trust had to stop using the tool as a result.
AI is a subject that fascinates many people. Based on your evaluation, how ready is it to be rolled out and is it effective?
Angus: Readiness needs to be thought about not just in terms of the AI tools, but also the NHS services into which they are being introduced. There are challenges around procurement, selection, AI literacy and knowledge, infrastructure variations and governance variations – all of which make implementing AI at scale quite challenging. But we can see that people on all sides of the divide – the suppliers, NHS staff and leadership – seem really up for the challenge. It's an opportunity to start strengthening the infrastructure and the capabilities and the capacity in NHS services to manage this. But there is definitely still work to do.
Chris: On effectiveness, you can see some signs of it potentially improving the process, but there are long-term impacts that are as yet untested. And some of those are the effect that it might have on the culture, training and the skilling of people.
There also has to be a big push to make sure that the right data is there so that they can monitor how effective and safe these tools are, because that's a bit hit and miss at the moment.
What surprised you most in the findings?
Angus: How positively staff viewed the introduction of the AI tools. There's always a risk of people feeling encroached upon by innovations that might be treading on their toes. But in practice, people generally liked the AI tools and valued them as something that sat alongside their daily practice.
Chris: What surprised me most was just how poorly equipped many trusts are in being able to monitor impact and having the data available to do that. It also surprised me also that this was seemingly not a factor in deciding whether or not a site was selected for funding.
Kevin: Humans are not infallible. Currently AI is good, but also not infallible. When it comes to autonomous reporting, the question that then arises is how good does AI have to be to be broadly accepted on not just a regulatory, but also on a public and perception basis. There were some findings that brought up interesting conceptual questions around AI and what it actually is, how it's used, and how good does it have to be.
What are the next steps for the project?
Angus: We're writing up a number of papers, bringing together our analyses to share more widely and hopefully develop both national and international impact. We're also excited to be presenting our findings at a number of national and international conferences over the coming months. We have also been engaging with national exercises to help shape policy and regulation around the use of AI in health care in future.
For many of us, this was the first time that we had really engaged with evaluating AI. It's like a lot of innovations in that it's fast moving and it keeps changing, but it's another degree up. It's been a hugely valuable learning experience in how you go about evaluating a fast-moving innovation of this kind. It's going to be a terrific basis for future work in different settings.