Publications

Conference proceeding

Home Sweet Hospital: Evaluating Evidence Gaps and Future Research Priorities for Hospital at Home

YHEC authors: Charlotte Graham, Robert Malcolm, Lavinia Ferrante di Ruffano, Hayden Holmes, Rachael MacDonald, Nick Hex, Rachael McCool

Publication date: November 2024

Conference: ISPOR EU, Barcelona

Type of conference proceeding: Poster

Abstract

OBJECTIVES: Healthcare systems face significant strain due to growing demand, with urgent and emergency care centers particularly affected. 'Hospital at Home' (HaH) has been identified as a potential solution, allowing patients to receive acute care at home or in community settings. HaH facilitates early hospital discharge (step-down care) or prevents hospital admission (step-up care). This research examines the challenges in generating evidence for technology-enabled HaH initiatives and highlights factors for future evaluation.

METHODS: A pragmatic literature review was conducted to assess safety, clinical effectiveness, and cost effectiveness of HaH initiatives. Gap analysis identified priority areas for future research and issues in evidence generation. The authors leveraged their experience from an early value assessment for NICE on virtual wards for acute respiratory infection to inform the review and gap analysis.

RESULTS: Evidence, though limited, suggests HaH is potentially safe and effective. Clinical effectiveness varies by patient cohort and virtual ward model (step-up, step-down, mixed). Most studies were non-comparative or underpowered. Case studies of HaH initiatives in the NHS lacked peer review, involved small samples, and were not transparent about costs. Key issues in evaluation include variability in features between technology-enabled HaH initiatives, population and subgroup differences, and potential distortions in comparison with standard care. Future research should focus on prospective cohort studies to understand clinical and resource outcomes, impacts of different technological features, and true resource use.

CONCLUSIONS: Technology-enabled HaH initiatives are complex interventions. Preliminary evidence suggests that they may benefit healthcare system resources, but comprehensive evaluations are crucial to understand their clinical efficacy, safety, risks and costs. Comprehensive evaluations are vital as HaH initiatives are rapidly implemented across global healthcare systems. Future studies should determine the effectiveness of HaH initiatives across different clinical areas, identify effective features, and be used to determine the optimal implementation and management of HaH in different settings.

Conference proceeding

Inequity in Vaccine Access: Variation in Vaccine Decision-Making Processes Across Five Countries

YHEC authors: Emily Gregg, Charlotte Graham, Karina Watts, Karin Butler, Stuart Mealing

Publication date: November 2024

Conference: ISPOR EU, Barcelona

Type of conference proceeding: Poster

Abstract

OBJECTIVES: Quick and equitable vaccine access is a global priority. However, the vaccine assessment is complex, meaning there is variation in vaccine schedules between countries and limited opportunities to develop a centralized framework. This heterogeneity further contributes to inequity of vaccine access. This work aims to increase awareness of the key vaccine assessment elements in EU and non-EU countries.

METHODS: Pragmatic desk-based research was conducted in June 2024 to explore the key stages involved in vaccine market access and how these differ across England, Italy, Germany, France and the United States (US). Where available, data were extracted about the stakeholders involved in vaccine appraisal, the key assessment factors and value framework considered, and the number and type of vaccines included in the national vaccination schedule.

RESULTS: National Immunization Technical Advisory Groups (NITAGs) are key stakeholders in all five countries but play different roles. In some countries the NITAG is solely responsible for vaccine appraisal (England, Germany, US), but in other countries the vaccine appraisal is conducted by the health technology assessment (HTA) body (Italy) or both the NITAG and HTA body in parallel (France). There are key differences in the vaccine assessment by NITAGs and HTA bodies. For example, NITAGs consider public health impact, which is not considered by HTA bodies. There are also differences in the value framework between NITAGs in different countries. For example, only England's NITAG formally considers the disease impact on quality of life of carers. Consequently, the key vaccine assessment factors differ between countries, resulting in a different number of vaccines in each country's vaccination schedule.

CONCLUSIONS: Several between-country differences in vaccine market access are identified; for example, the role of NITAGs, vaccine assessment factors, and value frameworks. Vaccine developers should consider these results when planning market access strategies to ensure rapid and equitable vaccine access across countries.

Conference proceeding

Inferiority Complex: Challenges in Clinical Equivalence and Non-Inferiority Trials in Health Technology Assessment

YHEC authors: Matthew Taylor, Joe Goldbacher, Charlotte Graham

Publication date: November 2024

Conference: ISPOR EU, Barcelona

Type of conference proceeding: Poster

Abstract

OBJECTIVES: Non-inferiority and clinical equivalence clinical trials can be used to determine whether a health technology is, at least, no worse than an existing treatment. There is a large body of literature and guidance on this topic, with substantial variation in definitions and practice, which can make it challenging to robustly demonstrate or assess claims of non-inferiority or clinical equivalence. This study aimed to provide actionable recommendations in the appraisal of claims of non-inferiority and clinical equivalence.

METHODS: International guidelines and published literature were reviewed to identify approaches for the conduct and reporting of non-inferiority or clinical equivalence studies. Guidelines from health technology assessment (HTA) and regulatory bodies were considered, and literature reviews from 2010 to 2023 were identified. The results of the reviews were supplemented with findings from an expert panel and synthesized to form a series of recommendations, using case studies from the National Institute for Health and Care Excellence (NICE).

RESULTS: The majority of guidelines (13/14) discussed, to varying extents, methods to determine the non-inferiority margin and how the analysis should be conducted. Despite this, the rationale for the margin was not reported in over 50% of 273 blinded randomized controlled trials between 1966 and 2015. Evidence of non-inferiority or clinical equivalence presented in NICE Medical Technology Evaluation Program appraisals (for health technologies) is often of lower quality than in Technology Appraisal appraisals (for pharmaceuticals), increasingly concluding that further evidence generation is required.

CONCLUSIONS: Despite clear guidance, the quality of reporting in non-inferiority and clinical equivalence trials is consistently poor. Prior to presentation of trial evidence, HTA submissions that claim non-inferiority or equivalence should present the technical, biological and/or pharmacokinetic reasonings that support the claim. HTA bodies should introduce more precise definitions of non-inferiority and clinical equivalence so that evidence standards are more likely to be met.

Conference proceeding

Integrating Large Language Models Into an Existing Review Process: Promises and Pitfalls

YHEC authors: Mary Edwards, Lavinia Ferrante di Ruffano

Publication date: November 2024

Conference: ISPOR EU, Barcelona

Type of conference proceeding: Poster

Abstract

OBJECTIVES: The recent development and rise in accessibility of large language models (LLMs) has generated excitement around their possibilities for reducing the resource burden of conducting reviews. Following testing, we assessed the cost, accuracy, and accessibility of LLMs to reviewers, and consider what types of reviews LLMs are currently best suited to assist with.

METHODS: We conducted internal testing of a LLM, Claude 3 Opus, via the chat interface. We used the tool to conduct high level data extraction for a targeted review, highly granulated extraction for a systematic review, and risk of bias assessment of RCTs.

RESULTS: The LLM via a chat interface was highly accessible, inexpensive, and saved significant time in conducting high level qualitative data extraction for a pragmatic review. Outputs were standardized and easy to manipulate and integrate into our existing work process. Extracting accurate granular data for a systematic review proved more difficult, with the model failing to interpret complexities of patient flow, struggling to respond accurately to lengthy, detailed prompts, and the subsequent checking, correcting, and formatting outweighing any time saved. The model identified some relevant content for conducting risk of bias assessment with the Cochrane RoB 1 tool, although lacked context, and human judgement was needed for final decision making.

CONCLUSIONS: LLM chat interfaces offer significant time savings for pragmatic reviews, although copyright issues exist in uploading published papers for synthesis. Optimal performance for systematic reviews is unlikely to be achieved without fine tuning a version of the model with archive data. This process is currently costly, commercial confidentiality must be considered, and the skill set required is outside the scope of many review teams. Developers should ensure that any LLM based tools for reviewing can be integrated into clients' existing processes with the use of standardized import and export formats such as CSV or RIS.

Conference proceeding

Investigating Input Correlation in Probabilistic Sensitivity Analysis

YHEC authors: Matthew Taylor, Erin Barker, Harriet Fewster, Emily Gregg

Publication date: November 2024

Conference: ISPOR EU, Barcelona

Type of conference proceeding: Poster

Abstract

OBJECTIVES: Probabilistic sensitivity analysis (PSA) is used to characterize uncertainty in cost-effectiveness models. Inputs in PSA are often varied independently even when they may be correlated. This study investigated the effects of input correlation on PSA outputs.

METHODS: A Markov model was developed using R and Shiny to compare a hypothetical treatment and comparator. Three options were built into the model: no correlation (inputs varied independently); part correlation (correlation within but not between costs, utilities and transition matrices); and full correlation (correlation between all inputs). Inputs which improved the incremental cost-effectiveness ratio (ICER) were positively correlated with each other and negatively correlated with inputs which worsened the ICER, and vice versa. The treatment cost and the number of health states and health state costs were varied in scenario analyses to determine the circumstances in which correlation had the largest impact.

RESULTS: While the ICER was comparable across all correlation options, the likelihood of cost-effectiveness differed substantially from 61% to 93%. In all scenarios, the 'no correlation' option displayed the most certain likelihood (closest to either 0 or 1) of cost-effectiveness, while the least certain was produced by the full correlation option. The greater the complexity of the model (i.e. the greater the number of health states), the more pronounced the difference between correlating or not correlating inputs. Counterintuitively, correlating inputs increases uncertainty because it allows for a greater number of 'extreme' scenarios to be generated, whereas allowing independent generation of large numbers of inputs tends to lead to a 'cancelling out' effect. This effect is most pronounced when the ICER is moderately close to the willingness-to-pay threshold.

CONCLUSIONS: This analysis demonstrates that input correlation can have a substantial impact on the level of certainty in model outputs, and by ignoring this, the model may be over- or under-stating the true level of confidence.

Conference proceeding

Large Language Models for Data Extraction in a Systematic Review: A Case Study

YHEC authors: Mary Edwards, Lavinia Ferrante di Ruffano

Publication date: November 2024

Conference: ISPOR EU, Barcelona

Type of conference proceeding: Poster

Abstract

OBJECTIVES: A typical systematic review includes extraction of highly granulated data in a standardized format, a resource intensive part of the review process. We investigated whether the chat interface to a large language model (Claude 3 Opus) could provide time savings in extracting such data while retaining the accuracy necessary for a systematic review.

METHODS: A data extraction sheet from a completed review of biologic treatments was selected. A set of prompts was designed to obtain details of the methods, interventions, and populations assessed by three of the included studies. Each paper was uploaded individually, and the results were copied into the original data sheet and compared with those produced and checked by two independent human reviewers. Testing of outcome extraction was also conducted.

RESULTS: In order to produce suitably formatted granular data, prompts were detailed and consistently structured. Although the model successfully extracted details of the intervention (including dose, scheduling and duration of treatment) and population (including age, gender, duration of disease, and exon10 variants) assessed in each arm, it struggled to interpret complex patient flow through the studies. Primary outcomes in the ITT population were successfully extracted, but extraction of secondary outcomes, subgroups, and outcomes at different timepoints proved much less reliable.

CONCLUSIONS: While chat interfaces to LLMs may provide some time savings in extracting basic study data, such interfaces do not lend themselves to the detailed prompts required for successful extraction of more complex data. Accessing a LLM outside a chat interface can be costly, and requires a skillset not possessed by the majority of reviewers; organizational investment may therefore be needed to facilitate productive access. Fine-tuning using archive data also raises issues of commercial confidentiality. A market is emerging for companies providing affordable access to a protected model, accessible only to the customer and fine-tuned to their needs.

Conference proceeding

Large Language Models for Data Extraction in a Targeted Review: A Case Study

YHEC authors: Mary Edwards, Lavinia Ferrante di Ruffano

Publication date: November 2024

Conference: ISPOR EU, Barcelona

Type of conference proceeding: Poster

Abstract

OBJECTIVES: Accurate, consistent extraction and presentation of data is a time-consuming process. Large language models (LLMs) accessed via a chat interface require minimal user training and perform tasks without any setup overheads beyond an initial phase of prompt engineering. We assessed the chat interface to Claude 3 Opus for accuracy, consistency, presentation of data, and time savings in the context of high-level extraction for a targeted review.

METHODS: A targeted review was conducted to investigate disparities in patient characteristics in the diagnosis and treatment of one specific indication. We used the chat interface to Claude to extract data from 30 papers, with a human reviewer checking all data points. Study and population details were extracted, plus brief details of any study results or discussion regarding disparities in diagnosis or treatment. Papers were uploaded in pairs to minimize prompts, with the model explicitly tasked with labelling each set of data points with the name of the relevant paper.

RESULTS: Data were consistently extracted in a suitably structured format. Eleven papers required no edits to the data. Five papers required minimal edits, and nine papers contained minor errors or omissions in the data. One paper was extracted correctly but the answers reported by the model also contained additional data drawn from the second paper of the pair. Another pair of papers was extracted by the model and mislabeled, with data for each paper labelled with the file name of the other paper. Following this error, PDFs were uploaded singly.

CONCLUSIONS: Even allowing time for human checking and minor correction of the extracted data, use of the model enabled extraction and checking of 30 papers in a single day. Access to LLMs via a chat interface is typically relatively inexpensive and can offer significant resource savings in the context of suitable reviews.

Home Sweet Hospital: Evaluating Evidence Gaps and Future Research Priorities for Hospital at Home

Abstract

Inequity in Vaccine Access: Variation in Vaccine Decision-Making Processes Across Five Countries

Abstract

Inferiority Complex: Challenges in Clinical Equivalence and Non-Inferiority Trials in Health Technology Assessment

Abstract

Integrating Large Language Models Into an Existing Review Process: Promises and Pitfalls

Abstract

Investigating Input Correlation in Probabilistic Sensitivity Analysis

Abstract

Large Language Models for Data Extraction in a Systematic Review: A Case Study

Abstract

Large Language Models for Data Extraction in a Targeted Review: A Case Study

Abstract

Useful links

Contact us

Connect with us