Published: October 2016

Last updated: September 2025

Missing values

Missing values occur when no data value is stored for a variable of interest in a study observation. In most study datasets, there are missing values; the amount and type of missing values can have an important impact on the conclusions that can be drawn from the data. If the likelihood of data being missing is related to (a) the study outcome or (b) to other explanatory variables, then the study results will be biased. For example, if the prevalence of a health condition is related to age and (a) older people are less likely to report whether they have the condition, then the study will under-report the prevalence of the condition; (b) if older people are less likely to provide their age, then the study will under-report the relationship between the health condition and age.
Missing data can be classified into three types:
• Missing completely at random (MCAR): The probability of a value being missing is unrelated to any other observed or unobserved data. If data are MCAR, analyses that exclude missing values will not be biased, but the results will have greater uncertainty.
• Missing at random (MAR): The probability of a value being missing is related to other observed variables in the dataset, but not to the unobserved missing value itself.
• Missing not at random (MNAR): The probability of a value being missing is directly related to the unobserved value itself.
For MNAR and MAR, it is important to understand the patterns of missingness and to use appropriate statistical techniques to control for possible bias (in the case of MAR, some statistical analyses may be unbiased). Simple exclusion of study subjects with missing values will usually maintain or increase the bias in the results. Typical methods used to adjust for missing values include imputation (various methods exist), partial deletion, inverse propensity weighting or more complicated maximum likelihood estimation. In economic modelling, sensitivity analysis may be used to explore the impact of ‘best case’ and ‘worst case’ assumptions about missing data in source studies.

You may also be interested in