What is most important to determine when understanding missing data?
What is most important to determine when understanding missing data?
A central concern in the treatment of missing data is the identification of the underlying mechanism responsible for the missingness. Graham (2009) underscores the necessity of distinguishing among data that are Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR), as this classification directly informs the selection of appropriate statistical techniques and the potential for bias in estimation. Similarly, Peng, Harwell, Liou, and Lee (2006) emphasize that a clear understanding of the missingness mechanism is foundational to choosing a valid analytic strategy and ensuring the integrity of statistical inference.
What are the advantages and disadvantages of common missing data methods?
Traditional approaches to handling missing data, such as listwise deletion, remain prevalent due to their simplicity and ease of implementation. However, these methods frequently result in biased parameter estimates and diminished statistical power, particularly when the data are not Missing Completely at Random (MCAR) (Peng et al., 2006; Graham, 2009). Pairwise deletion, while capable of retaining more data points, may yield inconsistent results across analyses due to variations in sample sizes and covariance structures. In contrast, more sophisticated techniques such as Multiple Imputation (MI) and Full Information Maximum Likelihood (FIML) offer improved accuracy and efficiency when data are Missing at Random (MAR), though they demand greater computational resources and careful attention to model specification (Graham, 2009). Although Osborne and Overbay (2004) primarily address the issue of outliers, their work underscores a broader methodological concern: the failure to account for data irregularities, whether outliers or missing values can significantly compromise the validity of statistical conclusions and should be approached with deliberate scrutiny.
When might one use a threshold or guideline in terms of when missing data should be estimated vs. deleted?
According to Graham (2009), when the proportion of missing data is relatively small meaning , typically less than five percent and the missingness mechanism is classified as Missing Completely at Random (MCAR), the use of deletion methods may be considered acceptable. However, as the extent of missingness increases or when the data exhibit patterns consistent with Missing at Random (MAR) or Missing Not at Random (MNAR), more advanced estimation procedures such as Multiple Imputation (MI) or Full Information Maximum Likelihood (FIML) are generally preferred due to their capacity to yield more accurate and less biased results. Peng, Harwell, Liou, and Lee (2006) caution against the application of rigid thresholds for determining acceptable levels of missingness. Instead, they advocate for a context-sensitive evaluation of how missing data may affect key variables and compromise the assumptions underlying statistical models. Their analysis underscores that even modest amounts of missing data can pose significant threats to validity when the missingness is systematically related to substantive predictors or outcomes.
Reference
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576. https://doi.org/10.1146/annurev.psych.58.110405.085530
Peng, C. Y. J., Harwell, M., Liou, P.-Y., & Lee, Y. (2006). Advances in missing data methods and implications for educational research. In S. Sawilowsky (Ed.), Real data analysis (pp. 31–78). Information Age Publishing.
Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why researchers should always check for them). Practical Assessment, Research & Evaluation, 9(6), 1–8. https://doi.org/10.7275/qf69-7k43
The post What is most important to determine when understanding missing data? first appeared on Nursing Worker.