GEE for longitudinal ordinal data: comparing R-replr, R-ordgee, SAS-GENMOD, SPSS-GENLIN

Nooraee, N; Molenberghs, Geert; van den Heuvel, ER

doi:10.1016/j.csda.2014.03.009

GEE for longitudinal ordinal data: comparing R-replr, R-ordgee, SAS-GENMOD, SPSS-GENLIN

Author:

Nooraee, N

Molenberghs, Geert ; van den Heuvel, ER

Keywords:

Science & Technology, Technology, Physical Sciences, Computer Science, Interdisciplinary Applications, Statistics & Probability, Computer Science, Mathematics, Correlated ordinal data, Generalized estimating equations, Copula, Multivariate logistic distribution, Bridge distribution, GENERALIZED ESTIMATING EQUATIONS, STATISTICAL SOFTWARE PACKAGES, ORDERED CATEGORICAL-DATA, REGRESSION-MODELS, MAXIMUM-LIKELIHOOD, BINARY DATA, SCORE DATA, ASSOCIATION, 0104 Statistics, 0802 Computation Theory and Mathematics, 1403 Econometrics, 3802 Econometrics, 4905 Statistics

Abstract:

Studies in epidemiology and social sciences are often longitudinal and outcome measures are frequently obtained by questionnaires in ordinal scales. To understand the relationship between explanatory variables and outcome measures, generalized estimating equations can be applied to provide a population-averaged interpretation and address the correlation between outcome measures. It can be performed by different software packages, but a motivating example showed differences in the output. This paper investigated the performance of GEE in R (version 3.0.2), SAS (version 9.4), and SPSS (version 22.0.0) using simulated data under default settings. Multivariate logistic distributions were used in the simulation to generate correlated ordinal data. The simulation study demonstrated substantial bias in the parameter estimates and numerical issues for data sets with relative small number of subjects. The unstructured working association matrix requires larger numbers of subjects than the independence and exchangeable working association matrices to reduce the bias and diminish numerical issues. The coverage probabilities of the confidence intervals for fixed parameters were satisfactory for the independence and exchangeable working association matrix, but they were frequently liberal for the unstructured option. Based on the performance and the available options, SPSS and multgee, and repolr in R all perform quite well for relatively large sample sizes (e.g. 300 subjects), but multgee seems to do a little better than SPSS and repolr in most settings. © 2014 Elsevier B.V. All rights reserved.