Laenen, Alonso, and Molenberghs (2007) and Laenen, Alonso, Molenberghs, and Vangeneugden (2009) proposed a method to assess the reliability of rating scales in a longitudinal context. The methodology is based on hierarchical linear models, and reliability coefficients are derived from the corresponding covariance matrices. However, finding a good parsimonious model to describe complex longitudinal data is a challenging task. Frequently, several models fit the data equally well, raising the problem of model selection uncertainty. When model uncertainty is high one may resort to model averaging, where inferences are based not on one but on an entire set of models. We explored the use of different model building strategies, including model averaging, in reliability estimation. We found that the approach introduced by Laenen et al. (2007, 2009) combined with some of these strategies may yield meaningful results in the presence of high model selection uncertainty and when all models are misspecified, in so far as some of them manage to capture the most salient features of the data. Nonetheless, when all models omit prominent regularities in the data, misleading results may be obtained. The main ideas are further illustrated on a case study in which the reliability of the Hamilton Anxiety Rating Scale is estimated. Importantly, the ambit of model selection uncertainty and model averaging transcends the specific setting studied in the paper and may be of interest in other areas of psychometrics.