How to construct and implement script concordance tests: insights from a systematic review

Authors

  • Valérie Dory,

    1. Fonds de la Recherche Scientifique - FNRS
    2. Institute of Health and Society (IRSS), Université catholique de Louvain, Brussels, Belgium
    Search for more papers by this author
  • Robert Gagnon,

    1. Centre de Pédagogie Appliquée aux Sciences de la Santé, Faculty of Medicine, University of Montreal, Montreal, Quebec, Canada
    Search for more papers by this author
  • Dominique Vanpee,

    1. Institute of Health and Society (IRSS), Université catholique de Louvain, Brussels, Belgium
    2. Emergency Department, Centre Hospitalier Universitaire Mont-Godinne, Université catholique de Louvain, Yvoir, Belgium
    Search for more papers by this author
  • Bernard Charlin

    1. Centre de Pédagogie Appliquée aux Sciences de la Santé, Faculty of Medicine, University of Montreal, Montreal, Quebec, Canada
    Search for more papers by this author

Institute of Health and Society, Université catholique de Louvain, Clos Chapelle-aux-champs 30 boîte B1.30.15, 1200 Brussels, Belgium. Tel: 00 32 2 764 3471; Fax: 00 32 2 764 3470; E-mail: valerie.dory@uclouvain.be

Abstract

Medical Education 2012: 46: 552–563

Context  Programmes of assessment should measure the various components of clinical competence. Clinical reasoning has been traditionally assessed using written tests and performance-based tests. The script concordance test (SCT) was developed to assess clinical data interpretation skills. A recent review of the literature examined the validity argument concerning the SCT. Our aim was to provide potential users with evidence-based recommendations on how to construct and implement an SCT.

Methods  A systematic review of relevant databases (MEDLINE, ERIC [Education Resources Information Centre], PsycINFO, the Research and Development Resource Base [RDRB, University of Toronto]) and Google Scholar, medical education journals and conference proceedings was conducted for references in English or French. It was supplemented by ancestry searching and by additional references provided by experts.

Results  The search yielded 848 references, of which 80 were analysed. Studies suggest that tests with around 100 items (25–30 cases), of which 25% are discarded after item analysis, should provide reliable scores. Panels with 10–20 members are needed to reach adequate precision in terms of estimated reliability. Panellists’ responses can be analysed by checking for moderate variability among responses. Studies of alternative scoring methods are inconclusive, but the traditional scoring method is satisfactory. There is little evidence on how best to determine a pass/fail threshold for high-stakes examinations.

Conclusions  Our literature search was broad and included references from medical education journals not indexed in the usual databases, conference abstracts and dissertations. There is good evidence on how to construct and implement an SCT for formative purposes or medium-stakes course evaluations. Further avenues for research include examining the impact of various aspects of SCT construction and implementation on issues such as educational impact, correlations with other assessments, and validity of pass/fail decisions, particularly for high-stakes examinations.

Ancillary