Examining the reliability, correlation, and validity of commonly used assessment tools to measure balance

Abstract Objectives The Biodex SD Stability System has been shown to be a reliable assessment tool for postural stability. However, its ability to provide an accurate representation of balance has not been compared with functional performance measures such as the four‐square step test (FSST) and timed‐up‐and‐go test (TUG). The purpose of this study was to investigate reliability, internal consistency, and construct validity of FSST, TUG, and Biodex SD (limits of stability [LOS] and modified Clinical Test of Sensory Organization and Balance [m‐CTSIB]). Methods An observational reliability and validity study was conducted. A convenience sample of 105 healthy adults, 77 females and 28 males, mean age 24.5 years old (± 4.66 SD) performed balance assessments including the FSST, TUG, Biodex SD LOS, and m‐CTSIB. For LOS, the overall percentage and test duration were recorded. For m‐CTSIB, the overall Sway Index was recorded. Condition 1 of the m‐CTSIB represented simple postural stability. Results The Biodex SD LOS overall percentage, TUG, and FSST showed strong to excellent test‐retest reliability (ICC [3, 1] = .83 [mean 1: 58.14, mean 2: 60.54], .88 [mean 1: 6.98 seconds, mean 2: 6.91 seconds], .92 [mean 1: 6.29 seconds, mean 2: 6.14 seconds], respectively), while the Biodex SD m‐CTSIB overall percentage demonstrated strong test‐retest reliability (ICC [3, 1] = .75 [mean 1: 1.18, mean 2: 1.18]). The LOS test duration showed moderate test‐retest reliability (ICC [3, 1] = .58 [mean 1: 38.55 seconds, mean 2: 37.10 seconds]), while the m‐CTSIB condition 1 showed poor test‐retest reliability (ICC [3, 1] = .24 [mean 1: 0.63, mean 2: 0.66]). Weak construct validity was found between TUG, FSST, and Biodex SD measures of LOS and m‐CTSIB (r values = −0.15‐0.22). Conclusion It is suggested that clinicians use more than one measure to assess different aspects of a patient's balance deficits to better guide treatment and intervention.


| INTRODUCTION
Balance and postural control are essential to ensuring not only safe activities of daily living for individuals, but for the performance of safe locomotion in general. These two components of human performance serve as a foundation of stability prior to achieving more complex controlled mobility and skilled activities such as independent standing and walking. 1 Postural control, or stability, represents an individual's capacity to maintain an upright position during both static and dynamic conditions, with or without the application of external perturbation or displacement of support surface. [1][2][3][4] Posture is an angular measure from vertical describing the gravitational vector of the body's orientation. 5 Static balance is often defined as a person's ability to maintain control of their center of mass (COM) over a fixed base of support (BOS) while on a firm, flat surface. Even during static or quiet stance, researchers differ on the most important variables (eg, center of pressure, COM, difference between these variables 5 :), which can make it more complicated when attempting to select an assessment tool. Dynamic balance, on the other hand, refers to a person's ability to maintain postural control of their COM over a fixed BOS while either the surface is no longer firm or flat, or while the individual is reaching or performing other extremity movements while maintaining balance. 6 Additionally, functional balance is a person's ability to maintain control of their COM over a moving BOS, or while performing a more complex controlled mobility or skilled activity. 7 Pickerill and Harter 7 identify three key problems with the current methods for balance assessment: nomenclature, criterion standards, and technology. The terms "balance" and "postural stability" are often considered interchangeable in rehabilitation sciences due to the lack of standardized nomenclature and operationalization. It is important that a clinician understands what aspect of balance is being assessed in order to make appropriate testing and treatment decisions. There is a lack of a single evaluative construct that defines good or normal dynamic balance.
Generally, in terms of balance and postural measures, there is a lack of reliability and validity data supporting the utilization of any one method as the best objective tool to capture a comprehensive balance component of a musculoskeletal and neuromuscular examination. This makes it extremely problematic for clinicians and researchers interested in postural stability assessment to not only accurately identify and adequately describe balance deficits at an initial examination or at baseline in a research study, but to be certain that the selected intervention or treatment provided adequate improvements in balance at the time of re-assessment or follow-up visit. Pickerill and Harter 7 compared the Biodex Stability System dynamic limits of stability (LOS) test (Biodex Medical Systems, Shirley NY), which challenges patients to move and control their center of gravity within their BOS and is a good indicator of dynamic control within a normalized sway envelope, 8 to neuroCOM smart balance master dynamic LOS test (Natus Medical Incorporated, Pleasanton CA). Authors found a low correlation between the two stability tests revealing they measure distinctly different constructs of postural stability. The concurrent and construct validity of either LOS test were not established by the aforementioned study, and these authors recommended further research aimed at repeating this study with a clinical population. 7 Neither, however, was compared with commonly used clinical or functional dynamic balance measures such as timed-up-and-go (TUG) and four-square step test (FSST); therefore, it would be beneficial to examine the correlation between computerized posturography and clinical outcome measures, to identify the level of construct validity, if any, between common measures of dynamic balance. This would give clinicians and researchers more information regarding the properties of these various assessment tools to assist with determination of the most appropriate tools to select during an examination or screening. Hinman (2000) describes the differences in test-retest reliability of balance measures produced by the Biodex Balance System in a summary of four studies. In each study, subjects had to perform two 30-second tests under varying conditions. Test-retest reliability of the subjects' LOS and overall stability index (SI) were both computed.
The interclass correlation coefficients (ICC) for the overall SI ranged from .44 to .89 for the static balance tests. The ICCs for the LOS tests, on the other hand, ranged from .64 to .89, demonstrating less variability than static measures. As these ranges are rather large, further research must be done to better establish the reliability of the LOS tests with the Biodex Stability System (Hinman, 2000). 9 It should also be taken into consideration that these particular studies did not include comparison of the Biodex measures to more clinical measures such as the TUG or FSST. Identifying the construct validity of these Biodex measures and determining the association when compared with commonly used functional assessments will be pertinent to the clinical world. Clinicians need to have knowledge and understanding of the particular construct the tool used is actually measuring. This information is instrumental in the development of a plan of care and allows an accurate representation of a patient's baseline in order to guide treatment and intervention.
Balance is a "generic" term and serves as the foundation of stability prior to more controlled mobility, yet there is no "gold standard" for its measurement. 5,7,10 Balance may be difficult to capture in a single assessment, as it is a complex construct with reliance on multiple afferent and efferent physiological systems including vision, somatosensory, and vestibular. 11 As often times clinicians attempt to use a single assessment tool during the examination, it is, therefore, necessary to gather a more thorough understanding of which tools are best at providing the necessary information required for clinical decision making. Functional performance measures such as the FSST 12 and the TUG 13

| Study design and subjects
This study was an observational, closed cohort design, and consisted of a convenience sample of 105 healthy adults, 77 females and 28 males, mean age 24.5 years old (± 4.66 SD), who met the specific inclusion criteria: at least 18 years old and generally healthy. Individuals were excluded from the study if they had any current musculoskeletal injury, visual impairment that affected their daily living, vestibular disorder, neurological disorder, cognitive disorder, and any further medical condition that would prohibit them from participating safely in the chosen balance measures. Subjects were recruited from the University of Central Florida, where the study was conducted between March 2016 and May 2016. The Institutional Review Board at the University of Central Florida approved this study (SBE-16-12078), and verbal informed consent was collected from all participants prior to the start of data collection.

| Materials
Timed-up-and-go (TUG) 13 : The TUG is a widely used clinical performance-based assessment tool used to measure an individual's lower extremity function, mobility, and fall risk. The TUG is able to correctly identify fallers and nonfallers with 87% sensitivity and specificity, and has a suggested cutoff point of 13.5 seconds. 13,14 The participant was asked to start seated, with their back against a standard height chair, without armrests. At the start of the timer and the investigator's "Go!" command, the participant stood up from the chair and walked at a normal, comfortable pace for 10 feet (3 meters) to a line on the floor, where they turned around and returned to a seated position in the chair (see Figure 2). The investigator stopped the timer when the participant's buttocks touched the chair. Two trials were performed for each participant, and both times were recorded.
Four-square step test (FSST) 12 : The FSST is a test of dynamic balance that assesses a person's ability to step over obstacles in three directions of motion: forwards, backwards, and sideways. Populations tested by this assessment tool include geriatrics and those suffering from Parkinson's disease, stroke, transtibial amputation, and vestibular disorders. [15][16][17][18] A cut-off score of 15 seconds serves as the threshold of older adults at risk for multiple falls, with a specificity of 88% and a sensitivity of 85%.
The test setup consisted of four canes of the same width in a cross formation (see Figure 3). The participant was instructed to step into each square, labeled 1 through 4, in a clockwise sequence: 2, 3, 4, 1, 4, 3, 2, and 1 (see Figure 2). The participant was asked to complete the sequence as fast as possible without hitting the equipment.
Each subject was allotted one practice run if necessary, and then two trials were performed and recorded for each participant. A lower time recorded in seconds reflected better performance on this measure. high-resolution color touch-screen LCD display, and a color printer with stand for printing results of testing assessments. Two different testing protocols were used in this study: The LOS test and the modified-Clinical Test of Sensory Organization and Balance (m-CTSIB).

1) Limits of stability (LOS):
The LOS test challenges the participants to move and control their COM to remain within their BOS. It serves as an indicator of dynamic control within a normalized sway envelope. An individual's LOS for standing balance is the maximum angle that body can achieve from vertical without losing balance. Once an individual exceeds their individual LOS, a fall, stumble, or step may ensure. The LOS in normal adults is defined as eight degrees anterior, four degrees posterior, and 16 degrees in the lateral direction. 19 In this study, the default setting for the LOS test, which is 75% LOS, was used. This reflects a moderate skill level.   Limits of stability test (Biodex SD balance) line to each target; higher percentage thus reflected better performance. The overall percentage itself represents the amount of deviation from a straight pathway to the targets.

2) Biodex Balance System SD modified Clinical Test of Sensory Organization and Balance (m-CTSIB):
This test has been well documented in the literature as an effective test for identifying individuals with mild to severe balance deficits, as it also isolates which system is impaired. 20,21 The test protocol is meant to provide a generalized assessment of an individual's ability to both integrate various senses with respect to balance and to compensate when one of more of these senses has deficits. After each of the two trials, the results were printed and recorded.

| Procedure
Participants attended a single session for data collection, which consisted of a short demographic questionnaire (eg, age, highest education received) and a battery of balance assessments, including the FSST, TUG, Biodex SD LOS, and Biodex SD m-CTSIB.
Each of the participants performed the four balance assessments.
First, each subject performed two trials of the TUG. Next, each study participant was allotted one practice trial of the FSST prior to the performance of two timed trials. Next, the participant performed two separate balance assessments on the Biodex Balance System SD. Each participant's feet were positioned onto the platform using the default values based on their individual height. The first test to be performed was the LOS. The default setting for this test is 75% LOS (moderate still level). Limits of stability hold times were defaulted to 0.25 seconds, and each rest countdown in between trials lasted 3 seconds. Two trials were performed. Finally, each participant performed the m-CTSIB. The participant assumed the same foot position as they did in the LOS test.
Each of the four conditions lasted for 30 seconds, and two trials of all four conditions were performed.

| DISCUSSION
Clinicians have used the terms "balance," "postural stability," and The results, however, left uncertainty regarding the particular construct being measured.  between varying conditions to determine which system is affecting their balance ability (eg, proprioception, vision, vestibular), or to produce a computerized report to document improvements in a patient following rehabilitative interventions.

| Limitations and future research
While these findings contribute significantly to the understanding of postural stability and balance, it should be noted that the study included a fairly homogenous sample of healthy adults; therefore, generalization of these results to clinical populations or samples of varying ages may not be appropriate. Additionally, it should be noted that the testing ordered was maintained for all participants; therefore, there is a potential for practice effects between trials of the assessments.

| CONCLUSION
Balance is a complex construct, and it is recommended that clinicians understand this, as we encourage the utilization of multiple balance assessment tools to capture the entire picture of an individual's balance. Based on results of this study, it is suggested that clinicians use more than one balance test to assess different aspects of balance based on patient deficits to better guide treatment and intervention. It is important to take into account that while all of these outcome measures do look at components of balance, none of them can serve as a complete, single evaluative construct of balance itself.