The International Journal of Psychosocial Rehabilitation

The Validity of Occupational Performance Measurement in
 Psychosocial Occupational Therapy:
A Meta-Analysis using the Validity Generalization Method

 


Moses N. Ikiugu, PhD, OTR/L
Associate Professor and Director of Research


Lynne M. Anderson, MS, OTR/L
Assistant Professor and Clinical Fieldwork Coordinator
The University of South Dakota
School of Health Sciences
Occupational Therapy Department
414 E. Clark Street
Vermillion, SD 57069

 

 

Citation:
Ikiugu MN & Anderson LM (2009). The Validity of Occupational Performance Measurement in
 Psychosocial Occupational Therapy: A Meta-Analysis using the Validity Generalization
 Method. International Journal of Psychosocial Rehabilitation. Vol 13(2).   77-91

 

Acknowledgments: We would like to thank Angela (Dyczek) Wortman, MS, OTR/L for her assistance in data gathering during the initial stages of the study. 

This paper was presented at the 7th Annual Research Conference of the Society for the Study of Occupations (SSO):USA in Fort Lauderdale, Florida, in October 2008.


Abstract
The purpose of this study was to determine an estimate of the mean validity and generalizability from research to clinical settings of occupational performance measurement scores through a meta-analysis of findings from 19 studies. We used the validity generalization (VG) method developed by Hunter and Schmidt (2004). Our analysis indicated that the mean weighted validity coefficients were small according to interpretation guidelines outlined by Cohen (1988). Scores based on self-report assessments had the highest mean validity. When variance of the coefficients was corrected for attenuation by sampling error and variability of the test criterion measurement reliability, less than 75% (suggested by Hunter and Schmidt to be the decision rule) was explained. This suggested that the validity of the instruments investigated in the 19 studies could not be transferred from research to clinical settings without need for further validation. Further meta-analysis is indicated for more definite conclusions to be reached.
Key Words:
Occupational Performance, Assessment, Validity


Introduction
In recent times, evidence-based practice has been advocated as a way of providing effective and efficient occupational therapy services (American Occupational Therapy Foundation [AOTF], 2007; Canadian Association of Occupational Therapists [CAOT], 2007; Coster, Gillette, Law et al., 2004; Law, Baum, & Dunn, 2005).  The most preferred type of evidence in medical disciplines is that derived from a meta-analysis of randomized controlled trials (Coster et al.; Depoy & Gitlin, 2005; Trombly & Ma, 2002). This is because a synthesis of research decreases the demand on therapists to retrieve and evaluate individual studies (Bennet & Townsend, 2006).

Polatajko (2006) suggested that there was a lack of enough research evidence available for synthesis in meta-analysis to support occupational therapy interventions.  She argued that for that reason, there was not enough data to support evidence-based practice in occupational therapy.  This could be due to limited research history and infrastructure in the profession (Ilott 2004).  Before enough evidence can be accumulated for summary and synthesis in order to support practice, data have to be adduced using valid and reliable research instruments.  Therefore, the first concern should be whether we have valid instruments to produce needed evidence.

In this paper, it is argued that evidence relevant to occupational therapy must be related to occupational performance, as defined by occupational therapists and occupational scientists.  Therefore, the issue at hand is whether or not we have valid instruments to measure occupational performance.  Having such instruments would provide a starting point for accumulation of the appropriate type of evidence to support evidence-based occupational therapy practice.

The purpose of this study was to investigate: (1) the robustness of mean validity coefficients of various types of occupational performance measurement scores after correction for sampling error and variability of the test criterion measurement reliability; and (2) whether the validity of the occupational performance scores could be generalized from research to clinical settings.  Following were the specific questions guiding this inquiry: (1) What were the mean weighted validity coefficients of the various types of occupational performance measurement scores after correction for sampling error and variability in test criterion reliabilities (by test criterion was meant the occupational performance indicators that were observed during measurement)? (2) Could the validity coefficients of occupational performance scores be generalized from research to clinical settings without need for further validation? In other words, if a therapist chose an occupational performance measurement instrument with documented validity, could it be assumed that such validity would hold in the clinical setting where he/she intended to use the instrument without need for further validation research? Answering the above questions would give the therapist confidence in defending h/her choice of occupational performance assessment instruments to clients, payers for occupational therapy services, the public, etc.              

Research Methods
The Validity generalization (VG) method was developed by Schmidt and Hunter (1977) and recently updated by Hunter and Schmidt (2004). It is a type of meta-analysis aimed at determining the overall validity of measurement scores for a given phenomenon, as well as whether such validity can be generalized from the research setting to other situations. This type of study tests the situation-specificity hypothesis (Callender & Osburn, 1980; Hunter & Schmidt, 2004; Pearlman, Schmidt, & Hunter 1980; Schmidt & Raju, 2007; Schmidt, Law, Hunter et al., 1993; Schmidt & Hunter 1977). The hypothesis, which was originally proposed by Schmidt and Hunter, was re-stated for the purpose of this study as follows: Occupational performance scores observed in assessments vary from situation to situation due to local modifiers [also known as “random effects” (Brannick & Hall, 2001, p. 1)]. Therefore, they cannot be generalized to situations other than where the validation study was conducted without requiring further research.

According to Schmidt and Hunter (1977), the situation-specificity hypothesis was in the past considered to be self-evident in personnel psychology (the field for which VG procedures were originally developed). It originated from the empirical observation that “considerable variability is observed from study to study in raw validity coefficients even when jobs (types of occupations in our study) and tests studied appear to be similar or essentially identical…” (Pearlman, Schmidt, & Hunter, 1980, p. 373). This meant that the validity of personnel selection methods could not be assumed for practical use even though past studies had indicated that they were valid. It also meant that general principles about personnel selection could not be developed because, “the inability to generalize validities makes it impossible to develop the general principles and theories that are necessary to take the field (personnel psychology) beyond a mere technology to the status of a science” (p. 374).

The same can be said of occupational performance. It can be argued that even when occupational performance measurement instruments have been proven valid, their validity only pertains to the research setting. A therapist cannot assume that such instruments are valid in a variety of clinical situations. The assumption that validity cannot be generalized would mean that general principles about occupational performance measurement cannot be developed. In order for therapists to have complete confidence in the validity of their occupational performance instruments, and in order to embark on the process of developing general principles about occupational performance measurement, the situation-specificity hypothesis of occupational performance measurement has to be proven null, hence, the need for VG research.  

In general, the VG method used in this study consisted of the following steps: 1) computation of an estimate of observed variance of occupational performance score validity coefficients for studies included in the analysis; 2) computation of an estimate of variance of occupational performance score validity coefficients attributable to statistical artifacts such as sampling error, differences in criterion reliability, differences in test reliability, reliability criterion contamination, computational and typographical errors, range restriction, etc. In the present study, due to limited information (such as lack of: independent variable reliability coefficients; range restriction data; etc.) provided in the studies that constituted our sample, we could not do a complete meta-analysis. Instead, we used Hunter and Schmidt’s (2004) “bare-bones” (p. 134) procedures to correct observed variability for variance due to differences among studies in sampling error and variability in test criterion reliability; 3) subtraction of variance due to statistical artifacts from observed variance; 4) division of the variance due to artifacts by observed variance; 5) Using the 75% decision rule suggested by Schmidt and Hunter (1977) and Hunter and Schmidt (2004) to accept or reject situation-specificity hypothesis based on how much of the observed variance could be accounted for by statistical artifacts; (6) Determining the robustness of the validity of occupational performance measurement by computing pre-attenuated (corrected) mean weighted validity coefficients and their standard deviations.   

We used the following formulas provided in the procedures described by Hunter and Schmidt:

Vobs=∑[Ni(ri-rm)2]/∑Ni           (1)

where Vobs = the observed variance of occupational performance score validity coefficients; Ni = the sample size associated with the ith validity coefficient; ri = the ith validity coefficient; rm = the weighted mean of the validity coefficients; and ∑Ni = the overall sample size associated with the validity coefficients across all studies included in the analysis.

The uncorrected weighted mean validity coefficients were calculated using the following equation:

rm=∑[Niri]/∑Ni                  (2)

where rm = uncorrected weighted mean validity coefficient; Ni = the sample size associated with the ith validity coefficient; ri = the ith validity coefficient; and ∑Ni = the overall sample size associated with validity coefficients across studies included in the analysis.

                The attenuating factor for criterion reliability measurement error for each reliability coefficient was calculated using the equation:

                a=√ri                                                       (3)

where a=attenuating factor, ri=the ith validity coefficient.

The mean attenuation factor for criterion reliability measurement errors was calculated using the equation:

                am=∑[√ri]/n                                           (4)                                          

where am=mean attenuating factor; ri=the ith reported validity coefficient; and n=number of validity coefficients in the analysis.

                The pre-attenuated (corrected) mean weighted validity coefficients were calculated using the equation:

                P=rm/am                                 (5)

where P=corrected weighted mean validity coefficient (an estimate of the true population validity coefficient); rm=uncorrected mean weighted validity coefficient; and am=mean attenuating factor.

                The variability of test criterion reliability attenuation factors was computed using the following equation:

                SDa2=∑(a-am)2/n-1                             (6)

where SDa2=Variance of attenuating factors.

                The sum of squared coefficients of the test criterion reliability variation was computed using the equation:

                V=SDa2/am2                                         (7)

                Variance of reported validity coefficients due to variability in test criterion reliability was computed using the equation:

                S2=P2am2V                                           (8)

The estimated variance due to sample size was calculated using the following equation:

Vs=(1-rm2)2/Nm-1                                (9)

where Vs = estimated variance due to sample size; rm = uncorrected weighted mean validity coefficient; and Nm = the mean sample size for all the 19 studies included in our analysis.

Residual variability was calculated by subtracting combined variance due to sampling error and variability in test criterion measurement from the observed variance thus:

VRes=Vobs-Vs-S2                          (10)

where VRes = true or residual variance; Vobs=Observed variance; Vs=Variance due to sampling error; S2=Variance due to study differences in test criterion measurement reliability. The percentage of variance accounted for by a combination of sampling error and variability in test criterion measurement reliability was obtained using the equation:

(Vs+S2)/VObs                                       (11)

                Finally, the pre-attenuated (corrected) residual variance was computed using the equation:

                V(P)=[Vobs-Vs-S2]/am2                      (12)

where V(P)=Corrected residual variance. The corrected standard deviation (SD(P)) was the square-root of the corrected residual variance (V(P)).

The above description of the procedures and the results of our study were emailed to Dr. Schmidt, the leading author of the VG method. His feedback indicated that our findings were generally correct [Schmidt, personal communication, January 17, 2008). In order to ensure generalizability of occupational performance score validity coefficients in the event that the situation-specificity hypothesis was rejected, we developed criteria for inclusion of studies in our analysis based on a consistent definition of occupational performance. Such criteria included that the following categories of occupational performance be measured in the validation study under consideration:

Self-maintenance (Basic ADLs such as dressing, bathing, toileting, and obtaining nutrition; and Instrumental Activities of Daily Living such as community mobility, shopping for clothes and other items, and shopping for groceries)

Productivity (unpaid work such as home management and care for family members; paid work; volunteering, etc.)

Leisure (both quiet and active recreation)

Education

Play (sports and child play)

Social participation (including community participation, family related occupations, and peer related occupations) (American Occupational Therapy Association [AOTA], 2002; Baum & Christiansen, 2005; Law et al., 2002)

The above listed criteria were derived from the new occupational therapy paradigm which emphasizes occupation-based, client-centered, and collaborative approach to intervention as central components of authentic occupational therapy practice (Ikiugu, 2007; Kielhofner, 2004; Law et al., 2002).

Occupational performance measurement scores reported in the studies included self-reported ratings of performance, observed performance (either by a therapist or by other people), interview-based assessment of performance, or a combination of any or all of the above (see discussion on “data coding” discussed below). Also, Schmidt et al. (1993) demonstrated that non-Pearson validity coefficients tended to over-estimate artifactual variance and therefore to under-estimate true variance. They recommended exclusion of such coefficients from the VG analysis. Therefore, Spearman rho, Cronbach’s alpha, Intraclass Correlation Coefficients (ICC), etc. were not included in our sample. The studies retrieved for our analysis investigated; convergent, divergent, criterion referenced, and predictive validities of occupational performance.

Study Sample
In VG research, individual studies constitute “the study subjects”. Such studies were identified by reviewing a book that we found in our literature search devoted to a discussion of occupational performance measurement in occupational therapy and occupational science (Law, Baum, & Dunn, 2005). We identified studies discussed in the book in which validity of occupational performance measurement scores had been investigated and requested the articles through the inter-library loan. One study was reported in a Master’s degree thesis which we also acquired through the interlibrary loan.

In addition, we searched a variety of electronic databases for relevant studies published between 1977 and 2007 (a 30 year span). Later, we updated the search to include studies published in 2008. Key phrases used in the search included “occupational and performance”, “occupational, performance, and measurement”, and “validity, occupational performance, and measurement instruments”. The databases searched included the EBSCO Mega FILE; Ovid (Cochrane database of systemic reviews, All EBM reviews); Cochrane, DSR, ACP, DARE & CCTR; Health and Psychosocial Instruments, Ovid Medline In-Process and other non-indexed citations and Ovid Medline 1950 to present; CINAHL; PsycINFO; and OT Search. The history and outcome of the search are reported in table 1.


Table 1
Obtaining the Data Sample: Search History and Outcome

Key phrase

Databases Searched and Number of Hits Realized

 

 

OT Search

EBSCO Mega FILE

Ovid

PsycINFO

Total

Occupational Performance

940

134

1285

266

2625

 

Occupational Performance Measurement

 

 

28

 

 

2

 

 

21

 

 

2

 

 

53

 

Validity of Occupational Performance Measures

 

 

25

 

 

0

 

 

2

 

 

0

 

 

27











 

As can be seen in table 1, the key words “occupational and performance” resulted in 2625 hits. When the scope was narrowed using the key words “occupational, performance, and measurement instruments”, the number of hits was reduced to 53. The terms: “validity, occupational performance, and measurement instruments” yielded 27 studies. Thus, 80 studies were found to be closely related to the purpose of our study (i.e., measurement of occupational performance). The abstract for each of the identified studies was reviewed to determine its specific relevance to our objective of determining the robustness and generalizability of the validity of occupational performance measurement scores. In all, 75 studies were retrieved and downloaded but only 25 of them were found to be relevant to our investigation. The 75 studies were downloaded online, copied from the bound Journal collection in the University of South Dakota (USD) library, or acquired through interlibrary loan.

When studies in which the validity estimate coefficients were non-Pearson were removed, only 14 studies remained. A later search for studies published between the years 2006 and 2008 revealed 5 more studies in which validity of occupational performance measurement using Pearson type validity estimates was investigated. Therefore, a total of 19 studies were included in our analysis. The studies are indicated in the reference list by asterisks (*). The instruments that were the subject of the studies were: Occupational Performance Calculation Guide (OPCG); All about me; Canadian Occupational Performance Measure (COPM); Functional Behavior Profile (FBP); World Health Organization Disability Assessment Schedule (WHO-DAS); Late Life Function and Disability Instrument (LLFDI); Barthel Index; Vineland Adaptive Behavior Scales (VABS); Assessment of Occupational Functioning (AOF); Functional Independence Measure (FIM); Assessment of Motor and Process Skills (AMPS); Occupational Abilities and Performance Scale (OAPS); Executive Function Performance Test (EFPT); 3-Day Physical Activity Recall Questionnaire (3dPAR); and the Activity Card Sort – Hong Kong version (ACS-HK). 

Data Coding
Data from the studies were categorized as follows:

Observation-based occupational performance – Data based on measurement of occupational performance by observation using instruments such as the Assessment of Motor and Process Skills (AMPS), Functional Independence Measure (FIM), etc.;

Interview-based occupational performance – Data based on measurement of occupational performance using interview based instruments such as the Canadian Occupational Performance Measure (COPM);

Self report-based occupational performance – Data based on clients rating of their perceived occupational performance using instruments such as the Role Checklist;

Combined Measures – Data based on measurement of occupational performance using instruments that combine observation, interview, and self-rating.

 Procedure
We read the articles retrieved from our search as explained above. We created a matrix in which we outlined the data as follows: study author; instruments that were the subject of the study; type of occupational performance scores gathered using the instrument (observation-based, interview-based, or self-report); type of occupational performance variable measured (e.g., productivity, leisure, play, etc.); type of validation study (convergent, divergent, criterion-referenced, or predictive); sample size (n), and validity estimate coefficient (r). We entered the validity data (study sample sizes and validity coefficients) as laid out in our data matrix into a Microsoft Office 2007 Excel Spreadsheet. Each of the researchers checked the entries at least twice in order to ensure accuracy.


Data Analysis

Based on equations 1 through 12 as explained above, we used the Excel spreadsheet to calculate the uncorrected weighted mean occupational performance validity coefficient for each occupational performance category, weighted mean validity coefficients corrected for attenuation by variability in the test criterion reliability, estimated observed variability of the coefficients, variability attributable to sampling error and variance in test criterion measurement reliability, residual variability, and percentage of observed variability explained by the two statistical artifacts whose attenuation was corrected in the analysis. We also computed credibility intervals of corrected weighted mean validity coefficients at 95% level (SD = 1.96) using the formula: CI=Corrected weighted mean validity coefficient±1.96 x SD(P), where CI=Credibility Interval, and 1.96=the critical value of the standard deviation at 95% confidence level (see Hunter & Schmidt, 2004 for explanation). <>

Findings

The findings of our data analysis are shown in Table 2.

Table 2
Validity Generalization Meta-Analysis Results for Various types of Occupational Performance Measurement Scores (n=19)

Type of OP Measure

Total N

Mean N

No. of rs

rm

am

p

Observed Variance

Vs+S2

% of Variance Acct.

Corrected Residual Variance (V(p))

SD(p)

95% CI

All Studies Combined

40837

91.98

444

.065

.676

.096

.259

.011

4

.544

.738

-1.40≤p≤1.49

Interview-Based (n=9)

36724

122.01

301

.060

.698

.086

.265

.008

3

.526

.725

-1.38≤p≤1.46

Observation-Based (n=6)

2455

45.46

54

.021

.531

.04

.204

.023

11

.643

.802

-1.56≤.p≤1.59

Self Report-Based (n=4)

1658

18.63

89

.228

.637

.358

.193

.055

29

.34

.583

-1.01≤.p≤1.28

 

 Key: n=Number of studies in the analysis; N = Sample size; r=Reported occupational performance score validity coefficient; rm=Uncorrected weighted mean occupational performance score validity coefficient; am = Mean attenuation factor for test criterion reliability (test criterion = occupational performance); p = pre-attenuation or corrected mean weighted validity coefficient; Vs+S2 = Combined variance due to sampling error and variability in test criterion measurement reliability; % of variance acct. = Percentage of occupational performance score validity coefficient variance accounted for by sampling error and test criterion reliability differences among studies; V(p) = Corrected or pre-attenuated residual variance of validity coefficients; SD(p) = Corrected or pre-attenuated standard deviation of validity coefficients; and CI =Credibility Interval (at 95% = p±1.96SD(p)).

 

Robustness of the Corrected Weighted Mean Validity Coefficients of Occupational Performance Scores

All studies combined
The mean weighted r for all the 19 studies was .065. The pre-attenuated mean validity coefficient (P) was .096. According to guidelines by Cohen (1988) and Kraemer, Morgan, Leech, Gliner et al. (2003), this constituted small (low) validity of occupational performance measurement assessments in comparison with typical instruments used in social science research.

Interview-based scores
As can be seen in table 2, the mean weighted r for interview-based occupational performance measurement was .06 (P=.086). According to Cohen and Kraemer et al., this coefficient similarly constituted low validity.

Observation-based scores

The weighted mean r for observation-based occupational performance measurement scores was .021 (P=.04). Again, this coefficient indicated low validity for this category of occupational performance measurement scores in comparison to typical instruments used in social science research.

Self report-based scores

The corrected mean weighted r for self report-based occupational performance scores was .23 (P=.36). This coefficient, according to Cohen and Kraemer et al., was still low but it was close to medium validity (r=.30) in comparison with typical instruments used in social science research. The pre-attenuated validity coefficient (P) was clearly in the medium validity range.

 

Generalizability of Validity Coefficients of Occupational Performance Scores

Overall generalizability
The residual overall variability of occupational performance validity coefficients after correction for attenuation by sampling error and variability of test criterion measurement reliability was .544, and about 4% of observed variance was explained by the two statistical artifacts. Based on the 75% decision rule, the situation-specificity hypothesis could not be rejected. This wide variability of validity coefficients was apparent in the CI whose pre-attenuated validity coefficient values ranged between P=-1.40 and 1.49 [variability by more than one Standard deviation at .05 confidence level (1.96)].  

Interview-based scores
The residual variability of interview based validity coefficients was .526. About 3 % of observed variance was explained by the artifacts. Again, based on the 75% decision rule, situation-specificity hypothesis could not be rejected for interview-based occupational performance measurement score validity. The CI of interview-based occupational performance pre-attenuated validity coefficients ranged between P=-1.38 and 1.46.

Observation-based scores
The corrected residual variability of the observation-based occupational performance scores was .643. About 11 % of the observed variance was attributable to sampling error and variability in test criterion measurement reliability. Therefore, the situation-specificity hypothesis could not be rejected, and as can be seen in table 3, the credibility interval was remarkably large (P=-1.56 to 1.59). 

Self report-based Scores
The corrected residual variability for self report-based occupational performance score validity coefficients was .34 (the lowest variability among all occupational performance measurement instruments in our sample). About 29% of the observed variance was attributable to the sampling error and variability in test criterion measurement reliability, again leading to failure to reject the situation-specificity hypothesis. 

Discussion

Validity Generalization analysis results in two mean validity coefficients: one attenuated (or modified) by statistical artifacts (in our case sample size and variability of test criterion reliability); and the other a pre-attenuated coefficient. Our interpretation of those mean coefficients was based on Cohen’s (1988) guidelines for determining the effect sizes and their importance. In applying Cohen’s notion of effect sizes, we can think of the effect of an assessment in terms of its ability to detect the indicators of occupational performance as defined in this study. In other words, it is a reference to the effectiveness of an assessment in detecting and measuring occupational performance as we have defined it. Cohen suggested that the effect sizes in social sciences were generally small in comparison to other disciplines because of attenuation of validity of the measures used and the subtlety of the variables involved. Consequently, he defined effect sizes (d) as follows: small (d=.20; r=.10); medium (d=.50; r=.30); and large (d=.80; r=.50).

Based on the above criteria, our analysis indicated that the weighted mean occupational performance score validity coefficients for all the 19 studies combined, interview-based, and observation-based occupational performance scores were small (r=.065, .060, and .021 respectively). The weighted mean validity coefficient for self-report based scores was small but approached the medium range (r=.23). However, Cohen’s interpretation criteria do not take into account the confounding effect of the sample size or the statistical significance. Of course, we need to remember that as Valentine and Cooper (2003) argued, statistical significance is not the best measure of effect size because it does not provide information about “practical significance or relative impact of the effect size” (p. 1, emphasis original).

Pearson and Hartley (1962) provided guidelines for interpretation of the statistical significance of r based on a chosen power level and p value. Based on Pearson and Hartley’s guidelines, the pre-attenuated validity coefficients (P) at .80 power level and p=.05 can be interpreted as follows: Overall, the mean pre-attenuated validity coefficient (P=.096) with a mean sample size of N=.91.98 was not statistically significant. The p value would need to be at least .196 in order for P to be significant for that mean sample size. Using the same criteria, the pre-attenuated interview-, observation-, and self-report-based validity coefficients were similarly not statistically significant [mean N=122.01, p=.086 (critical value=.196); mean N=45.46, p=.04 (critical value=.444); and mean N=18.63, p=.36 (critical value=.632 respectively]. Therefore, our analysis indicated that the mean weighted validity coefficients of occupational performance measurement instruments for the studies in our sample were not statistically significant. They were all small in comparison with validity of assessments used in social science research.

Our search of literature did not reveal other occupational performance validity generalization studies with which we could compare our findings. Ottenbacher, Hsu, Granger, and Fielder (1996) completed a meta-analytic study in which they found rs ranging between .84 and .92. However, they did not use the VG method. Rather, they converted reliability coefficients into Fisher z scores for comparison across studies and then converted them back to reliability coefficients for interpretation. They did not weight their observed rs with sample sizes or correct the mean rs for attenuation by variability in the test criterion reliability. Therefore, their findings were not comparable to ours.

The mean weighted rs in the present study were comparable to those found in other Validity Generalization studies in social science research such as Hackett (1989) who found mean weighted coefficients ranging between -.04 and -.17; and Pearlman, Schmidt, and Hunter (1980) who found mean weighted rs ranging between .07 and .26. Our findings were therefore consistent with Cohen’s assertion alluded to earlier that: “Many effects sought in personality, social, and clinical-psychological research (to which occupational performance may be categorized) are likely to be small…because of the attenuation in validity of the measures employed and the subtlety of the issue frequently involved” (Cohen, 1988, p. 13). Therefore, occupational therapists and scientists should not be alarmed by the small mean validity coefficients of occupational performance measurement instruments found in our analysis. Our findings indicated that the mean validity of the instruments compared well with that of other instruments used in social science research to measure phenomena that are as elusive as occupational performance.

The mean weighted validity coefficient for self report-based occupational performance scores was the highest in our analysis (rm=.23, P=.34). This finding suggested that among the occupational performance measurement assessments used in occupational therapy, those that were based on self-report of occupational performance, such as the Assessment of Occupational Functioning (AOF) and the 3-Day Physical Activity Recall Questionnaire (3dPAR) were the most valid. Given the few studies in this category in our sample (n=4), this finding was interesting. It denoted possibility of high mean validity coefficients for such assessments in future VG research when more studies become available. This finding could be particularly important in light of the current occupation therapy paradigm which emphasizes client-centeredness in therapeutic discourse. It means that use of instruments that require clients to identify their own occupational performance priorities, consistent with the client-centered focus of the professional paradigm, may be the most valid approach to occupational performance assessment.

In general, more VG research is indicated as more studies become available so that more conclusive findings may be realized. In addition, the reader should note that our findings are very limited because we could only include Pearson type validity coefficients in our analysis. If the entire range of coefficients were included, it is possible that the mean validities could be different. One of the most important findings in our study was that none of the occupational performance score validity estimate coefficients were found to be generalizable using the 75% decision rule proposed by Schmidt and Hunter (1977) and Hunter and Schmidt (2004) after correction for variability due to statistical artifacts. Self report-based occupational performance scores were the most generalizable with 29% of observed variance being attributed to sampling error and variability in test criterion measurement reliability. Interview-based occupational performance validity coefficients were the least generalizable with the two statistical artifacts accounting for only 3% of observed variance.

This lack of generalizability would suggest that therapists cannot automatically assume validity of occupational performance assessments (note that this statement applies only to Pearson validity estimate coefficients) even if such instruments have been proven valid in prior research. However, it is important to remember that only variance due to sampling error and test criterion measurement reliability variability was accounted for. The studies included in our analysis did not provide enough information to allow us do a complete meta-analysis in which all attenuating factors would have been modulated. Therefore, more validation studies were indicated in order to determine conclusively the generalizability of validity of occupational performance measurement instruments in the clinical situations where therapists need to use them.

Furthermore, it may be clinically useful to bear in mind that according to our findings, self report-based occupational performance scores had the most robust mean validity coefficient and the highest generalizability from research to clinical settings even with our limited meta-analysis results where many attenuating factors were not accounted for. This finding was encouraging because it indicated that assessment instruments that were the most consistent with the occupational therapy paradigm had the greatest promise of being found to be valid and transferable to clinical situations in future VG analysis involving more studies. However, our findings were not conclusive. In the end, until further analysis is completed to reach more definitive conclusions about occupational performance score validity generalization, therapists and scientists need to review available validity research for instruments that they want to use in practice and make their decisions based on clinical expertise and specific circumstances of practice.

It has also been suggested in literature that the percentage of observed variance attributable to statistical artifacts may not be a good way of making decisions about VG. Rather, it is “The actual amount of observed variance” that is important (McDaniel, Hirsh, Schmidt, et al 1986, p. 144). Even Hunter and Schmidt (2004) suggested that the 75% rule has been misinterpreted. It is not a means of statistically testing for chance fluctuations in sampling error. In situations such as our study where the number of studies in the meta-analysis was small, the value of observed variance could be significantly larger than may be accounted for by sampling error alone. In that sense, the relatively small variance (.193) of self report-based validity coefficients is of interest because it means that the validity of such instruments could conceivably be generalizable. However, there is no meaningful way other than the 75% rule of determining an absolute value of observed variance that would be considered small enough to allow generalizability.

Based on our findings, it may be beneficial to replicate this study and include published and un-published studies in order to determine if generalizability of occupational performance score validity coefficients is really a problem or whether our findings are due to limitation of the types of studies included in the analysis. In addition, an investigation of modifiers that increase variability of validity coefficients from situation to situation may be beneficial.  

Limitations and Recommendations
One of the limitations of this study may be the small number of studies included in the meta-analysis (only 19 studies). However, further literature review indicated that this was typical of Validity Generalization research. In his review, Hackett (1989) found that the number of studies included in a variety of meta-analyses ranged between 20 and 31, and the number of correlations between 106 and 707. Therefore, it was typical to have a small number of studies in a meta-analysis, perhaps due to stringent inclusion criteria necessary in order to ensure commonality of characteristics that make generalizability possible.

A more significant limitation was the fact that only published studies were included in our analysis. This could have tended to exclude methodologically poor studies, such as those with inadequate power due to small sample sizes, failing to take into account the broad spectrum of validation research and therefore skewing the findings (Schmidt & Hunter, 1977). That is why many VG investigators tend to seek to identify both published and unpublished studies for inclusion in their meta-analysis (Pearlman, Schmidt, & Hunter, 1980).  

The strength of this study was the fact that occupational performance measurement was stringently defined consistent with the current occupational therapy paradigm. Therefore, rejection of the situation-specificity hypothesis would have meant that a therapist only needed to analyze the clinical situation where he/she intended to use an instrument and if the circumstances of clinical practice were similar to the validation research situation, and if the assessment instrument measured occupational performance as defined in this study, he/she could use the instrument with assurance of validity without need for further validation.

It is recommended that this study be replicated and an attempt be made to include non-published studies in the replication. Also, in future studies, attempts to convert non-Pearson type validity coefficients to forms analyzable using the VG method should be made in order to make the meta-analysis more inclusive and complete. In addition, situation-specific modifiers that increase variability of validity coefficients and therefore make their generalizability difficult should be investigated. One way to do that may be to find out the common characteristics of occupational performance measurement instruments and to complete separate meta-analyses of groups of assessments based on those characteristics. That would reveal groups in which variances of validity coefficients differ significantly, allowing us to draw conclusions about attenuating factors that are the most important in occupational performance measurement. Finally, the present meta-analysis should be regularly up-dated using the methods proposed by Schmidt and Raju (2007) as new validation studies become available.

 Conclusion
In this study, a meta-analysis was completed using Hunter and Schmidt’s (2004) methods in order to determine the robustness of the mean validity coefficients of occupational performance scores and generalizability of those coefficients from research to clinical situations. Our analysis revealed that the mean weighted validity coefficients of self report-based occupational performance measurement scores were the most robust suggesting that self report-based occupational performance measurement instruments were the most valid. After correction of observed variance by subtracting variability due to sampling error and variability of the test criterion measurement reliability, too much residual variance was un-explained based on the 75% decision rule. Therefore, situation-specificity was not rejected and generalizability of the validity of occupational performance measurement scores could not be justified. It is important to bear in mind that many attenuating factors that could have explained such variance were not accounted for because of limited information provided in the studies included in our sample. Because of that limitation, definite conclusions could not be drawn. Future research should investigate situation modifiers that increase variability of validity coefficients making them un-generalizable.


<>References

*Aitken, D., & Bohannon, R. W. (2001). Functional Independence Measure versus Short Form-36: relative responsiveness and validity. International Journal of Rehabilitation, 24, 65-68.

American Occupational Therapy Association. (2002). Occupational therapy practice

framework:  Domain and process. Bethesda, MD: AOTA Press.

American Occupational Therapy Foundation. (2007). Resource center - Evidence-based

practice. Retrieved May 10, 2007, from http://www.aotf.org/html/evidence.html.

Baum, C. M., & Christiansen, C. H. (2005). Person-environment-occupation-performance: An occupation-based framework for practice. In: C. H. Christiansen & C. M. Baum (Eds.), Occupational therapy: Performance, participation, and well-being. Thorofare, NJ: Slack, 243-55.

*Baum, C. M., Connor, L. T., Morrison, T., Hahn, M., Dromerick, A. W., & Edwards, D. F. (2008). Reliability, validity, and clinical utility of the executive function performance test: A measure of executive function in a sample of people with stroke. American Journal of Occupational Therapy, 62, 446-455.

*Baum, C. M., Edwards, D. F., & Morrow-Howell, N. (1993). Identification and measurement of productive behaviors in senile dementia of the Alzheimer type. The Gerontologist, 33, 403-8.

Bennett, S., & Townsend, L. (2006). Evidence-based practice in occupational therapy:

International initiatives. World Federation of Occupational Therapists Bulletin, 53, 6-11.

Brannick, M. T., & Hall, S. M. (2001, April). Reducing bias in the Schmidt-Hunter meta

-analysis. Poster session presented at the 16th Annual Conference of the Society for Industrial and Organization Psychology, San Diego, CA.

Callender, J. C., & Osburn, H. G. (1980). Development and test of a new model for

validity generalization. Journal of Applied Psychology, 65, 543-558.

Canadian Association of Occupational Therapists. (2007). Joint position statement on

evidence-based occupational therapy 1999. Retrieved May 10, 2007, from http://www.caot.ca/default.asp?ChangeID=166&pageID=156.

*Carpenter, L., Baker, G. A., & Tydesley, B. (2001). The use of the Canadian

Occupational Performance Measure as an outcome of a pain management program. Canadian Journal of Occupational Therapy, 68(1), 16-22.

*Chan, V. W., Chung, J. C., & Packer, T. L. (2006). Validity and reliability of the activity card sort – Hong Kong version. Occupational Therapy Journal of Research: Occupation, Participation and Health, 26(4), 152-158.

*Chwastiak, L. A., & von Korff, M. (2003). Disability in depression and back pain evaluation of the World Health Organization Disability Assessment Schedule (WHO DAS II) in a primary care setting. Journal of Clinical Epidemiology, 56, 507-14.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).

Hillsdale, NJ: Erlbaum.

Coster, W. J., Gillette, N., Law, M., Lieberman, D., & Scheer, J. (2004). International conference on evidence-based occupational therapy. Retrieved May 10, 2007, from http://www.aotf.org/html/evidence.html.

DePoy, E., & Gitlin, L. N. (2005). Introduction to research: Understanding and applying multiple strategies (3rd ed.). St. Louis, MO: Elsevier Mosby.

*Dubuc, N., Haley, S. M., Ni, P., Kooyoomjian, J. T., & Jette, A. M. (2004). Function and disability in late life: Comparison of the Late-Life Function and Disability Instrument to the Short-Form-36 and the London Handicap Scale. Disability and Rehabilitation, 26(6), 362-70.

*Fricke, J., & Unsworth, C. A. (1996). Inter-rater reliability of the original and modified Barthel Index and a comparison with the Functional Independence Measure. Australian Occupational Therapy Journal, 43, 22-9.

Hackett, R. D. (1989). Work attitudes and employee absenteeism: A synthesis of the literature, Journal of Occupational Psychology, 62, 235-248.

Hunter, E., & Schmidt, F. L. (2004). Methods of meta-analysis: correcting error and bias

in research findings (2nd ed.). Thousand Oaks, CA: Sage.

Ikiugu, M. N. (2007). Psychosocial conceptual practice models in occupational therapy: Building adaptive capability. St. Louis, MO: Elsevier/Mosby

*Ikiugu, M., & Ciaravino, E. A. (2006). Assisting adolescents experiencing emotional

and behavioral difficulties (EBD) transition to adulthood. International Journal of Psychosocial Rehabilitation. 10(2), 57-78.

Ilott, L. (2004). Challenges and strategic solutions for a research emergent profession.

American Journal of Occupational Therapy, 58, 347-352.

*Karidi, M. V., Papakonstantinou, K., Stefanis, N., Zografou, M., Karamouzi, G., Skaltsi,

P., et al. (2005). Occupational abilities and performance scale: Reliability-validity assessment factor analysis. Social Psychiatry and Psychiatric Epidemiology, 40, 417-424.

Kielhofner, G. (2004). Conceptual foundations of occupational therapy (3rd ed.).

                Philadelphia: FA Davis.

Kraemer, H., Morgan, G. A., Leech, N. L., Gliner, J. A., Vaske, J. J., & Harmon, R. J. (2003). Measures of clinical significance. Journal of American Academic Child and Adolescent Psychiatry, 42(12), 1524-1529.

Law, M., Baum, C., & Dunn, W. (2005). Measuring occupational performance: Supporting best practice in occupational therapy. Thorofare, NJ: Slack.

Law, M., Polatajko, S., Baptiste, S., & Townsend, E. (2002). Core concepts of occupational therapy. In E. Townsend (Ed.), Enabling occupation: An occupational therapy perspective (pp. 29-56). Ottawa, ON: Canadian Association of Occupational Therapists.

*McColl, M. A., Paterson, M., Davies, D., Doubt, L., & Law, M. (2000). Validity and community utility of the Canadian Occupational Performance Measure. Canadian Journal of Occupational Therapy, 67(1), 94-100.

McDaniel, M. A., Hirsh, G. R., Schmidt, F. L., Raju, N., & Hunter, J. E. (1986). Interpreting the results of meta-analytic research: A comment on Schmitt, Gooding, Noe, and Kirsch (1984). Personnel Psychology, 39, 141-148.

*McNulty, M. C., & Fisher, A. G. (2001). Validity of using the Assessment of Motor and Process Skills to estimate overall home safety in persons with psychiatric conditions. American Journal of Occupational Therapy, 55, 649–55.

*Missiuna, C. (1998). Development of “All About Me”, a scale that measures children’s

perceived motor competence. Occupational Therapy Journal of Research: Occupation, Participation, and Health, 18(2), 85-108.

*Mori, A., & Sugimura, K. (2007). Characteristics of Assessment of Motor and Process Skills and Rivermead Behavioral Memory Test in elderly women with dementia and community-dwelling women. Nagoya Journal of Medical Science, 69, 45-53.

Ottenbacker, K., Hsu, Y., Granger, C., & Fiedler, R. (1996). The reliability of the functional independence measure: A quantitative review. Archives of Physical Medicine and Rehabilitation, 77, 1226-1232.

Pearlman, K., Schmidt, F. L., & Hunter, J. E. (1980). Validity generalization results for tests used to predict job proficiency and training success in clerical occupations. Journal of Applied Psychology, 65, 373-406.

Pearson, E. S., & Hartley, H. O. (Eds.). (1962). Biometrika tables for statisticians (2nd ed.). Cambridge, MA: Cambridge University Press.

Polatajko, H. J. (2006). In search of evidence: Strategies for an evidence-based practice process. Occupational Therapy Journal of Research, 26, 2-3.

*Ripat, J., Etcheverry, E., Cooper, J., & Tate, R. (2001). A comparison of the Canadian Occupational Performance Measure and the Health Assessment Questionnaire.  Canadian Journal of Occupational Therapy, 68(4), 247-53.

*Rochman, D. L., Ray, S. A., Kulich, R. J., Mehta, N. R., & Driscoll, S. (2008). Validity and utility of the Canadian Occupational Performance Measure as an outcome measure in a craniofacial pain center. Occupational Therapy Journal of Research: Occupation, Participation and Health, 28(1), 4-11.

*Rosenbaum, P., Saigal, S., Szatmari, P., & Hoult, L. (1995). Vineland Adaptive Behavior Scales as a summary of function outcome of extremely low birth weight children. Developmental Medicine and Childhood Neurology, 37, 577-586.

Schmidt, F. L., & Raju, N. S. (2007). Updating meta-analytic research findings: Bayesian approaches versus the medical model. Journal of Applied Psychology, 92, 297-308.

Schmidt, F. L., Law, K., Hunter, J. E., Rothstein, H. R., Pearlman, K., & McDaniel, M.  (1993). Refinements in validity generalization methods: Implications for the situational specificity hypothesis. Journal of Applied Psychology, 78(1), 3-12.

Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529-540.

*Stanley, R., Boshoff, K., & Dollman, J. (2007). The concurrent validity of the 3-day physical activity recall questionnaire administered to female adolescents aged 12-14 years. Australian Occupational Therapy Journal, 54, 294-302.

Trombly, C. A., & Ma, H. (2002). A synthesis of the effects of occupational therapy for persons with stroke, part I: Restoration of roles, tasks, and activities. American Journal of Occupational Therapy, 56, 250–259.

Valentine, J. C., & Cooper, H. (2003). Effect size substantive interpretation guidelines: Issues in the interpretation of effect sizes. Washington, DC: What Works Clearinghouse.

*Watts, J. H., Kielhofner, G., Bauer, D., Gregory, M., & Valentine, D. (1986). The Assessment of Occupational Functioning: A screening tool for use in long-term care. American Journal of Occupational Therapy, 40, 231-240.

*Studies constituting the sample that we analyzed.

 





Copyright © 2009 Southern Development Group, SA. All Rights Reserved.  
A Private Non-Profit Agency for the good of all, 
published in the UK & Honduras