The International Journal of Psychosocial Rehabilitation
Methodological Approaches in Mental Health
Services Research and Program Evaluation

Arthur J. Anderson, Ph.D.

Consulting Clinical Psychologist - Honduras
Adjunct Faculty - Antioch University
Anderson, A. J. (1999). Methodological Approaches in Mental Health Services Research
and Program Evaluation. International Journal of Psychosocial Rehabilitation. 4, 73-92

Abstract: This paper reviews the key concepts in mental health program assessment, efficacy studies and assessment methodologies. It reviews historical developments in program evaluation methodologies, recent studies, the Journal of Consulting and Clinical Psychology's special section on methodological developments, NIMH's National Plan to Improve Research in Mental Health Services, and NIMH guidelines for future mental health service research. A comparison of the major methodological approaches and detailed discussion of fourth generation evaluation research methodology is presented. Evaluation research methodology is found to be the most effective approach in the study of overall program efficacy.

Methodological Approaches in Mental Health
Services Research and Program Evaluation
Introduction: Mental health programs, along with other health and social welfare services, are coming under increased scrutiny and evaluation. With consistently smaller budget allocations for mental health, many federally funded programs are being radically changed, some severely curtailed, and the relationship between government and private sector providers is being realigned (Inouye, 1983; Klerman, 1974). In addition, state and local government agencies that were expected to reimburse mental health programs for federal shortfalls in funding, have not been universally successful in meeting this challenge (NIMH, 1991). Thus, new research into the efficacy of mental health services has been called for to meet these growing challenges. However, because traditional research methods have not been readily applicable to the study of applied health programs, a search for more appropriate methodologic approach is now underway (Newman, Howard, Windle, & Hohmann, 1994).

These developments are especially significant for mental health services, which have been more regulated and financially supported by government than any of the other service within the health sector in the past two hundred years (Rothman, 1972). Throughout this period there have been numerous cycles of mental health reform and innovation, followed by phases of criticism, dissension and retrenchment (Bockman, 1963; Caplan & Caplan, 1969; Deutsch, 1948, Grob, 1973). While the "reforms in mental health have coincided with periods of progressive social change in the larger American Society, phases of reactions, criticism and retrenchments have occurred with the aftermath of war and economic decline" (Klerman, 1974, p.783).The current economic slump of the nineties following the prosperity and massive federal spending of the eighties, continues the cyclic pattern of change in mental health services.

After a decade and a half of growth in mental health and substance abuse services, a number of criticisms have been leveled at the mental health sector. Chief among these criticisms is that of program effectiveness (NIMH, 1989). To date, there are very few applied or experimental research studies that address program efficacy (PsycScan, 1995). Without knowing what programs and/or treatment models effectively work for a variety of patient populations in a variety of settings, legislators and grant funding sources have no way of planning where and how their limited resources should be spent.

Inadequate research strategies and methodologies have been cited as the primary reason why such program efficacy research has not been studied (Newman et al., 1994). Mental health services research cuts across the disciplines of economics, sociology, epidemiology, political science and psychology. One of the prime purposes of mental health service research is to provide empirical evidence and support to guide policy decisions at all levels of government and non governmental organizations (NGOs). However, until recently, very few efficacy studies of mental health program or their associated models have been reported in the professional psychological literature (PsycScan, 1994).

Newman (1994) writes that until recently, clinical psychologists have tended to ignore and not pursue mental health services research. This has been a direct result of the limitations in methodological training that psychologists receive and a bias against such research in the publication standards of the professional and academic literature. According to Newman, studies that assess success rates and program effectiveness in mental health service programs have not been generally deemed worthy of publication. In addition, most clinical psychologists and clinical researchers are trained in experimental and quasi experimental techniques that make it very difficult to adequately evaluate the global, multifaceted, molar effects found in applied treatment programs (Guba & Lincoln, 1989).

The methodological deficits and bias toward the scientific method in professional publications have made it difficult for psychologists to develop and utilize research methodologies to fully assess the efficacy of mental health service programs and program models in both the public sector and NGOs (Clarke, 1995; Newman & Howard, 1991).

Researchers who investigate programs and clinical factors related to improving the quality and impact of mental health services are often handicapped by the perceived legitimacy of their applied research and the methodologic approaches they utilize (Newman et al., 1994). Historically, traditional research methods and journal/grant review criteria have not taken into account the global questions and systemic points of view necessary to fully understand the therapeutic delivery systems under evaluation. Thus, there has not been a coordinated, sustained effort to determine program efficacy for the majority of human service project initiatives (NIMH, 1989).

Many program and project research evaluations attempt to present the factors and/or 'facts' uncovered in their program evaluations, utilizing traditional experimental, quasi experimental, causal comparative, correlational and other approaches. Such evaluations attempt to identify the most salient factors for good program performance on a molecular level. In the process, many of the characteristics and practices that define a successful program may be ignored or dismissed as inappropriate or unimportant to the objectives of the research. For example, in an investigation of a psychotherapy program, investigators may choose to only examine the mean or median number of therapeutic hours received in an voluntary outpatient program. Though this may or may not relate to overall patient satisfaction, motivation and progress, such process measures do not determine the overall performance or level of program effectiveness. Thus, almost no programmatic conclusions can be made on the basis of this information.

Despite significant clinical and basic research progress made in the diagnosis and treatment of mental disorders over the past two decades, many questions about how to provide high quality, effective treatment services still have not been answered. For people with severe, persistent, disabling mental disorders, this situation means that individual diagnoses may be inaccurate, treatment plans may be inadequate or ineffective, and essential services may not be available (Lalley et al., 1992; NIMH, 1991). As a consequence, such individuals are often forced to not only endure a lonely struggle against the despair anddistress caused by their mental illness, but must also negotiate a confusing, fragmented maze of human services, created by a wide range of often well meaning public and private sector service providers.

Instead of concentrating on determining the individual facts and salient factors associated with successful treatment outcomes, human service researchers should be more concerned with one global question that allows for a more holistic examination of program worth: "What works, for whom, under what circumstances?" (NIMH, 1991, p.vii). The net effect of any treatment program or human service project is determined by the integrated use of multiple, interactive program components, delivered at a site conducive to recovery and/or rehabilitation to a population that will be receptive to such treatment (Breakey, 1987; Minkoff, 1991). Since successful treatment outcomes depend on the global interaction of all these factors, research methodologies used to investigate such programs must also mirror this global, molar intervention to accurately determine whether successful outcomes have indeed occurred.

This paper provides a systematic review of human service research methodologies and prevalent types of investigations. Its focus is on the central methodological issues in determining mental health program and program model efficacy. Though the research and methodologies noted throughout this paper do not cover the entire scope of mental health service research, they are presented as a cross section of the most prevalent types of program evaluation and exemplify the scope of methodological and conceptual issues in this area.

Critical Methodological Factors in Assessing Program Efficacy

Over the years there have been a number of review articles that have called attention to the need for an increased research effort into efficacy studies of mental health services (Inouye, 1983; Klerman, 1974; Newman et al., 1994). However, until recently, these articles only addressed the need to increase the research effort without recognizing the methodological developments that would be necessary to adequately assess program strengths.

Klerman (1974) gives the first comprehensive account of the state of mental health service research. In his descriptive article he not only identifies the major stakeholders that should be included in evaluation research, but outlines the major concerns voiced by each constituency. He notes that while the public at large, the courts, mental health professionals, and government agencies all have an active stake in the results of such research, each has a different focus and agenda for the outcomes of evaluation research studies, and requires varied types of data with which to formulate their concerns as to how, where and in what manner, mental health programs should operate. The identification and recognition of the needs of all major stakeholders in any mental health service program is a critical step that is often overlooked in most evaluation studies (Guba and Lincoln, 1989).

The public at large has an active stake in new mental health services evaluations (Klerman, 1974). In some cases an adversarial climate has developed among mental health program critics in the general public and mental health professionals and administrators. Because many critics feel that mental health has expanded too much into areas that had previously been regarded as social deviance or legal misdemeanor, such as substance abuse and treatment for the homeless, the public at large mirrors professional uncertainty about treatment adequacy, clinical training for paraprofessionals, and about what is or is not effective treatment strategies for various patient populations. This reflects a lessening of public trust and confidence in mental health services that parallels the erosion of funding and governmental support (Klerman, 1972). In addition, community groups are now seeking a more active voice in the operation of mental health service programs within their neighborhoods and catchment areas. In general, these groups want to ensure that treatment programs maintain standards which will protect and enhance their community, and not place the public or patients at risk (NIMH, 1991).

Federal and State courts have been become increasingly involved in mental health service programs over the past 20 years. Prison based substance abuse and mental health programs have markedly increased over the past decade and a half (NIMH, 1991). However, most of these programs have evolved due to court mandated levels of care and have mainly documented their measures of criminal recidivism as the sole measure of program efficacy (Robitscher, 1972). Since the courts have mandated this treatment and view success in treatment as key factor in rehabilitating both the involuntary hospital patient and the mentally ill legal offender, they have become more interested in mental health services evaluation as well. Thus, the court systems are active stakeholders in any efficacy evaluation of mental health services and seek information as to the type and level of services that will be delivered to disordered, disabled and incarcerated individuals.

The court system has also been a major contributor to the development of program models and standards of practice for both hospital and community based mental health service programs. Due to general concern for patients' civil rights and for the possible infringement of their personal liberties in cases of involuntary hospitalizations, a number of landmark court decisions mandated not only effective treatment but treatment at the least restrictive level (Klerman, 1972; Robitisher 1972). Moreover, there has been increased concern about the depersonalization and institutional dependence fostered by large public hospitals. Within the mental health professions, there is a general awareness that large hospital based programs become professionally and therapeutically bankrupt and ineffective. This sentiment fostered the creation of community mental health centers, which are felt to provide alternatives to the low levels of institutional care previously provided to the poor and disabled in large public hospitals (Weston, 1972).

Such mental health program interest by the court and legal systems has been the most significant reason for reforms in mental health services including the community mental health center programs (Klerman, 1972). By concentrating on the difficulties and dissatisfaction encountered with the large public mental hospitals, particularly the county, state, and Veterans Administration hospitals, the courts have mandated improvements in treatment and service programming that have led to significant reforms in mental health services. These reforms have resulted in a decrease in program size, emphasis on community based treatment, and increased intensity of treatment in both community and hospital based programs. They have also been shown to increase the probability of more rapid discharges and reduced recidivism (Farkas & Anthony, 1991; Lamb, 1972; Ullman, 1964).

"At all levels of government federal, state and local evaluation efforts are frequently initiated by fiscal and budgetary agencies "(Klerman, 1972, p.784). Increasingly, political, fiscal and administrative decisions regarding mental health programs and their associated treatment models are being made on the basis of fiscal goals to deliver the most effective programs possible for the least amount of funding. In addition, state and local agencies charged with monitoring and promoting mental health service research and delivery have been increasing their efforts to determine what constitutes effective mental health programming for a variety of patient populations. Initiatives to reform mental health care have precipitated legislation to develop and expand state and local commissions to further investigate program utility and effectiveness (Scott & Ginsburg, 1994; Frank et al., 1994).

At the federal level, service research has become a high priority for agencies responsible for mental health funding and monitoring. In fact, research funding at this level has been increased from $90 million in 1992 to a projected $369 million budgeted for Fiscal Year 1997 (NIMH, 1991). With the reorganization of ADAMHA (Alcohol Drug Abuse and Mental Health Agency) in 1992, the National Institutes of Mental Health (NIMH), National Institute on Drug Abuse (NIDA), and the National Institute on Alcohol Abuse and Alcoholism (NIAAA) must devote no less than 15% of their total budget for health care services research, all coordinated by NIMH [ADAMHA Reorganization Act, 464R (f)(2)]. This increase in funding clearly demonstrates an increased level of commitment by the federal government to improving service research and overall program effectiveness for the mental health program consumer.

Mental health consumers and their families have a very active stake in the outcome of program research, but until recently, have had little influence in the methods or outcomes of research process. Families of the severely disturbed and the mentally ill themselves have to shoulder tremendous financial and emotional burdens. Each year, 65% of discharged psychiatric patients (Approximately 1.5 million), return home and live with their families (NIMH, 1991).

Due to the high cost of hospitalization, many of these patients return home earlier than they would have in the past, still disabled by psychiatric symptoms (NIMH 1992). State and local governments are also beginning to recognize the stake that patients and their families have in the mental health treatment by enacting legislation to give the consumer and his/her family a voice in the therapeutic process (NYSOHM, 1990).

The National Institute of Mental Health (1991) credits the National Alliance for the Mentally Ill (NAMI) with a successful lobbying effort for the inclusion of the mentally ill and their families into the evaluative research process. NAMI together with Family Alliance for the Mentally Ill (FAMI) were developed as grass roots organizations to serve as advocacy groups on behalf of mentally ill persons and their families. These organizations bring the problems and issues of the mentally ill and their families to the attention of local, state and federal governments. In evaluative research studies, NAMI has championed the cause of "research designed to identify ways to help patients readjust to the community within the least restrictive environment possible and to prevent relapse through early intervention" (NIMH, 1991, p.13). In "Clinical Services Research: Enhancing the Real World Applications of Clinical Science" (1991a), NIMH has outlined the critical points that NAMI and FAMI advocate for service evaluation research studies.

Researchers must carefully assess the ramifications of family involvement in the care of a mentally ill member including the characteristics and conditions of caregiving families; the degree and varieties of family stress; the effectiveness of various coping and adaptation patterns; and timing and extent of caregiver burnout; and the impact of various kinds of respite care for family members.

Studies should be performed that are focused on family issues and produce results to assist families with mentally ill members in functioning more effectively and with less turmoil. Such studies would provide effective education in technique for dealing with mentally ill persons without succumbing to the overwhelming anxiety; motivating the patient to become more self sufficient; and understanding and communicating appropriate expectations. An essential need is for research on the long term effectiveness of such family education programs both in helping the patient and in reducing the family's burden.

For generations, families with mentally ill relatives have dealt with the problems of violence toward family members, exacerbated now as a result of a substance abuse by the seriously mentally ill. Investigators must focus on the predictable, frightening and violent behavior that patients may exhibit toward family members, with the goal of developing more efficient criteria for predicting such behavior and more objective ways to manage and prevent it. Such studies must place a high priority on meeting the needs of families for early education, prevention, and intervention. In this connection, attention should be given to identifying techniques of family adaptation and have proved effective.

Dealing with mental illness is expensive. Families become frustrated and angry as their savings dwindle, often with meager results. A guide on how to obtain the most effective services even with limited personal resources would be a welcome aid. Such a guide based on evaluative research, not an opinion could help families make informed decisions about more selective use of mental health services." (NIMH, 1991a, pp.12 13)

Together, these points highlight the very active stake that the mentally ill and their families have in evaluative research. Without taking these points into consideration, research investigations to determine therapeutic efficacy cannot be complete.

Mental health professional groups also have an active stake in the success of programs and on the outcomes of evaluation studies of program services (Isaac, 1971; Weiss, 1972). Concern for effective program planning and therapeutic results can be seen in the various calls for evaluation research by a number of professional groups. The American Psychiatric Association, The National Association for Mental Health, and American Psychological Association (APA) have, at various time, all called for increased research efforts in program effectiveness (Newman 1994, 1991; Lalley, 1992; Klerman, 1971). Though these calls for more research demonstrate an interest and active stake on the part of professional groups, little if any comprehensive evaluation research has been performed to date.

In response to the NIMH (1991) call for increased service research, the APA published a special section on Mental Health Services Research in the Journal of Consulting and Clinical Psychology (Newman, Howard, Windle, & Hohmann, 1994). In this special section, the authors not only reviewed the U.S. "National Plan of Research to Improve Services", but presented a series of research articles that exemplified methodological developments in this area of psychological research. These studies were presented to demonstrate innovative methodological approaches in assessing program and service efficacy. However, each of the studies appear to be focused on a different aspect of overall program efficacy, and are consequently limited in their ability to demonstrate overall program effectiveness or meet the research goals set forth in the NIMH national agenda. As discussed below, each of these studies failed to demonstrate program or service efficacy due to the positivist reductionist approach utilized by the researchers.

McGrew, Bond, Dietzen and Salyers (1994) address a key issue in services research: How to measure the fidelity of a mental health service intervention. They assembled a panel of 20 experts to first develop and then rate 18 mental health programs that use the assertive community treatment model (Witheridge, 1991). This model stresses active psychsocial rehabilitation to improve overall functioning and promote independent community living for severely and persistently disturbed psychiatric patients. The experts assessed the program's staffing patterns, organization and service domains on the index and then compared the respective programs on the outcome measure of reduction in days of hospitalization.

Newman et al. (1994) claimed that this study not only identified key ingredients of the service model, but was also able to identify how each program might improve their provision of service. However, given the data provided by this study using a causal comparative methodological approach, it is not possible to make such claims of program efficacy. The only stakeholders viewpoint in this investigation was that of the 20 mental health professionals who comprised the panel of raters. The ratings of patients, families, funding agencies, associated courts, and other interested groups may have been very different from those of the raters. In addition, other program variables such as environment, patient population pool and catchment differences, as well as patient characteristics and discharge differences may have impacted and skewed the results of this study. Thus, no efficacy claims can be made from the data that was presented. The methodology of this study reduced the rich, total number of variables down to that which was believed to be related to program success. Thus, it tried to determine a cause and effect relationship among a few predetermined variables without justification. Though this study followed accepted standards for research and publication, it failed to describe the interrelated, molar effects of all program variables because it reduced the scope of the study to a molecular level of program assessment.

Studies by Vessey, Howard, Lueger, Kachele and Mergenthaler (1994) and Yeaton (1994) addressed the quantity of psychotherapeutic interventions in this special section on methodological approaches. These studies assessed the dose response rates of therapeutic intervention in individual psychotherapy and self help groups. Vessey et al. (1994) attempted to demonstrate a relationship between the amount of psychotherapeutic time and effective treatment outcomes and innovatively adopted a causal comparative study design. On the other hand, Yeaton's (1994) investigation used a correlational approach to compare the amount of actual attendance in self help groups to positive therapeutic outcomes. Though they utilized two different methodological approaches, both studies concluded that time and 'dosage' of therapeutic intervention had an impact on psychotherapy outcomes.

However, this conclusion cannot be validated solely on the basis of the data collected and reported on in the study. Additional programmatic, demographic, and clinical indicators could be responsible for the positive correlations and other statistical evidence that was reported in the results of these studies. Both study approaches contained the same conceptual deficit as the McGrew et al. (1994) study; they reduce the complex interaction of many service related variables to a simple set of predetermined factors. Thus, even the positive outcomes with the greatest effect sizes become suspect because we cannot determine whether these effects are solely due to these variables or if some other variables, outside the scope of the study are interacting and producing the observed phenomena. This problem is not restricted to these investigations but has been cited as potential weaknesses in both causal comparative and correlational study designs (Borg & Gall, 1989; Wood, 1974).

Uehara, Smukler and Newman (1994) attempted to resolve this problem of constricted variables in their study of allocation of service resources to various patient populations. They provide a field test of a procedure to match the social, psychological, and physical functional needs of patients to specific types and amounts of treatment and rehabilitation services. This study made extensive use of correlational data and advanced statistics to match patients to the appropriate type and level of service. Though this expands the list of variables under investigation to virtually all program and patient variables that may impact treatment outcomes, it still only takes the point of view of the clinician into account. Patient satisfaction, family involvement, cost effectiveness and overall program efficacy from the point of view of the public at large and the funding source still cannot be accounted for within a correlational study of this type. Thus, this methodology also falls short as a vehicle for determining overall program efficacy.

Though the methodological approaches used in each of these studies are generally accepted as innovative scientific investigations suitable for publication in the psychological literature, each study has difficulty accounting for the full range of interaction between not only the program variables but from the point of view of each of the main stake holders in mental health service programs. Thus, these studies have difficulty providing a comprehensive account of program effectiveness. Though these methodological approaches may be instrumental and valuable as part of an overall evaluation effort, they cannot be used as the sole basis for determining success in mental health service programs. A comparison contrast of the main methodologic approaches, presented in the following section demonstrates the advantages of departing from the reductionism inherent in traditional approaches, in favor of a more comprehensive evaluation methodology.


Correlational, causal comparative, and evaluative research methodologies are the most common research approaches used to evaluate naturally occurring service programs (Borg & Gall, 1989). Though they have much in common, they differ in their utility, comprehensiveness and ability to establish cause and effect relationships among study variables with a strong degree of certainty. As a consequence, they also differ in their ability to strongly predict future effects and causal patterns that can be attributed to the study variables. This difference is primarily due to the limitations of the methodologies to attribute the full range of possible causes to effects observed in natural or artificial/experimental settings. Though each method has situational and experimental advantages over the others in program research, each varies in its situational utility as well.

Correlational Designs

While the correlational method is well suited to establishing relationships between the variables, it cannot demonstrate cause and effect relationships by itself. The correlational method is restricted to quantifiable data in the data set and therefore limited in its utility. Though readily applicable to quasi experimental study situations, its often difficult to apply in natural settings where identification and measurement of the most important variables often becomes difficult. This problem is illustrated by the results obtain in the Yeaton (1994) study which investigated the relationship of patient attendance in self help group meetings to successful completion of an alcohol treatment program.

This study examined the relationship between rates of patient attendance and successful completion of programs. The relative rates of attendance in the service milieu of self help groups are compared to the rates of successful treatment outcomes as a measure of programmatic effectiveness in treating substance abuse. However, collateral treatments for substance abuse and/or other deficits were not discussed. In addition, the actual service components of the self help program were not discussed. As a consequence, any relationship between the study variables of attendance and outcome becomes inconclusive. The actual effects that were noted may be due to variables outside the scope of this investigation that were related to the study variables. Thus, from the actual cause and effect relationship demonstrated by this single correlational indicator, programmatic effectiveness cannot be inferred.

Correlational method research studies are best suited to discovering relationships solely among study variables. As illustrated by the Yeaton investigation, it is very difficult to apply this approach to mental health service studies. Correlating the number of variables that define program effectiveness, both within and between service programs, becomes almost impossible to measure with this approach. When compounded by the various interests and focus of each stakeholder associated with a treatment program, the correlational method becomes almost useless in defining what works, for whom, under what circumstances.

Causal Comparative Designs

The causal comparative method is described as well suited to demonstrating significant relationships, group norms and traits in natural settings (Borg & Gall, 1989). This method can be also be used in study situations where experimental manipulation is difficult or impossible, such as in mental health service studies. However, the causal comparative method can only demonstrate causality from the data presented within the narrow scope of the study variables and is therefore, also limited in its ability to suggest causality in either experimental or natural settings. Alternative interpretations are often possible when this method is employed (Borg & Gall, 1989; Wood 1974). Thus, this method is similarly limited in it utility and comprehensiveness to demonstrate a treatment program's level of efficacy.

This problem was demonstrated in study described by Uehara et al. (1994). This investigation attempted to answer the question of "Who needs what services and what degree of care?". It was actually a field test of a procedure to identify and link the psychological, social and physical functioning needs of individuals with severe and persistent mental illness to specific levels and types of treatment programs and rehabilitative services. Using the LONCA scale to assess patients level of functioning, the researchers attempted to match the level of patient need to the level of care in a clustering method. The LONCA scale is an instrument designed to measure patients functional level across a wide range of psychosocial dimensions. It was speculated that the use of such a scale to place patients in programs that would specifically meet their needs would improve the treatment outcomes for this population. The degree of dysfunction would determine the level of appropriate care that should be provided.

Though 65 case managers carefully rated patient's level of need and the resulting data set was factor analyzed to cluster patients into groups that might benefit from different levels and types of care in community based settings, no actual outcome data was provided to substantiate the underlying assumption that there is a causal connection between level of dysfunction and the various outcomes that patients experience in a variety of treatment programs. In addition, critical variables in patient recovery and program operations were not taken into account. The effects of patient's level of motivation for treatment, demographic profiles, diagnostic groupings, level of patient satisfaction, program modeling, program milieu/environment, and other factors may also play an important role in determining whether a patient will respond to treatment within a given treatment program or not. Using their method, these variables could not be taken into account. Thus, the results of this study remain inconclusive.

The causal comparative method, like the correlational method and relational methods in general is limited in its ability to establish cause and effect relationships between study variables. As noted in the Yeaton and Uehara et al. studies, both methods can be criticized because they attempt to break down complex behavior into very simple components. Understanding the causal variable or set of variables that are related to the complex activities or traits of a mental health service program is beyond the scope of the research study when these methodologies are utilized. Generally, the phenomena and behaviors associated with a operational service program are so poorly understood that incomplete sets of study variables are chosen for study and analysis. This appears to have been the case with all the studies profiled in the Journal of Consulting and Clinical Psychology special section on mental health service program efficacy studies. In addition, analysis of multiple variables from one setting will not expose the complex interaction of variables across multiple settings or subject groups in other programs. These and other problems of measurement and analysis contaminate or weight inferences and result in erroneous or misleading conclusions. This limits the comprehensiveness and utility both methods in studies of natural events and phenomena, such as those that occur in mental health service programs.

Evaluative Designs

Of the principle research methodologies used to determine the effectiveness of mental health service programs, the evaluative method is the most suited to demonstrating descriptive relationships, not analytic ones. Thus, it is the most applicable to the study of mental health program efficacy.

Evaluation studies usually point out cause and effect relationships in natural settings. Without the constraints of the experimental study controls or sole use of quantifiable data sets, evaluative studies can identify the most salient relationships among all quantitative and qualitative variables in service programs. Because of this, the evaluative method is more comprehensive and has a higher degree of utility in natural settings than the other two methods. The evaluative method is an applied research method that focuses on determining the merits of educational, job training, health care and other institutional programs in health, education and welfare. This approach differs from correlation and the causal comparative methods in that it not only looks at the relationship of a few, obvious variables to determine a cause, but examines all observed variables that may impact the goals of the program under study.

With evaluative methodology, the causes of the positive or negative program outcomes become the main focus of study. Using program goals and actual individual and group performance measures in meeting those goals, evaluation researchers attempt to locate factors related to the actual program outcomes. Traditionally, an evaluator will work directly with program leadership, staff, and consumers to determine the most salient factors that define program performance with regard to the goals of a program. In ideal evaluation study situations, mental health service program staff, directors, funding sources and all other groups that have a stake in the investigation are invited into the variable identification phase to identify the critical program variables to be used in the study, methods of data collection and subsequent data analysis techniques that will be used to determine program efficacy levels. Once determined and agreed upon by all the stakeholders, these factors and study procedures produce results that can be returned to the stakeholders of the program to implement program modifications and improvements. Thus, the results of an evaluation research study can be used to modify program operations to increase performance toward meeting those program goals more effectively.

Unlike other research methodologies, evaluation research is usually initiated by someone's need for a decision to be made about policy, program management, or strategic planning. By contrast, experimental methodologies initiate studies based on a hypothesis; the research is conducted to reach a conclusion about the relationship between the variables and whether to reject or accept the hypothesis. In evaluation research, where the focus is on making practical decisions that will impact the effectiveness of a program, the emphasis is on testing variables against program goals in a decision making process, rather than hypothesis testing.

This decision process examines the impact of a mental health treatment program's components and modes of service delivery in meeting the stated treatment and outcome goals of the program, then uses the evaluation data to redefine and modify the service program to more adequately meet the needs of the patients. In addition, the goals and objectives of the program and treatment components are reexamined to improve the relative worth of the program for all the stakeholders associated with it. These stakeholders include not only program leadership and staff, but consumers, community participants, funding sources, and other constituent groups that have a vested interest in the successful outcomes of the program as well.

DonGiovanni (1988) performed an evaluation research study of a program for mentally ill chemical abusers. As in the Yeaton study, patient participation rates in group therapy and other program components were compared with patient outcomes as measures of mental health program effectiveness. However, this study also included measures to determine the overall level of patient satisfaction in the program, attempted to measure recidivism, polled referring hospital staff as to their opinion of the program, and also surveyed community based mental health providers to obtain data on perceived program effectiveness.

These measures incorporated the opinions and views of all the major stakeholders associated with this program. Thus, the results of this study demonstrated not only the overall level of effectiveness and relative worth of the program, but punctuated the need for additional program modifications to coordinate all the services of the program in a more cohesive, comprehensive fashion.

The DonGiovanni study illustrates the distinctive characteristic of evaluation research studies. This type of study examines the relative worth and merit of a program or program components. Thus, judgements of programmatic merit and worth that are not emphasized in other research methods are not only appropriate, but necessary in the evaluation of a program's effectiveness in its natural setting. Causal factors or variables that impact program effectiveness are also judged as to their worth, merit and value in meeting program goals.

Evaluation research draws heavily from other methodologies. Qualitative as well as quantitative data collection and analytical techniques are often used. In the DonGiovanni evaluation, correlational data was used along with the qualitative data from surveys to determine the program's relative worth. Because of this, cause and effect determinations arise from a richer, more comprehensive data set than with sole use of quantitative data and advanced statistics. Thus, the use of evaluation research methods in this mental health service program allowed for more comprehensive determinations of what works, for whom, under what conditions.

The only major limitation of the evaluation research method in mental health service studies lies in their generalizability. Due to the applied nature of this method, programmatic and situational variables tend to be specific to the program under study. It is therefore often difficult to generalize the results to even similar program types. Since each program study is situated in different physical environments, with different staff, and other characteristics, the evaluation study becomes customized to that program's variables. Thus, generalizing the conclusions of one program evaluation to other programs may be difficult because many of the salient variables change from one program to another.

However, many of the 'lessons learned' from one program can be applied and tested in other, similar treatment settings and may serve as models to enhance program or program component effectiveness in those programs as well. When performed on a program by program, case by case basis, such evaluation research data may serve as valuable tools for program modification and improvements.

Finally, the evaluative method is not constrained to hypothesis testing, but seeks to functionally establish the most salient variables operating in the natural settings of mental health service programs. Evaluation studies attempt to determine the impact of complex variable interaction on the goals of the program. The primary advantage of using this method is to provide data to policy and decision makers that can be used to improve program performance to more successfully meet program goals. Thus, it is both comprehensive and readily applicable to studies in naturally occurring mental health program settings that require data for not only research purposes but for improvements in program or program component performance as well.

Fourth Generation Evaluation Research
and The National Plan of Research to Improve Services


Guba and Lincoln (1989) have traced the development and expansion of evaluation research and have refined the methodological approach to not only reflect state of the art enhancements in health and mental health program assessment, but provide a potent vehicle for program improvement as well. They note that the first generation of evaluation concentrated on the systematic collection of data and measurement of phenomena, while the second generation dealt with description of patterns of strengths and weaknesses with respect to certain stated objectives. The third generation of evaluation research focused on judgements of relative worth between programs and program components.

Though the various generations of evaluation built on the gains of preceding phases, each successive generation providing a foundation for more detailed and sophisticated assessments of programs and organizations, there remained significant limitations in evaluation methodologies. The main problems with the first three generations were a tendency toward managerialism, failure to accommodate value pluralism, and an overcommittment to an experimental paradigm of inquiry. A tendency toward managerialism means that the evaluator and the clinician/manager/administrator responsible for the program under study become either too close to remain objective and impartial or become adversarial. This may contaminate or shade the results of the study. This also occurs in traditional correlational and causal comparative studies. Failure to accommodate value pluralism refers to the inability of the investigator to incorporate the values and viewpoints of all those who have an active stake in the outcome of the study.

Finally, most first, second, and third generation evaluation research studies, as with other social scientific methodologies, tend to make sole use of the scientific method to determine the 'truth' or 'truths' underlying a phenomenon instead of focusing on the overall worth of the programs and services to the patients and the communities they serve. The recognition of these deficits in evaluation methodologies led to the development of has been referred to as "Fourth Generation Evaluation Research" (Guba & Lincoln, 1989).

The fourth generation of evaluation research is responsive evaluation. It has been termed responsive because it seeks out different stakeholder views in determining the variables and instruments that will used in the investigation and then responds to the needs of all those who have an active stake in the evaluation process and results.

Fourth generation, responsive evaluations have four main phases that may be reiterated or overlap. In the first phase stakeholders are identified and solicited for those claims and issues they want to bring into the research study. Guba and Lincoln have identified three main classes of stakeholders who would have an active interest in a program investigation and its outcomes: "The agents, those persons involved in producing, using or implementing the (study results); the beneficiaries, those persons who profit in some way from the use of the (study) outcomes; and the victims; those persons who are negatively affected by the (study)"(1989, p.40). In the second phase all stakeholders are introduced to the others to begin the negotiating process through comments, agreements and/or disputes to determine what issues and topics will be assessed by what instrumentation. The third phase involves further information collection as non resolved disputes are investigated and further negotiated. Finally in the fourth phase, negotiation among stakeholding groups, under the guidance of the evaluator, takes place to reach a consensus and the information is collected, analyzed and disseminated to all the stakeholders for comment and publication.

Using the process oriented, fourth generation summative worth evaluation methodology would improve the current state of mental health services research and fulfill many of the goals set forth in the U.S. national plan of research to improve services. Instead of researchers and program directors choosing critical program variables and using current correlational, causal comparative or quasi experimental methods to establish a 'scientific truth', program staff, patients, funding sources, governmental agencies, and other interested stakeholders could collaboratively agree on the critical study variables and study methodologies that would be used to determine the relative value of the mental health service. The results could then be used not only to determine what works in one program or another for given patient populations, but could also be used as a tool to improve services in the study programs as well. As each new set of data within a program is analyzed, remedial action plans could be collaboratively agreed and a new evaluation data obtained and analyzed to ensure a process of continuous quality improvement. Thus, in using this responsive evaluation methodology, not only would overall levels of mental health program effectiveness be obtained, but a mechanism could be established to allow for continuous quality improvement in the program over time.

The DonGiovanni (1988) evaluation study identified all major stakeholders and included their participation in the study and in the program modification phase after the results were analyzed. Though these stakeholder did not have the degree of participation mandated by responsive evaluation methodologies at the onset of the investigation, they participated in the results and program modification phases after the initial level of effectiveness were obtained. In addition, it was noted that ongoing program evaluation involving all the major stakeholders would continue into subsequent evaluations of that program. This goes far beyond the simple collection of data that traditional evaluation and experimentally based studies reported on, to include a program monitoring and improvement mechanism for future program improvements. Thus, in this and other responsive evaluation studies, the participation of all associated vested interest groups becomes not only a research tool but a programmatic problem solving mechanism as well.

Seligman (1995) refined his responsive evaluation approach to include all stakeholders in every step of the evaluation process. In addition, he demonstrated how this approach can be applied not only at the local program level but at a macro, national evaluation level as well. In this impact evaluation of rural health and mental health services in Panama, almost all the criteria for evaluation were determined by the stakeholders at the local level. Though selected criteria were included by national and regional health planners, the bulk of the program impact indicators were determined, measured, analyzed and reported by local stakeholders as a group. This data was then aggregated at the regional level to assist in future health and mental health planning at the regional and national levels.

This study demonstrates the advantages in using a responsive evaluation approach to evaluate and monitor program effectiveness. Continuous quality improvement mechanisms are incorporated into the basic study design to not only provide data on program effectiveness but a mechanism for future programmatic changes and innovations to improve the quality of patient care. After the initial data collection phase of the study, baseline data was available for decision making at the local level. All stakeholders then participated in developing action plans to improve the quality of care at the program level and time tables were established to measure the impact of those action plans. Though the results of this second data collection phase have not yet been reported, it appears likely that quality of care will have improved due to the implementation of the action plans, primarily due to the continuous quality improvement mechanisms that were built into the study design.

This research design improved on that of DonGiovianni by incorporating greater stakeholder participation in the decision making process at the local level. This produced greater interest in programmatic problem solving activities at the local level and provided a mechanism for long term, continuous quality improvement through action plans based on program and patient data indicators. In addition, the centralized data reporting approach provided regional and national health care planners with a rich source of quality of care and outcome data with which to plan future programs and allocate resources. NIMH (1991) has identified key areas and sub areas that could benefit from such mental health services evaluation research:

Table 1.

Target Program areas for Evaluation Studies

Characteristics of Mentally Ill People
Epidemiology and Service Settings
Impairments in Physical and Psychosocial Functioning
Family Matters: Problems and Resources.
Assessment Research
Diagnosis and Measuring Severity and Disability
Assessing Physical Health
Measuring the Quality of Life
Understanding the Family's Burden
Determining Rehabilitation Status.
Independent Living Skills
Extended Clinical Research
Types of Treatment
Treatment Settings
Integration, Continuity, and Quality of Care
Special Populations, Special Treatment Issues.
The Road Back
Social Skills Training
Vocational Rehabilitation
Independent Living
Minority and Cross Cultural Issues.
Consumer and Family Perspectives.
Habilitation Services.
Outcome Research: The Effects of Caring In:
The clinical domain.
The rehabilitative domain.
The community domain.
The overall public welfare domain.



In each of the NIMH target research areas, responsive evaluation methodologies that incorporate a variety of methodologies to determine efficacy and program value to all stakeholders would provide more comprehensive results than more traditional methodologies, tied to the scientific method. In addition, use of this research approach would give a voice to all the stakeholders who have an interest in treatment efficacy and successful program outcomes in any given mental health program, without the disenfranchisement that often occurs in many traditional research investigations.


In an era of shrinking budgets, ongoing deinstitutionalization and administrative reorganizations, evaluation research methodologies provide not only a framework to assess what works for which patients under what circumstances, but a mechanism for incorporating the views of all stakeholders in the process. Due to the fact that evaluative research methods are not constrained to the study of molecular events and relationships among only a handful of variables, a variety of qualitative and quantitative research techniques can be incorporated into evaluation studies which produce molar, global outcomes that more accurately demonstrate efficacy then traditional approaches.

Of the most commonly used research methods, the evaluative approach is the most comprehensive and applicable method for understanding complex variable interactions and is well suited to not only determining cause and effect in natural settings, but determining the relative value of mental health programs as well. Without the constraints of experimental study controls and manipulations, evaluative research can identify the overall, global relationships among all important factors that operate in a mental health program's input, process, content and products. Though outcome studies of this type may be constrained by generalizability limitations, this method offers a more comprehensive approach to applied research problems than either causal comparative or correlational methodologies and is an extremely useful tool for many applied research projects that cannot be experimentally manipulated in a scientific paradigm. Thus, evaluation research is the most useful approach of the prinicple investigative approaches in determining the global programmatic determination of 'what works, for which patients, under what circumstances' in mental health treatment programs.

Finally, evaluation research methods can incorporate the views of allied professionals from a variety of disciplines, program administrators, directors, patients, families, and community action boards more readily than other methodological approaches. In addition, it can be more readily applied to the evaluation of a wider variety of mental service programs than traditional research approaches (Guba & Lincoln, 1981; Windle & Lalley, 1992). Since the intent of this research is to determine the value or worth of services in a particular program or program model, evaluation research studies can provide efficacy data that fits within the worldview of all interested stakeholders. This leads to greater public acceptance and better direction in mental health program planning, funding, and clinician training. Most importantly, evaluation research promotes more effective treatment programs to enrich the lives of those patients and families who must endure the financial, social, and personal costs of mental illness on a daily basis.


ADAMHA Reorganization Act of 1992, Pub. L. No. 102 321, 464R (f)(2).

Bockman, J.S. (1963). Moral Treatment in American Psychiatry. New York: Springer Publishing Company.

Borg, W.R., and Gall, M.D. (1989). Educational Research. New York: Longman, 587 540.

Breakey, W.R. (1987). Treating the homeless. Alcohol and Research World, 11, 42 47.

Caplan, R., Caplan G. (1969). Psychiatry and the Community in Nineteenth Century America. New York: Basic Books

Clarke, G. N. (1995). Improving the transition from basic efficacy research to effectiveness studies: methodological issues and procedures. Journal of Consulting and Clinical Psychology, 63 (5), 718 725.

Deutsch A. (1948). The Shame of the States. New York: Harcourt Brace.

DonGiovanni, V. J. (1988). An evaluation of an ongoing treatment program for the psychiatrically impaired substance abuser. Dissertation Archives, Indiana University of Pennsylvania.

Farkas, M.D., & Anthony, M.A. Psychiatric Rehabilitation Programs. Baltimore, MD: The John Hopkins University Press.

Frank, R.G., Sullivan, M.J., DeLeon, P.H. (1994) Health care reform in the states. American Psychologist, 49 (10), 855 867. Grob, G.N. (1973). Mental Institutions in America: Social Policy in 1875. New York: Free Press.

Guba, E.G. & Lincoln, Y. (1981) Effective Evaluation. San Francisco: Josey Bass.

Guba, E.G. & Lincoln, Y. (1989) Forth Generation Evaluation. Newbury Park, CA: Sage Publications.

Inouye, D. (1983). Access, stigma, and effectiveness. American Psychologist. August, 912 917. Isaac, S. (1971). Handbook of research and evaluation. San Diego: Edits.

Klerman, G.L. (1974). Current evaluation research on mental health services. American Journal of Psychiatry, 131 (7), 783 787.

Klerman, G.L. (1972). Public trust and professional confidence. Smith College Studies in Social Work, 17, 115 124.

Klerman, G.L., Kellam, S., Leiderman, H. (1971). Research aspects of community mental health centers: report of the APA task force. American Journal of Psychiatry, 127, 993 998.

Lalley, T.L., Hohmann, A.A., Windel, C.D., Norquist, G.S., Keith, S.J., & Burke, J.D. (1992). Caring for people with severe mental disorders: A national plan to improve services. Schizophrenia Bulletin, 18, 559 700.

Lamb, H.R., & Goertzel V. (1972). The demise of the state hospital a premature obituary? General Archives of Psychiatry, 26, 489 495.

McGrew, J.H., Bond, G.R., Dietzen, L., & Salyers, M. (1994). Measuring the fidelity of implementation of a mental health program model. Journal of Consulting and Clinical Psychology, 62, 670 679.

Minkoff, K. (1991). Program components of a comprehensive integrated care system for serious mentally ill patients with substance disorders. New Directions for Mental Health Services, 50, 95 106.

Minkoff, K. (1989). An Integrated Program Model for Dual Diagnosis of Psychosis and Addiction. Hospital and Community Psychiatry. 40 (10), 1031 1036.

National Institute of Mental Health. (1989). The future of mental health services research (DHHS Publication No. DM 89 1600). Washington, DC: U.S. Government Printing Office.

National Institute of Mental Health. (1991). Caring for People With Severe Mental Disorders: A National Plan of Research to Improve Services. In C. A. Traube, D. Mechanic, & A. A. Hohmann, (Eds.) (DHHS Publication No. ADM 91 1762). Washington, DC: U.S. Government Printing Office.

National Institute of Mental Health. (1991a). Clinical services research: enhancing the real world applications of clinical science. In Caring for People With Severe Mental Disorders: A National Plan of Research to Improve Services. (DHHS Publication No. ADM 91 1762). Washington, DC: U.S. Government Printing Office.

Newman, F.L., & Howard, K.I. (1991) Introduction to the special section on seeking new clinical research methods. Journal of Consulting and Clinical Psychology, 59, 8 11.

Newman, F.L., Howard, K.I., Windle, C.D., and Hohmann, A.A. (1994). Introduction to the special section on seeking new methods in mental health services research. Journal of Consulting and Clinical Psychology. 62, 667 669.

NYSOMH (1990). Part 585, mental health regulations for outpatient services. New York State Office of Mental Health. Albany, NY: NYSOMH.

Psycscan (1994). Program effectiveness index. Alexandria VA: American Psychological Association.

Psycscan (1995). Program effectiveness/efficacy index. Alexandria VA: American Psychological Association.

Robitscher J. (1972). The right to psychiatric treatment: a social legal approach to the plight of the state hospital patient. Villanova Law Review, 18:11 36.

Robitscher J. (1972). Courts, state hosptials, and the right to treatment. American Journal of Psychiatry, 127: 993 998.

Rothman D. (1971). The Discovery of the Asylum: Social Order and Disorder in the New Republic. Boston: Little, Brown and Co.

Scott, J.P., and Ginsburg, B.E. (1994) The seville statement on violence revisited. American Psychologist, 49, No. 10, 849 850.

Seligman, R. B. (1995). Reporte de supervision del proyecto de salud rural summario administrativo. World Bank Project CAM/92/009. World Bank. Washington DC.

Uehara, E.S., Smukler, M., & Newman, F.L. (1994). Linking resourse use to consumer level of need: Field test of the level of need care assessment (LONCA) method. Journal of Consulting and Clinical Psychology, 62, 695 709.

Ullman, L.P., & Gurel L. (1964). Size, staffing, and psychiatric hospital effectiveness. Archives of General Psychiatry, 11: 360 367.

Vessey, J.T., Howard, K.I., Lueger, R.J. Kachele, H., & Mergenthaler, E. (1994). The clinician's illusion and the psychotherapy practice: An application of stochatic modeling. Journal of Consulting and Clinical Psychology, 62, 679 685.

Windle, C., & Lalley, T.L. (1992). Recent findings from NIMH's services research program. Administration and Policy in Mental Health, 19 (5).

Witheridge, T.F. (1991). The "active ingredients" of an in vivo community support program. New Directions in Mental Health Services, 52, 47 64.

Weiss, C. (1972). Evaluation Research. Englewood Cliffs NJ: Prentice Hall Inc., pp. 6 25.

Westin, D. (1972). cited in Drug Research Reports 15 (49), Dec. 2, p.RN3.

Wood, S. (1974). Fundamentals of psychological research. Englewood Cliffs NJ: Prentice Hall.

Yeaton, W. H. (1994) The development and assessment of valid measures of service delivery to enhance inference in outcome based research: Measuring attendance at self help group meetings. Journal of Consulting and Clinical Psychology, 62, 686 694. Weiss, C. (1972). Evaluation Research. Englewood Cliffs NJ: Prentice Hall Inc., pp. 6 25.

Westin, D. (1972). cited in Drug Research Reports 15 (49), Dec. 2, p.RN3.

Wood, S. (1974). Fundamentals of psychological research. Englewood Cliffs NJ: Prentice Hall.

Yeaton, W. H. (1994) The development and assessment of valid measures of service delivery to enhance inference in outcome based research: Measuring attendance at self help group meetings. Journal of Consulting and Clinical Psychology, 62, 686 694.

Copyright © 2000, Southern Development Group, S.A.  All Rights Reserved.
A Private Non-Profit Agency for the good of all, published in the UK & Honduras