We distinguish between reverse discrimination and over-correction, arguing that the former should be used only to describe cases where well-qualified non-minority applicants are unjustifiably denied positions in organizations run by and/or staffed by minorities. Similarly, we argue over-correction should be used to describe well-qualified non-minority applicants who are unjustifiably denied positions in organizations run by non-minorities. Accordingly, reverse discrimination and over-correction constitute two of five possible outcomes: Fair hiring, discrimination, correction (or affirmative action), over-correction, and reverse discrimination. We draw from Ricci v. DeStefano, and a simulation based on recent litigation to show the benefits of making this critical distinction.
Social psychologists have a long history of interest in bias and stereotypes, a topic which has generated careful experimental work for more than nine decades. For example, one of the earliest studies on the impact of stereotypes examined the role of appearance in judging appropriate work roles (Rice, 1926). More recently, researchers have begun to conduct experiments on the conditions under which subjects adjusted their initial impressions to either accommodate or compensate for a prevalent stereotype. For instance, one study (Petty & Wegener, 1993) highlighted the fact that, depending upon the context, subjects will make an effortful adjustment-a correction-of their initial impressions to either compensate for or accommodate a prevailing bias. This line of research has led Olson and Fazio to underscore the importance of what they have called overcorrection: an intentional adjustment to overcome a negative stereotype where individuals overshoot their goal of dispassionate objectivity and end up formulating an unrealistically favorable attitude (Olson & Fazio, 2004). In this paper I extend Olson and Fazio's concept into the legal domain, and show how it can be applied in lawsuits involving what is often called reverse discrimination, the most prominent example of which is Ricci v. DeStefano (2006). By examining the role of overcorrection in a data simulation built by blending features of several recent lawsuits where I served as an expert witness, and conducting the statistical tests that are suggested by that distinction, we arrive at an unexpected resolution that would not be readily apparent had we not invoked the concept of overcorrection. The heart of this effort rests on the central distinction between reverse discrimination and overcorrection, a distinction that is clarified directly below.
Reverse Discrimination and Overcorrection are Not Identical
In the common notion of reverse discrimination, well-qualified non-minority applicants are unjustifiably denied desirable positions in a company, school, or job category, in an effort to make room for less-qualified minority applicants who receive the sought-after position despite their disparity in skills, aptitude, qualifications, merit, experience, training, or potential. Arguably, the notion of reverse discrimination derives its power from the triple implications of ignoring the sufficient qualifications of the non-minority candidate, willfully disregarding the insufficient qualifications of the minority candidate, and overturning the notion of fairness upon which all meritocracies are based. Accordingly, vernacular use of the term often seems to imply a triple offence wherein genuine talent is ignored, sub-standard applicants are artificially elevated, and the principle of fairness is sacrificed for expediency.
Overcorrection is an altogether different matter. When a school, organization, or company overcorrects, it makes an effort to overcome the negative impact of a longstanding stereotype or bias, and in the process ascribes too much weight to the disadvantages-the "headwinds" as they were initially termed in some important decisions such as Griggs v. Duke Power Co., (1971)-that were overcome by minority applicants. In overcorrection, the diminished qualifications of the minority applicants are understood to indicate the presence of an exceptional-but nascent-skill, aptitude, merit, or potential. Proponents of this view bring three implicit assumptions to bear: first, that positions will not be mistakenly given to minority applicants who are truly and utterly unqualified; second, that the number of applicants who should be considered qualified frequently surpasses the number of available positions; and third, that minorities who have long labored under unearned stereotypes deserve an opportunity to apply themselves and excel in an arena where they have been historically denied admission for trivial reasons.
At some level, the core of the distinction between reverse discrimination and overcorrection concerns the nature of the organization involved: If an organization is controlled by minorities who fill a preponderance of its positions, and policies are implemented to exclude all qualified non-minorities, then arguably appropriate charges of reverse discrimination are bound to follow, and it would be odd indeed to speak of overcorrection. For example, if an organization of female nurses or female librarians excluded all qualified male candidates, the rejected applicants might very well feel as if they were victims of reverse discrimination. Similarly, if an organization is controlled by non-minorities and policies are implemented to hire and promote all qualified minority candidates rather than any qualified non-minority candidate, regardless of the circumstances, then arguably appropriate charges of overcorrection would logically follow. But here is where the analogy breaks down, and the discontinuity is informative: One could easily imagine aggrieved non-minority employees complaining of reverse discrimination as they watched any qualified minority applicant fill a position that they themselves had previously expected without having to overcome competition from a large (and growing) pool of qualified minority candidates. Just as minorities sometimes intentionally avoid concluding that racial prejudice is present where it exists (Crosby, 1984) and may, in uniquely provocative situations, mistakenly imagine racial discrimination and a malefactor where neither are present (Crocker, Voelkl, Testa & Major, 1991), non-minorities may do the same-underestimating the presence of genuine racism penalizing their minority colleagues and overestimating the role of pernicious "reverse discrimination" against themselves.
A Revised Definition
In light of the importance of the minority status of the organization's leadership, we argue that the term reverse discrimination should be used only to describe cases where well-qualified non-minority applicants are unjustifiably denied desirable positions in organizations run by and/or staffed by minority employees; similarly, we argue that the term overcorrection should be used for circumstances where well-qualified non-minority applicants are unjustifiably denied desirable positions in organizations run by and/or staffed by non-minority employees who are actively instituting policies to expand the hiring and promotion of minorities, typically in an affirmative action program. By this view, reverse discrimination and overcorrection constitute two components in a complete set of four conditions: discrimination, correction (or affirmative action), overcorrection, and reverse discrimination.
In the remainder of this paper we will show how statistical tests can be used to distinguish between overcorrection and the type of correction which is part of an effective affirmative action program. We'll close with a discussion concerning the importance of the distinction between overcorrection and correction, and the impact of this distinction on litigation trends in the US.
For the analysis in this study I built a simulation dataset that combined several typical features from about a dozen cases where I served as an expert witness on disparate impact lawsuits. Typically, the task in these cases was to determine whether the organization's affirmative action program was functioning effectively and had met court-mandated affirmative action targets, or whether it had fallen short of those goals, or whether it had actually surpassed the court-mandated targets. The dataset for the current analysis contained 40 rows and 12 columns (as described below) representing each of 40 months where the hiring and promotion practices of a municipality were tracked over the course of a multi-year affirmative action program. Our analysis method used conventional tests of statistical significance (Cohen, Cohen, Aiken & West, 2002), and followed standard practices for applied research in the behavioral sciences based on peer reviewed research (Singleton & Straits, 1999); we also relied heavily on standards established in Daubert v. Merrell Dow Pharmaceuticals, Inc., (1993), Kumho Tire Co., Ltd. v. Carmichael (1999), the Federal Rules of Evidence , and Malave v. Potter (2003), where the court expressed its preference for data drawn from a flow.
In our dataset, each of the 40 rows summarizes data from one month of a municipalities' affirmative action program. The 15 columns hold a number of important parameters, including: (a) the actual proportion of minority employees in the workforce; (b) the benchmark goal, expressed in percentage points, for minorities in the organization-as required by a consent decree based on demographic patterns in the hiring area, and following the procedure used in Hazelwood School District v. United States (1977); (c) the total number of employees in the workforce; (d) the total number of minority employees in the workforce; (e) the total number of residents in the geographic catchment area; (f) the total number of minorities in the geographic catchment area; (g) the total number of minorities in the geographic catchment area who had successfully passed the qualifying exam during the month in question; (h) the total number of minorities in the geographic catchment area who had successfully passed the qualifying exam during the month in question; and (i) several other identifying variables to track the name of the time period. This data structure allows the statistical expert to evaluate the difference between the benchmark goal and the municipality's hiring records in an objective manner that is consistent with best practices in all fields that rely on statistics. In particular, the data structure we describe can be used with a simple table to compute the z-score of the difference between the observed value and the benchmark (Cohen, Cohen, Aiken & West, 2002), a test that has been used in more than a score of disparate impact lawsuits, as a quick review of Lexis-Nexis shows.
By applying the z-score test in each row of the dataset we can obtain a p value indicating the probability of seeing the observed difference between the expected value and the actual value if nothing other than random chance were operating. This method adheres to what is usually called the Castaneda-Hazelwood Standard (Piette, 1992): "...if the difference between the expected value and the observed number is greater than two or three standard deviations" then the outcome of the selection procedure is deemed unlikely to have been determined by chance alone Castaneda v. Partida (1977). In other words, the p value of the z-score allows the statistical expert to determine whether the benchmark has been unmet, met, or exceeded by a statistically significant margin. Accordingly, the p value of the z-scores enables the statistical expert to classify every row into one of three states: (1) the observed value is not significantly different from the benchmark goal; or (2) the observed value is lower than the benchmark goal by a statistically significant margin; or (3) the observed value is larger than the benchmark goal by a statistically significant margin.
The practical value of using p values from z-scores is that they provide a straightforward, widely recognized statistical method for evaluating the effectiveness of an organization's affirmative action efforts. If there is no significant difference between an observed value and its benchmark goal, then the affirmative action target has been met for that observation. If the observed value is less than the benchmark goal, then the affirmative action program has not yet hit its target for that observation. And if the observed value surpasses the benchmark goal by a significant margin then there is suggestive evidence of overcorrection-at least in that row of the dataset.
The next step in the analysis is critical. Each of the three classifications described above allows the statistical expert to make a dichotomous determination suited to the litigation at hand. For example, if one litigant believes that the affirmative action program is meeting its goals and another litigant disagrees, then a simple "yes" or "no" will allow the statistical expert to create a new column in the database showing whether each row confirms or disconfirms the plaintiff's claim. Readers with experience in the behavioral sciences will notice that this new column is identical to the variable in a research project that would indicate whether the researcher's hypothesis was confirmed or disconfirmed by any given observation. However, as Reynolds points out in his classic text on analysis of nominal data, (Reynolds, 1984) it is imperative to analyze the entire dataset as a unit before any more detailed analyses are conducted. This approach, formalized by Iverson several decades ago (Iverson, 1979) logically requires us to run an appropriate test to examine whether the entire dataset as a whole can be said to confirm or disconfirm the plaintiff's claim. In our example, the effectiveness of the municipality's affirmative action program is being questioned throughout the course of its 40-month history; and our dataset allows us to track all hiring decisions on a month-to-month basis, with each of the 40 rows in the dataset generating a "yes" or a "no" outcome to any one of the three questions outlined above, namely: Does the observed value equal, or surpass, or fall below, the expected value by a statistically significant margin? The courts have recognized a venerable and straightforward statistical tool called the binomial test for evaluating such data, and it has played a key role in landmark discrimination cases such as Hazelwood School District v. United States (1977), International Brotherhood of Teamsters v. United States (1977), Connecticut v. Teal (1977), Wards Cove Packing Co., Inc. v. Antonio (1989), and EEOC v. Sears, Roebuck & Co., (1986). The binomial test, formulated by Newton in 1665 (Ball, 1908) allows a statistician to determine the likelihood of seeing any given number of confirmations in any set of observations. To continue our example, 20 heads out of 40 coin tosses is obviously an indication that nothing other than pure chance is operating. One head (or 39 tails) out of 40 tosses has a very low likelihood of being generated by a perfectly balanced coin. However, as the binomial test tells us, 26 "yes" classifications (but not 25) in 40 observations is just enough to be a statistically significant deviation from chance (at the 0.05 level-the internationally recognized standard for statistical significance). In fact, if chance alone were determining the outcome with a perfectly fair coin we would expect to see 26 "yes" outcomes in 40 trials only four times in every 100 sets of 40 coin tosses. For our work with the binomial test, we used a well-designed online calculator (http://stattrek.com) and checked those values against a conventional printed binomial table (Hays, 1988).
In our hypothetical example, a municipality has implemented an affirmative action program that spanned 40 months, and an expert witness was engaged to determine the effectiveness of this program. A benchmark goal was established that specified that the municipality's workforce should be 15 percent African American to match the composition of the population in the surrounding census tracts. Given the fact that the workforce was already averaging greater than 15 percent African American in a majority of the months, there were only two possibilities: either the program had adequately corrected for discrimination in the workplace, or it had overcorrected. The z-score analysis described above indicated that in 30 of the 40 available comparisons, the observed value was greater than the expected value (i.e., the benchmark goal of 15 percent) by a statistically significant margin. The remaining 10 rows in the dataset were not significantly different from the benchmark goal of 15 percent; that is, even in cases where the observed proportion of minority employees was below 15 percent, the difference between the benchmark target and the observed value was so small that it could be attributed exclusively to chance variation.
In the next step of the analysis we applied the binomial test to determine whether the overall result of the z-score analysis (30 confirmations out of 40 trials) could be explained by chance alone. The binomial test indicated that the likelihood of seeing 30 confirmations in 40 events if only chance alone were operating is 0.001-only one chance in 1,000. As specified by the Castaneda-Hazelwood Standard, this level of statistical significance constitutes adequate evidence that the observed numbers are not merely random events, but indicate a meaningful distribution caused by non-random factors. In this case, the significant binomial results suggest the presence of consistent overcorrection in the 40 month period of the affirmative action program: The municipality not only met the benchmark goal of a 15 percent minority workforce, it surpassed that goal.
In our hypothetical example the expert witness used a rigorous and conventional method for running tests of statistical significance in a dataset generated by a municipality's affirmative action program. The expert found that the municipality surpassed its target of having a 15 percent minority workforce during 30 of the 40 months of the program. Moreover, in these 30 months the observed proportion of minorities surpassed 15 percent by a statistically significant margin. That is, chance alone was not a plausible explanation for the disparity seen between the target of 15 percent and the higher percent of minorities actually hired or retained. Furthermore, as the binomial test result shows, the overall pattern of results strongly suggests the presence of consistent overcorrection: If nothing more than chance variation were operating, we would only see this pattern of results one time in 1,000 trails.
As mentioned above, we view overcorrection as a symmetrical component matched by adequate correction, the latter being the outcome of effective affirmative action programs where efforts are made to correct a stereotype or bias. Similarly, we view conventional discrimination as a symmetrical component balanced against reverse discrimination. We propose that the term reverse discrimination be used only in circumstances where a non-minority applicant is unjustifiably excluded from a position in an organization that is controlled and/or staffed by minorities. Accordingly, there are four balanced components in our model: adequate correction, overcorrection, conventional discrimination, and reverse discrimination.
Naturally enough, whether plaintiffs claim they are victims of pernicious "reverse discrimination," or merely of well-intended overcorrection, the determination by the finders of fact in court will theoretically be identical because intention is not a relevant issue: In the landmark case Teamsters, the court determined that discriminatory intent is neither necessary for, nor a logical inference from, any disparate impact suit. Nevertheless, the distinction between overcorrection and what has been called reverse discrimination is not a superficial semantic detail.
Our distinction sheds light on an important aspect of the plaintiffs' contention in Ricci just as it does in our constructed example. In my work on the amicus brief for the NAACP's Legal Defense Fund in that lawsuit, I pointed out the fact that the New Haven's skill assessment might very well have sufficient validity to predict subsequent on-the-job performance (as required by the Uniform Guidelines on Employee Selection Procedures of 1978). However, it is not necessarily true that because average scores predict subsequent average performance that each test score from all individual employees will be equally predictive. Moreover-and here is the critical issue-even though the test's average scores from employees as a whole may indeed predict subsequent performance, the selection test in Ricci used rank scores, and rank scores from individual minority employees are even less likely to be reliable than raw scores arrayed on a smooth continuum (Sudman, Bradburn & Schwarz, 1996), as has been known for more than half a century (Stevens, 1958). Indeed, if the selection test contained subtle design features in the format, content, or wording that put minorities at an unfair disadvantage, their artificially depressed scores might not be apparent in the aggregate at all, and would only be manifest when raw scores were converted into rankings. Accordingly, the selection procedure in Ricci will almost certainly generate more litigation in the near future, non-minority claims of discrimination notwithstanding.
Why does the distinction between overcorrection and reverse discrimination matter? Because it's entirely reasonable to argue that plaintiffs who feel they are the victims of discrimination behave differently, think differently, and perceive events differently, than other plaintiffs.
Accordingly, the benefit of differentiating between overcorrection and reverse discrimination lies in its ability to discourage the strong negative psychological impact that the latter terms entails for plaintiffs and defendants alike. As a recent analysis of litigation trends shows (Clermont & Schwab, 2008), employee litigants are already functioning under difficult circumstances: They lose their trials and pre-trial hearings more often than non-employee litigants, tend to settle out of court less frequently than other litigants, and are more likely to initiate an appeal...which (as you may guess) they lose in remarkably disproportionate numbers (Clermont & Schwab, 2004). Surely, this is a group of litigants who might benefit by understanding that well-intended overcorrection elevating a relatively small number of qualified minorities does not quite reach the level of depravity we associate with the notion of discrimination.
Ball, W. W. R. (1908). A short account of the history of mathematics (4th ed.). London: Macmillian.
Castaneda v. Partida, 430 U.S. 482-97, (S.Ct., 1977) 1272, 51 L.Ed.2d.498 (1977).
Clermont, K., & Schwab, S. J. (2004). How employment discrimination plaintiffs fare in federal court. Journal of Empirical Legal Studies, 1, 429-458.
Clermont, K., & Schwab, S. J. (2008). Employment discrimination plaintiffs in federal court: From bad to worse? Harvard Law and Policy Review, 3(1), 104-123.
Cohen, J., Cohen, P., Aiken, L. S., & West, S. G. (2002). Applied multiple regression - correlation analysis for the behavioral sciences. Hillsdale, NJ, USA: Erlbaum Associates.
Connecticut v. Teal, 457 U.S. 440-102, (S.Ct., 1977) 2525, 73 L.Ed.2d. 130 (1977).
Crocker, J., Voelkl, K., Testa, M., & Major, B. (1991). Social stigma: The affective consequences of attributional ambiguity. Journal of Personality and Social Psychology, 60(2), 218.
Crosby, F. (1984). The denial of personal discrimination. American Behavioral Scientist, 27(3), 371.
Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, (S.Ct., 1993) 113 S.Ct. 2786, 125 L.Ed2d. 469 (1993).
EEOC v. Sears, Roebuck & Co., 628 F. Supp. 1264, (N.D. Ill., 1986) .
Griggs v. Duke Power Co., 401 U.S. 424, (S.Ct., 1971) .
Hays, W. L. (1988). Statistics (3rd. ed.). New York: Holt, Rinehart and Winston.
Hazelwood School District v. United States, 433 U.S. 299, (S.Ct., 1977) 97 S.Ct. 2736, 53 L.Ed.2d 768 (1977).
International Brotherhood of Teamsters v. United States, 431 U.S. 324-97, (S.Ct., 1977) 1843, 52 L.Ed.2d. 396 (1977).
Iverson, G. J. (1979). Decomposing chi square. Sociological Methods and Resaerch, 8, 143-157.
Kumho Tire Co., Ltd. v. Carmichael, 526 U.S. 137, (S.Ct., 1999) 119 S.Ct. 1167, 143, L.Ed.2d 238 (1999).
Malave v. Potter, 320 F.3d. 321, (2nd Cir., 2003) .
Olson, M. A., & Fazio, R. H. (2004). Trait inferences as a function of automatically activated racial attitudes and motivation to control prejudiced reactions. Basic and Applied Social Psychology, 26(1), 1-11.
Petty, R. E., & Wegener, D. T. (1993). Flexible correction processes in social judgment: Correcting for context-induced contrast. Journal of Experimental Social Psychology, 29(2), 137-165.
Piette, M. J. (1992). Methodological issues when using simple models to investigate employment discrimination: Reply. Journal of Forensic Economics, 6(1), 43-50.
Reynolds, H. T. (1984). Analysis of nominal data (2nd. ed.). Beverly Hills: Sage.
Ricci v. DeStefano, 554 F.Supp.2d. 142, (US District Court for Connecticut, 2006).
Rice, S. A. (1926). 'Stereotypes': A source of error in judging human character. Journal of Personnel Research, 5, 267-276.
Singleton, R., & Straits, B. C. (1999). Approaches to social research (4th ed.). New York: Oxford University Press.
Stevens, S. S. (1958). Measurement and man. Science, 127(3295), 383-389.
Sudman, S., Bradburn, N. M., & Schwarz, N. (1996). Thinking about answers; the application of cognitive processes to survey methodology. San Francisco: Jossey-Bass.
Wards Cove Packing Co., Inc. v. Antonio, 490 U.S. 642, (S.Ct., 1989) 2115,104 L.Ed.2d. 733 (1989).
Author's Note: The author would like to thank Mary Heumann for her dedicated help preparing this manuscript.