THE BLOG
05/30/2009 05:12 am ET Updated Dec 06, 2017

Opening the Black Box of Peer Review

An essay excerpted from How Professors Think: Inside the Curious World of Academic Judgment (Harvard University Press)

Excellence is the holy grail of academic life. Scholars strive to produce research that will influence the direction of their field. Universities compete to improve their relative rankings. Students seek inspiring mentors. But if excellence is ubiquitously evoked, there is little cross-disciplinary consensus about what it means and how it is achieved, especially in the world of research. "The cream of the crop" in an English or nthropology department has little in common with "the best and the brightest" in an economics department. This disparity does not occur because the academic enterprise is bankrupt or meaningless. It happens because disciplines shine under varying lights and because their members define quality in various ways. Moreover, criteria for assessing quality or excellence can be differently weighted and are the object of intense conflicts. Making sense of standards and the meanings given to them is the object of this book.

The Latin word academia refers to a community dedicated to higher learning. At its center are colleagues who are defined as "peers" or "equals," and whose opinions shape shared definitions of quality. In the omnipresent academic evaluation system known as peer review, peers pass judgment, usually confidentially, on the quality of the work of other community members. Thus they determine the allocation of scarce resources, whether these be prestige and honors, fellowships and grants to support research, tenured positions that provide identifiable status and job security, or access to high-status publications. Peers monitor the flow of people and ideas through the various gates of the academic community. But because academia is not democratic, some peers are given more of a voice than others and serve as gatekeepers more often than others. Still, different people guard different gates, so gatekeepers are themselves subject to evaluation at various times.1

Peer review is secretive. Only those present in the deliberative chambers know exactly what happens there. In this book I report what I have learned about this peculiar world. I studied humanists and social scientists serving on multidisciplinary panels that had been charged with distributing prestigious fellowships and grants in support of scholarly research. I conducted in-depth interviews with these experts and also observed their deliberations. During their face-to-face discussions, panelists make their criteria of evaluation explicit to one another as they weigh the merits of individual proposals and try to align their own standards with those of the applicants' disciplines. Hence, grant review panels offer an ideal setting for observing competing academic definitions of excellence. That peer evaluation consumes what for many academics seems like an ever-growing portion of their time is an additional reason to give it a close look.

Academic excellence is produced and defined in a multitude of sites and by an array of actors. It may look different when observed through the lenses of editorial peer review, books that are read by generations of students, current articles published by top journals, elections at national academies, or appointments at elite institutions. American higher education also has in place elaborate processes for hiring, promoting, and firing academics. Systematically examining parts of this machinery is an essential step in assessing the extent to which this system is harmonized by a shared evaluative culture.

Evaluations of fellowship programs typically seek answers to such questions as: Are these programs successful in identifying talent? Do awardees live up to their promise? The tacit assumption is that these programs give awards to talented people with the hope that the fellowship will help them become "all they can be."2 Examining how the worth of academic work is ascertained is a more counterintuitive, but I think ultimately more intriguing, undertaking. Rather than focusing on the trajectory of the brilliant individual or the outstanding oeuvre, I approach the riddle of success by analyzing the context of evaluation -- including which standards define and constrain what we see as excellent.3

By way of introduction, I pose and answer the same kinds of questions that are typically asked in peer review of completed or proposed research proposals.

What do you study? I study evaluative cultures.4 This broad term includes many components: cultural scripts that panelists employ when discussing their assessments (is the process meritocratic?);5 the meaning that panelists give to criteria (for instance, how do you recognize originality?); the weight they attribute to various standards (for example, "quality" versus "diversity"); and how they understand excellence. Do they believe excellence has an objective reality? If so, where is it located--in the proposal (as economists generally believe) or in the eye of the beholder (as English scholars claim)?

Evaluative cultures also include how reviewers conceive of the relationship between evaluation and power dynamics, their ability to judge and reach consensus, and their views on disciplinary boundaries and the worth and fate of various academic fields. Finally, evaluative cultures include whether panelists think that subjectivity has a corrupting influence on evaluation (the caricatured view of those in the harder social sciences) or is intrinsic to appreciation and connoisseurship (the view of those in the humanities and more interpretive social sciences).6

I study shared standards and evaluative disciplinary cultures in six disciplines. Each presents its own characteristics and challenges. In philosophy, members claim a monopoly on the assessment of their disciplinary knowledge. In history, a relatively strong consensus is based on a shared sense of craftsmanship. Anthropologists are preoccupied with defining and maintaining the boundaries of their discipline. English literary scholars experience their field as undergoing a "legitimation crisis," while political scientists experience theirs as divided. In contrast, economists view their own field as consensual and unified by mathematical formalism.

Whom do you study? I study panelists, who are, in principle, highly regarded experts known for their good "people skills" and sound judgments. They have agreed to serve on grant peer review panels for a host of reasons having to do with influence, curiosity, or pleasure. Some say that they are "tremendously delighted" to spend a day or two witnessing brilliant minds at work. Others serve on panels to participate in a context where they can be appreciated, that is, where they can sustain -- and ideally, enhance -- their identities as highly respected experts whose opinions matter.

Why study peer review? As I write, debates are raging about the relative significance of excellence and diversity in the allocation of resources in American higher education. Are we sacrificing one for the other? Analyzing the wider culture of evaluation helps us understand their relationship. For all but the top winners, decisions to fund are based generally on a delicate combination of considerations that involve both excellence and diversity. Often panelists compete to determine which of several different types of diversity will push an A− or B+ proposal above the line for funding -- and very few proposals are pure As. Evaluators are most concerned with disciplinary and institutional diversity, that is, ensuring that funding not be restricted to scholars in only a few fields or at top universities. A few are also concerned with ethnoracial diversity, gender, and geographic diversity. Contra popular debates, in the real world of grant peer review, excellence and diversity are not alternatives; they are additive considerations. The analysis I provide makes clear that having a degree from say, a Midwestern state university, instead of from Yale, is not weighed in a predictable direction in decision-making processes. Similarly, while being a woman or person of color may help in some contexts, it hurts in others.7

What kind of approach do you take? I think that as social actors seeking to make sense of our everyday lives, we are guided primarily by pragmatic, problem-solving sorts of concerns. Accordingly, my analysis shows that panelists adopt a pragmatic approach to evaluation. They need to reach a consensus about a certain number of proposals by a predetermined time, a practical concern that shapes what they do as well as how they understand the fairness of the process. They develop a sense of shared criteria as the deliberations proceed, and they self-correct in dialogue with one another, as they "learn by monitoring."8 Moreover, while the language of excellence presumes a neat hierarchy from the best to the worst proposals, panelists adopt a nonlinear approach to evaluation. They compare proposals according to shared characteristics as varied as topic, method, geographical area, or even alphabetical order. Evaluators are often aware of the inconsistencies imposed by the conditions under which they carry out their task.

What do you find? The actions of panelists are constrained by the mechanics of peer review, with specific procedures (concerning the rules of deliberation, for instance) guiding their work. Their evaluations are shaped by their respective disciplinary evaluative cultures, and by formal criteria (such as originality, significance, feasibility) provided by the funding competition. Reviewers also bring into the mix diversity considerations and more evanescent criteria--elegance, for example. Yet despite this wide array of disciplinary differences, they develop together shared rules of deliberation that facilitate agreement. These rules include respecting the sovereignty of other disciplines and deferring to the expertise of colleagues. They entail bracketing self-interest, idiosyncratic taste, and disciplinary prejudices, and promoting methodological pluralism and cognitive contextualization (that is, the use of discipline-relevant criteria of evaluation). Respect for these rules leads panelists to believe that the peer review process works, because panelists judge each other's standards and behavior just as much as they judge proposals.9

Peer review has come under a considerable amount of criticism and scrutiny.10 Various means--ranging from double-blind reviewing to training and rating--are available to enforce consistency, ensure replicability and stability, and reduce ambiguity. Grant peer review still favors the face-to-face meeting, unlike editorial peer review, where evaluators assess papers and book manuscripts in isolation and make recommendations, usually in writing, to physically distant editors.11 Debating plays a crucial role in creating trust: fair decisions emerge from a dialogue among various types of experts, a dialogue that leaves room for discretion, uncertainty, and the weighing of a range of factors and competing forms of excellence. It also leaves room for flexibility and for groups to develop their own shared sense of what defines excellence--that is, their own group style, including speech norms and implicit group boundaries.12 Personal authority does not necessarily corrupt the process: it is constructed by the group as a medium for expertise and as a ground for trust in the quality of decisions made.13 These are some of the reasons that deliberation is viewed as a better tool for detecting quality than quantitative techniques such as citation counts.

It may be possible to determine the fairness of particular decisions, but it is impossible to reach a definite, evidence-based conclusion concerning the system as a whole. Participants' faith in the system, however, has a tremendous influence on how well it works. Belief in the legitimacy of the system affects individual actions (for instance, the countless hours spent reading applications) as well as evaluators' understanding of what is acceptable behavior (such as whether and how to signal the disregard of personal interest in making awards). Thus embracing the system has important, positive effects on the panelists' behavior.14

What is the significance of the study? The literature on peer review has focused almost exclusively on the cognitive dimensions of evaluation and conceives of extracognitive dimensions as corrupting influences.15 In my view, however, evaluation is a process that is deeply emotional and interactional. It is culturally embedded and influenced by the "social identity" of panelists--that is, their self-concept and how others define them.16 Reviewers' very real desire to have their opinion respected by their colleagues also plays an important role in deliberations. Consensus formation is fragile and requires considerable emotional work.17 Maintaining collegiality is crucial. It is also challenging, because the distinctive features of American higher education (spatial dispersion, social and geographic mobility, the sheer size of the field, and so on) increase uncertainty in interaction.

Is higher education really meritocratic? Are academics a selfreproducing elite?18 These and similar questions are closely tied to issues of biases in evaluation and the trustworthiness of evaluators. Expertise and connoisseurship (or ability to discriminate) can easily slide into homophily (an appreciation for work that most resembles one's own). Evaluators, who are generally senior and established academics, often define excellence as "what speaks most to me," which is often akin to "what is most like me," with the result that the "haves"--anyone associated with a top institution or a dominant paradigm--may receive a disproportionate amount of resources.19 The tendency toward homophily may explain the perceived conservative bias in funding: it is widely believed that particularly creative and original projects must clear higher hurdles in order to get funded.20 It would also help explain the Matthew effect (that is, the tendency for resources to go to those who already have them).21

But I find a more complex pattern. Evaluators often favor their own type of research while also being firmly committed to rewarding the strongest proposal. Panelists are necessarily situated in particular cognitive and social networks. They all have students, colleagues, and friends with whom they share what is often a fairly small cognitive universe (subfield or subspecialty) and they are frequently asked to adjudicate the work of individuals with whom they have only a few degrees of separation. While their understanding of what defines excellence is contingent on the cultural environment in which they are located, when scholars are called on to act as judges, they are encouraged to step out of their normal milieus to assess quality as defined through absolute and decontextualized standards. Indeed, their own identity is often tied to their self-concept as experts who are able to stand above their personal interest. Thus, evaluators experience contradictory pushes and pulls as they strive to adjudicate quality.22

What are the epistemological implications of the study? Much like the nineteenth-century French social scientist Auguste Comte, some contemporary academics believe that disciplines can be neatly ranked in a single hierarchy (although few follow Comte's lead and place sociology at the top). The matrix of choice is disciplinary "maturity," as measured by consensus and growth, but some also favor scientificity and objectivity.23 Others firmly believe that the hard sciences should not serve as the aspirational model, especially given that there are multiple models for doing science, including many that do not fit the prevailing archetypical representations.24 In the social sciences and the humanities, the more scientific and more interpretive disciplines favor very different forms of originality (with a focus on new approaches, new data, or new methods).25 From a normative standpoint, one leitmotif of my analysis is that disciplines shine under different lights, are good at different things, and are best located on different matrixes of evaluation, precisely because their objects and concerns differ so dramatically. For instance, in some fields knowledge is best approached through questions having to do with "how much"; other fields raise "how" and "why" questions that require the use of alternative approaches, interpretive tools, methods, and data-gathering techniques. These fundamental differences imply that excellence and epistemological diversity are not dichotomous choices. Instead, diversity supports the existence of various types of excellence.

Is the study timely? At the start of the twenty-first century, as I was conducting interviews, market forces had come to favor increasingly the more professional and preprofessional fields, as well as research tied to profit-making.26 Moreover, the technology of peer review has long been embedded in a vast academic culture that values science. In the public sphere, the social sciences continue to be the terrain for a tug-of-war between neoliberal market explanations for societal or human behavior and other, more institutional and cultural accounts.27 By illuminating how pluralism factors into evaluation processes, I hope to help maintain a sense of multiple possibilities.

2009-04-29-HPT_BookJacket.jpgMany factors in American higher education work against disciplinary and epistemological pluralism. Going against the tide in any endeavor is often difficult; it may be even more so in scholarly research, because independence of thinking is not easily maintained in systems where mentorship and sponsored mobility loom large.28 Innovators are often penalized if they go too far in breaking boundaries, even if by doing so they redefine conventions and pave the way for future changes.29 In the context of academic evaluation, there does not appear to be a clear alternative to the system of peer review. 30 Moreover, there seems to be agreement among the study's respondents that despite its flaws, overall this system "works." Whether academics who are never asked to evaluate proposals and those who never apply for funds share this opinion remains an open question.

Despite all the uncertainties about academic judgment, I aim to combat intellectual cynicism. Post-structuralism has led large numbers of academics to view notions of truth and reality as highly arbitrary. Yet many still care deeply about "excellence" and remain strongly committed to identifying and rewarding it, though they may not define it the same way.

I also aim to provide a deeper understanding, grounded in solid research, of the competing criteria of evaluation at stake in academic debates. Empirically grounded disciplines, such as political science and sociology, have experienced important conflicts regarding the place of formal theory and quantitative research techniques in disciplinary standards of excellence. In political science, strong tensions have accompanied the growing influence of rational choice theory.31 In the 1990s, disagreements surrounding the American Sociological Association's choice of an editor for its flagship journal, the American Sociological Review, have generated lively discussion about the place of qualitative and quantitative research in the field.32 In both disciplines, diversity and academic distinction are often perceived as mutually exclusive criteria for selecting leaders of professional associations. My analysis may help move the discussion beyond polemics.

Also, the book examines at the micro level the coproduction of the social and the academic.33 Since the late 1960s, and based on their understanding of the standards of evaluation used in government organizations, sociologists seeking support from government funding agencies began to incorporate more quantitative techniques in part as a way of legitimizing their work as "scientific."34 At the same time, government organizations became increasingly dependent on social science knowledge (for example, census information, data pertaining to school achievement, unemployment rates among various groups) as a foundation for social engineering. Thus, knowledge networks and networks of resource distribution have grown in parallel--and this alignment has sustained disciplinary hierarchies. The more a researcher depends on external sources of funding, the less autonomous he or she is when choosing a problem to study.35 Stan dards of evaluation that are salient, or that researchers perceive as salient, shape the kind of work that they undertake. These standards also affect the likelihood that scholars will obtain funding and gain status, since receiving fellowships is central to the acquisition of academic prestige.36 Thus are put in place the conditions for the broader hierarchy of the academic world.

Most of all, I want to open the black box of peer review and make the process of evaluation more transparent, especially for younger academics looking in from the outside.37 I also want to make the older, established scholars--the gatekeepers--think hard and think again about the limits of what they are doing, particularly when they define "what is exciting" as "what most looks like me (or my work)." Providing a wider perspective may help broaden the disciplinary tunnel vision that afflicts so many. A greater understanding of the differences and similarities across disciplinary cultures may lead academics toward a greater tolerance of, or even an appreciation for, fields outside their own. And coming to see the process as moved by customary rules may help all evaluators view the system in a different and broader perspective as well as develop greater humility and a more realistic sense of their cosmic significance, or lack thereof, in the great contest over excellence.

--------------------------------------------------------------------
Notes

1. A general analysis of the system of peer review and of other means
of allocating resources within academia can be found in Chubin and
Hackett (2003). On various reward systems and gatekeepers, see also Crane
(1976).
2. Cognitive psychologists and organizational behavior experts also focus
on the identification of success, intelligence, creativity, and the development
of excellent individuals. See, for example, Csikszentmihalyi (1996); Gardner
(1999); Goleman, Boyatzis, and McKee (2002); and Ericsson (1996).
3. This approach is akin to that described in Latour (1988), Hennion
(2004),Heinich (1996), and Rosental (2003) on the recognition of intellectual
and cultural outputs. See also Frickel and Gross (2005) and Lamont (1987).
On conventions, see Becker (1982).
4. My approach to evaluative cultures builds on Fleck's classic book Genesis
and Development of a Scientific Fact (1979), which brought attention to the
importance of "thought style" produced by "thought collectives." He also
wrote about the "disciplined shared mood of scientific thought" (144).
5. Social scientists use the term "cultural scripts" to refer to widely available
notions that individuals draw on to make sense of reality. On "scripts" in
higher education, I draw on the work of JohnMeyer and his associates (2006),
which emphasizes the role of individual rationalist models diffused by Western
higher education.
6. These evaluative cultures are embedded in epistemic cultures, such as
peer review, which are not simply modes of evaluating work, but also technologies
or mechanisms for producing and determining truth claims. The concept
of epistemic culture is borrowed from Knorr-Cetina (1999).
7. The literature on gender discrimination and evaluation tends to downplay
such variations to emphasize consistencies. See especially Schiebinger
(1999).
8. On learning by monitoring in organizations, see Helper, MacDuffie,
and Sabel (2000).
9. Deliberations, rather than abstract formulations, both produce and uncover
common standards of justice in real situations. More specifically, as
students of jury deliberation put it, "temporary situated recourse, common
sense, lively, and contingent determinations" of justice occur through deliberation,
as actors attempt to convince one another. See Maynard and Manzo
(1993, 174).
10. For instance, the Journal of the American Medical Association (JAMA)
has sponsored a conference on peer review every four years since 1989 to
study it and monitor its reliability.
11. This preference for face-to-face meetings speaks volumes about the
value that academics place on the role of debate in fostering fairness and reducing
bias. Deliberations contrast with more mechanistic techniques of
evaluation, such as quantitative rating, that have built-in protections against
the vagaries of connoisseurship and subjectivity. On the difference made
by quantification, see for instance Porter (1999) and Espeland and Sauder
(2007). Quantification has also been applied to anticipate and avoid insolvency
and credit failure, and to regularize trust; see Carruthers and Cohen
(2008). On the management of information and uncertainty in organizations,
see Stinchcombe (1990). Many believe that when it comes to grant peer review,
instituting rigid, technical decision-making rules of evaluation would
generate only the illusion of objectivity.
12. The concept of group style is developed by Eliasoph and Lichterman
(2003, 738): "We define group style as recurrent patterns of interaction that
arise from a group's shared assumptions about what constitutes good or adequate
participation in the group setting . . . Everyday experience makes the
concept of group style intuitively plausible. When people walk into a group
setting, they usually recognize the style in play. They know whether the setting
calls for participants to act like upstanding citizens or iconoclasts. They know
some settings call for joking irreverence, while others demand high-minded
seriousness. Settings usually sustain a group style; different settings do this
differently."
13. My thinking on this subject is influenced by recent writings on the
place of the self in evaluation and objectivity, especially the work of Daston
and Galison (2007) and Shapin (1994).
14. In science studies and economic sociology, these effects are described
as "performative effects." As Michel Callon writes (1998, 30), the economy "is
embedded not in society but in economics," because economics brings the
market into being and creates the phenomena it describes. Thus the discipline
creates the rational actor it posits. Donald MacKenzie and Yuval Millo have
refined this approach by analyzing performativity as a "stabilizing" self-fulfilling
prophecy that results from conflictual and embedded processes; see Mac-
Kenzie and Millo (2003).
15. For a critique of the classical dichotomy between the cognitive and the
social, see Longino (2002). A concern for how the social corrupts the cognitive
is typical of the institutional approach to peer review developed by Robert
K.Merton, Jonathan Cole and Stephen Cole, Harriet Zuckerman, and others--
see, for example, Cole and Cole (1981); Cole, Rubin, and Cole (1978);
and Zuckerman and Merton (1971). Others, such as Mulkay (1976), have
been concerned with the noncognitive aspects of evaluation. For their part,
Pierre Bourdieu and Bruno Latour have analyzed how criteria of evaluation
reflect social embeddedness; see Bourdieu (1988) and Latour (1987). My critique
of the literature on peer review is developed more fully in Chapters 4
and 5.
16. See Jenkins (1996), a study of social identity as a pragmatic individual
achievement that considers both group identification and social categorization.
17. Hochschild (1979).
18. See Stevens, Armstrong, and Arum (2008) for a probing analysis of the
current state of the literature on American higher education.
19. Kanter (1977) uses the concept of homophily to refer to recruiters who
"seek to reproduce themselves in their own image"; see also Rivera (2009).
Homophily often affects the candidate pool when informal networks are used
for recruitment and job searches, which results in more men being hired; see
Ibarra (1992) and Reskin and McBrier (2000). For an analysis of claims based
on arguments about cultural descent--particularly sacred properties of tradition--
see Mukerji (2007). For a measure of homophily in the panels discussed
in this book, see Guetzkow et al. (2003).
20. On the conservative bias, see Eisenhart (2002).
21. In proposing this concept, Merton drew on the Gospel according to
Matthew: "For unto everyone that hath shall be given and he shall have abundance:
but from him that hath not shall be taken away even that which he
hath"; see Merton (1968).
22. While Bourdieu (1988) suggests that the habitus of academics promote
criteria of evaluation that favor their own work due to the competitive logic
of fields, I suggest that this tendency results from their necessary cultural and
institutional embeddedness. Because he leaves very little room for identity,
Bourdieu ignores the types of pushes and pulls that I discuss here.
23. See for example Ben-David (1991); Fuchs and Turner (1986); Collins
(1994); Braxton and Hargens (1996); and Hargens (1988).
24. Galison and Stump (1996); Knorr-Cetina (1999).
25. On this topic, see Guetzkow, Lamont, and Mallard (2004) and Chapter
5.
26. Many authors have noted this. See, for instance, Brint (2002); Slaughter
and Rhoades (2004); Kirp (2003). For a theoretically sophisticated account of
the relationship between science and society, see also Jasanoff (2004).
27. Hall and Lamont (2009) is an attempt to intervene in this tug-of-war
around the question of what may define "successful societies."
28. Hargens (1988).
29. Hayagreeva,Monin, and Durand (2005).
30. Some advocate the use of citation counts as a means for measuring
quality while avoiding biases. A large literature criticizes bibliometric techniques.
For a discussion, see Feller et al. (2007).
31. Lustick (1997).
32. Feagin (1999).
33. On conditions that sustain coproduction, see Jasanoff (2004), particularly
pp. 1-12.
34. McCartney (1970).
35. Shenhav (1986).
36. Cole and Cole (1973).
37. In this, I add to the work of Daryl Chubin, Edward Hackett, and many
others. See in particular Chubin and Hackett (2003).