29 The problems with peer review

There is already a lot covering peer review in this book, and I have placed this chapter last not because it is the least significant, potentially it is the most significant, but because I think that it is important that you appreciate exactly what peer review is, and experience it from both sides, before you begin to consider the problems with the peer review system.

At the heart of the problems with peer review is that individual humans are themselves biased. Because peer review relies on a small number of individuals providing their assessment of a manuscript, it is quite likely that these biases might align, and that the manuscript is rejected along those lines, rather than being considered along purely objective lines. This likelihood of aligned prejudices comes about because the pool of people that conduct peer review in biological sciences, and in many other disciplines, is mostly white, western (i.e. Europe and North America) and male. These people hold a very similar cultural set of biases.

Some people have argued that peer review is untested and that the effects are uncertain (Jefferson et al., 2002). Perhaps more worryingly, studies designed to test peer review (by deliberately sending out manuscripts with errors) have shown that most reviewers are unable to find all errors and some find none (Rothwell & Martyn, 2000).

For example, if peer review was effective, then reviews of grant applications should closely align with the productivity of grants given. Fang et al. (2016) found that percentile scores awarded by peer review of NIH grant applications were poor at predicting the the productivity of >100 000 grants awarded.

Essentially, the major problem with peer review is that it is conducted by humans, and that like humans in societies everywhere, reviewers tend to have their own set of biases. The above sections should have given you some idea about the frailties of the peer review system.

29.1 Upsetting comments

I think that the reason why we find reivew comments so harsh is usually because we put so much effort into the writing process that it feels very personal whenever we receive criticism. Indeed, I think that there might be a correlation between how much effort you put in and how harsh the reviewers’ comments seem. Another study suggests that authors consider the competence of their reviewers to be closely aligned to the editorial decision (Drvenica et al., 2019). Just be aware that this is normal. Remember that the reviewers are humans, and they have sat down and given freely of their own time to read your work. The most important thing to be aware of is that all they had was what you had written. No background information, and possibly no information about the species or the system involved. They will be experts at some level, but perhaps not the type you might expect. Importantly, the editor asked them because they thought that their opinion would be of importance in helping them make their decision on your paper. This means that you also need to respect their opinion and comments, even if you don’t agree with them or find them to be offensive, arrogant or even rude. Remember also that some apparent rudeness may just be a reviewer who has a sense of humour that you don’t understand. There are lots of examples of this at ShitMyReviewersSay. So no matter what you think of each comment, you should respond to it in a professional and courteous manner that shows that you are a professional scientist.

Why do scientists make disparaging or unprofessional remarks to their colleagues in peer review? Whenever two or three scientists get together, you hear tales of recent woes associated with peer review. The retelling of such stories is all part of the collective, cathartic unburdening of what can be a traumatic experience especially when we put so much effort into each piece of work (see Hyland & Jiang, 2020). Reading through a lot of these reviewers’ comments, I can see that there is an attempt at humour. This humour is not appreciated by those who receive the reviews. Perhaps I understand the humour, because I also come from that same culture that dominates STEM, but that is not understood or even recognised as humour by others. Writing humorous reviews is unprofessional, especially if it is used to accentuate negative aspects. Needless to say, we could all do without unprofessional reviews.

29.1.1 Why do academics make all of these terrible comments?

I can’t pretend to know the answer for all of the cases, but I can speak from personal experience. Time is at a premium, and time spent reading and reviewing manuscripts tends to be quality time - best when it is quiet and uninterrupted. If these manuscripts are not of a quality that will pass peer review (i.e. will be rejected), then this feels like an abuse of professional time - especially when editors should have spotted the same mistake in their first reading. Editors that fail to see manuscripts that should be rejected do the reviewers a dis-service by increasing the amount of work for everyone (more people and more time is involved). Resentment and frustration may follow on the part of reviewers that manifests itself in the form of ad hominem attacks.

Another source of abuse in peer review appears to be the recycling of abuse received. Just as those who are bullied at work are more likely to perpetrate bullying on others, there is also an abuse cycle in peer review from those who have received abusive comments in the past. Perhaps because criticism of our writing feels so personal, continually receiving abusive comments can result in the abused author becoming an abusive reviewer. When peer review is anonymous, it provides abusers with a platform from which to give back some of the pain that they have received in the past. Victim-offender abuse cycles are a human trait that we should all be aware of in our professional, as well as our personal, lives.

29.1.2 Ad hominem attacks

One of the shocking results of a very large study of peer review of PLOS ONE articles is the large number of comments that are written directly attacking the authors as a group or personally (i.e. ad hominem attacks, see Eve et al., 2021). This should not happen. Reviewers should confine their objective comments to the work and its presentation. However, this is an aspect of peer review where authors (especially the corresponding and leading authors) will need to acquire a thick skin, because unprofessional comments are made to people across gender and racial groupings, but especially toward traditionally underrepresented groups (Silbiger & Stubler, 2019). Sadly, these same groups feel that such comments disproportionately impact their productivity and career advancement (Silbiger & Stubler, 2019). Reading comments that are sent to other authors can be cathartic as these allow you to see that everyone receives such negative comments. ShitMyReviewers is a good source of these, or see Eve et al. (2021), or Silbiger and Stubler (2019). When ad hominem attacks are made, it would be good if editors openly and explicitly identified these as bad reviewer behaviour. It would certainly improve the understanding of authors if editors intervened when such ad hominem attacks are made. This would not necessarily involve deleting these comments, but directing authors to ignore the same.

29.2 Sweetheart reviews

Many academics are quick to point out the problems of exaggerated negative reviews. Clearly, these have a toll on those who receive them who get very upset. But the opposite problem also occurs, and while it might not be at all upsetting (indeed it is often very flattering), uncritical reviews (also known as sweetheart reviews) are also problematic.

In general, sweetheart reviews are very easy to spot for editors. They are overtly flattering often leading with ad hominem praise and little critique of the contents. Like the upsetting reviews, editors are the arbitrators of sweetheart reviews, and can either choose to solicit another review or ignore their contents. Once again, it would be good if editors openly and explicitly identified these reviews as bad reviewer behaviour.

I have received such reviews. It is flattering, but unhelpful. I can’t say that I’d rather have a review that attacks me, but when a reviewer does not provide constructive criticism it does not improve the work in the way that a critical review might. In this way, I’d rather have a more critical review that improves the manuscript.

Sweetheart reviews can be the source of particular problems when editors conspire with authors. There are numerous examples of such practices. In some journals (like PNAS), editors promote articles for submission and then solicit reviews. This can, and has, been used to promote some work that would not have met with acceptance in other publications (Fainra & Gibbons, 2022). Similarly authors can conspire before submitting a manuscript with potential and influential reviewers (see Fainra & Gibbons, 2022). Lastly, authors have been known to write their own reviews. This includes authors who suggest reviewers but provide fake email addresses that they register to themselves but that look like those of well known scientists (Brainard & You, 2018).

Attempts to subvert the course of peer review in an overly positive way might be more pernicious in the literature than those that are clearly damning. Phenomena such as sweetheart reviews might explain why increasing numbers of published papers are later retracted.

29.3 Demonstrated biases in peer review

Although Table 29.1 shows that many kinds of bias have been explicitly demonstrated, that’s certainly not their limit. Given that over 280 biases have already been catalogued (I encourage you to look through the online catalogue), many more different types of bias are likely to exist in peer review. Let’s not forget that our biases have evolved because they are very useful. They exist as a way of shortcutting exhaustive decision making based on random variables. But maybe peer review needs some more of this. And perhaps that means that I should be tolerant when I’m asked to review an economics journal, as these folk clearly weren’t exhibiting any biases associated with economists when they picked me (see section on editors).

TABLE 29.1: There are as many biases in peer review as there are humans that conduct them. This table demonstrates some of the biases that have been proven in studies.
Bias for which there is evidence Study demonstrating bias
Against female authors Tregenza (2002); Manlove & Belou (2018); Fox et al. (2019); Budden et al. (2008); Morgan, Hawkins & Lundine (2018); Hagan et al. (2020)
Against female reviewers Helmer et al. (2017); Fox & Paine (2019)
Towards author reputation, favouring acceptance of manuscripts despite poor reviews Bravo et al. (2018); Okike et al. (2016)
Towards authors from more prestigious institutions, also called prestige bias Ceci & Peters (1982); Travis & Collins (1991); Garfunkel et al. (1994); Tomkins, Zhang & Heavlin (2017); Manlove & Belou (2018) ; Lee et al. (2013)
Nationality and language bias Song et al. (2000); Lee et al. (2013); Manlove & Belou (2018); Nunez & Amano (2021); Link (1998)
Confirmation bias (the tendency for journals and reviewers to favour significant results) Mahoney (1977); Fanelli (2010a); Fanelli (2012); see Part I
Publication bias (the literature contains a bias in published results) Jennions & Møller (2002); Munafò, Matheson & Flint (2007); Van Dongen (2011); Franco, Malhotra & Simonovits (2014); Fanelli, Costas & Ioannidis (2017); Sánchez-Tójar et al. (2018a); see Part IV

Perhaps the biggest problem facing those who wish to reform the peer review system is that it all starts with editors who are choosing reviewers. Those editors themselves have their own inherent biases. When they look for reviewers, they are likely to sample from within their own group of peers who have the same biases. Interestingly, bias (in general) is more easily perceived by early career scientists (Zvereva & Kozlov, 2021). My experience is that soliciting reviews from people that I don’t know and have no connection with (are outside of my field) are more likely to fail - they will say no, or they won’t reply to the request (see Perry et al., 2012). This is even for academics that are publishing within the same area.

Editors are the people who select reviewers, and inspection of most editorial boards will reveal that they reflect the same biases found in peer review. That is, editorial boards are mostly made up of white men from Europe and North America. Rectifying this bias will take time and the acknowledgement that there is a problem together with the willingness to do something about it. In 2020, I have seen that there has been a big movement to redress the imbalance in science at all levels. I hope that this will continue into the future so that at least some of the biases in peer review will fall away.

29.4 All reviews are not equal

If you are an editor and you receive three reviews from three researchers each suggesting something different, I have argued (in another section) that the editor should make their own decision on what action to take. But what if one of the reviewers is very negative and is a leader in their field? Should their review count equally with the others? Should their opinion be given more weight than the others? Of course, they could be using their position to influence their field, to make sure that opinions they hold are reinforced. Lee et al. (2013) provide a good overview of the potential way in which influential reviewers could bias the peer review system. But the power sits with the editor to make this decision. Interestingly, Thurner and Hanel (2011) make the point using an agent based model (much as you might use in biological sciences) to show that only a small number of biased (for whatever reason) reviewers are needed to seriously degrade the quality of peer review, and thus the science system as a whole.

The truth is that all reviews are not equal because some reviewers will put in more effort than others. Some will know the literature better. Some will be experts in the field that should be better placed to comment. These people are actually more likely to be less senior, PhD students, post docs or early career researchers. However, the importance for the editor is not to take account of the names of these people, their rank, their institution, or other demographics such as their gender, race or nationality. There are great editors out there who can do this, but my impression is that the majority fail. In this case, the only way to do this is by the triple blind method. Here the editors will invite the reviewers (by name) but the reviews that result will not be marked with the reviewers’ names. This will make forgetting who they are easier, especially for busy editors.

29.5 Decisions rest with editors

A good editor will look at the reasoning in the reviews and make a decision in an unbiased way. A poor editor may be swayed by the perceived influence of an important reviewer irrespective of their argument. An increasing trend that I’ve noticed is that editors will simply take a decision that follows the consensus of all reviewers: that is, they rate all reviewers equally (see also Rothwell & Martyn, 2000). However, I would argue that this is also bad editing. Irrespective of the bias from reviewers, guarding the integrity of the process of peer review lies with editors.

Today, editors are so busy with the other duties in their jobs as academics that their decisions are hurried and expecting them to take the time and space to overcome their personal biases might be a lot to ask. Instead, I think that it is time for the Open Evaluation concept to move into the mainstream so that everyone can see how editors came by their decision and were not led by potential biases of their reviewers, and instead be swayed by the quality of the review and their own reading of the manuscript. This is especially important for rejected manuscripts, which is why we need the effort of this peer review recorded on preprint sites - such as happens in overlay journals.

Another important problem with peer review comes when editors are not independent of authors. This can happen when an editor is known well by the authors. They could be in the same department or even in the same research group. Similarly, there could be a group of editors for different journals that have some quid pro quo arrangement, that might even be unstated, whereby their manuscripts do not undergo equal scrutiny to other manuscripts that are submitted. One could argue that whenever editors know the names of the authors, there is a conflict of interest that should be declared or the possibility for the system to be corrupted.

Despite all of the problems with peer review that are acknowledged above, we stick with it as the majority system in science. It could be that peer review favours exactly the same people who uphold the system and prevent it from moving into something more transparent, equal, just and fair. These are the editors and reviewers who have, for the most part, managed to make their careers inside the system, and have therefore mastered it to some degree.

To you, dear reader, I can only suggest that you be aware of all the potential pitfalls with peer review, and never stop striving for something better.

29.6 The social side of peer review

There is so much more to peer review than peer review. Being selected by an editor to review a manuscript represents an important standing amongst your peers. Literally it means that your opinion is valued. But there’s much more to it. Doing a good job at peer review means that you improve other people’s work. This help can be valued to the point where those colleagues get in touch and want to work with you. That this can happen has now been shown in a study, and has been termed the ‘invisible hand’ of peer review.

29.6.1 The ‘invisible hand’ of peer review

Dondio et al. (2019) found that reviewers were more likely to provide positive review comments to authors who were close (less than or equal to three steps) to their collaborative networks (see Adams, 2012). In this case, a close reviewer to the author was calculated by a social network where a distance of one meant that they had co-authored together (one step), co-authors of the reviewer may have collaborated with these authors (two steps), or co-authors of reviewers and authors had collaborated (three steps). Surprisingly, they obtained this result even though the journal practised a strict double-blind review system (reviewers didn’t know who the authors were, and vice versa). Referees that were not close (i.e. greater than or equal to four steps] were more likely to provide more negative review comments. Those who helped authors more during peer review (i.e. asked for major revisions), were more likely to cite the manuscript, once published, and eventually more likely, than random, to publish with those same authors, even if manuscripts were eventually rejected. The authors concluded therefore that peer review may accelerate the potential for collaboration in science (Dondio et al., 2019).

This finding appears to be based on the fact that peer review can/should be constructive. Authors and peer reviewers are in fact collaborating to improve the quality of a manuscript. The process is orchestrated by an editor who can and should join in to improve the manuscript. Dondio et al. (2019) make the point that this interaction is inherently social, and the peer review therefore has a function that develops relationships within and between networks of researchers.

This evidence that peer review is a collaborative system towards the betterment of science is, to me, a sign that peer review is acting as it should. However with any social network comes the fragilities of human bias. This means that while peer review may function well for some, for others it may more often than not fail. The bigger problem is that it might depend on your sex, the colour of your skin, the name of your institution, or your country as to whether you are selected as a potential reviewer (i.e. to join the club; see Table 29.1), or having submitted your manuscript, find that peer review is going to work for you (see Davies et al., 2021). In addition, if you are never asked to review then you will never benefit from this network.

Casnici et al. (2017) tracked the fate of rejected manuscripts and showed that if the reviewers had several rounds of peer review before rejection, these manuscripts benefited later by being accepted to journals with a higher Impact Factor, and/or obtaining greater numbers of citations, even if the reviewers were instrumental in rejecting the manuscript. This suggests that in working collaboratively on a manuscript, reviewers are more likely to promote, cite and help authors. The alternative is that reviewers agree to re-review an article again because they see merit in it, even if they also see flaws. And having spent considerable effort reading the manuscript, they are more likely to remember and cite it. But this doesn’t take away from the idea that reviewers and authors are collaborating in a social way.

29.7 Fixing peer review

Fixing peer review will rest with the community of biological scientists, at the level of the gatekeepers: editors and the scholarly societies that they represent. To me, it is clear that we won’t fix peer review by asking our peers to be less biased, or by asking them to be more rational. We should know by now that we can’t fix people in this way. For example, Khoo (2018) found that there was little improvement in reviews after reviewer training courses, even when these included feedback on previous reviews submitted. Instead, we have to plot a course for peer review whereby we accept that reviews will contain bias and irrational content, and train those in editorial positions to try to spot these, instead of falling victim to them.

There are lots of ways by which editorial oversight can be improved. My intuition is that the crux is to find a way that makes it more efficient and objective for the editors. For example, to try to pin down reviewers on where they find fault and exactly what that fault is. There is a difference between:

  • insufficient information to decide whether the experimental design was faulty and needing this clarified before a decision can be made
  • finding a fundamental error in the experimental design such that the manuscript can be rejected
  • insufficient power (in replicates or sampling) to reach the conclusion generated in the manuscript

A manuscript having each of these outcomes should have different fates: The first is Reject & Resubmit, the second is Reject, while the third may warrant either major/minor revisions (depending what else is problematic), or movement to another journal.

However, because peer reviewers represent a minority, this means that at times prejudices and biases will align, a more inclusive world might mean that they diverge, prompting more differences in opinions about what should happen to manuscripts. Given that it is already quite difficult to find enough reviewers, simply asking more reviewers won’t fix this. Instead, we need ways in which editors can more easily come to decisions on manuscripts taking into consideration the potential faults. This really entails journals being more transparent about what flaws in manuscripts will be considered fatal. For journals where methodological competency is all that’s required (see here for commiment to publish), this is simple, but for many more journals (particularly those that are important for Early Career Researchers because advancement in their careers will depend on publishing there), these will be more ill defined, editor-centric choices that are more about fashion in Biological Sciences, than good science per se. Removing the systematic biases is important, and something that is well worth fixing.

Hence, fixing peer review comes back to fixing problems associated with the publishing culture (and all that that entails), rather than any a simple fix-all for the myriad of existing publishing options. Preprints combined with Overlay journals offer one solution that keeps reviews with the original submission. This stops that practice of authors resubmitting their manuscripts over and over to countless journals until a pair of reviewers fail to spot a fundamental flaw. Keeping these manuscripts with their reviews on biorXiv (or any other preprint server) represents one solution, together with another problem that good articles with biased reviews still need to overcome. I feel that this is more likely to come right when authors resubmit to another Overlay journal with a valid rebuttal.

29.7.1 Peer review is here to stay

Lastly, but perhaps most importantly, we have to be more realistic about the limitations of peer review. We would be better to think of peer review as a ‘silver standard’ - something that some scientists agree has merit. Our problems come because our expectations of peer review are too great - a ‘gold standard’ it is not. An invaluable filter that improves the quality of manuscripts through a spirit of professional camaraderie - it is (for the most part). We can keep the ‘gold standard’ for those papers that have withstood the test of time, the repeatability of the community, and acceptance into the mainstream.