There is already a lot covering peer review in this book, and I have placed this chapter last not because it is the least significant, potentially it is the most significant, but because I think that it is important that you appreciate exactly what peer review is, and experience it from both sides, before you begin to consider the problems with the peer review system.
At the heart of the problems with peer review is that individual humans are themselves biased. Because peer review relies on a small number of individuals providing their assessment of a manuscript, it is quite likely that these biases might align, and that the manuscript is rejected along those lines, rather than being considered along purely objective lines. This likelihood of aligned prejudices comes about because the pool of people that conduct peer review in biological sciences, and in many other disciplines, is mostly white, western (i.e. Europe and North America) and male. These people hold a very similar cultural set of biases.
Some people have argued that peer review is untested and that the effects are uncertain (Jefferson et al., 2002). Perhaps more worryingly, studies designed to test peer review (by deliberately sending out manuscripts with errors) have shown that most reviewers are unable to find all errors and some find none (Rothwell & Martyn, 2000).
For example, if peer review was effective, then reviews of grant applications should closely align with the productivity of grants given. Fang et al. (2016) found that percentile scores awarded by peer review of NIH grant applications were poor at predicting the the productivity of >100 000 grants awarded.
Essentially, the major problem with peer review is that it is conducted by humans, and that like humans in societies everywhere, reviewers tend to have their own set of biases. The above sections should have given you some idea about the frailties of the peer review system.
I think that the reason why we find reivew comments so harsh is usually because we put so much effort into the writing process that it feels very personal whenever we receive criticism. Indeed, I think that there might be a correlation between how much effort you put in and how harsh the reviewers’ comments seem. Another study suggests that authors consider the competence of their reviewers to be closely aligned to the editorial decision (Drvenica et al., 2019). Just be aware that this is normal. Remember that the reviewers are humans, and they have sat down and given freely of their own time to read your work. The most important thing to be aware of is that all they had was what you had written. No background information, and possibly no information about the species or the system involved. They will be experts at some level, but perhaps not the type you might expect. Importantly, the editor asked them because they thought that their opinion would be of importance in helping them make their decision on your paper. This means that you also need to respect their opinion and comments, even if you don’t agree with them or find them to be offensive, arrogant or even rude. Remember also that some apparent rudeness may just be a reviewer who has a sense of humour that you don’t understand. There are lots of examples of this at ShitMyReviewersSay. So no matter what you think of each comment, you should respond to it in a professional and courteous manner that shows that you are a professional scientist.
Why do scientists make disparaging or unprofessional remarks to their colleagues in peer review? Whenever two or three scientists get together, you hear tales of recent woes associated with peer review. The retelling of such stories is all part of the collective, cathartic unburdening of what can be a traumatic experience especially when we put so much effort into each piece of work (see Hyland & Jiang, 2020). Reading through a lot of these reviewers’ comments, I can see that there is an attempt at humour. This humour is not appreciated by those who receive the reviews. Perhaps I understand the humour, because I also come from that same culture that dominates STEM, but that is not understood or even recognised as humour by others. Writing humorous reviews is unprofessional, especially if it is used to accentuate negative aspects. Needless to say, we could all do without unprofessional reviews.
I can’t pretend to know the answer for all of the cases, but I can speak from personal experience. Time is at a premium, and time spent reading and reviewing manuscripts tends to be quality time - best when it is quiet and uninterrupted. If these manuscripts are not of a quality that will pass peer review (i.e. will be rejected), then this feels like an abuse of professional time - especially when editors should have spotted the same mistake in their first reading. Editors that fail to see manuscripts that should be rejected do the reviewers a dis-service by increasing the amount of work for everyone (more people and more time is involved). Resentment and frustration may follow on the part of reviewers that manifests itself in the form of ad hominem attacks.
Another source of abuse in peer review appears to be the recycling of abuse received. Just as those who are bullied at work are more likely to perpetrate bullying on others, there is also an abuse cycle in peer review from those who have received abusive comments in the past. Perhaps because criticism of our writing feels so personal, continually receiving abusive comments can result in the abused author becoming an abusive reviewer. When peer review is anonymous, it provides abusers with a platform from which to give back some of the pain that they have received in the past. Victim-offender abuse cycles are a human trait that we should all be aware of in our professional, as well as our personal, lives.
One of the shocking results of a very large study of peer review of PLOS ONE articles is the large number of comments that are written directly attacking the authors as a group or personally (i.e. ad hominem attacks, see Eve et al., 2021). This should not happen. Reviewers should confine their objective comments to the work and its presentation. However, this is an aspect of peer review where authors (especially the corresponding and leading authors) will need to acquire a thick skin, because unprofessional comments are made to people across gender and racial groupings, but especially toward traditionally underrepresented groups (Silbiger & Stubler, 2019). Sadly, these same groups feel that such comments disproportionately impact their productivity and career advancement (Silbiger & Stubler, 2019). Reading comments that are sent to other authors can be cathartic as these allow you to see that everyone receives such negative comments. ShitMyReviewers is a good source of these, or see Eve et al. (2021), or Silbiger and Stubler (2019). When ad hominem attacks are made, it would be good if editors openly and explicitly identified these as bad reviewer behaviour. It would certainly improve the understanding of authors if editors intervened when such ad hominem attacks are made. This would not necessarily involve deleting these comments, but directing authors to ignore the same.
Many academics are quick to point out the problems of exaggerated negative reviews. Clearly, these have a toll on those who receive them who get very upset. But the opposite problem also occurs, and while it might not be at all upsetting (indeed it is often very flattering), uncritical reviews (also known as sweetheart reviews) are also problematic.
In general, sweetheart reviews are very easy to spot for editors. They are overtly flattering often leading with ad hominem praise and little critique of the contents. Like the upsetting reviews, editors are the arbitrators of sweetheart reviews, and can either choose to solicit another review or ignore their contents. Once again, it would be good if editors openly and explicitly identified these reviews as bad reviewer behaviour.
I have received such reviews. It is flattering, but unhelpful. I can’t say that I’d rather have a review that attacks me, but when a reviewer does not provide constructive criticism it does not improve the work in the way that a critical review might. In this way, I’d rather have a more critical review that improves the manuscript.
Sweetheart reviews can be the source of particular problems when editors conspire with authors. There are numerous examples of such practices. In some journals (like PNAS), editors promote articles for submission and then solicit reviews. This can, and has, been used to promote some work that would not have met with acceptance in other publications (Fainra & Gibbons, 2022). Similarly authors can conspire before submitting a manuscript with potential and influential reviewers (see Fainra & Gibbons, 2022). Lastly, authors have been known to write their own reviews. This includes authors who suggest reviewers but provide fake email addresses that they register to themselves but that look like those of well known scientists (Brainard & You, 2018).
Attempts to subvert the course of peer review in an overly positive way might be more pernicious in the literature than those that are clearly damning. Phenomena such as sweetheart reviews might explain why increasing numbers of published papers are later retracted.
Although Table 29.1 shows that many kinds of bias have been explicitly demonstrated, that’s certainly not their limit. Given that over 280 biases have already been catalogued (I encourage you to look through the online catalogue), many more different types of bias are likely to exist in peer review. Let’s not forget that our biases have evolved because they are very useful. They exist as a way of shortcutting exhaustive decision making based on random variables. But maybe peer review needs some more of this. And perhaps that means that I should be tolerant when I’m asked to review an economics journal, as these folk clearly weren’t exhibiting any biases associated with economists when they picked me (see section on editors).
|Bias for which there is evidence||Study demonstrating bias|
|Against female authors||Tregenza (2002); Manlove & Belou (2018); Fox & Paine (2019); Budden et al. (2008); Morgan, Hawkins & Lundine (2018); Hagan et al. (2020)|
|Against female reviewers||Helmer et al. (2017); Fox et al. (2019)|
|Towards author reputation, favouring acceptance of manuscripts despite poor reviews||Bravo et al. (2018); Okike et al. (2016)|
|Towards authors from more prestigious institutions, also called prestige bias||Ceci & Peters (1982); Travis & Collins (1991); Garfunkel et al. (1994); Tomkins, Zhang & Heavlin (2017); Manlove & Belou (2018) ; Lee et al. (2013)|
|Nationality and language bias||Song et al. (2000); Lee et al. (2013); Manlove & Belou (2018); Nuñez & Amano (2021); Link (1998)|
|Confirmation bias (the tendency for journals and reviewers to favour significant results)||Mahoney (1977); Fanelli (2010); Fanelli (2012); see Part I|
|Publication bias (the literature contains a bias in published results)||Jennions & Møller (2002); Munafò, Matheson & Flint (2007); Van Dongen (2011); Franco, Malhotra & Simonovits (2014); Fanelli, Costas & Ioannidis (2017); Sánchez-Tójar et al. (2018); see Part IV|
Perhaps the biggest problem facing those who wish to reform the peer review system is that it all starts with editors who are choosing reviewers. Those editors themselves have their own inherent biases. When they look for reviewers, they are likely to sample from within their own group of peers who have the same biases. Interestingly, bias (in general) is more easily perceived by early career scientists (Zvereva & Kozlov, 2021). My experience is that soliciting reviews from people that I don’t know and have no connection with (are outside of my field) are more likely to fail - they will say no, or they won’t reply to the request (see Perry et al., 2012). This is even for academics that are publishing within the same area.
Editors are the people who select reviewers, and inspection of most editorial boards will reveal that they reflect the same biases found in peer review. That is, editorial boards are mostly made up of white men from Europe and North America. Is it no surprise then that ~20% of all reviewers conduct 69 to 95% of reviews (Kovanis et al., 2016). Rectifying this bias will take time and the acknowledgement that there is a problem together with the willingness to do something about it. In 2020, I have seen that there has been a big movement to redress the imbalance in science at all levels. I hope that this will continue into the future so that at least some of the biases in peer review will fall away.
One common discussion point among associate editors is the seemingly increasing difficulty to find reviewers to conduct reviews. In order to get two or three people to agree to conduct a review, it used to be that you would need to write to five or six people. These days I, and many others Perry et al. (2012), find themselves inviting 15 plus people and not getting even two people to agree. What is happening?
We know that the number of journals and articles published in Biological Sciences and elsewhere has increased over the years, and that this places an increasing burden on peer reviewers. But surely there are more potential peer reviewers entering the system all the time? Does the demand for peer review really exceed the capacity of researchers capable of conducting it?
Recent analysis of demand for peer review versus the capacity of peer reviewers suggests that demand is always lower than supply (Kovanis et al., 2016). This is assuming that reviewers are pulled from a pool of authors, with different scenarios including those that are paired down to first and last authors only. The real problem it seems is that review invitations are not sent out evenly across all potential reviewers. Instead, a small group of reviewers are being hit on multiple times to conduct ever increasing numbers of reviews. The figures are that 20% of potential reviewers are conducting 69 to 94% of all peer reviews, depending on the scenario. With the increasing demand for peer review, it is no wonder then that these ‘super-reviewers’ are fatiguing and getting tired of the ever increasing requests that come into their inboxes.
In the last five years, many editorial management systems have instigated novel systems for suggesting potential reviewers to editors. These algorithms are driven, in part, by these individuals having conducted peer review on the system in the past. If the problem is that ‘super-reviewers’ are burdened with the lions share of reviews, then these algorithms are likely to increase their burden, skewing the system even further.
Some Gold OA publishers (particularly Frontiers In and MDPI) is that I receive so many inappropriate requests for reviews. These come from journals and subject areas where I have no experience (although somewhere I guess that I must have used a keyword that triggered them). In correspondence on this matter, I was assured that part of the exorbitant APC that Frontiers In demands is put into their new AI system that is helping them save time and find new reviewers (a fact confirmed by their website). I was told that it really wasn’t possible that I’d receive review requests from areas that were not my speciality because every request was double checked before being sent out. It was very appropriate then that within 24 hours of the meeting, I received request from Frontiers In Psychology and another from Frontiers In Public Health. I flagged these with Frontiers In, and have been told (very nicely) to ‘unsubscribe’ (who knew that I was subscribed?). I would contend that the evidence is that the AI bots at Frontiers In are not as good as they think they are, and that there are probably lots more researchers who receive inappropriate and unsolicited review requests from their ‘pest bots’. When you multiply this across the many new and old publishers, it’s really no wonder that potential reviewers are no longer responding to requests.
There are many researchers out there who are willing to conduct peer review and who simply don’t get asked. This means that editors need to send more requests to authors who are not the laboratory PI (i.e. a familiar or known name). Similarly, the ‘super-reviewers’ that are ignoring requests need to learn how to quickly shift their burden onto researchers in less demand. I understand that this effectively puts them into a quasi-editorial role, but spreading the load also takes specialist knowledge of people.
Another revelation from the analysis by Kovanis et al. (2016), is that the geographic distribution of ‘super-reviewers’ is heavily skewed to the USA, just as the editors themselves have a similar geographic bias. Generating a more balanced peer review system is going to need more editors outside the USA (particularly from China), but I would contend that it also needs potential reviewers to be more findable. Perhaps the bots are in their infancy and one day they’ll
If you are an editor and you receive three reviews from three researchers each suggesting something different, I have argued (in another section) that the editor should make their own decision on what action to take. But what if one of the reviewers is very negative and is a leader in their field? Should their review count equally with the others? Should their opinion be given more weight than the others? Of course, they could be using their position to influence their field, to make sure that opinions they hold are reinforced. Lee et al. (2013) provide a good overview of the potential way in which influential reviewers could bias the peer review system. But the power sits with the editor to make this decision. Interestingly, Thurner and Hanel (2011) make the point using an agent based model (much as you might use in biological sciences) to show that only a small number of biased (for whatever reason) reviewers are needed to seriously degrade the quality of peer review, and thus the science system as a whole.
The truth is that all reviews are not equal because some reviewers will put in more effort than others. Some will know the literature better. Some will be experts in the field that should be better placed to comment. These people are actually more likely to be less senior, PhD students, post docs or early career researchers. However, the importance for the editor is not to take account of the names of these people, their rank, their institution, or other demographics such as their gender, race or nationality. There are great editors out there who can do this, but my impression is that the majority fail. In this case, the only way to do this is by the triple blind method. Here the editors will invite the reviewers (by name) but the reviews that result will not be marked with the reviewers’ names. This will make forgetting who they are easier, especially for busy editors.
A good editor will look at the reasoning in the reviews and make a decision in an unbiased way. A poor editor may be swayed by the perceived influence of an important reviewer irrespective of their argument. An increasing trend that I’ve noticed is that editors will simply take a decision that follows the consensus of all reviewers: that is, they rate all reviewers equally (see also Rothwell & Martyn, 2000). However, I would argue that this is also bad editing. Irrespective of the bias from reviewers, guarding the integrity of the process of peer review lies with editors.
Today, editors are so busy with the other duties in their jobs as academics that their decisions are hurried and expecting them to take the time and space to overcome their personal biases might be a lot to ask. Instead, I think that it is time for the Open Evaluation concept to move into the mainstream so that everyone can see how editors came by their decision and were not led by potential biases of their reviewers, and instead be swayed by the quality of the review and their own reading of the manuscript. This is especially important for rejected manuscripts, which is why we need the effort of this peer review recorded on preprint sites - such as happens in overlay journals.
Another important problem with peer review comes when editors are not independent of authors. This can happen when an editor is known well by the authors. They could be in the same department or even in the same research group. Similarly, there could be a group of editors for different journals that have some quid pro quo arrangement, that might even be unstated, whereby their manuscripts do not undergo equal scrutiny to other manuscripts that are submitted. One could argue that whenever editors know the names of the authors, there is a conflict of interest that should be declared or the possibility for the system to be corrupted.
Despite all of the problems with peer review that are acknowledged above, we stick with it as the majority system in science. It could be that peer review favours exactly the same people who uphold the system and prevent it from moving into something more transparent, equal, just and fair. These are the editors and reviewers who have, for the most part, managed to make their careers inside the system, and have therefore mastered it to some degree.
To you, dear reader, I can only suggest that you be aware of all the potential pitfalls with peer review, and never stop striving for something better.
Fixing peer review will rest with the community of biological scientists, at the level of the gatekeepers: editors and the scholarly societies that they represent. To me, it is clear that we won’t fix peer review by asking our peers to be less biased, or by asking them to be more rational. We should know by now that we can’t fix people in this way. For example, Khoo (2018) found that there was little improvement in reviews after reviewer training courses, even when these included feedback on previous reviews submitted. Instead, we have to plot a course for peer review whereby we accept that reviews will contain bias and irrational content, and train those in editorial positions to try to spot these, instead of falling victim to them.
There are lots of ways by which editorial oversight can be improved. My intuition is that the crux is to find a way that makes it more efficient and objective for the editors. For example, to try to pin down reviewers on where they find fault and exactly what that fault is. There is a difference between:
- insufficient information to decide whether the experimental design was faulty and needing this clarified before a decision can be made
- finding a fundamental error in the experimental design such that the manuscript can be rejected
- insufficient power (in replicates or sampling) to reach the conclusion generated in the manuscript
A manuscript having each of these outcomes should have different fates: The first is Reject & Resubmit, the second is Reject, while the third may warrant either major/minor revisions (depending what else is problematic), or movement to another journal.
However, because peer reviewers represent a minority, this means that at times prejudices and biases will align, a more inclusive world might mean that they diverge, prompting more differences in opinions about what should happen to manuscripts. Given that it is already quite difficult to find enough reviewers, simply asking more reviewers won’t fix this. Instead, we need ways in which editors can more easily come to decisions on manuscripts taking into consideration the potential faults. This really entails journals being more transparent about what flaws in manuscripts will be considered fatal. For journals where methodological competency is all that’s required (see here for commiment to publish), this is simple, but for many more journals (particularly those that are important for Early Career Researchers because advancement in their careers will depend on publishing there), these will be more ill defined, editor-centric choices that are more about fashion in Biological Sciences, than good science per se. Removing the systematic biases is important, and something that is well worth fixing.
Hence, fixing peer review comes back to fixing problems associated with the publishing culture (and all that that entails), rather than any a simple fix-all for the myriad of existing publishing options. Preprints combined with Overlay journals offer one solution that keeps reviews with the original submission. This stops that practice of authors resubmitting their manuscripts over and over to countless journals until a pair of reviewers fail to spot a fundamental flaw. Keeping these manuscripts with their reviews on biorXiv (or any other preprint server) represents one solution, together with another problem that good articles with biased reviews still need to overcome. I feel that this is more likely to come right when authors resubmit to another Overlay journal with a valid rebuttal.
Lastly, but perhaps most importantly, we have to be more realistic about the limitations of peer review. We would be better to think of peer review as a ‘silver standard’ - something that some scientists agree has merit. Our problems come because our expectations of peer review are too great - a ‘gold standard’ it is not. An invaluable filter that improves the quality of manuscripts through a spirit of professional camaraderie - it is (for the most part). We can keep the ‘gold standard’ for those papers that have withstood the test of time, the repeatability of the community, and acceptance into the mainstream.