I was asked to review my own papers three times during 2018. Or more precisely, I was asked to review papers by other people that contain the same content as some of my most well-cited papers. The review requests didn’t come from IEEE journals but less reputed journals. However, the papers were still written in such a way that they would likely pass through the automatic plagiarism detection systems that IEEE, EDAS, and others are using. How is that possible? Here is an example of how it could look like.
Original:
Plagiarized version:
As you can see, the authors are using the same equations and images, but the sentences are slightly paraphrased and the inline math is messed up. The meanings of the sentences are the same, but the different wording might be enough to pass through a plagiarism detection system that compares the words in different documents without being able of understanding the context. (I have better examples of this than the one shown above, but I didn’t want to reveal myself as a reviewer of those papers.)
This approach to plagiarism is known as rogeting and basically means that you replace words in the original text with synonyms from a thesaurus with the purpose of fooling plagiarism detection systems. There are already online tools that can do this, often resulting unnatural sentence structures, but the advances in deep learning and natural language processing will probably help to refine these tools in the near future.
Is this an increasing problem?
This is hard to tell, but there are definitely indications in that direction. The reason might be that digital technology has made it easier to plagiarize. If you want to plagiarize a scientific paper, you don’t need to retype every word by hand. You can simply download the LaTeX code of the paper from ArXiV.org (everything that an author uploads can be downloaded by others) and simply change the author names and then hide your misconduct by rogeting.
On the other hand, plagiarism detection systems are also becoming more sophisticated over time. My point is that we should never trust these systems as being reliable because people will always find ways to fool them. The three plagiarized papers that I detected in 2018 were all submitted to less reputed journals, but they apparently had a functioning peer-review system where researchers like me could spot the similarities despite the rogeting. Unfortunately, there are plenty of predatory journals and conferences that might not have any peer-review whatsoever and will publish anything if you just pay them to do so.
Does anyone benefit from plagiarism?
I am certainly annoyed by the fact that some people have the dishonesty to steal other people’s research and pretend that it is their research. At the same time, I’m wondering if anyone really benefits from doing that? The predatory journals make money from it, but what is in it for the authors? Whenever I review the CV of someone that applies for a position in my group, I have a close look at their list of publications. If it only contains papers published in unknown journals and conferences, I treat it as if the person has no real publications. I might even regard it as more negative to have such publications in the CV than to have no publications at all! I suppose that many other professors do the same thing, and I truly hope that recruiters at companies also have the skills of evaluating publication lists. Having published in a predatory journal must be viewed as a big red flag!