GLTR
GLTR is an AI tool developed by the MIT-IBM Watson AI lab and HarvardNLP that utilizes forensic analysis to detect automatically generated text. It specifically focuses on identifying texts that have been artificially generated by analyzing the likelihood of a language model producing the text.
By visually analyzing the output of the GPT-2 117M language model from OpenAI, GLTR is able to rank each word based on its probability of being generated by the model. This ranking is represented through color highlighting, with the most likely words shown in green, followed by yellow and red, while the remaining words are displayed in purple.
This visual indication allows for easy identification of computer-generated text. Additionally, GLTR provides three histograms that aggregate information over the entire text. The first histogram displays the frequency of words in each category, the second illustrates the ratio between the probabilities of the top predicted word and the subsequent word, and the third shows the distribution of predictions’ entropies. These histograms provide further evidence to determine whether a text has been artificially generated.
GLTR is particularly useful in detecting fake reviews, comments, or news articles that are generated by large language models, which can produce text that is nearly indistinguishable from human-written content to non-expert readers. The tool is accessible through a live demo and its source code is available on Github. Researchers can also refer to the ACL 2019 demo track paper, which was nominated for best demo.