Generative Artificial Intelligence: Models, Benefits, Dangers and Detection of AI-Generated Text on Specialized Domains
Ιωάννης Ν. Μήτρου
Παρασκευή 22/3/2024, 12:00
https://uoa.webex.com/uoa/j.php?MTID=m3b237012ce5fcb1d257eafc606ff1f14
Περίληψη
Artificial Intelligence, and more specifically, Machine Learning, is undergoing a rapid and unprecedented development nowadays. At the center of Machine Learning, the fastest growing field of science that has been dominating public discourse with almost innumerable applications is Generative Artificial Intelligence. From art and text generation to speech synthesis, Generative AI has become extremely popular extremely quickly.
The thesis delves first into Generative Artificial Intelligence and its applications. After defining what Generative AI is, it is classified into the most prominent categories based on input and output type and the most commonly used models that are used to implement them are evaluated. Furthermore, emphasis is placed on the risks and dangers that
this emerging technology entails.
In the sequel and what is the focus of this thesis, a model to distinguish real from AI-generated essays is designed and evaluated. Initially, a comprehensive review of the State of the Art in AI-text detection is conducted and analyzed. While popular AI-detectors
demonstrate satisfactory results when ChatGPT-3.5 is used, inconsistencies arise when ChatGPT-4 is used or when the text is formal.
In order to substantially increase the accuracy and make pattern detection easier, a customized model can be built with a specific dataset. To validate the hypothesis, we use a specialized dataset from a Kaggle competition. The model uses Byte Pair Encoding for tokenization and TF-IDF for vectorization, as well as an ensemble classifier with sub-classifiers for classification. After evaluating the results and performance of the model, a scenario where no real essays are provided is examined. In that scenario, it is an anomaly detection problem, instead of binary classification and a one class SVM model is trained,
which outperforms generic AI text detectors, particularly within the confines of a highly specific dataset.
Εξεταστές: Μ. Κουμπαράκης, Σ. Χατζηευθυμιάδης, Π. Σταματόπουλος