Title Teksto autorystės modeliavimas ir identifikavimas
Translation of Title Text authorship modeling and identification.
Authors Tinteris, Daumantas
Full Text Download
Pages 52
Keywords [eng] Authorship identification ; artificial intelligence methods ; n-grams ; analytical research review ; Lithuanian texts.
Abstract [eng] The final master's thesis deals with the topic of authorship of language texts. The deep learning networks chosen for the authorship of English language texts are the Multilayer Perceptron (MLP), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) and autoencoders. The study compares the selected methods with other machine learning methods: support vector machine (SVM), k-nearest neighbours algorithm (CNN) and Bayesian probabilistic classifier (Bayes). The data used are Lithuanian language texts - 147 parliamentary speeches with a total number of more than 110,000. The n-gram model was chosen for the metrics. The highest accuracy obtained in the study was 74%. Based on the results, conclusions and recommendations are presented. The paper consists of: introduction, text authorship identification, analysis of artificial intelligence methods for text authorship identification, results of the experimental study, conclusions, recommendations and reference list. Thesis consists of 46 p. text without appendixes, 12 pictures, 5 tables, 37 bibliographical entries. Appendixes are included separately.
Dissertation Institution Vilniaus Gedimino technikos universitetas.
Type Master thesis
Language Lithuanian
Publication date 2022