Title Investigation of methods for tracker and ad detection on web pages
Translation of Title Reklamos ir sekiklių internetiniuose puslapiuose aptikimo metodų tyrimas.
Authors Šliachina, Margarita
Full Text Download
Pages 52
Keywords [eng] web tracking ; machine learning ; cybersecurity ; AdGraph ; AdblockPlus
Abstract [eng] This thesis explores techniques and tools for detecting trackers and advertisements on the web. The research tasks include conducting a comprehensive literature review, investigating data collection techniques, reviewing model evaluation methods, training machine learning models, evaluating their effectiveness, investigating the impact of different features, and performing a rolling forecast test. The literature review identified three approaches: URL analysis, URL and HTTP data analysis, and AdGraph. Data collection involved using proxies, crawlers, and Adblock Plus lists and constructing graphs from HTML pages for AdGraph. The model evaluation utilized traditional metrics like confusion matrix, learning curves, and ROC curves, with the addition of the rolling forecast test for long-term reliability. Results indicated that the model trained with the most optimal features exhibited superior performance, outperforming other models in accuracy and other metrics. This finding underscores the significance of identifying and utilizing the most relevant features for effective tracker and ad detection. The structure of this document: introduction, review of online trackers and advertisements, review of already implemented solutions, methods and materials, experiment and results, model improvements, and conclusions. This document consists of 50 pages of text, 22 figures, 7 tables, and 16 sources.
Dissertation Institution Vilniaus Gedimino technikos universitetas.
Type Master thesis
Language English
Publication date 2023