Banca de DEFESA: YURI THOMAS PINHEIRO NUNES

Uma banca de DEFESA de DOUTORADO foi cadastrada pelo programa.
STUDENT : YURI THOMAS PINHEIRO NUNES
DATE: 16/04/2024
TIME: 15:00
LOCAL: Sala Virtual do Meets: meet.google.com/gjx-jxbd-atk
TITLE:

Concept Drift Detection Heuristic Based on TEDA


KEY WORDS:

Classification on Data Stream, Concept Drift Detector, Data Stream, Unsupervised Learning, TEDA


PAGES: 50
BIG AREA: Ciências Exatas e da Terra
AREA: Ciência da Computação
SUBÁREA: Matemática da Computação
SPECIALTY: Modelos Analíticos e de Simulação
SUMMARY:

The enormous amount of machine learning applications and data produced presents several challenges today. In many contexts, data may have temporal relevance, for example, seasonality and trends, resulting in non-stationary behavior. This characteristic present in several systems makes it difficult to apply machine learning models, which, in general, assume that the data is stationary. In this scenario, data sources can be considered as data streams: ordered and unlimited sources of non-stationary data. These sources feed machine learning applications unreliably because they violate stationarity. When the data stream presents a significant variation that could lead to performance degradation, it is said that a concept drift has occurred. A data stream that presents concept drift is considered to represent an evolutionary system (evolving system). A system that evolves, presenting changes in its internal concepts, for example, emergence of new concepts, extinction of concepts, division and fusion of concepts, etc. In this context, machine learning techniques must be adapted to the context of data streams. An example would be a classifier for data stream samples (data stream classifier). This type of model needs to consider real-time retraining, robustness to non-stationarity, data unavailability, and limited data set, among others. To implement these different characteristics, it is essential to use concept drift (CDD) detectors. CDDs are not models capable of identifying when one or more concepts in the data stream have changed significantly. The literature is rich in works on concept drift detection distributed into three groups: supervised, semi-supervised, and unsupervised. Supervised methods have access to the true classes of data stream samples at the time of detection, while semi-supervised methods have limited access. Semi-supervised methods can have access to the true classes during training, during offline steps, or even to a subset of samples at the time of detection. Unsupervised methods do not access the true classes of the samples, being theoretically more limited than other approaches. However, unsupervised methods allow for shorter detection delays in real applications, as it is reasonable not to have access to the true class at the time of detection. Examples of unsupervised methods are ADWIN, KSWIN, and PageHinkley. This work presents a new concept drift detection method, TEDA-CDD. This detector is composed of two models to represent concepts based on TEDA: the reference model and the evolutionary model. The reference model represents the concept known to the machine learning model, while the reference model is free to adapt to any new model that emerges from the data stream. The models are compared heuristically using the Jaccard index to indicate similarity. When the index indicates low similarity between the models, the detector indicates a concept drift. To compare the proposed method with other methods present in the literature, initially, a realistic approach for data stream classifiers is proposed. This approach makes it possible to apply several classifiers and detectors to the data stream classification task and estimate performance metrics specific to the data streams context. In the experiments, the proposed method is compared to other methods present in the literature using synthetic and real benchmarks. The proposed method has comparable performance in terms of accuracy compared to methods consolidated in the literature, while it is the most efficient in terms of memory consumption.


COMMITTEE MEMBERS:
Presidente - 1153006 - LUIZ AFFONSO HENDERSON GUEDES DE OLIVEIRA
Interno - 2885532 - IVANOVITCH MEDEIROS DANTAS DA SILVA
Interno - 1837240 - MARCELO AUGUSTO COSTA FERNANDES
Externo à Instituição - IGNACIO SANCHEZ GENDRIZ
Externo à Instituição - JUAN MOISES MAURICIO VILLANUEVA - UFPB
Notícia cadastrada em: 23/03/2024 11:21
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2024 - UFRN - sigaa12-producao.info.ufrn.br.sigaa12-producao