Banca de QUALIFICAÇÃO: ELISEU JAYRO DE SOUZA MEDEIROS

Uma banca de QUALIFICAÇÃO de MESTRADO foi cadastrada pelo programa.
STUDENT : ELISEU JAYRO DE SOUZA MEDEIROS
DATE: 28/01/2021
TIME: 14:00
LOCAL: Google Meet
TITLE:

USING MUTUAL INFORMATION TO IDENTIFY RELEVANT REGIONS ON 16S rRNA FOR TAXONOMIC CLASSIFICATION


KEY WORDS:

Clustering method; Distance model; Metagenomics; Unclassified sequences


PAGES: 35
BIG AREA: Ciências Biológicas
AREA: Biologia Geral
SUMMARY:

Metagenomics, which is the analysis of genetic material recovered from environmental samples, has become part of the great subjects and analyses in bioinformatics, due to its high capacity to provide information for the creation of new biological concepts as well as practical applications for human health, agriculture, animals and the environment. However, one of the biggest challenges in this area of study is related to the ability to link information from sequencing readings with their taxonomic identification. In bacterial samples, the classification strategies for these microorganisms are linked to the amplification of the 16S rRNA, which are grouped by binning techniques and identified through similarity analysis with reference sequences. However, this strategy is limited to identifying only those species with representative 16S rRNA sequences deposited in these databanks, which prevent us from making a broader taxonomic characterization, since most of the bacterial diversity has yet to be known. Thus, the clustering method represents a good strategy to assist in characterizing the diversity in an environmental sample in the face of unknown organisms. Given the above, the objective of this project was to evaluate different metrics for calculating the distance matrix, cluster models and regions of the 16S rRNA, in order to find a hierarchical cluster that best reflects the taxonomic classification. For this, a representative 16S rRNA sequence of each bacterial family was selected from the RDP 16S bacteria database, totaling 350 sequences. The full alignment of these sequences was submitted to 17 metrics for distance matrix calculation, and 7 hierarchical clustering models. All hierarchical trees generated were then ranked according to the similarity score of adjusted Mutual Information (MI) between the expected and the observed clusters. The analysis was also carried out using subsets of the alignment obtained from a sliding window of sizes 100 and 200 sites with the step of 10 sites. The results showed that, for the full alignment, the hierarchical tree generated from TN93/WPGMA and K81/UPGMA the highest MI score for the levels Phylum (0.649) and Class (0.705), respectively. Moreover, the alignment subset that encompasses the V4 and C4 regions of 16S rRNA generated clusters that best corroborate with the taxonomic classification in class and phylum levels. The results presented here could improve the characterization of the bacterial diversity in environmental samples by grouping and suggesting new taxonomic groups for taxonomically unclassified sequences.


BANKING MEMBERS:
Presidente - 3063244 - TETSU SAKAMOTO
Interno - 347628 - ADRIAO DUARTE DORIA NETO
Interna - 1149647 - LUCYMARA FASSARELLA AGNEZ LIMA
Notícia cadastrada em: 27/01/2021 10:18
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2024 - UFRN - sigaa12-producao.info.ufrn.br.sigaa12-producao