Banca de DEFESA: THAÍS DE ALMEIDA RATIS RAMOS

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
DISCENTE : THAÍS DE ALMEIDA RATIS RAMOS
DATA : 11/05/2018
HORA: 14:00
LOCAL: BioME
TÍTULO:

Development and aplication of CORAZON: a normalization and clustering tool for genomic data


PALAVRAS-CHAVES:

Gene expression. Machine learning. Clustering.


PÁGINAS: 163
GRANDE ÁREA: Ciências Biológicas
ÁREA: Biologia Geral
RESUMO:

The creation of gene expression encyclopedias possibilities the understanding of gene groups that are co-expressed in different tissues and comprehend gene clusters according to their functions and origin. Due to the huge amount of data generated in large-scale transcriptomics projects, an intense demand to use techniques provided by artificial intelligence became widely used in bioinformatics. Unsupervised learning is the machine learning task that analyzes the data provided and tries to determine if some objects can be grouped in some way, forming clusters. We developed an online tool called CORAZON (Correlation Analyses Zipper Online), which implements three unsupervised machine learning algorithms (mean shift, k-means and hierarchical) to cluster gene expression datasets, six normalization methodologies (Fragments Per Kilobase Million (FPKM), Transcripts Per Million (TPM), Counts per million (CPM), base-2 log, normalization by the sum of the instance's values and normalization by the highest attribute value for each instance), and a strategy to observe the attributes influence, all in a friendly environment. The algorithms performances were evaluated through five models commonly used to validate clustering methodologies, each one composed by fifty randomly generated datasets. The algorithms presented accuracies ranging between 92-100%. Next, we applied our tool to cluster tissues, obtain gene’s evolutionarily knowledgement and functional insights, based on the Gene Ontology enrichment, and connect with transcription factors. To select the best number of clusters for k-means and hierarchical algorithms we used Bayesian information criterion (BIC), followed by the derivative of the discrete function and Silhouette. In the hierarchical, we adopted the Ward’s method. In total, we analyzed three databases (Uhlen, Encode and Fantom) and in relation to tissues we can observe groups related to glands, cardiac tissues, muscular tissues, tissues related to the reproductive system and in all three groups are observed with a single tissue, such as testis, brain and bone-narrow. In relation to the genes clusters, we obtained several clusters that have specificities in their functions: detection of stimulus involved in sensory perception, reproduction, synaptic signaling, nervous system, immunological system, system development, and metabolics. We also observed that clusters with more than 80% of noncodings, more than 40% of their coding genes are recents appearing in mammalian class and the minority are from eukaryota class. Otherwise, clusters with more than 90% of coding genes, have more than 40% of them appeared in eukaryota and the minority from mammalian. These results illustrate the potential of the methods in CORAZON tool, which can help in the large quantities analysis of genomic data, possibiliting the potential associations analyzes between noncoding RNAs and the biological processes of clustered together coding genes, as well as the possibility of evolutionary history study. CORAZON is freely available at http://biodados.icb.ufmg.br/corazon or http://corazon.integrativebioinformatics.me.


MEMBROS DA BANCA:
Externo à Instituição - GUSTAVO HENRIQUE ESTEVES - UEPB
Presidente - 059.501.268-07 - JOSÉ MIGUEL ORTEGA - UFMG
Interno - 1507794 - RODRIGO JULIANI SIQUEIRA DALMOLIN
Externo à Instituição - THAIS GAUDENCIO DO REGO - UFPB
Externo à Instituição - VINICIUS RAMOS HENRIQUES MARACAJA COUTINHO - UDCHI
Notícia cadastrada em: 18/04/2018 20:53
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2024 - UFRN - sigaa02-producao.info.ufrn.br.sigaa02-producao