Banca de QUALIFICAÇÃO: THAÍS DE ALMEIDA RATIS RAMOS

Uma banca de QUALIFICAÇÃO de MESTRADO foi cadastrada pelo programa.
DISCENTE : THAÍS DE ALMEIDA RATIS RAMOS
DATA : 04/04/2018
HORA: 14:00
LOCAL: BioME
TÍTULO:

Development and aplication of CORAZON, a normalization and clustering tool for genomic data, to investigate transcripts functionally and evolutionarily


PALAVRAS-CHAVES:

Gene expression. Machine learning. Clustering. CORAZON.


PÁGINAS: 172
GRANDE ÁREA: Ciências Biológicas
ÁREA: Biologia Geral
RESUMO:

The creation of gene expression encyclopedias possibilities the understanding of gene groups that  are  co-expressed  in  different  tissues  and  comprehend  gene  clusters  according  to  their functions and origin. Due to the huge amount of data generated in large-scale transcriptomics projects,  an  intense  demand  to  use  techniques  provided  by  artificial  intelligence  became widely  used  in  bioinformatics.  Unsupervised  learning  is  the  machine  learning  task  that analyzes the data provided and tries to determine if some objects can be grouped in some way, forming  clusters.  We  developed  an  online  tool  called  CORAZON  (Correlation  Analyses Zipper  Online),  which  implements  three  unsupervised  machine  learning  algorithms  (mean shift,  k-means  and  hierarchical)  to  cluster  gene  expression  datasets,  six  normalization methodologies  (Fragments  Per  Kilobase  Million  (FPKM),  Transcripts  Per  Million  (TPM), Counts per million (CPM), base-2 log, normalization by the sum of the instance's values and normalization by the highest attribute value for each instance), and a strategy to observe the attributes  influence,  all  in  a  friendly  environment.  The  algorithms  performances  were evaluated through five models commonly used to validate clustering methodologies, each one composed by fifty randomly  generated datasets. The algorithms presented accuracies ranging between 92-100%. Next, we applied our tool to cluster tissues, obtain gene’s evolutionarily knowledgement and functional insights, based on the Gene Ontology enrichment, and connect with transcription factors. To select the best number of clusters for k-means and hierarchical algorithms we used Bayesian information criterion (BIC), followed by the derivative of the discrete function and Silhouette. In the hierarchical, we adopted the Ward’s method. In total, we analyzed three databases (Uhlen, Encode and Fantom) and in relation to tissues we can observe  groups  related  to  glands,  cardiac  tissues,  muscular  tissues,  tissues  related  to  the reproductive system and in all three groups are observed with a single tissue, such as testis, brain and bone-narrow. In relation to the genes clusters, we obtained several clusters that have specificities  in  their  functions:  detection  of  stimulus  involved  in  sensory  perception, reproduction,  synaptic  signaling,  nervous  system,  immunological  system,  system development,  and  metabolics.  We  also  observed  that  clusters  with  more  than  80%  of noncodings, more than 40% of their coding genes are recents appearing in mammalian class and the  minority  are  from  eukaryota  class.  Otherwise,  clusters  with  more  than  90%  of  coding genes, have more than 40% of them appeared in eukaryota and the minority from mammalian. These results illustrate the potential of the methods in CORAZON tool, which can help in the large  quantities  analysis  of  genomic  data,  possibiliting  the  potential  associations  analyzes between non-coding RNAs and the biological processes of clustered together coding genes, as well  as  the  possibility  of  evolutionary  history  study.  CORAZON  is  freely  available  at http://biodados.icb.ufmg.br/corazon or http://corazon.integrativebioinformatics.me.


MEMBROS DA BANCA:
Interno - 1513597 - JOAO PAULO MATOS SANTOS LIMA
Presidente - 059.501.268-07 - JOSÉ MIGUEL ORTEGA - UFMG
Externo à Instituição - THAIS GAUDENCIO DO REGO - UFPB
Externo ao Programa - 052.739.204-93 - VINICIUS RAMOS HENRIQUES MARACAJA COUTINHO - USP
Notícia cadastrada em: 09/03/2018 09:06
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2024 - UFRN - sigaa11-producao.info.ufrn.br.sigaa11-producao