Banca de QUALIFICAÇÃO: THAYNA NHAARA OLIVEIRA DAMASCENO

Uma banca de QUALIFICAÇÃO de MESTRADO foi cadastrada pelo programa.
DISCENTE : THAYNA NHAARA OLIVEIRA DAMASCENO
DATA : 19/06/2018
HORA: 14:00
LOCAL: BioME
TÍTULO:

 All purpose word pairing tool: Easy interaction networks for clinical data.


PALAVRAS-CHAVES:

Text Mining. Bioinformatics. Biomedical Text Mining. Graphs.


PÁGINAS: 52
GRANDE ÁREA: Ciências Biológicas
ÁREA: Biologia Geral
RESUMO:

One of the major problems encountered by researchers, in the health area or not, is in the growing volume of data on the most diverse subjects, and it is called Big Data this massive data volume. Due to the enormous amount of biological and biomedical data generated daily, one of the main barriers will be the analysis of these data. It is growing the development and use of computational tools that allow the analysis of this data through techniques such as Text Mining. Text Mining, a Data Mining strand, can be defined as a method that allows the extraction of relevant information contained in text. In order to allow a differentiated analysis of the data, whether these clinical data or not, a simple algorithm was developed, which allows the analysis of this data without the need of correlation with existing databases, nor the creation of new databases. From this algorithm, a WEB tool was developed so that anyone can access the algorithm (even without the knowledge of computational techniques) and promote the analysis of their data. The algorithm was developed in an R script, through the use of RStudio, and the tool was developed from the languages JavaScript, HTML5, CSS and PHP. The tool was named Integrate Paired Tool, and it uses Text Mining techniques to analyze data from a .csv file available by the user. The algorithm reads the .csv file and pass through it by pairing its terms two by two, regardless of whether the columns are different sizes or incomplete until all columns are paired. After all the groupings, a value is assigned to each grouped pair, adding all pairs with the same frequencies and generating another .csv file containing the existing interactions and their respective frequencies. After the relations and their appearance frequencies are formed, a graph of interactions (in R) is shown on the WEB tool screen, so the user can do their analyzes, in addition to the .csv file with all interactions and frequencies. This graph and this table can contain variable information, depending on the percentage that the user chooses in the IPT tool. This .csv file with interaction and frequency data can be used by the user in other network visualization tools, such as Gephi, for example. For the purposes of tool testing, a data from a neonatal ICU and a PubMed abstraction survey were used. The abstracts research was made by a script, available in this work. The IPT proved to work well and reached the objectives of the research, and as future goals, we will have the hosting of the tool in the page of the Program of Postgraduate in Bioformtics of UFRN, the analysis of other data and a possible integration of the pre-processing of the data of PubMed within the IPT itself.


MEMBROS DA BANCA:
Presidente - 1893445 - EUZEBIO GUIMARAES BARBOSA
Interno - 1507794 - RODRIGO JULIANI SIQUEIRA DALMOLIN
Externo ao Programa - 2432313 - RAND RANDALL MARTINS
Notícia cadastrada em: 11/06/2018 09:51
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2024 - UFRN - sigaa12-producao.info.ufrn.br.sigaa12-producao