Banca de DEFESA: THIAGO HENRIQUE FREIRE DE OLIVEIRA

Uma banca de DEFESA de DOUTORADO foi cadastrada pelo programa.
STUDENT : THIAGO HENRIQUE FREIRE DE OLIVEIRA
DATE: 11/01/2021
TIME: 09:00
LOCAL: Virtual Pelo Google Meet
TITLE:

Reinforcement Learning Algorithms for Multiobjective Optimization Problems


KEY WORDS:

Multiobjective reinforcement learning, Q-Learning, ε−constraint, Pareto Front, Hypervolume, Single-policy approach.


PAGES: 90
BIG AREA: Engenharias
AREA: Engenharia Elétrica
SUMMARY:

Multi-objective optimization problems depict real situations and therefore, this class of problems is extremely important. However, even though it has been studied for decades, this class of problems continues to provide challenging situations, especially by the continuing lack of effective techniques. Among all the difficulties that we can find in the optimization of multiple objectives simultaneously, whether conflicting or not, one of the main difficulties found by the algorithms and existing approaches is the need for a priori knowledge of the problem, causing a predefined importance for each of the objectives. When dealing with this class of problems through reinforcement learning, two approaches are predominant: single policy (single-policy) and multiple policies (multi-policy). Algorithms and techniques that use the first approach suffer from the need for prior knowledge of the problem, an inherent characteristic of multi-objective problems. The second approach has other difficulties, such as: limiting the set of solutions and high computational cost. Given this presented context, the work proposes two hybrid algorithms, called Q-Managed with reset and Q-Managed without reset. Both are a hybridization of the Q-learning algorithm and the ε−constraint approach, respectively belonging to reinforcement learning and multi-objective optimization. In summary, the proposed algorithms work as follows: Q- Learning is used for environment exploration, while the ε−constraint approach is used for the environment dynamic delimitation, allowing to keep intact the essence of how the algorithm Q-Learning works. This delimitation has the following purpose: to impose the learning agent can learn other solutions by blocking actions that lead to solutions already learned and without improving them, that is, solutions to which the learning agent has already converged. This blocking actions feature is performed by the figure of a manager, where it is responsible for observing everything that occurs in the environment. Regarding the difference between the proposed algorithms, basically it is the choice of whether or not to take advantage of the knowledge already acquired of the environment after a solution is considered to be learned, that is, the learning agent has converged to a particular solution. As a way of testing the effectiveness of Q-Managed two versions, traditional benchmarks were used, which were also adopted in other works, thus allowing a fairer comparison. Thus, two comparative approaches were adopted, the first of which was through the implementation of third-party algorithms for direct comparison, while the second was done through a common metric to everyone who used the same benchmarks. In all possible tests, the algorithms proposed here proved to be effective, always finding the entire Pareto Front.


BANKING MEMBERS:
Presidente - 347628 - ADRIAO DUARTE DORIA NETO
Interno - 1837240 - MARCELO AUGUSTO COSTA FERNANDES
Externo ao Programa - 1669545 - DANIEL SABINO AMORIM DE ARAUJO
Externo à Instituição - ALUIZIO FAUSTO RIBEIRO ARAÚJO - UFPE
Externo à Instituição - FRANCISCO CHAGAS DE LIMA JUNIOR - UERN
Externo à Instituição - JORGE DANTAS DE MELO - UFRN
Notícia cadastrada em: 05/12/2020 11:14
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2024 - UFRN - sigaa12-producao.info.ufrn.br.sigaa12-producao