About: Optimisation des chaînes de production dans l'industrie sidérurgique, une approche statistique de l'apprentissage par renforcement

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Optimisation des chaînes de production dans l'industrie sidérurgique, une approche statistique de l'apprentissage par renforcement Goto Sponge NotDistinct Permalink

An Entity of Type : rdac:C10001, within Data Space : data.idref.fr associated with source document(s)

Attributes	Values
type	frbr:Work rdac:C10001
Thesis advisor	École doctorale IAEM Lorraine - Informatique, Automatique, Électronique - Électrotechnique, Mathématiques de Lorraine (1992-....) LMAM - Laboratoire de Mathémathiques et Applications de Metz - UMR 7122 (....-2012) Vivalda, Jean-Claude
Praeses	Florchinger, Patrick (19..-....)
Author	Geist, Matthieu (1982-....)
alternative label	Optimal control of production line in the iron and steel industry, a statistical approach of reinforcement learning
dc:subject	Filtrage de Kalman Apprentissage automatique Thèses et écrits académiques Approximation, Théorie de l' Kalman, Filtrage de Approximation Apprentissage par renforcement Fonction de valeur
preferred label	Optimisation des chaînes de production dans l'industrie sidérurgique, une approche statistique de l'apprentissage par renforcement
Language	http://lexvo.org/id/iso639-3/fra
Subject	http://www.idref.fr/027940373/id http://www.idref.fr/031741657/id http://www.idref.fr/027253139/id http://www.idref.fr/027282716/id
dc:title	Optimisation des chaînes de production dans l'industrie sidérurgique, une approche statistique de l'apprentissage par renforcement
Degree granting institution	Université de Metz (1969-2012)
Opponent	Munos, Rémi (19..-....) Pietquin, Olivier (19..-....) Fricout, Gabriel Yrieix (1978-....) Sigaud, Olivier (1968-.... ; enseignant-chercheur en informatique)
note	L'apprentissage par renforcement est la réponse du domaine de l'apprentissage numérique au problème du contrôle optimal. Dans ce paradigme, un agent informatique apprend à contrôler un environnement en interagissant avec ce dernier. Il reçoit régulièrement une information locale de la qualité du contrôle effectué sous la forme d'une récompense numérique (ou signal de renforcement), et son objectif est de maximiser une fonction cumulante de ces récompenses sur le long terme, généralement modélisée par une fonction dite de valeur. Le choix des actions appliquées à l'environnement en fonction de sa configuration est appelé une politique, et la fonction de valeur quantifie donc la qualité de cette politique donc la qualité de cette politique. Ce parangon est très général, et permet de s'intéresser à un grand nombre d'applications, comme la gestion des flux de gaz dans un complexe sidérurgique, que nous abordons dans ce manuscrit. Cependant, sa mise en application pratique peut être difficile. Notamment, lorsque la description de l'environnement à contrôler est trop grande, une représentation exacte de la fonction de la valeur (ou de la politique) n'est pas possible. Dans ce cas se pose le problème de la généralisation (ou de l'approximation de fonction de valeur) : il faut d'une part concevoir des algorithmes dont la complexité algorithmique ne soit pas trop grande, et d'autre part être capable d'interférer le comportement à suivre pour une configuration de l'environnement inconnue lorsque des situations proches ont déjà été expérimentées. C'est le problème principal que nous traitons dans ce manuscrit, en proposant une approche inspirée du filtrage de Kalman Reinforcement learning is the response of machine learning to the problem of optimal control. In this paradigm, an agent learns do control an environment by interacting with it. It receives evenly a numeric reward (or reinforcement signal), which is a local information about the quality of the control. The agent objective is to maximize a cumulative function of these rewards, generally modelled as a so-called value function. A policy specifies the action to be chosen in a particular configuration of the environment to be controlled, and thus the value function quantifies the quality of yhis policy. This paragon is very general, and it allows taking into account many applications. In this manuscript, we apply it to a gas flow management problem in the iron and steel industry. However, its application can be quite difficult. Notably, if the environment description is too large, an exact representation of the value function (or of the policy) is not possible. This problem is known as generalization (or value function approximation) : on the one hand, one has to design algorithms with low computational complexity, and on the other hand, one has to infer the behaviour the agent should have in an unknown configuration of the environment when close configurations have been experimented. This is the main problem we address in this manuscript, by introducing a family of algorithms inspired from Kalman filtering
dc:type	Text
http://iflastandar...bd/elements/P1001	http://iflastandards.info/ns/isbd/terms/contentform/T1009
rdaw:P10219	2009
has content type	http://rdaregistry.info/termList/RDAContentType/1020
is primary topic of	http://www.idref.fr/218561024
is rdam:P30135 of	http://www.sudoc.fr/14282206X/id http://www.sudoc.fr/142410667/id

Faceted Search & Find service v1.13.91 as of Aug 16 2018

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3229 as of May 14 2019, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (70 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software