영상: https://youtu.be/71nH1BUjhNw

강의 자료: https://www.davidsilver.uk/wp-content/uploads/2020/03/FA.pdf

Table Of Content

Large-Scale Reinforcement Learning

Reinforcement learning can be used to solve large problems, e.g.

Backgammon: 10²⁰ states
Computer Go: 10¹⁷⁰ states
Helicopter: continuous state space → 연속적인 상태이기 때문에 테이블 생성 불가능

→ Prediction, Control 문제를 푸는 Model-Free 방법을 어떻게 Scale up할까?

Value Function Approximation

State 개수만큼, 또는 StatexAction 개수만큼의 lookup 테이블을 사용했었음
v̂은 v_π를 모방하는 approximation → v̂를 통해서 실제값인 v_π을 찾겠다
w는 v̂ 안에 들어있는 파라미터
Generalize란: 보지 않은 State에 대해서 올바른 Output을 반환

→ MDP를 알아도 Value Function Approximation을 사용할 수 있다. 그러나 여기는 Model-Free 관련해서만 설명

Types of Value Function Approximation

네모칸은 블랙박스 영역. w라는 값이 관장하는 공간
Q는 블랙박스를 2가지 형태로 만들 수 있음(2, 3번째 그림이 이에 해당)

Which Function Approximator?

There are many function approximators.

And we consider differentiable function approximators. (w를 업데이트하기 위해 미분 가능한 함수를 사용)

Linear combinations of features
Neural network
Decision tree
Nearest neighbour
Fourier / wavelet bases

Furthermore, we require a training method that is suitable for non-stationary, non-iid data

(모분포가 계속 바뀌고 서로 Indepent하지 않은 데이터에 적합한 Training 방법)

Incremental Methods

Gradient Descent

Value Function Approx. By Stochastic Gradient Descent

Stochastic gradient descent: Policy를 따라가다보면 샘플이 나오는데, 이것을 input으로 주는 것

Linear Function Approximation > Feature Vector

State S가 있으면 N개의 feature를 만들 수 있음

Linear Function Approximation > Linear Value Function Approximation

각 Feature들에 w를 가중해서 합함
실제 값과 근접하게 하기 위해 w를 업데이트

Linear Function Approximation > Table Lookup Features

table lookup은 linear value function의 한 예시

Incremental Prediction Algorithms

앞서 설명한 수식에 MC, TD 방법을 적용

Monte-Carlo with Value Function Approximation

TD Learning with Value Function Approximation

TD(λ) with Value Function Approximation

Control with Value Function Approximation

Action-Value Function Approximation

Linear Action-Value Function Approximation

Incremental Control Algorithms

Linear Sarsa with Coarse Coding in Mountain Car

Study of λ: Should We Bootstrap?

y축: error

Convergence of Prediction Algorithms

Gradient Temporal-Difference Learning

Convergence of Control Algorithms

Batch Methods

Batch Reinforcement Learning

Gradient descent is simple and appealing
But it is not sample efficient
Batch methods seek to find the best fitting value function
Given the agent’s experience ("training data")

Least Squares Prediction

LSP > Stochastic Gradient Descent with Experience Replay

LSP > Experience Replay in Deep Q-Networks (DQN)

'Study > Machine Learning' 카테고리의 다른 글

[RL]Lecture #5 - Model-Free Control (0)	2022.03.10
[RL]Lecture #4 - Model-Free Prediction (0)	2022.03.07
[RL]Lecture #3 - Planning by Dynamic Programming (0)	2022.03.06
[RL]Lecture #2 - Markov Decision Processes (0)	2022.03.06
[RL]Lecture #1 - Introduction to Reinforcement Learning (0)	2022.03.04

커피콩

[RL]Lecture #6 - Value Function Approximation

Table Of Content

Large-Scale Reinforcement Learning

Value Function Approximation

Types of Value Function Approximation

Which Function Approximator?

Incremental Methods

Gradient Descent

Value Function Approx. By Stochastic Gradient Descent

Linear Function Approximation > Feature Vector

Linear Function Approximation > Linear Value Function Approximation

Linear Function Approximation > Table Lookup Features

Incremental Prediction Algorithms

Monte-Carlo with Value Function Approximation

TD Learning with Value Function Approximation

TD(λ) with Value Function Approximation

Control with Value Function Approximation

Action-Value Function Approximation

Linear Action-Value Function Approximation

Incremental Control Algorithms

Linear Sarsa with Coarse Coding in Mountain Car

Study of λ: Should We Bootstrap?

Convergence of Prediction Algorithms

Gradient Temporal-Difference Learning

Convergence of Control Algorithms

Batch Methods

Batch Reinforcement Learning

Least Squares Prediction

LSP > Stochastic Gradient Descent with Experience Replay

LSP > Experience Replay in Deep Q-Networks (DQN)

'Study > Machine Learning' 카테고리의 다른 글

댓글

티스토리툴바

[RL]Lecture #6 - Value Function Approximation

Table Of Content

Large-Scale Reinforcement Learning

Value Function Approximation

Types of Value Function Approximation

Which Function Approximator?

Incremental Methods

Gradient Descent

Value Function Approx. By Stochastic Gradient Descent

Linear Function Approximation > Feature Vector

Linear Function Approximation > Linear Value Function Approximation

Linear Function Approximation > Table Lookup Features

Incremental Prediction Algorithms

Monte-Carlo with Value Function Approximation

TD Learning with Value Function Approximation

TD(λ) with Value Function Approximation

Control with Value Function Approximation

Action-Value Function Approximation

Linear Action-Value Function Approximation

Incremental Control Algorithms

Linear Sarsa with Coarse Coding in Mountain Car

Study of λ: Should We Bootstrap?

Convergence of Prediction Algorithms

Gradient Temporal-Difference Learning

Convergence of Control Algorithms

Batch Methods

Batch Reinforcement Learning

Least Squares Prediction

LSP > Stochastic Gradient Descent with Experience Replay

LSP > Experience Replay in Deep Q-Networks (DQN)

'Study > Machine Learning' 카테고리의 다른 글

관련글

댓글

티스토리툴바