Markov Decision Process (MDP)란?

The Markov Decision Process (MDP) provides a mathematical framework for solving the RL problem. Almost all RL problems can be modeled as an MDP. MDPs are widely used for solving various optimization problems. In this section, we will understand what an MDP is and how it is used in RL.

To understand an MDP, first, we need to learn about the Markov property and Markov chain.

State Transition (Probability) Matrix

For a state s and successor state s', the state transition probability is defined by

State transition matrix P defines transition probabilities from all states s to all successor states s‘

where each row of the matrix sums to 1.

Markov Property

a state 𝑆𝑡 is Markov (= a state 𝑆𝑡 has Markov Property)

if and only if (필요충분조건)

(the memoryless property of a stochastic process)

The future is independent of the past and only decided by the present; Once the present state is known, the history may be thrown away.

Markov Process

A Markov process is a memoryless random process, i.e. a sequence of random states S1, S2, … with the Markov property.
Definition

Markov Reward Process

A Markov Reward Process is a Markov Chain with values(rewards)
Definition

Episode

Episode and sampling
Episode is a kind of story of process which starts from beginning to terminal state.

Sampling is making an example of episodes

Return

Purpose of reinforcement learning is maximise 'Return' not rewards
Definition

Why discount?

수학적 편리성(Mathematically convenient)
- Avoids infinite returns in cyclic Markov processes
- It is sometimes possible to use undiscounted Markov reward processes (i.e. = 1), e.g. if all sequences terminate
사람의 선호도 반영 (Human preference)
- Animal/human behavior shows preference for immediate reward
- If the reward is financial, immediate rewards may earn more interest than delayed rewards
미래에 대한 불확실성 반영 (Future uncertainty)
- Uncertainty about the future may not be fully represented

(State) Value Function (상태 가치함수)

The value function v(s) gives the long-term value of state s
Definition