\( W_{new}(t-1) = W_{old}(t-1) + \alpha \delta(t) \) (Tempral Difference Learning Update Rule)
\( \delta(t) = r(t) + W_{old}(t) - W_{old}(t-1) \) (Prediction Error)
State/ Time Step \((t)\)
1
2
3
4
Stimulus (Reward)
🚪 \[ r(1) = 0 \]
🔔 \[ r(2) = 0 \]
🕑 \[ r(3) = 0 \]
🍩 \[ r(4) = 1 \]
Trial 0
\( W(1) = 0 \)
\( W(2) = 0 \)
\( W(3) = 0 \)
\( W(4) = 0 \)
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Back to the overview