\( W_{new}(t-1) = W_{old}(t-1) + \alpha \delta(t) \) (Tempral Difference Learning Update Rule)
\( \delta(t) = r(t) + W_{old}(t) - W_{old}(t-1) \) (Prediction Error)
State/ Time Step \((t)\) 1 2 3 4
Stimulus (Reward) 🚪 \[ r(1) = 0 \] 🔔 \[ r(2) = 0 \] 🕑 \[ r(3) = 0 \] 🍩 \[ r(4) = 1 \]
Trial 0 \( W(1) = 0 \) \( W(2) = 0 \) \( W(3) = 0 \) \( W(4) = 0 \)
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Back to the overview