Consider the grid-world given below and an agent (yellow) moving using these actions: N-North, WWest, E-East, S-South, and a special action D-Depart in terminal states (Exit). Rewards are only
awarded for taking the Exit action from one of the terminal states (green and red). Assume discount
factor γ = 1 for all calculations.The agent starts from the top left corner and you are given the following episodes from runs of the
agent through this grid-world. Each line in an Episode is a tuple containing (s, a, s0
, r).