Which algorithm extends Q-learning by using a neural network to approximate the Q-values rather than storing them in a table?

Prepare for the GARP Risk and AI (RAI) Exam. Master concepts with flashcards and multiple-choice questions, each with hints and clarifications. Get exam-ready with extensive practice!

Multiple Choice

Which algorithm extends Q-learning by using a neural network to approximate the Q-values rather than storing them in a table?

Explanation:
Using a neural network to approximate the Q-values instead of storing them in a table is Deep Q-Learning. This approach lets the algorithm handle large or continuous state spaces by learning a function Q(s, a; θ) that generalizes across states and actions, rather than listing every possible state-action value. Deep Q-Learning trains the network to minimize the temporal-difference error, using targets like r + γ max_a' Q(s', a'; θ−), where a separate target network with parameters θ− provides stable targets. Experience replay helps by reusing past experiences and breaking correlations in the data, further stabilizing learning. The other options don’t fit this description: Q-learning is the original tabular method that stores Q-values in a table; Policy Gradient methods learn a policy directly rather than a Q-value function; Batch Gradient Descent is a general optimization technique, not the RL algorithm that replaces the table with a neural network.

Using a neural network to approximate the Q-values instead of storing them in a table is Deep Q-Learning. This approach lets the algorithm handle large or continuous state spaces by learning a function Q(s, a; θ) that generalizes across states and actions, rather than listing every possible state-action value.

Deep Q-Learning trains the network to minimize the temporal-difference error, using targets like r + γ max_a' Q(s', a'; θ−), where a separate target network with parameters θ− provides stable targets. Experience replay helps by reusing past experiences and breaking correlations in the data, further stabilizing learning.

The other options don’t fit this description: Q-learning is the original tabular method that stores Q-values in a table; Policy Gradient methods learn a policy directly rather than a Q-value function; Batch Gradient Descent is a general optimization technique, not the RL algorithm that replaces the table with a neural network.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy