Which optimization method updates weights after seeing each individual, randomly selected data point?

Prepare for the GARP Risk and AI (RAI) Exam. Master concepts with flashcards and multiple-choice questions, each with hints and clarifications. Get exam-ready with extensive practice!

Multiple Choice

Which optimization method updates weights after seeing each individual, randomly selected data point?

Explanation:
Updates after every single training example, chosen at random, define stochastic gradient descent. This approach estimates the gradient from one data point at a time, so updates happen quickly and scales well to large datasets. The randomness makes the gradient estimate noisy, which can help the optimizer move through the loss landscape more freely and can lead to faster total training time in practice. Because only one example is used per update, the learning process is more “online” and can adapt as new data arrives. However, the trade-off is less stability in the updates, so a careful learning-rate schedule or additional techniques (like momentum) are often used to smooth convergence. In contrast, batch gradient descent uses the entire dataset to compute one update, which is stable but slow on large datasets, and mini-batch gradient descent uses small chunks of data per update as a middle ground. A dynamic learning rate, meanwhile, refers to adjusting the step size itself and is not by itself the method of updating weights per data point.

Updates after every single training example, chosen at random, define stochastic gradient descent. This approach estimates the gradient from one data point at a time, so updates happen quickly and scales well to large datasets. The randomness makes the gradient estimate noisy, which can help the optimizer move through the loss landscape more freely and can lead to faster total training time in practice. Because only one example is used per update, the learning process is more “online” and can adapt as new data arrives. However, the trade-off is less stability in the updates, so a careful learning-rate schedule or additional techniques (like momentum) are often used to smooth convergence. In contrast, batch gradient descent uses the entire dataset to compute one update, which is stable but slow on large datasets, and mini-batch gradient descent uses small chunks of data per update as a middle ground. A dynamic learning rate, meanwhile, refers to adjusting the step size itself and is not by itself the method of updating weights per data point.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy