Extreme data points exert a disproportionate influence on the regression line, often distorting results.

Prepare for the GARP Risk and AI (RAI) Exam. Master concepts with flashcards and multiple-choice questions, each with hints and clarifications. Get exam-ready with extensive practice!

Multiple Choice

Extreme data points exert a disproportionate influence on the regression line, often distorting results.

Explanation:
Outliers exert disproportionate influence on a regression line because the method used to fit the line minimizes the sum of squared vertical distances from the line to all data points. An extreme point has a large residual, and squaring that residual makes it contribute a lot to the total error, so the line shifts toward that point to reduce the overall SSE. If the outlier is also far from the center of the x-values (high leverage), its pull on both the slope and intercept can be even stronger, distorting the relationship you’re trying to model and leading to biased predictions. This is a reason to check for influential points using diagnostic tools (like Cook’s distance) and consider robust alternatives or data investigation if outliers are data-entry errors or legitimate rare observations. Other issues exist in regression, but they affect different aspects: multicollinearity makes coefficient estimates unstable when predictors are highly related; residual plots are used to diagnose patterns in the residuals (nonlinearity, heteroskedasticity, etc.), and heteroskedasticity concerns whether the error variance changes with the level of an independent variable rather than distorting the line itself.

Outliers exert disproportionate influence on a regression line because the method used to fit the line minimizes the sum of squared vertical distances from the line to all data points. An extreme point has a large residual, and squaring that residual makes it contribute a lot to the total error, so the line shifts toward that point to reduce the overall SSE. If the outlier is also far from the center of the x-values (high leverage), its pull on both the slope and intercept can be even stronger, distorting the relationship you’re trying to model and leading to biased predictions.

This is a reason to check for influential points using diagnostic tools (like Cook’s distance) and consider robust alternatives or data investigation if outliers are data-entry errors or legitimate rare observations.

Other issues exist in regression, but they affect different aspects: multicollinearity makes coefficient estimates unstable when predictors are highly related; residual plots are used to diagnose patterns in the residuals (nonlinearity, heteroskedasticity, etc.), and heteroskedasticity concerns whether the error variance changes with the level of an independent variable rather than distorting the line itself.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy