A Step towards Stepwise Regression

Deep Patel
2 min readJul 3, 2021

A brief introduction to Stepwise Regression.

So, you saw the name, and it says Stepwise. As the name stepwise regression suggests, this procedure selects variables in a step-by-step manner. Stepwise either adds the most significant variable or removes the least significant variable. It does not consider all possible models, and it produces a single regression model when the algorithm ends.

It is highly used to meet regression models with predictive models that are carried out naturally. With every forward step, the variable gets added or subtracted from a group of descriptive variables.

How Stepwise Regression Works

  • The Backward Method: Whenever a model is fully saturated, we think of removing some parameters to make it more generalized. Backward stepwise regression does the same for us. At each step, it gradually eliminates variables from the regression model to find a reduced model that best explains the data.
  • An example of a stepwise regression using the backward elimination method would be an attempt to understand energy usage at a factory using variables such as equipment run time, equipment age, staff size, temperatures outside, and time of year. The model includes all of the variables — then each is removed, one at a time, to determine which is least statistically significant. In the end, the model might show that time of year and temperatures are most significant, possibly suggesting the peak energy consumption at the factory is when air conditioner usage is at its highest.
  • The Forward Method: Now that you know the backward method, the forward method does just the opposite. Initially, the model has no variable, so it adds variables by testing one at a time. If you have a large set of predictor variables, use this method

Problems with Stepwise

Research has found that sometimes real and explanatory variables that have causal effects on the dependent variable may happen to not be statistically significant, while nuisance variables may be coincidentally significant. As a result, the model may fit the data well in-sample, but do poorly out-of-sample.

Conclusion

Many Big-Data researchers believe that the larger the number of possible explanatory variables, the more useful is stepwise regression for selecting explanatory variables. The reality is that stepwise regression is less effective the larger the number of potential explanatory variables. Stepwise regression does not solve the Big-Data problem of too many explanatory variables. Big Data exacerbates the failings of stepwise regression. Stepwise regression has its downsides, however, as it is an approach that fits data into a model to achieve the desired result.

--

--

Deep Patel

Learning and exploring this beautiful world with amazing tech.