Ensemble Methods in Machine Learning: Boosting, Bagging, and Stacking

Apr 04, 2024

In our last article, we discussed the concept of multi-modal learning, where models are trained on different types of data like text, images, audio, and videos. To effectively combine and leverage all this varied information, machine learning utilizes special techniques called ensemble methods. In this post, we'll explore three popular ensemble methods - boosting, bagging, and stacking - using some analogies.

What are boosting, bagging, and stacking?

These are ways to combine multiple machine learning models to get better results. The core idea is that by strategically combining several models, even ones that aren't great individually, you can create an ensemble that performs much better!

Boosting is one ensemble method. In boosting, you train several models one after another. Each new model learns from the mistakes of the previous models. This helps create a strong final model. Popular boosting algorithms include AdaBoost, Gradient Boosting, and XGBoost. Analogy - Imagine you and your friends take turns reading a book. The first friend reads a page, but maybe they get some words wrong. Then the next friend looks at what the first one missed and tries to read those words correctly. Each friend helps the next one get better and better until you have all learned how to read the book perfectly.

Boosting techniques are used in face detection and recognition tasks where they combine multiple weak classifiers to create a strong and accurate face detector or recognizer.

Bagging is another method. With bagging, you train multiple models in parallel using different subsets of the training data. Then you combine the predictions from all the models. Analogy - It’s like having different groups of friends, and each group gets a different book to read. Then, when you're all done, you come together and share what you've learned from your different books. That way, you all learn way more than if you had just read one book alone!

Bagging techniques are used in finance for the fraud detection, stock price prediction, and credit risk modeling.

Stacking is the third method. First, you train several basic models. Then you train a new model, called a meta-model, to combine the predictions from the basic models. This meta-model learns how to best combine the base models predictions. Analogy: This is like each of your friends reads a few pages from a book. Then, you, being the super smart kid that you are, look at what all your friends have read and figure out the best way to put it all together so you can read the entire book!

Stacking techniques are used in speech recognition systems, where different models trained on various speech features are combined to improve the overall accuracy of speech-to-text transcription. They are also used in drug discovery to combine different machine learning models trained on different molecular representations (e.g., fingerprints, graphs, 3D structures) to predict molecular properties more accurately.

Among these methods, Bagging is more widely used as it is good for improving model accuracy and can be parallelized for faster training1. Bagging models also tend to have lower variance (variance - how inconsistent the guesses/predictions are across different data points). In machine learning, high variance means the model's guesses are all over the place when tested on new data. This is undesirable, so techniques like bagging help reduce this variance without increasing bias (bias - how far off the model's predictions are from the correct values on average).

By combining multiple models through ensemble methods like boosting, bagging, and stacking, we can leverage the strengths of individual models and mitigate their weaknesses, ultimately achieving better overall performance and more accurate predictions.

https://stackoverflow.com/questions/57660093/in-what-scenario-bagging-can-be-used-over-boosting

https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205

https://medium.com/analytics-matters/taking-large-language-models-to-the-next-level-54d67cb0fab5

Bhavana’s Substack

Discussion about this post