This article was published as a part of the Data Science Blogathon.
One of the areas of machine learning research that focuses on knowledge retention and application to unrelated but crucial problems is known as “transfer learning.” In other words, rather than being a particular form of machine learning algorithm, transfer learning is a strategy or method used when training models. In transfer learning, a new task requires the reuse of a prior training model, at the same time, the new task will somehow relate to the one that was previously mastered. The old trained model often needs a high level of generalization to adapt to the new, unseen input.
Transfer learning refers to changing a network that has been trained to solve one problem to solve another. We might not always have enough data to train a machine learning model. Then, utilizing a parent model that had previously performed well on a comparable dataset, we may easily attain our end goals by altering the transfer learning approaches. We have two options: we may utilize the parent model exactly as it is, or we can use our small dataset to improve the model.
This is similar to how we perform in the final moments before tests. Assume that we didn’t have enough time to adequately prepare for the exam we have today. What do we typically do then? We will get in touch with a friend and request information about the subject of today’s exam. As a result, your friend who has well-prepared for the exam will import his/her knowledge to you. So in a short period of time and without the aid of any other resources, you will get sufficient confidence in that subject. So why are you still standing there? Go to the exam room and do your best on the answer sheet. I wish you all the very best✌️.
We can map many of our real-world daily situations to the idea of transfer learning, as we can see in the example above. So during the past three to four years, it has gained a lot of popularity. Many AI-based organizations are currently seeking employees with sufficient transfer learning experience and knowledge.
I will therefore demonstrate and discuss the top 10 questions to gauge your Data Science proficiency in this blog. Therefore, before getting started, I hope that this article will be useful for beginners to understand transfer learning. Use this opportunity to assess your transfer learning abilities if you have any prior knowledge of the subject.
Ans: A variety of transfer learning approaches are extensively used in the field of data science. They do.
Ans:
Ans:
Ans:
1. Training a model and reusing it
Think about doing Task A but failing to train a deep neural network architecture because of a shortage of data. One way around this is to find a different task B that is relevant and has a tonne of data. After training on problem B, the deep neural network architecture is used to complete task A. Whether you need to use the complete model or just a few levels will depend on the problem you’re trying to solve.
2. Using a pre-trained model
The second method is to use a model that has already been trained. Do some preliminary study because there are numerous models of these products accessible. Different layers should be recycled and retrained depending on the problem.
3. Feature Extraction
The objective of this approach is to use deep learning to pinpoint the representation of your problem that is most important. Performance obtained using this method, also known as representation learning, typically outperforms that obtained using manually created representations. The representation obtained can subsequently be used to address other problems.
Ans: Applying or using transfer learning requires some changes. Fine-tuning, in particular, is the process of improving or modifying a model that has already been trained to carry out a first, custom perticular job in order to have it carry out a second, related job. In NLP, fine-tuning is, in other words, the act of retraining a language model that has already been trained using your own particular data. As a result of the fine-tuning job, the weights of the first model are changed to take into consideration the characteristics of the domain data and the job you are interested in.
Ans: The pre-trained artificial neural network’s softmax layer is typically left out and replaced with a new softmax layer that is appropriate for our custom particular case.
A common strategy is to reduce the first learning rate by a factor of ten from the learning rate used for scratch training.
The weights of the pre-trained network’s first few layers are almost frozen. This is because the early layers preserve a generic set of elements, like edges and curves, which are relevant to our present issue. Those weights have to stay the same. Instead, we’ll focus on teaching the network’s subsequent layers properties specific to a given dataset
Ans: The pre-training procedure involves teaching the model on a sizable generic corpus. When it is modified to fit a particular task or dataset, the process is called “fine-tuning.” While finetuning can be done with a small amount of equipment, pretraining needs a lot of computational power and data. The network layers will update their weights in both scenarios.
Ans:
1 – Autoregressive-models
Autoregressive models are trained using the common language modeling task, which asks users to predict the next token after reading all the preceding ones. They represent the decoder in the original transformer model, and the entire sentence is covered by a mask so that the attention heads cannot perceive anything that follows what came before. Eg: GPT, CTRL
2 – Autoencoding-models
These models view every token in the attention heads because they don’t use a mask and just rely on the encoder part of the original transformer. For pretraining, the targets are the original sentences, while the inputs are the altered versions of those sentences. Eg: BERT, ALBERT
3 – Seq-to-seq-models
These copies have pre-trained encoders and decoders from the original transformer. Eg, T5, BART
4 – Multimodal-models
Unlike the other multimodal models, one has not undergone self-supervised pretrained. Eg, MMBT
5 – Retrieval-based-models
To respond to questions with an open domain during (pre)training, some models utilize document retrieval while others use inference. Eg: RAG, DPR
Ans: An encoder stack serves as the basis of BERT. BERT base and BERT large difference in the number of encoder layers. The BERT base model only contains 12 levels of encoders, as opposed to the BERT large, which has 24 layers of encoders stacked on top of one another. As the number of layers in a BERT large rises, so do the number of parameters (weights) and attention heads. The complete BERT base has 110 million parameters and 12 attention heads. Contrarily, BERT large has 340 million parameters and 16 attention heads. Compared to BERT base’s 768 hidden layers, BERT large has 1024.
Transfer learning has become one of the most important abilities for data scientists today. Transfer learning enhances the environment in addition to the technical advantages. According to research published in the MIT Technology Review, a big neural network (trained on a cloud TPU) with 200M+ parameters generates the same amount of carbon dioxide over its lifespan as six cars. Transfer learning can cut down on the time these powerful processing units are employed.
Major points to remember
I hope this article enabled you to better understand and assess your transfer learning knowledge. Feel free to leave a remark below if you have any questions, concerns, or recommendations.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Really informative post. Thanks for sharing.
Good and useful job for data scientist. Great!