Deep learning has revolutionized the field of data science, powering advancements in natural language processing, computer vision, and autonomous systems. As data scientists delve into this transformative area, understanding its nuances and intricacies becomes crucial. This article compiles 40 deep learning questions for Data scientists that probe the depths of deep learning, offering insights into its fundamental concepts, practical applications, and emerging trends. Whether you’re a novice eager to explore or an expert seeking to refine your knowledge, these questions provide a comprehensive guide to mastering deep learning.
A) TRUE
B) FALSE
Solution: (B)
Deep learning itself does feature engineering whereas machine learning requires manual feature engineering.
A) Neural network
B) Random Forest
C) k-Nearest neighbor
D) None of the above
Solution: (A)
Neural network converts data in such a form that it would be better to solve the desired problem. This is called representation learning.
B) 3 and 4
C) 1 and 4
D) 2 and 3
Solution: (A)
Option A is correct.
A) TRUE
B) FALSE
Solution: (B)
Kernel size is a hyperparameter and therefore by changing it we can increase or decrease performance.
Question Context
Suppose we have a deep neural network model which was trained on a vehicle detection problem. The dataset consisted of images on cars and trucks and the aim was to detect name of the vehicle (the number of classes of vehicles are 10).
Now you want to use this model on different dataset which has images of only Ford Mustangs (aka car) and the task is to locate the car in an image.
A) Fine tune only the last couple of layers and change the last layer (classification layer) to regression layer
B) Freeze all the layers except the last, re-train the last layer
C) Re-train the model for the new dataset
D) None of these
Solution: (A)
A) 217 x 217 x 3
B) 217 x 217 x 8
C) 218 x 218 x 5
D) 220 x 220 x 7
Solution: (C)
Note: The neural network was able to approximate XNOR function with activation function ReLu.
A) Yes
B) No
Solution: (B)
If ReLU activation is replaced by linear activation, the neural network loses its power to approximate non-linear function.
B) Exactly 2 secs
C) Greater than 2 secs
D) Can’t Say
Solution: (B)
The changes is architecture when we add dropout only changes in the training, and not at test time.
A) 1, 2, 3
B) 1, 4, 5
C) 1, 3, 4, 5
D) All of these
Solution: (D)
All of the above techniques can be used to reduce overfitting.
A) Higher the perplexity the better
B) Lower the perplexity the better
Solution: (B)
What would be the output of this Pooling layer?
B) 5
C) 5.5
D) 7
Solution: (D)
Max pooling works as follows, it first takes the input using the pooling size we defined, and gives out the highest activated input.
If we remove the ReLU layers, we can still use this neural network to model non-linear functions.
B) FALSE
Solution: (B)
A) Machine translation
B) Sentiment analysis
C) Question Answering system
D) All of the above
Solution: (D)
Deep learning can be applied to all of the above-mentioned NLP tasks.
Deep learning can be applied to Scenario 1 but not Scenario 2.
A) TRUE
B) FALSE
Solution: (B)
Scenario 1 is on Euclidean data and scenario 2 is on Graphical data. Deep learning can be applied to both types of data.
B) 2, 3, 4, 5, 6
C) 1, 3, 5, 6
D) All of these
Solution: (D)
Which neural network architecture would be suitable to complete this task?
B) Convolutional Neural Network
C) Recurrent Neural Network
D) Restricted Boltzmann Machine
Solution: (C)
Recurrent neural network works best for sequential data. Therefore, it would be best for the task.
B) Deconvolutional network on input and convolutional network on output
Solution: (A)
That is why ReLU function was proposed, which kept the gradients same as before in the positive direction.
A ReLU unit in neural network never gets saturated.
B) FALSE
Solution: (B)
ReLU can get saturated too. This can be on the negative side of x-axis.
Note: we have defined dropout rate as the probability of keeping a neuron active?
B) Higher the dropout rate, lower is the regularization
Solution: (B)
Higher dropout rate says that more neurons are active. So there would be less regularization.
B) Unlike backprop, in BPTT we subtract gradients for corresponding weight for each time step
Solution: (A)
BPTT is used in context of recurrent neural networks. It works by summing up gradients for each time step
What is the probable approach when dealing with “Exploding Gradient” problem in RNNs?
B) Gradient clipping
C) Dropout
D) None of these
Solution: (B)
To deal with exploding gradient problem, it’s best to threshold the gradient values at a specific point. This is called gradient clipping.
In which of the following scenarios would you prefer l-BFGS over SGD?
B) Only 1
C) Only 2
D) None of these
Solution: (A)
l-BFGS works best for both of the scenarios.
B) Skip-gram model
C) PCA
D) Convolutional neural network
Solution: (C)
A) L-BFGS
B) SGD
C) AdaGrad
D) Subgradient method
Solution: (D)
Other optimization algorithms might fail on non-continuous objectives, but sub-gradient method would not.
B) 1 is False and 2 is True
C) Both 1 and 2 are True
D) Both 1 and 2 are False
Solution: (D)
In dropout, neurons are dropped; whereas in dropconnect; connections are dropped. So both input and output weights will be rendered in useless, i.e. both will be dropped for a neuron. Whereas in dropconnect, only one of them should be dropped
What is the best place in the graph for early stopping?
B) B
C) C
D) D
Solution: (C)
You would “early stop” where the model is most generalized. Therefore option C is correct.
Image painting is one of those problems that requires human expertise to solve. It is particularly useful for repairing damaged photos or videos. Below is an example of the input and output of an image painting example.
B) Negative-log Likelihood loss
C) Any of the above
Solution: (C)
Both A and B can be used as a loss function for image inpainting problem.
A) Sum of squared error with respect to inputs
B) Sum of squared error with respect to weights
C) Sum of squared error with respect to outputs
D) None of the above
Solution: (C)
A) Gradient descent optimizes best when you use an even number
B) Parallelization of neural network is best when the memory is used optimally
C) Losses are erratic when you don’t use an even number
D) None of these
Solution: (B)
Xavier’s init helps reduce vanishing gradient problem.
Xavier’s init helps the input signals reach deep into the network. Which of the following statements is true?
B) 2, 3, 4
C) 1, 3, 4
D) 1, 2, 3
E) 1, 2, 3, 4
Solution: (D)
All of the above statements are true.
A) Use recursive units instead of recurrent
B)Use attention mechanism
C) Use character level translation
D) None of these
Solution: (B)
A) TRUE
B) FALSE
Solution: (A)
Recurrent neuron can be thought of as a neuron sequence of infinite length of time steps.
A) Data related to the problem
B) CPU to GPU communication
C) GPU memory
D) All of the above
Solution: (D)
Along with having the knowledge of how to apply deep learning algorithms, you should also know the implementation details. Therefore you should know that all the above mentioned problems are a bottleneck for deep learning algorithm.
Dropout technique is not an advantageous technique for which of the following layers?
B) Convolutional layer
C) RNN layer
D) None of these
Solution: (C)
Dropout does not work well with recurrent layer. You would have to modify dropout technique a bit to get good results.
For example:
The input given to you is an image depicting the music symbols as given below,
Your required output is an image of succeeding symbols.
Which architecture of neural network would be better suited to solve the problem?
B) Convolutional neural network followed by recurrent units
C) Neural Turing Machine
D) None of these
Solution: (B)
CNN work best on image recognition problems, whereas RNN works best on sequence prediction. Here you would have to use best of both worlds!
B) Location-based addressing
Solution: (A)
A) Affine layer
B) Strided convolutional layer
C) Fractional strided convolutional layer
D) ReLU layer
Solution: (C)
Option C is correct. Go through this link.
Question Context 38-40
GRU is a special type of Recurrent Neural Network proposed to overcome the difficulties of classical RNNs. This is the paper in which they were proposed: “On the Properties of Neural Machine Translation: Encoder–Decoder Approaches. Read the full paper here.
B) Only 2
C) None of them
D) Both 1 and 2
Solution: (D)
B) Previous hidden state would be not be ignored
Solution: (A)
B) Copies the information through many time steps
Solution: (B)
Navigating the complex landscape of deep learning requires not just theoretical understanding but also practical insight and foresight into future developments. The 40 deep learning questions for data scientists addressed in this article aim to deepen your comprehension of deep learning’s capabilities, challenges, and evolving nature. Armed with this knowledge, data scientists can leverage deep learning more effectively to innovate and solve real-world problems, positioning themselves at the forefront of technological advancement in data science.
Also Read: How to Become a Data Scientist in 12 Months?
A. Yes, deep learning is highly useful for data scientists. It enables the development of complex models that can automatically learn features and patterns from large datasets, leading to breakthroughs in fields like image and speech recognition, natural language processing, and predictive analytics.
A. A data scientist can be asked questions related to deep learning, machine learning, AI, LLMs, NLP etc.
A. Deep learning techniques include:
Convolutional Neural Networks (CNNs) for image analysis.
Recurrent Neural Networks (RNNs) for sequential data.
Generative Adversarial Networks (GANs) for generating data.
Autoencoders for unsupervised learning.
Transformers for natural language processing tasks.
A. Python is the most important language for data scientists due to its extensive libraries (like TensorFlow, PyTorch, Pandas) for data manipulation, analysis, and machine learning. It’s also known for its readability, simplicity, and supportive community, making it ideal for rapid development and deployment.
Hi, Thanks for the solutions. However, some doubts.. In Q.8, since dropout has no change in the test time, shouldn't the answer be exactly 2 secs. ie. option B Q.25, the explanation given maps to Q.24 Q.33, while the explanation says "Therefore you should know that all the above mentioned problems are a bottleneck for deep learning algorithm", the answer chosen in option A.
Thanks Asha for the feedback; updated them
As a newbie to deep learning, I wonder to crack these questions, what material is the best in terms of the understanding the application of and the tuning of deep learning projects. Thanks.
Explanations and resources are regularly published at Analytics Vidhya on Deep Learning and Data Science in general. You can refer them if you want.