This article was published as a part of the Data Science Blogathon
Colaboratory, or “Colab” for short, are Jupyter Notebooks hosted by Google that allow you to write and execute Python code through your browser. It is easy to use a Colab and linked with your Google account. Colab provides free access to GPUs and TPUs, requires zero configuration, and easy to share your code with the community.
Colab has a fascinating history. It is an internal tool for data analysis at Google. However, later it was released publically, and since then, many people have been using this tool to achieve their machine learning tasks. Most of the students and users who do not have a GPU utilize colab for the free resources to run their Data Science experiments.
This article collects some helpful tips and hacks that I use to make my work easy in Colab. I have attempted to list most of the sources where I read those first. These tips should help you get the most out of your Colab notebooks.
Typically, Colab affords you free GPU resources. However, if you already have your GPUs and want to use the Colab UI, there is a workaround. Users can use the Colab UI with a local runtime as follows:
You can use this method to run code on your local hardware and access your local file system without leaving the Colab notebook. The following documentation goes more profound into the way it works. Check this document for more details.
If you are creating multiple notebooks with names like “untitled.ipynb” and “untitled1.ipynb” etc.? I think a few of us might sail in the same boat in this regard. Then the Cloud scratchpad notebook might be for you if that’s the case.
The Cloud scratchpad is a specific notebook available at the URL — https://colab.research.google.com/notebooks/empty.ipynb that is not saved automatically to your drive account. It is helpful for experimentation or nontrivial work and doesn’t take space in Google drive.
Users get notified after completion of executions even if they switch to another tab, window, or application. Users can enable it through Tools > Settings > Site > Show desktop notifications (and allow browser notifications once prompted) to check it out.
Here is how the notification appears even if you are on another tab.
Here is a demo of notification that appears even if you navigate to another tab.
Image By Author
Colab notebooks developed in such a way that they can easily integrate with Github. It means you can load and save Colab notebooks to GitHub directly. We have an easy way to do it, thanks to Seungjae Ryan Lee.
When you are on a notebook on Github which you want to view in Colab, replace Github with githubtocolab in the URL, leaving everything else unchanged.
Image By Author
At the time of low budget and had consumed your GPU quota on Kaggle, this hack might be a break for you. One can download any dataset seamlessly from Kaggle onto your Colab workspace. Here is what you should do:
After clicking the ‘Create New API Token’ tab, a kaggle.json file generated which consists of your API token. Create a folder Kaggle in your Google Drive and store the kaggle.json file in it
Mount drive in colab notebook
Change the config path to ‘Kaggle.json’ and change the current working directory
import os os.environ['KAGGLE_CONFIG_DIR'] = "/content/drive/My Drive/Kaggle"
%cd /content/drive/MyDrive/Kaggle
The API is present under the ‘Data’ tab for datasets linked to competitions.
Image By Author
At last, run any one of the following commands to download the dataset
!kaggle datasets download -d alexanderbader/forbes-billionaires-2021-30
!kaggle competitions download -c google-smartphone-decimeter-challenge
Image By Author
If you want to search for a specific Colab notebook in the drive? Go to the drive search box and add:
application/vnd.google.colaboratory
It will list all your notebooks in your Google Drive. In Addition, you can also specify the title and ownership of the notebook. For example, if I need to search for a notebook created long back, having ‘Transfer’ in its title, the following helps to get it:
Colab includes an extension for loading pandas dataframes into interactive displays that can be dynamically sorted, filtered, and examined. Type the below code in the notebook cell to enable the Data table display for Pandas dataframes.
%load_ext google.colab.data_table #To diable the display %unload_ext google.colab.data_table
Here’s a quick demo of it:
Image By Author
By using Colab, it is easy to compare two notebooks. Use View > Diff notebooks from the Colab menu or navigate to https://colab.research.google.com/diff and in the input box, paste URLs of the notebooks to see the difference.
Disconnected due to idleness:
This is a significant disadvantage of Google Colab, and I’m sure many of you have experienced it at least once. You decide to take a break, but when you return, your notebook is disconnected!
In fact, if we leave the notebook idle for more than 30 minutes, Google Colab automatically disconnects it.
Open Chrome DevTools by hitting F12 on Windows or ctrl+shift+i on Linux, and then type the following JavaScript code into your console:
function KeepClicking(){ console.log("Clicking"); document.querySelector("colab-connect-button").click() } setInterval(KeepClicking,60000)
Every 60 seconds, this function clicks the connect button. As a result, Colab believes that the notebook is not idle, and you should not be concerned about being disconnected!
Disconnection while a task is running:
To begin, keep in mind that when you connect to a GPU, you are only allowed to use the Cloud Machine for a maximum of 12 hours at a time.
You may be disconnected at some point during these 12 hours. “Colaboratory is meant for interactive use,” according to the FAQ on Colab. Background computations that have been running for a long time, particularly on GPUs, can be terminated.
TensorBoard is a tool for displaying metrics and visualizations throughout a Deep Learning workflow. It is immediately usable within Colab.
Load the TensorBoard notebook extension first:
%load_ext tensorboard
Once your model is complete, launch TensorBoard within the notebook by typing:
%tensorboard --logdir logs
These were few tricks that I have found very useful, particularly when it comes to training Ml models on GPUs. Even though Colab notebooks can only run for a maximum of 12 hours, with the hacks shared above, you should be able to get the most out of your session.
I hope you have found this article useful and have a wonderful day, Thank you.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.