NeurIPS (formerly called NIPS – Neural Information Processing Systems) is one of the premier machine learning conferences in the world. Researchers from across the globe present their latest projects in this field, but getting past the review screening? Not so easy. Thousands of papers are submitted every year out of which only a handful make the final conference.
The audience tickets for NeurIPS 2018 sold out within 12 minutes of the portal being opened! That might give you an inkling of how popular this annual conference is. For those who couldn’t be there – we are thrilled to present a quick summary of the best tutorials from NeurIPS 2018!
This year’s edition was held in Montreal, Canada between 2nd to 8th December. There were a variety of topics being showcased – from fairness and transparency in AI to visualizing deep learning models. You can check out the full schedule here.
There are hours and hours of videos, so our team went through all of them to bring you the best in the form of this article.
Note: We have embedded the videos for most sessions as well. A couple of videos are not being embedded due to some technical issue with FB’s video platform, and we have provided their direct links. While the summary is a good starting point, we encourage everyone to watch the videos as well – this is a great chance to learn from the top minds in this field.
Speakers: Frank Hutter and Joaquin Vanschoren
Building an end-to-end machine learning model involves a number of steps, such as preprocessing data, creating features, selecting model, and tuning the hyperparameters. Automatic Machine Learning, or AutoML, aims to automate these processes – this tutorial covers methods underlying the state-of-the-art in AutoML. Quite a relevant topic in today’s environment.
Frank Hutter kicked off the tutorial by discussing the various applications of deep learning and an expert’s role in building a successful model. This can potentially be replaced by an AutoML service that tries to learn the features, architecture and parameters to use based on the raw data that we provide. Followed by this basic introduction to AutoML, Frank spoke about the types of hyperparameters and modern approaches to Hyperparameter Optimisation. This is broadly divided into three sub-topics:
The next topic Frank covered was about Neural Architecture, which is again divided into three parts – Search Space Design, Blackbox optimization and Beyond Blackbox optimization.
After a short Q&A session with Frank, Joaquin Vanschoren took over for the second half of the tutorial. His focus was mainly on Meta-Learning. He spoke about various approaches, configuration space design, surrogate model transfer and warm-started multi-task learning. Joaquin further discussed the Learning Pipeline followed by transfer learning and transfer features. He also spent some time discussing topics like gradient descent and LSTM meta-learner.
Speakers: Deirdre Mulligan, Nitin Kohli, Joshua A. Kroll
Tutorial summary:
Machine learning is being used in almost every domain in the industry and researchers all over the world are examining how it can affect people and society. Ethics, essentially, should be at the heart of every ML project. The main idea behind this tutorial was to put forward some common misconceptions machine learning researchers and practitioners hold when thinking about certain topics.
This is a video we implore everyone to watch!
Some of the terms, like fairness, accountability, transparency and interpretability, are often reused to represent different meanings which may cause an unnecessary misunderstanding. This tutorial examined how the same words can be used to refer to different ideas. The presenters also showcased a few case studies where these learnings are being applied to ML problems.
The session started with focusing on the necessity of having certain definitions for the terms we use and how these terms carry different meanings for different people. We were introduced to the term ‘sociotechnical’. To explain the concept better, Nitin Kohli took the term ‘Fairness’ and showcased how it can come across differently for statisticians, computer scientists, or lawyers.
The next example, quite naturally, was of the word ‘Transparency’. The presenters picked up some very common examples to differentiate the meaning of the word for someone working in the government sector, to a machine learning engineer. Post this, Nitin described the word ‘Explanation’ and its various types with suitable instances.
They also spoke about the terms ‘accountability’ and ‘interpretability’ during the session and the Q&A that followed was pretty informative as well.
Speakers: John Shawe-Taylor, Omar Rivasplata
Tutorial summary:
John Shawe-Taylor initiated the tutorial by giving an introduction to statistical learning theory (SLT) followed by some basic definitions and notations for terms used quite frequently, such as generalization gap, upper bound, etc.
Both speakers provided a broad outline of the session where they listed down some important topics:
After familiarizing the audience with important terminologies, Omar Rivasplata took over the baton by discussing about First Generation SLT. He started with talking about the building blocks of a single function and then explains the finite function class and infinite function class. In the next few slides, Omar discussed the VM bounds along with the limitations to the VM framework.
Once the audience had been familiarized with the first generation of SLT in the first half of the talk, John gave an overview of what comes after that – the second generation of SLT. He elaborated on the different ways to make the bound function dependent and the techniques that can be used for detecting a benign distribution.
We also saw the Three Proof Techniques – Covering numbers, Rademacher Complexity, and PAC Bayes Analysis. He compared the PAC-Bayes bounds with Bayesian Learning. This part of the talk is really interesting – do watch the video to gain a deeper insight into it.
The last section of the tutorial is a discussion over the Next Generation SLT, where the speakers talks about Performance of neural networks and stability. Throughout the tutorial, the speakers explain all the concepts using plots and mathematical equations which makes the topics crystal clear.
Speakers: Alex Graves and Marc’Aurelio Ranzato
A topic most of you will be curious to explore more! This tutorial is divided into two parts:
In Part 1, Alex explained why we need unsupervised learning in the first place. Why can’t we just provide the true labels for training the model? There are mainly three reasons for that:
Targets in supervised learning contain very less information as compared to the input data. Using supervised learning, we are bounding the model to learn only a few bits of information. Unsupervised learning on the other hand, gives us an essentially unlimited supply of information to learn. So instead of learning the data points, the model learns the dataset.
Unsupervised learning gives us more of a signal to learn from, but the learning objective is not entirely clear. Autoregressive neural networks can be used for density modelling which help to learn information from the data. Methods such as auto-encoding and predictive coding can yield useful latent representations.
In Part 2, Marc discussed various applications of unsupervised learning which are based on other frameworks and principles. He explained how to learn representations and samples and how to map between two domains. Some of the tips for learning representations are:
He also mentioned how to extract features in NLP using unsupervised learning. Some of the applications of learning how to map between two domains are:
Unsupervised learning has tons of sub-areas like feature learning, learning to align domains, learning to generate samples, etc. The biggest challenges with unsupervised learning are:
Speakers: Zico Kolter and Aleksander Mądry
In the tutorial, both researchers spoke about how machine learning predictions are mostly accurate, but at the same time, brittle as well. Intrigued? Adding just a little noise to the data can change the predictions drastically, resulting in a drop in performance. Trying data augmentation also does not help much in improving the performance. Some of the problems that the brittleness of machine learning can cause are:
Zico and Aleksander proposed three commandments in order to make our machine learning model more secure:
They talked about adversarial examples and verification, and how to train adversarially robust models. Zico further propounded on whether robust deep networks overfit or not. Even for training adversarial robust models, more data is required – this is a known fact. Data Augmentation can be used to make the model robust. Adversarial training is also an ultimate version of data augmentation as we train on the most confusing version of the given training set.
Some of the keypoints to make a model robust are:
Finally, to summarize adversarial robustness:
Apart from this, the advantages of using adversarial robust models are massive. The model becomes more semantically meaningful. We will be able to rely on it far more. And it leads to machine learning that is not only safe and secure, but also better. Sounds like a good bet to us!
Speakers: Fernanda Viégas and Martin Wattenberg
Visualization is a topic all of us can relate to at some level. Who among us hasn’t done a thorough EDA before?
Fernanda Viégas and Martin Wattenburg covered one of the most interesting and fundamental topics of machine learning – visualization. They first spoke about what data visualization is, how it works and what are some of the best practices for it. The talk then focused on how visualization has been applied to machine learning till date. A special case of high dimensional data has also been covered in this tutorial.
Data visualization is good for almost every field and some of its applications include:
Using colors for visualization makes it more interpretable and even faster. Visualization makes calculations easier and less tedious (and who doesn’t appreciate that?!). Some of the examples where it helps in calculation are:
The tutorial also dove into the interpretability and model inspection facets of a ML project. Visualizing different layers of convolutional neural networks (CNNs) helps us to understand how it classifies images (this also helps in case the model is not performing well).
We can interpret the model layer by layer and finally conclude where it is going wrong. They recommend using Jupyter notebooks for visualization which have libraries like matplotlib and plotly which have pre-built codes for most visualizations.
Speaker: David Dunson
Bayesian learning as a topic has fascinated us for a long time.
The objective of this session was to motivate people to work more on Bayesian methods as these methods offer an attractive general approach for modeling complex data. David Dunson gave an overview of the state-of-the-art approaches for analyzing huge datasets using Bayesian statistical methods.
David explained how the Markov Chain Monte Carlo (MCMC) algorithm is becoming more and more scalable and faster thanks to the emerging rich and practical literature on the subject. Apart from that, he put quite an emphasis on tweaking the Bayesian paradigm to be more robust with respect to Big Data and scaling of Bayes to high-dimensional data (no. of features > no. of samples), which in itself is quite a hot topic.
If you are interested in Bayesian statistics, then this is a must-watch video, and has the following key takeaways:
Speakers: Survit Sra and Stefanie Jegalka
This tutorial gives an introduction to the topic: the theory of negative dependence. This can impact all aspects of machine learning, including both supervised and unsupervised learning. It is a rich mathematical toolbox which aids in tasks like anomaly detection, information maximization, experimental design, validation of black-box systems, architecture learning, fast MCMC sampling, and much more.
The speakers have highlighted the rich variety of mathematical ideas behind the theory of negative dependence. In addition, the following topics have been covered:
That was quite an intense collection! This year’s conference was bigger than ever before, with more papers, a bigger venue and a far more widespread audience from around the world. We ended up learning quite a lot while reviewing these videos – so we recommend you do the same. 🙂
Which was your favorite session from NeurIPS 2018? Connect with us in the comments section below and feel free to ask any questions you might have on the topics that were covered.
"This year’s edition was held in Vancouver, Canada between 2nd to 8th December." You mentioned Vancouver instead of Montreal.
Thanks for pointing it out Narsihma, we've fixed that.