Top Highlights from the Amazing Machine Learning Tutorials Presented at NeurIPS (NIPS) 2018

Aishwarya Singh Last Updated : 19 Jul, 2020

10 min read

Introduction

NeurIPS (formerly called NIPS – Neural Information Processing Systems) is one of the premier machine learning conferences in the world. Researchers from across the globe present their latest projects in this field, but getting past the review screening? Not so easy. Thousands of papers are submitted every year out of which only a handful make the final conference.

The audience tickets for NeurIPS 2018 sold out within 12 minutes of the portal being opened! That might give you an inkling of how popular this annual conference is. For those who couldn’t be there – we are thrilled to present a quick summary of the best tutorials from NeurIPS 2018!

This year’s edition was held in Montreal, Canada between 2nd to 8th December. There were a variety of topics being showcased – from fairness and transparency in AI to visualizing deep learning models. You can check out the full schedule here.

There are hours and hours of videos, so our team went through all of them to bring you the best in the form of this article.

Note: We have embedded the videos for most sessions as well. A couple of videos are not being embedded due to some technical issue with FB’s video platform, and we have provided their direct links. While the summary is a good starting point, we encourage everyone to watch the videos as well – this is a great chance to learn from the top minds in this field.

Automatic Machine Learning
Common Pitfalls for Studying the Human Side of Machine Learning
Statistical Learning Theory: a Hitchhiker’s Guide
Unsupervised Deep Learning
Adversarial Robustness, Theory and Practice
Visualization for Machine Learning
Counterfactual Inference
Scalable Bayesian Inference
Negative Dependence, Stable Polynomials and All That

Automatic Machine Learning

Speakers: Frank Hutter and Joaquin Vanschoren

Tutorial summary

Building an end-to-end machine learning model involves a number of steps, such as preprocessing data, creating features, selecting model, and tuning the hyperparameters. Automatic Machine Learning, or AutoML, aims to automate these processes – this tutorial covers methods underlying the state-of-the-art in AutoML. Quite a relevant topic in today’s environment.

Frank Hutter kicked off the tutorial by discussing the various applications of deep learning and an expert’s role in building a successful model. This can potentially be replaced by an AutoML service that tries to learn the features, architecture and parameters to use based on the raw data that we provide. Followed by this basic introduction to AutoML, Frank spoke about the types of hyperparameters and modern approaches to Hyperparameter Optimisation. This is broadly divided into three sub-topics:

AutoML as hyperparameter optimization
Blackbox optimization: Discusses approaches for blackbox optimization like grid search, random search and Bayesian optimization.
Beyond Blackbox optimization: Covers three main approaches – hyperparameter gradient descent, extrapolation of learning curves and multi-fidelity optimization. Meta-learning is also a part of this aspect

The next topic Frank covered was about Neural Architecture, which is again divided into three parts – Search Space Design, Blackbox optimization and Beyond Blackbox optimization.

Search Space Design: Includes basic neural architecture search spaces such as chain structured search spaces and cell structured search spaces.
Blackbox optimization for neural architecture search (NAS) method: Frank covers NAS with reinforcement learning and Bayesian optimization as a part of this topic
Beyond Blackbox optimization: Discussion on the four main approaches weight inheritance and network morphisms, weight sharing and one-shot model, multi-fidelity optimization and meta-learning.

After a short Q&A session with Frank, Joaquin Vanschoren took over for the second half of the tutorial. His focus was mainly on Meta-Learning. He spoke about various approaches, configuration space design, surrogate model transfer and warm-started multi-task learning. Joaquin further discussed the Learning Pipeline followed by transfer learning and transfer features. He also spent some time discussing topics like gradient descent and LSTM meta-learner.

Common Pitfalls for Studying the Human Side of Machine Learning

Speakers: Deirdre Mulligan, Nitin Kohli, Joshua A. Kroll

Tutorial summary:

Machine learning is being used in almost every domain in the industry and researchers all over the world are examining how it can affect people and society. Ethics, essentially, should be at the heart of every ML project. The main idea behind this tutorial was to put forward some common misconceptions machine learning researchers and practitioners hold when thinking about certain topics.

This is a video we implore everyone to watch!

Some of the terms, like fairness, accountability, transparency and interpretability, are often reused to represent different meanings which may cause an unnecessary misunderstanding. This tutorial examined how the same words can be used to refer to different ideas. The presenters also showcased a few case studies where these learnings are being applied to ML problems.

The session started with focusing on the necessity of having certain definitions for the terms we use and how these terms carry different meanings for different people. We were introduced to the term ‘sociotechnical’. To explain the concept better, Nitin Kohli took the term ‘Fairness’ and showcased how it can come across differently for statisticians, computer scientists, or lawyers.

The next example, quite naturally, was of the word ‘Transparency’. The presenters picked up some very common examples to differentiate the meaning of the word for someone working in the government sector, to a machine learning engineer. Post this, Nitin described the word ‘Explanation’ and its various types with suitable instances.

They also spoke about the terms ‘accountability’ and ‘interpretability’ during the session and the Q&A that followed was pretty informative as well.

Statistical Learning Theory – a Hitchhiker’s Guide

Speakers: John Shawe-Taylor, Omar Rivasplata

Watch the video for this tutorial here.

Tutorial summary:

John Shawe-Taylor initiated the tutorial by giving an introduction to statistical learning theory (SLT) followed by some basic definitions and notations for terms used quite frequently, such as generalization gap, upper bound, etc.

Both speakers provided a broad outline of the session where they listed down some important topics:

First generation of SLT
- Worst-case uniform bounds
- Vapnik Chervonenkis characterization
Second generation SLT
- Hypothesis dependent complexity
- SRM Margin PAC-Bayes framework
Next generation SLT?

After familiarizing the audience with important terminologies, Omar Rivasplata took over the baton by discussing about First Generation SLT. He started with talking about the building blocks of a single function and then explains the finite function class and infinite function class. In the next few slides, Omar discussed the VM bounds along with the limitations to the VM framework.

Once the audience had been familiarized with the first generation of SLT in the first half of the talk, John gave an overview of what comes after that – the second generation of SLT. He elaborated on the different ways to make the bound function dependent and the techniques that can be used for detecting a benign distribution.

We also saw the Three Proof Techniques – Covering numbers, Rademacher Complexity, and PAC Bayes Analysis. He compared the PAC-Bayes bounds with Bayesian Learning. This part of the talk is really interesting – do watch the video to gain a deeper insight into it.

The last section of the tutorial is a discussion over the Next Generation SLT, where the speakers talks about Performance of neural networks and stability. Throughout the tutorial, the speakers explain all the concepts using plots and mathematical equations which makes the topics crystal clear.

Unsupervised Deep Learning

Speakers: Alex Graves and Marc’Aurelio Ranzato

Tutorial summary

A topic most of you will be curious to explore more! This tutorial is divided into two parts:

Part 1 is taught by Alex Graves
Part 2 by Marc’Aurelio Ranzato.

In Part 1, Alex explained why we need unsupervised learning in the first place. Why can’t we just provide the true labels for training the model? There are mainly three reasons for that:

Targets can be difficult to obtain
Unsupervised learning feels more human
We want rapid generalisation on new tasks and situations

Targets in supervised learning contain very less information as compared to the input data. Using supervised learning, we are bounding the model to learn only a few bits of information. Unsupervised learning on the other hand, gives us an essentially unlimited supply of information to learn. So instead of learning the data points, the model learns the dataset.

Unsupervised learning gives us more of a signal to learn from, but the learning objective is not entirely clear. Autoregressive neural networks can be used for density modelling which help to learn information from the data. Methods such as auto-encoding and predictive coding can yield useful latent representations.

In Part 2, Marc discussed various applications of unsupervised learning which are based on other frameworks and principles. He explained how to learn representations and samples and how to map between two domains. Some of the tips for learning representations are:

Always look at the data before designing the model
PCA and k-means are very often a strong baseline

He also mentioned how to extract features in NLP using unsupervised learning. Some of the applications of learning how to map between two domains are:

Making analogies in vision
Leverage lots of unlabeled data in machine translation
An AI agent has to be able to perform analogies to quickly adapt to a new environment so learning this mapping is helpful

Unsupervised learning has tons of sub-areas like feature learning, learning to align domains, learning to generate samples, etc. The biggest challenges with unsupervised learning are:

Which metric to choose and what should be the defined task?
Generality and efficiency of current algorithms
Integrating unsupervised learning with other learning components

Adversarial Robustness, Theory and Practice

Speakers: Zico Kolter and Aleksander Mądry

Tutorial summary

In the tutorial, both researchers spoke about how machine learning predictions are mostly accurate, but at the same time, brittle as well. Intrigued? Adding just a little noise to the data can change the predictions drastically, resulting in a drop in performance. Trying data augmentation also does not help much in improving the performance. Some of the problems that the brittleness of machine learning can cause are:

Security
Safety
ML alignment

Zico and Aleksander proposed three commandments in order to make our machine learning model more secure:

Don’t train on the data which you don’t trust as it may lead to data poisoning
Never let anyone use your model or observe its output unless you completely trust them
Don’t fully trust the predictions of your model because of adversarial examples

They talked about adversarial examples and verification, and how to train adversarially robust models. Zico further propounded on whether robust deep networks overfit or not. Even for training adversarial robust models, more data is required – this is a known fact. Data Augmentation can be used to make the model robust. Adversarial training is also an ultimate version of data augmentation as we train on the most confusing version of the given training set.

Some of the keypoints to make a model robust are:

In standard training, all correlation is a good correlation
But, if we want robustness, weakly correlated features must be avoided

Finally, to summarize adversarial robustness:

Optimization during training is more difficult and the model needs to be larger
More training data is required
The standard accuracy also might decrease while using adversarially robust models

Apart from this, the advantages of using adversarial robust models are massive. The model becomes more semantically meaningful. We will be able to rely on it far more. And it leads to machine learning that is not only safe and secure, but also better. Sounds like a good bet to us!

Visualization for Machine Learning

Speakers: Fernanda Viégas and Martin Wattenberg

Watch the full video for the session here

Tutorial summary

Visualization is a topic all of us can relate to at some level. Who among us hasn’t done a thorough EDA before?

Fernanda Viégas and Martin Wattenburg covered one of the most interesting and fundamental topics of machine learning – visualization. They first spoke about what data visualization is, how it works and what are some of the best practices for it. The talk then focused on how visualization has been applied to machine learning till date. A special case of high dimensional data has also been covered in this tutorial.

Data visualization is good for almost every field and some of its applications include:

Data exploration
Scientific insights
Communication
Education

Using colors for visualization makes it more interpretable and even faster. Visualization makes calculations easier and less tedious (and who doesn’t appreciate that?!). Some of the examples where it helps in calculation are:

Average calculation and comparison
Weighted average

The tutorial also dove into the interpretability and model inspection facets of a ML project. Visualizing different layers of convolutional neural networks (CNNs) helps us to understand how it classifies images (this also helps in case the model is not performing well).

We can interpret the model layer by layer and finally conclude where it is going wrong. They recommend using Jupyter notebooks for visualization which have libraries like matplotlib and plotly which have pre-built codes for most visualizations.

Scalable Bayesian Inference

Speaker: David Dunson

Tutorial summary

Bayesian learning as a topic has fascinated us for a long time.

The objective of this session was to motivate people to work more on Bayesian methods as these methods offer an attractive general approach for modeling complex data. David Dunson gave an overview of the state-of-the-art approaches for analyzing huge datasets using Bayesian statistical methods.

David explained how the Markov Chain Monte Carlo (MCMC) algorithm is becoming more and more scalable and faster thanks to the emerging rich and practical literature on the subject. Apart from that, he put quite an emphasis on tweaking the Bayesian paradigm to be more robust with respect to Big Data and scaling of Bayes to high-dimensional data (no. of features > no. of samples), which in itself is quite a hot topic.

If you are interested in Bayesian statistics, then this is a must-watch video, and has the following key takeaways:

Bayes is scalable
In big data and high-dimensionality problems, we have to tweak and modify Bayesian algorithms
We have to carefully explore how to exploit parallel processing for Bayes methods & accurate approximations to reduce bottlenecks
These Bayes methods can have improved computational performance & robustness

Negative Dependence, Stable Polynomials and All That

Speakers: Survit Sra and Stefanie Jegalka

Tutorial summary

This tutorial gives an introduction to the topic: the theory of negative dependence. This can impact all aspects of machine learning, including both supervised and unsupervised learning. It is a rich mathematical toolbox which aids in tasks like anomaly detection, information maximization, experimental design, validation of black-box systems, architecture learning, fast MCMC sampling, and much more.

The speakers have highlighted the rich variety of mathematical ideas behind the theory of negative dependence. In addition, the following topics have been covered:

Strongly Rayleigh (SR) Measures: Determinental Point Processes, Volume Sampling, Dual Volume Sampling
Applications in ML that benefit from Negative Dependence: Active learning, Interactive learning, Recommender Systems, Adversarial models, etc.

End Notes

That was quite an intense collection! This year’s conference was bigger than ever before, with more papers, a bigger venue and a far more widespread audience from around the world. We ended up learning quite a lot while reviewing these videos – so we recommend you do the same. 🙂

Which was your favorite session from NeurIPS 2018? Connect with us in the comments section below and feel free to ask any questions you might have on the topics that were covered.

Aishwarya Singh

An avid reader and blogger who loves exploring the endless world of data science and artificial intelligence. Fascinated by the limitless applications of ML and AI; eager to learn and discover the depths of data science.

Artificial Intelligence Deep Learning

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Deep Learning

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

Top Highlights from the Amazing Machine Learning Tutorials Presented at NeurIPS (NIPS) 2018

Introduction

Table of Contents

Automatic Machine Learning

Tutorial summary

Common Pitfalls for Studying the Human Side of Machine Learning

Statistical Learning Theory – a Hitchhiker’s Guide

Watch the video for this tutorial here.

Unsupervised Deep Learning

Tutorial summary

Adversarial Robustness, Theory and Practice

Tutorial summary

Visualization for Machine Learning

Watch the full video for the session here

Tutorial summary

Scalable Bayesian Inference

Tutorial summary

Negative Dependence, Stable Polynomials and All That

Tutorial summary

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap