Stanford has long been considered one of the best universities in terms of teaching, quality of faculty and the content they teach. With the recent boom in the machine learning field, Stanford’s ML courses have generated a lot of interest (you can find videos on YouTube if you haven’t done so already).
Each year, Stanford releases a list of projects that it’s students have worked on and recently, in that same regard, has released a list of course projects for it’s Natural Language Processing (NLP) course. And wow, is it impressive.
Students were given two options for the project – either choose your own topic (called ‘Custom Project’) or take part in the ‘Default Project’, which was building Question Answering models based on the SQuAD challenge. “Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage.” Remember we covered this on AVBytes – Alibaba and Microsoft’s models had the bests scores on this dataset, which have since been eclipsed.
Some of the papers submitted will undoubtedly impress you. The winner of the ‘Custom Project’ did a project on Speech Synthesis in which the text-to-speech synthesis model produces audio from a sequence of input characters. They also demonstrated how to build a convolutional sequence to sequence model with a natural voice and pronunciation from scratch in well under $75!
There was a project on machine translation of the Eskimo language which also won a prize. The students built a sequence to sequence neural machine translation model that translates the Eskimo language into English. Incredible!
Another project that caught the eye was ‘Generating SQL queries from Natural Language’. The students developed a model for generating SQL query from from natural language question.
There was even a project on creating memes – ‘Dank Learning: Generating Memes using Deep Neural Networks’, which produces a humorous caption given any image.
Check out Stanford’s page here which lists each project.
I like that Stanford opens these projects up to the community so all aspiring data scientists and practitioners can read about the different (and sometimes unique) approaches taken. I highly recommend going through this list and choosing the papers you find interesting. This will be a great learning for you in the NLP field.
One of my favourites, and this will undoubtedly interest data scientists, was based on generating SQL queries from natural language. Check it out!
Which project caught your eye? Let me know in the comments section below.
What a great find!
I think NL to SparQL would be the much more interesting approach, and the latter easily translated into SQL or any other data retrieval method.