In the world of Deep Learning, applications are plagued by Computing Resources (GPUs and CPUs), Datasets and Algorithms. But these problems are even more elevated in the Language Translation domain due to non-availability of parallel corpus between Common and Rare Languages. In this Hack session, We will have a look at the current State-of-the-Art in the Unsupervised Language Translation domain which does not require parallel corpus of translations and utilises the previous works in the field of Machine Translation, Statistical Machine Translation and Unsupervised Embeddings to achieve the results.
We will also see how the performance of Unsupervised Machine Translation can be significantly increased and brought close to the current State-of-the-Art Supervised Machine Translations with only few thousands of parallel translations.
Structure of the Hack Session
- Then and Now – History of Machine Translation from SMT to MT
- Types of Machine Translation – Supervised and Unsupervised
- Brief on GNMT – State-of-the-Art on Supervised MT
- Detailed discussion on how Unsupervised Machine Translation works.
- Brief on Evaluation Metrics for Machine Translation
- Implementing Unsupervised MT for Language Translation without Parallel Corpus.
- Implementing Unsupervised MT for Language Translation with a small Parallel Corpus.
- Conclusion – The next possible steps for research in Unsupervised MTs.
Tools Used
- Python 3
- Pytorch
HACKERS
Neeraj Singh Sarwan
He is a perpetual, quick learner and keen to explore the realm of Data analytics and science. He is deeply excited about the times we live in and the rate at which data is being generated and being transformed as an asset. He is well versed with a few tools for dealing with data and also in the process of learning some other tools and knowledge required to exploit data.
Duration of Hack-Session: 1 hour