There has been a lot of investment in Big Data by various companies in last few years. This rise in usage of big data analytics has resulted in high demand of skilled big data professionals. While there has been a lot of debate over usefulness of this spend, there is a clear increase in the jobs on Big Data. Here is what a quick search on indeed tells:
Given the sharp increase in demand, big data has become a lucrative area to upskill yourself. However, if you are some one like me, who needs a good overview and understanding of practical benefits before learning something in a more formal fashion, you will probably struggle to find structured resources as I did some time back.
There are a lot of technologies and terminologies associated with Big Data, which can act as an additional road block to get you started. Names like Hadoop, MapReduce, Spark, MongoDB, Hive take some time to get used to! After receiving an overwhelming response on my previous article on Top YouTube Videos on Machine Learning, Deep Learning, Neural Network and looking at the lack of structured resources, the answer was simple! Here’s another YouTube special for Big Data aspirants – basically my take on immersing yourself on Big Data!
Disclaimer: We DO NOT intend to promote any brand or service through this article. The videos listed in this article are solely based on their relevance / usefulness to the audience.
I have written this article keeping in mind the beginners fraternity of Big Data. Hence, this article is best suited for candidates keen to start their career in big data analytics. If you are already an experienced big data professional, this article might not be what you are looking for! However, you can still consume ‘inspiration’ from TED Talks listed below.
The structure of this article is designed to give a complete overview on various technologies used in Big Data Analytics. TED Talks displayed at the beginning are meant to add a pinch of inspiration to your learning path. These talks offers you to imagine an exciting world driven by numbers, analytics and big data technologies.
Duration: 11:30 mins
Summary: In this short video, Hilary talks about the rise of big data and how it is going to impact our work environment. She also highlights the tiny but significant changes brought by big data which includes CPUs, Data and Algorithms. Later, she examines the profile of a data scientist in her style. She highlights the applications of big data and its usage in our day to day lives.
Duration: 22:00 mins
Summary: Dr. Kirk Borne begins by talking about his journey to become a data scientist. Later, he covers some of the best ideas applied behind data mining and how it can be applied to our daily lives. He also talks about the ‘small world phenomenon’ and ‘6 degrees separation’. Later, he reveals some surprising statistics of big data which promises that the future world will be driven by data.
Duration: 16:00 mins
Summary: Kenneth lays immense focus on using data available at the granular level. Every byte of data has something or the other to reveal, all it requires is an engineer to discover. He believes that, with the available amount of information, we can find answers to all the questions which were difficult to think of earlier. Data has made us more powerful. Data can be our greatest power if we dispose it intuitively.
Duration: 12:29 mins
Summary: Susan believes ‘We are not the passive consumers of data and technology. Rather, we shape the data and make meaning from it‘. In this short video, she shares her perspective on the rise of big data and the different ways of using data for its optimal utilization. Data doesn’t create meaning, we do. Data offers us a vast ocean of information which has to be churned to extract useful insights.
Duration: 11:53 mins
Summary: The title says it all. The speaker makes use of statistics and visualization to infer the worst place to park in NYC. He made sure that he didn’t miss out any important information, hence he captured all the important variables in his graphical representations. If you ever wanted to see the real time usage of data, you shouldn’t miss it.
I would highly recommend these YouTube videos to people who are new to big data analytics. Watching these quick videos ( ~ 3 mins) videos would give you a clear overview of the different big data technologies and the relations between them.
1. What is HBase? Duration – 3 mins
2. What is Hadoop? Duration – 3:12 mins
3. What is MapReduce? Duration – 2:39 mins
4. What is HDFS? Duration – 2:51 mins
5. What is Flume? Duration – 2:59 mins
6. What is PIG? Duration – 3:01 mins
7. What is Hive? Duration – 2:52 mins
8. What is Avro? Duration – 3:00 mins
9. What is Oozie? Duration – 2:28 mins
10. What is Zookeeper? Duration – 3:26 mins
Duration – 55:32 mins
Summary: As the name suggest, this video covers all about Hadoop and related concept in less than an hour. The speaker begin with a quick introduction of Hadoop, followed by explaining hadoop ecosystem and distribution, HDFS in detail. Later, multiple components of Hadoop such as Mapreduce, Yarn, Tez are explained using some interesting stories. Finally, he winds up this crash course by revealing some of the not so popular but super useful ways of accessing data.
Duration – 32:03 mins
Summary: This is a complete tutorial to learn basics of MapReduce. This tutorial series is divided in 5 parts, each of which covers a specific module of MapReduce. This introductory video on MapReduce provides a detailed overview on its importance, related job opportunities, applications and usage. As you navigate through its following parts, you will cover essential fundamentals of MapReduce. Do check the Up Next section while you are there!
Duration – 40:25 mins
Summary: This tutorial teaches you the knowledge of integrating hadoop with R. The speaker follows a step by step process of Hadoop installation on R. Concepts like RHadoop, RHive and various related R libraries have been discussed. Furthermore, he also discusses on varied usage of R and how R programming has evolved over the years.
Duration – 41:14mins
Summary: The speaker beautifully explains the concept of deep learning using hadoop. Deep Learning is one of the most talked about topic in data science community. Scientists and researchers are working hard to discover new patterns using deep learning. The concept of deep learning has been explained in a simplistic manner in this video. Topics like deep belief networks, implementation of Hadoop / YARN have also been discussed.
Duration – 1:15:06 hour
Summary: I have found very few videos on Apache Cassandra but this makes up for all. Here’s a complete introduction to Apache Cassandra from scratch. The rise of Apache Cassandra is catching eyes of companies and professionals across the world. In this video, the speaker explains the algorithms used, its essential features, benefits and the concept / cause behind launching Apache Cassandra ~6 years back.
Duration – 30:56 mins
Summary: This is a complete tutorial to lean about PIG. In this tutorial, the instructor begins with providing an overview of Pig followed by the comparison between Pig and SQL. Since both are very similar, it makes an interesting comparison. He also explains about using Pig latin. Above all this, the basic steps of Pig installation have also been illustrated.
Duration – 1:06:14 hour
Summary: This tutorial aptly justifies its title by teaching about the spark technology and how it can help in shaping the world. The tutorial begins with a quick refresher of mapreduce followed by spark and the advantage of using this technology. The speaker has beautifully explained these concepts.
Duration – 1:06:19 hour
Summary: Hive is built on top of hadoop to provide data management, querying and analysis. This tutorial discusses hive architecture, hive operations and other related functions. This tutorial not only enriches you with theoretical knowledge, but also displays the practical aspect and demonstrates the same on terminal.
Duration – 54:51 mins
Summary: This is one of the best video I have come across on NoSQL databases. You’ll find an introduction to NoSQL databases along with every other essential knowledge of this concept which you must possess. This tutorial covers application, advantages, disadvantages, compatibility, usage, characteristics and various other essential features of NoSQL. I’ll recommend this video for everyone.
Duration – 4:34:47 hour
Summary: If you ever longed to learn MongoDB, here the complete resource for you. This tutorial comprehensively covers all the aspect of MongoDB and NoSQL databases. Though, it appears to be quite long ( > 4 hours), you can watch this tutorials in breaks. A prior knowledge of Javascript would be advantageous for learning MongoDB through this tutorial. This tutorial begins with introduction to NoSQL databases followed by explaining mongodb, how to run mongodb queries, node.js, advanced data processing and method to learn mongodb on cloud ubuntu.
Alternate resource: MongoDB course on Udacity
If you have watched the videos listed above – you would be equipped with the essentials of Big Data by now. In this article, I have highlighted the most helpful YouTube videos and TED talks I found on internet. The videos listed are intend to build you big data basics and make your learning path easier.
If you wish to reap maximum benefits from these videos, I’d insist to make notes and get your hands dirty while watching these videos. In case I have missed out on any important video, feel free to mention it in the comments section below.
Wow! This is very close to what I was looking for. I just need a little clarification, is big data analytics different from business analytics, if yes then how? I'm currently going through statistics videos on khan academy. I hope that helps too?
I'm a learner too. As far as I understand, Business Analytics need not necessarily involve huge amounts of data: you could simply be using the data from your factory or sales figures of a (datawise) small- or medium-sized company, to increase productivity or project profits and things like that. And all of this data could potentially be stored in an ordinary MySQL DB, for example, and would not require the use techniques like MapReduce to decrease computational resources. "BigData" refers to phenomenon of companies like Google, Amazon and Facebook which have access to Petabytes of new data every day, from which they want to extract patterns (ostensibly to serve their customers better ;) -- in reality, to push more ads into their faces). The sizes of these datasets are extremely larage, and the structure (or a lack of one) doesn't resemeble simpler ones like factory output. With more companies tapping into the smartphone/tablet boom every day, even smaller players now have access to large data sets (but still relatively small when compared to Google, say) that they look for what are typically called "BigData Technologies", like the ones covered by the videos listed in this blog post.
Plz add a button "Add to reading list" to all your videos.
sorry....add button to articles not to videos. ;)
AV Team, AV is kazien tool for me. Wow, what a great collection. It's dictionary. Hats off to AV team. Warm Reagrds, Kumar Chinnakali