6 Open Source Data Science Projects to Try at Home!

Pranav Dar Last Updated : 04 May, 2020
7 min read

Overview

  • Work on your data science skills using these open source projects
  • These open-source data science projects cover a broad range of topics, from computer vision to web analytics

 

Introduction

Have you found learning at home difficult? Most of us are in the same boat – there are too many things to juggle during these tumultuous times and learning has, contrary to our initial expectations, taken a back seat.

So how can we get back on track? How can we combine our data science learning with practical experience?

One key thing that has helped me immensely is picking an open-source data science project and running with it. This not only helps me understand the key areas I need to improve on but also shows me the way forward.

data_science_projects_github

And these projects aren’t your run-of-the-mill data science projects. These are specific projects that tackle a certain data science sub-field, such as computer vision, web analytics, and so on. The project could be a dataset, a state-of-the-art library that has brought the data science field forward, or even an open-source analytics tool.

So, pick a project that intrigues you and start working on it today!

You can check out our entire archive of open source data science projects here.

 

6 Open-Source Data Science Projects to Try During this Lockdown Period

 

Open Source Computer Vision Projects

Thanks to the power of PyTorch, we’re seeing a slew of awesome use cases in the computer vision space this year. Here, I have picked out a few outstanding computer vision projects you’ll love exploring and diving into.

And if you’re new to this field and are looking to get started, then check out these resources:

 

Convert Any Image into a 3D Photo

This is an exquisite use case of computer vision. Converting an image into a 3-dimensional photo required sophisticated and in-depth knowledge of tools such as Photoshop at one point in time. Now, thanks to the advances in deep learning and computer vision, we can perform this transformation in just a few lines of code!

This project, open-sourced on GitHub, does exactly that. It takes a single RGB-D input image and converts it into a 3D photo. If you prefer deep learning terms, then this is “a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view”.

Check out an example of what you can do using this framework:

Pretty awesome, right? This project, as you might have guessed already, has been done using PyTorch. That’s a framework you should really start getting familiar with:

 

Transform an Image into a Cartoon Illustration

This is a sweet side project to work on if you don’t have a lot of time on your hands. It does what it says on the box – you give the model an input image, and it’ll transform that into a cartoon version:

photo_cartoon_open_source_project

Can you take a guess as to what computer vision concept is behind this project? Yes – Generative Adversarial Networks (GANs). I am truly amazed at the rapid advancements we’ve seen in GANs since it was open-sourced in 2014 to the community. From CycleGANs to StarGANs, there’s no shortage of frameworks you can pick up and work on.

The developers behind this photo-to-cartoon project have open-sourced a pretrained model to help you quickly load and execute this on your machine. I have seen a few attempts at this before but this is the most realistic transformation I’ve come across.

Here are a few resources to help you understand GANs:

 

One-Shot Multi-Object Tracking

Object detection frameworks have seen remarkable progress in recent years. We have gone from generating simple bounding boxes on static images to tracking dynamic objects in videos. That’s the power of computer vision.

However, progress in uniting the concepts of object detection and re-identification has been slow (to say the least!). In this fascinating study, the researchers present a simple baseline to address this gap using one-shot multi-object tracking.

Check out their model in action:

multi_object_detection_open_source

The baseline model they have open-sourced outperforms the state-of-the-art on public datasets at 30 fps. You can find both the code and research paper on the link I have mentioned above.

I recommend going through the below tutorials if you’re looking to learn object detection:

 

Other Awesome Open Source Data Science Projects

I have curated a list of miscellaneous open source data science projects here, from audio generation to sports analytics. Have a crack at your favorite and enjoy the learning experience!

 

OpenAI’s Jukebox: A Generative Model for Music

I clicked on this project as soon as I saw OpenAI in the headline. I’m a big fan of their work, and I appreciate their stance on open-sourcing the major developments to the general data science community. Who doesn’t love GPT-2?

Jukebox, as music fans will intuitively understand, is a neural network model that generates music with singing in the raw audio domain. OpenAI has open-sourced the model weights and code, along with a tool to explore the generated samples.

music

Here’s how Jukebox works – we provide the genre, artist, and lyrics as input, and the neural network gives us a new music sample produced from scratch. The range of music Jukebox can generate is staggering in its scope. This is a fascinating project to work on!

You can see (and hear) Jukebox in action on OpenAI’s site. And you can also check out Analytics Vidhya’s articles on working with audio data:

 

ShyNet – Privacy-Friendly and Cookie-Free Web Analytics

Do you use web analytics tools like Google Analytics to track your site’s performance? The issue with these tools is that there is no privacy for your organization. Additionally, you might need to fork out some money if you want the premium features. Not ideal for everyone, then.

These are the key gaps ShyNet aims to bridge. Here’s how the developers put it:

“You host it yourself, so the data is yours. It works without cookies, so you don’t need any intrusive cookie notices. It collects just enough data to be useful, but not enough to be creepy. It’s open source and intended to be self-hosted. And you may even find the interface easy to use.”

Here’s a sample screenshot of ShyNet’s default homepage:

shynet_web_analytics_open_source_project

And if you’re wondering what key metrics ShyNet can give you, your wait is over:

  • Hits
  • Sessions
  • Page load time
  • Bounce rate
  • Duration
  • Referrers
  • Locations
  • Operating system
  • Browser
  • Geographic location & network
  • Device type

Keep in mind that ShyNet in its current format is great if you have a small or medium-sized business. It might not be ideal to use if you’re in a big firm. The GitHub repository I have linked above contains a comprehensive run-through of how ShyNet works and how you can start using it.

I recommend going through the below in-depth guide to learn about the world of digital marketing (of which web analytics is a part):

 

Soccer Analytics Handbook

This is a personal favorite. I’m a huge football fan and have been delving into the world of sports analytics for quite some time now. Progress in this field has been far slower as compared to other industries but in the last couple of years, teams and franchises are waking up to the power of analytics and data science.

American sports are way ahead of other countries in terms of progress and adaptability but European football clubs are starting to finally play ball. Liverpool, for example, relies heavily on a data-driven approach from top-to-bottom, including planning their recruitment strategy.

So, if you’re a sports fan and want to dabble into the world of analytics, this is the perfect open source project for you.

football_analytics_open_source_project

The GitHub repository contains a plethora of resources to get you started, including:

  • Resources and suggestions for technical skills worth having for work in football analytics
  • A collection of Python tutorials that showcase how to work with football datasets
  • Research papers and articles about state-of-the-art developments in football analytics

 

End Notes

So, which open-source data science project will you work on in May? I tried to cover a broad range of domains here that offer a good depth of choices for you. I’m personally very excited to dive into the football analytics handbook project and see how I can further my knowledge of the subject.

If you have any other open-source projects to share with us, feel free to drop the name and link in the comments section below. Let’s make this a super productive learning month!

Senior Editor at Analytics Vidhya.Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Responses From Readers

Clear

Dipesh Pal
Dipesh Pal

The first two required lots of GPU power, you can't try at home. 😂🤣

Dudekula Mahammad Rafi
Dudekula Mahammad Rafi

Could you please suggest me online courses for learning data science.

Deepithi Guvva
Deepithi Guvva

Thanks for sharing this valuable information. we CNP Construction’s is one of the experienced Real Estate company which offer a wide range of related services with all the genuine and confidence of the customer. Our aim is to deliver well experienced and expertized real estate services by adopting new technologies and applying new work concepts.

Congratulations, You Did It!
Well Done on Completing Your Learning Journey. Stay curious and keep exploring!

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details