A list of best data science & machine learning projects at GitHub

Data Science   |   
Published January 26, 2019   |   

In this post, we shall discuss the leading data science and machine learning projects at GitHub.  Let us start exploring.
What’s the best stage for facilitating your code, working together with colleagues, and furthermore that goes about as an online resume to grandstand your coding abilities? Ask any data scientist and they’ll point you towards GitHub. It has been a really progressive platform lately and has changed the scene of how we host and even do coding. Here are some of the best data science and machines learning projects at GitHub.


Scikit-learn is a Python module for machine learning based over SciPy. It highlights different order, relapse and grouping calculations including support for vector machines, strategic relapse, guileless Bayes, irregular woods, angle boosting, k-means and DBSCAN, and is intended to interoperate with the Python numerical and logical libraries NumPy and SciPy.
The official source code is repo.


The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that executes the HTM learning calculations. HTM is a point by point computational hypothesis of the neocortex. At the center of HTM are time sensitive nonstop learning calculations that store and review spatial and transient examples. NuPIC is suited to an assortment of issues, especially abnormality identification and expectation of streaming information sources.


The pattern is a web mining module for Python. It has instruments for Data Mining, Natural Language Processing, Network Analysis, and Machine Learning. It supports vector space show, clustering, characterization utilizing KNN, SVM, Perceptron.


The ramp is a python library for fast prototyping of machine learning arrangements. It’s a light-weight pandas-based machine learning system pluggable with existing python machine learning and measurements apparatuses (sci-kit-learn, rpy2, and so on.). Incline gives a basic, definitive linguistic structure for investigating highlights, calculations, and changes rapidly and proficiently.


Milk is a machine learning toolbox in Python. Its attention is on the directed arrangement with a few classifiers accessible: SVMs, k-NN, arbitrary woods, choice trees. It additionally performs include determination. These classifiers can be consolidated from multiple points of view to shape diverse grouping systems. For unsupervised learning, drain underpins k-implies bunching and proclivity spread.


Skdata is a library of informational indexes for machine learning and measurements. This module gives institutionalized Python access to toy issues just as famous PC vision and normal language preparing data collections.


It’s a library comprising of helpful apparatuses and expansions for everyday information science undertakings.


A gathering of test applications manufactured utilizing Amazon Machine Learning.


REP is a condition for directing information-driven research in a predictable and reproducible way. It has a bound together classifiers wrapper for an assortment of usage like TMVA, Sklearn, XGBoost, uBoost. It can prepare classifiers parallelly on a bunch. It support of intuitive plots

NVIDIA’s vid2vid Technique

There has been colossal advancement in the picture-to-picture interpretation field. Anyway, the video preparing field has seldom observed numerous leaps forward as of late. Until now
NVIDIA, officially driving the route in utilizing profound learning for picture and video preparing, has publicly released a procedure that does video-to-video interpretation, with stunning outcomes. They have publicly released their code on GitHub so you can begin with utilizing this strategy NOW. The code is a PyTorch usage of vid2vidand you can utilize it for:

  • Converting semantic names into reasonable certifiable recordings
  • Creating various yields for integrating individuals talking from edge maps
  • Generating a human body from a given posture (the structure as well as the whole body!)

Dopamine by Google

If you’ve worked or examined in the field of support learning, you will have a thought of how troublesome (if certainly feasible) it is to imitate existing methodologies. Dopamine is a TensorFlow system that has been made and publicly released with the expectation of quickening progress in this field and making it progressively adaptable and reproducible.
In case you’ve been needing to learn support adapting however were terrified by how complex it is, this store comes as a brilliant chance. Accessible in only 15 Python records, the code accompanies itemized documentation and a free dataset!


This one is for all the R clients out there. We as a rule download R bundles from CRAN so I for one haven’t wanted to go to GitHub, however, this bundle is one that I discovered exceptionally fascinating. Chorrrds encourages you to separate, dissect, and sort out music harmonies. It even comes pre-stacked with a few music datasets. You can quite introduce it from CRAN, or utilize the devtools bundle to download it from GitHub.


PredictionIO is a broadly useful system. It incorporates a few layout motors for surely understood undertakings, for example, grouping and proposal, which can be modified, associates with existing applications with REST APIs or SDKs, and incorporates underpins for Spark MLib. Since it is based over Spark and uses its biological system, it should not shock anyone that PredictionIO is produced predominantly in Scala.

Mask R-CNN

Mask R-CNN is for article location and division. This is a usage of Mask R-CNN on Python 3, Keras, and TensorFlow. The model creates jumping boxes and division veils for each occurrence of an article in the picture. It depends on Feature Pyramid Network (FPN) and a ResNet101 backbone.

Face Recognition

Perceive and manipulate faces from Python or from the direct line with the world’s most straightforward face acknowledgment library. This likewise gives a straightforward face recognition direction line apparatus that gives you a chance to do confront acknowledgment on an envelope of pictures from the command line!


Flask is a lightweight WSGI web application structure. It is intended to make beginning snappy and simple, with the capacity to scale up to complex applications. It started as a straightforward wrapper around Werkzeug and Jinja and has turned out to be a standout amongst the most prevalent Python web application structures.


Zulip is an incredible, open source assemble visit application that consolidates the quickness of continuous talk with the efficiency advantages of strung discussions. Zulip is utilized by open source ventures, Fortune 500 organizations, vast models’ bodies, and other people who require a continuous talk framework that enables clients to effortlessly process hundreds or thousands of messages multi-day. With more than 300 givers converging more than 500 submits a month, Zulip is likewise the biggest and quickest developing open source aggregate chat project.


GitHub is a constantly developing piece of technology. It is being adopted by various companies for the betterment of their overall functionality. With the help of all these associated projects, you can do more than ever with GitHub.