Udacity Data Scientist Nanodegree is an adventurous, interesting and industry relevant data science program and so are its capstone projects. I decided to undergo the Spark-based project called “Sparkify”. Sparkify is supposed to be a fictional music streaming service, however the provided dataset is very realistic.
Completing this project can help us learn how to use Spark MLlib to build machine learning models with large datasets, far beyond what could be done with non-distributed technologies like scikit-learn.
Udacity provides 2 versions of datasets. The full dataset (12GB) for trying Spark cluster on the cloud using AWS or IBM Cloud. …
In this blog we will have a look at some basic approaches which are supposed to give us a clue and inspiration for using data science and machine learning techniques to improve an existing, or start a profitable, AirBnB business.
The Seattle AirBnB homes data-set, which we decided to use for research and analytical techniques demonstration, can be found at the link below.
Following 3 questions are not the crucial questions to be answered in order to employ Data Science to increase our revenues. …
Data Scientist and Software Engineer