Useful Machine Learning and HDInsight / Hadoop Links Posts and Information


  • Initial Post: 2014-11-17

As many ramp up on Microsoft Azure Machine Learning, I wanted to start keeping a succinct list of many of the articles, blogs, videos, posts, etc. that have shown to be helpful in conveying the essence of the general practice of Machine Learning as well as the implementation within Microsoft Azure.

Machine Learning Center

R Programming

R for Beginners by Emmanuel Paradis

Introductory Statistics with R (Statistics and Computing), Peter Dalgaard 

R Succinctly, Barton Poulson, Syncfusion

An Introduction to Statistical Learning, with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani


Analyzing Customer Churn using Microsoft Azure Machine Learning


Develop a predictive solution with Azure Machine Learning

Create a simple experiment in Azure Machine Learning Studio


Instructional Azure Machine Learning videos

Tools / Scripts

Creates a cluster with specified configuration.
Creates a HDInsight cluster configured with one storage account and default metastores. If storage account or container are not specified they are created
automatically under the same name as the one provided for cluster. If ClusterSize is not specified it defaults to create small cluster with 2 nodes.
User is prompted for credentials to use to provision the cluster.

During the provisioning operation which usually takes around 15 minutes the script monitors status and reports when cluster is transitioning through the
provisioning states.

Blog Posts

Benjamin Guinebertière (from Microsoft France) has a great blog that covers quite a few scenarios that many encounter when ramping and using Microsoft Azure Machine Learning

Azure Automation: What is running on my subscriptions - Benjamin Guinebertière

Remember you pay for what you use; ensure you keep track of these in-use clusters. In fact, the goal is to provision only when needed. Take a look at Kerrb for a commercial option to help you manage your spend:

Sample code: create an HDInsight cluster, run job, remove the cluster - Benjamin Guinebertière

Again, we want to keep our data in Blobs (or other persistence) then hydrate the cluster, process, save off our results, then kill the cluster.

How to upload an R package to Azure Machine Learning - Benjamin Guinebertière

Adding R scripts and packages can be achieved through this method.

How to retrieve R data visualization from Azure Machine Learning - Benjamin Guinebertière

R is a great point of extensibility. Here we see how to visualize the R output (images) that could be run as part of your R script.

Carl Nolan’s blog is also a great resource – much more than just ramblings:

Managing Your HDInsight Cluster using PowerShell – Update - Carl Nolan

Managing Your HDInsight Cluster and .Net Job Submissions using PowerShell - Carl Nolan

Hadoop .Net HDFS File Access – Carl Nolan


There is a book on Azure ML due out this week (2014-11-19)

Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes, Valentine Fontama, Roger Barga, Wee Hyong Tok, ISBN-13: 978-1484204467 ISBN-10: 1484204468 Edition: 1st


Microsoft Azure Machine Learning Frequently Asked Questions (FAQ)


Machine Learning Preview Pricing Details

Data Factory