Useful Machine Learning and HDInsight / Hadoop Links Posts and Information
Updates:
- Initial Post: 2014-11-17
As many ramp up on Microsoft Azure Machine Learning, I wanted to start keeping a succinct list of many of the articles, blogs, videos, posts, etc. that have shown to be helpful in conveying the essence of the general practice of Machine Learning as well as the implementation within Microsoft Azure.
Machine Learning Center
http://azure.microsoft.com/en-us/documentation/services/machine-learning/
R Programming
R for Beginners by Emmanuel Paradis
http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Introductory Statistics with R (Statistics and Computing), Peter Dalgaard
R Succinctly, Barton Poulson, Syncfusion
http://bit.ly/1pzxbJi
An Introduction to Statistical Learning, with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
http://www-bcf.usc.edu/~gareth/ISL/
Papers
Analyzing Customer Churn using Microsoft Azure Machine Learning
Tutorials
Develop a predictive solution with Azure Machine Learning
Create a simple experiment in Azure Machine Learning Studio
http://azure.microsoft.com/en-us/documentation/articles/machine-learning-create-experiment/
Videos
Instructional Azure Machine Learning videos
http://azure.microsoft.com/en-us/documentation/videos/index/?services=machine-learning
Tools / Scripts
Creates a cluster with specified configuration.
DESCRIPTION
Creates a HDInsight cluster configured with one storage account and default metastores. If storage account or container are not specified they are created
automatically under the same name as the one provided for cluster. If ClusterSize is not specified it defaults to create small cluster with 2 nodes.
User is prompted for credentials to use to provision the cluster.
During the provisioning operation which usually takes around 15 minutes the script monitors status and reports when cluster is transitioning through the
provisioning states.
Blog Posts
Benjamin Guinebertière (from Microsoft France) has a great blog that covers quite a few scenarios that many encounter when ramping and using Microsoft Azure Machine Learning
http://blogs.msdn.com/b/benjguin/
Azure Automation: What is running on my subscriptions - Benjamin Guinebertière
Remember you pay for what you use; ensure you keep track of these in-use clusters. In fact, the goal is to provision only when needed. Take a look at Kerrb for a commercial option to help you manage your spend: http://www.kerrb.com/
Sample code: create an HDInsight cluster, run job, remove the cluster - Benjamin Guinebertière
Again, we want to keep our data in Blobs (or other persistence) then hydrate the cluster, process, save off our results, then kill the cluster.
How to upload an R package to Azure Machine Learning - Benjamin Guinebertière
Adding R scripts and packages can be achieved through this method.
How to retrieve R data visualization from Azure Machine Learning - Benjamin Guinebertière
R is a great point of extensibility. Here we see how to visualize the R output (images) that could be run as part of your R script.
Carl Nolan’s blog is also a great resource – much more than just ramblings: http://blogs.msdn.com/b/carlnol/
Managing Your HDInsight Cluster using PowerShell – Update - Carl Nolan
Managing Your HDInsight Cluster and .Net Job Submissions using PowerShell - Carl Nolan
Hadoop .Net HDFS File Access – Carl Nolan
http://blogs.msdn.com/b/carlnol/archive/2013/02/08/hdinsight-net-hdfs-file-access.aspx
Books
There is a book on Azure ML due out this week (2014-11-19)
Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes, Valentine Fontama, Roger Barga, Wee Hyong Tok, ISBN-13: 978-1484204467 ISBN-10: 1484204468 Edition: 1st
FAQ
Microsoft Azure Machine Learning Frequently Asked Questions (FAQ)
http://azure.microsoft.com/en-us/documentation/articles/machine-learning-faq/
Pricing
Machine Learning Preview Pricing Details
http://azure.microsoft.com/en-us/pricing/details/machine-learning/
Data Factory
http://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/