Yesterday, at the Strata+Hadop World conference, we announced the first Microsoft Azure managed service, HDInsight to be available on Linux. This post will give you the ins and outs of this new offering.
What is Azure HDInsight?
Azure HDInsight is Microsoft’s Hadoop-as-a-service offering. It is built on top of the Hortonworks Data Platform (HDP) and architected for the Azure cloud. HDInsight provides the following benefits to customers:
- Crunch large non-relational datasets that come in real-time: A core value for Hadoop is the ability to process large volumes of data whether it be relational or non-relational in near real-time.
- Fast Time To Value: As a Hadoop-as-a-service, anyone can spin up Hadoop on any number of nodes with just a few clicks and within minutes.
- Minimal deployment expertise with no hardware planning or maintenance required: Because you can spin up clusters on-demand, you will not have to do hardware maintenance, tuning, patching, upgrades and capacity planning.
- Backed by Microsoft’s reliability SLA: Backed by Azure’s 99.9% reliability guarantee, Microsoft will be monitoring your clusters 24×7 to provide you peace of mind.
What is Azure HDInsight on Linux?
With this announcement, customers will have a choice to select either Windows or Linux when deploying HDInsight from the Azure portal. Both options are first class citizens, offering simple deployment, SLA, technical support for the entire stack, ranging from Hadoop to the operating system.
For customers with prior experience deploying Hadoop in Linux on-premise, they will be able to leverage their existing skillsets and tooling (documentation, samples, and templates) to work with HDInsight. Familiar tools like SSH and Ambari are now available to use.
- SSH connectivity to your HDInsight clusters is enabled by default for all HDInsight on Linux clusters. You can use an SSH client of your choice to connect to the cluster. Additionally, SSH tunneling can be leveraged for forwarding traffic from your browser to all of the Hadoop web applications.
- Ambari is a project for the management and monitoring of Big Data clusters. You’ll find it deployed to the cluster when you browse to it from the Azure portal. Ambari provides a single view of the performance and state of your Hadoop cluster, as well as providing the ability to customize configuration setting, and deploy additional Hadoop apps onto the cluster. Ambari provides monitoring and alerting within the HDInsight cluster, configurable from a web app hosted on the cluster.
How do I get Started?
To get started on Azure HDInsight on Linux, you need to sign up for the preview on the preview feature page. To do this, go to the Azure portal and select the option under HDInsight to sign up.
After you are signed up for the preview, head over to watch the Channel 9 video to get more details.
For more information on the announcements we made at Strata, head over to the Official Microsoft Blog and the Azure Blog. Finally, for more information about Azure HDInsight on Linux, read below:
- Channel 9 Video: HDInsight on Linux
- Introduction to HDInsight which gives recommendations when to use Windows or Linux
- Getting started using HDInsight on Linux
- Tips for HDInsight on Linux
- Develop Python MapReduce jobs for Azure HDInsight on Linux
- Use SSH keys with Linux-based Hadoop on HDInsight
- Blog: HDInsight is the first Azure Service To Run on Linux
General information on HDInsight:
- Read documentation
- Learning Map
- Microsoft Virtual Academy
- Try 30 day trial