Tag Archives: “Big Data”

Bees and Elephants In The Clouds – Using Hive with Azure HDInsight


For today’s blog entry I will combine both Cloud and Big Data to show you how easy it is to do Big Data on the Cloud.  Microsoft Azure is a cloud service provided by Microsoft, and one of the services offered is HDInsight, an Apache Hadoop-based distribution running in the cloud on Azure. Another service […]

https://gennadny.wordpress.com/2015/08/17/bees-and-elephants-in-the-clouds-using-hive-with-azure-hdinsight/

Announcing the Azure SDK 2.7.1 for .NET


via Announcing the Azure SDK 2.7.1 for .NET | Microsoft Azure Blog.

Today, we’re excited to announce Azure SDK 2.7.1 for Visual Studio 2013 and Visual Studio 2015. This release includes some improvements to the Visual Studio Tools for Azure and enhance the customer experience who are enrolled in either the DreamSpark or Cloud Solution Provider programs. This post summarizes the updates and how the new tools will improve your Azure development experience.

Azure SDK 2.7.1 for .NET [download for VS 2015 | VS 2013]

  • Improved support for Visual Studio 2013: Azure SDK 2.7.1 brings Visual Studio 2013 on par with the capabilities available in the Azure SDK 2.7 release for Visual Studio 2015. Visual Studio 2013 now supports DreamSpark and Cloud Solution Provider customer accounts. The new Cloud Explorer tool window is now also available in Visual Studio 2013.
  • Improvements to the HDInsight tools for Visual Studio: Big data developers will benefit from improvements to the Hive Job Operator view, the Hive Error Marker, and the Storm Topology Graph. We’ve also made some improvements to IntelliSense suggestions for HDInsight.
  • The Azure Resource Manager Tools have been updated to include better IntelliSense and to automate deployment of virtual machines with in-guest customization easier.
  • AZCopy 3.2.0 RTM: This update includes the support for new storage blob type Append Blob and FIPS compliant MD5 setting. You can find more details on the Azure Storage blog.

Actionable Architecture: Secure Hybrid Data Warehouse on Bluemix (Big Data Workshop 1)


via Actionable Architecture: Secure Hybrid Data Warehouse on Bluemix (Big Data Workshop 1) – Bluemix.

This workshop series walks you through an overview of the Big Data & Analytics reference architecture, how Bluemix supports that architecture, and how you can build your very own native Big Data application on Bluemix today. Getting hands on, you will create the System of Record (SoR) database in a Virtual Machine on the Bluemix VM service (simulating an on-premise SoR), deploy & configure a Secure Gateway to connect the SoR to Bluemix, deploy a Big Data sample application to Bluemix, and configure the application to connect to the on-premise SoR via the Secure Gateway connection. This first workshop is a prerequisite for all the other workshops in the Big Data series, as it provides the necessary application components for future workshop interaction.

Ordered Sets and Logs in Cassandra vs SQL


I’ve written before that Cassandra’s achilles’ heel is devops: Storage, redundancy and performance are expanded by adding more nodes. This can happen during normal business hours as long as consistency parameters are met. Same applies to node replacements. As the number of servers grows be prepared to hire a devops army or look for a […]

http://thedulinreport.com/2015/04/08/ordered-sets-and-logs-in-cassandra-vs-sql/

Ordered Sets and Logs in Cassandra vs SQL


I’ve written before that Cassandra’s achilles’ heel is devops: Storage, redundancy and performance are expanded by adding more nodes. This can happen during normal business hours as long as consistency parameters are met. Same applies to node replacements. As the number of servers grows be prepared to hire a devops army or look for a […]

http://thedulinreport.com/2015/04/08/ordered-sets-and-logs-in-cassandra-vs-sql/

Tuning Java Garbage Collection for Spark Applications


via Tuning Java Garbage Collection for Spark Applications.

This is a guest post from our friends in the SSG STO Big Data Technology group at Intel.

Join us at the Spark Summit to hear from Intel and other companies deploying Spark in production.  Use the code Databricks20 to receive a 20% discount!


Spark is gaining wide industry adoption due to its superior performance, simple interfaces, and a rich library for analysis and calculation. Like many projects in the big data ecosystem, Spark runs on the Java Virtual Machine (JVM). Because Spark can store large amounts of data in memory, it has a major reliance on Java’s memory management and garbage collection (GC). New initiatives like Project Tungsten will simplify and optimize memory management in future Spark versions. But today, users who understand Java’s GC options and parameters can tune them to eek out the best the performance of their Spark applications. This article describes how to configure the JVM’s garbage collector for Spark, and gives actual use cases that explain how to tune GC in order to improve Spark’s performance. We look at key considerations when tuning GC, such as collection throughput and latency.

HOW TO BULK LOAD DATA FROM TEXT FILE TO BIG DATA HADOOP HBASE TABLE?


via How to Bulk Load Data from Text File to Big Data Hadoop HBase Table?.

Here we are introducing the process of bulk loading of data from text file using HBase java client API. The worldwide Hadoop development community will learn in this post about bulk loading and when to use it and how its process is looks like.

We are introducing bulk loading of data using HBase bulk load feature using HBase java client API.