Tag Archives: Cloudera

Installing Cloudera Manager and CDH on Amazon EC2: Part-1

Log into the AWS console. Go to EC2. Please refer my previous post “Setting up infrastructure with Amazon EC2: Part-1” for more info. I have chosen CentOS 6.4 as our underlying operating system.


Bayesian Machine Learning on Apache Spark

Markov Chain Monte Carlo methods are another example of useful statistical computation for Big Data that is capably enabled by Apache Spark.

During my internship at Cloudera, I have been working on integrating PyMC with Apache Spark. PyMC is an open source Python package that allows users to easily apply Bayesian machine learning methods to their data, while Spark is a new, general framework for distributed computing on Hadoop. Together, they provide a scalable framework for scalable Markov Chain Monte Carlo (MCMC) methods. In this blog post, I am going to describe my work on distributing large-scale graphical models and MCMC computation. Read more>>