via Learn Apache Camel – Indexing Tweets in Real-time | Building scalable enterprise applications.
There’s a point in most software development projects when the application needs to start communicating with other applications or 3rd party components.
Whether it’s sending an email notification, calling an external api, writing to a file or migrating data from one place to another, you either roll out your own solution or leverage an existing framework.
As for existing frameworks in the Java ecosystem, on one end of the spectrum we find Tibco BusinessWorks and Mule ESB, and on the other end there’s Spring Integration and Apache Camel.
In this tutorial I’m going to introduce you to Apache Camel through a sample application that reads tweets from Twitter’s sample feed and indexes those tweets in real time using Elastic Search.
via Running PageRank Hadoop job on AWS Elastic MapReduce | The Pragmatic Integrator.
In a previous post I described an example to perform a PageRank calculation which is part of the Mining Massive Dataset course with Apache Hadoop. In that post I took an existing Hadoop job in Java and modified it somewhat (added unit tests and made file paths set by a parameter). This post shows how to use this job on a real-life Hadoop cluster. The cluster is a AWS EMR cluster of 1 Master Node and 5 Core Nodes, each being backed by a m3.xlarge instance.
The first step is to prepare the input for the cluster. I make use of AWS S3 since this is a convenient way when working with EMR. I create a new bucket, ‘emr-pagerank-demo’, and made the following subfolders:
- in: the folder containing the input files for the job
- job: the folder containing my executable Hadoop jar file
- log: the folder where EMR will put its log files
In the ‘in’ folder I then copied the data that I want to be ranked. I used this file as input. Unzipped it became a 5 GB file with XML content, although not really massive, it is sufficient for this demo. When you take the sources of the previous post and run ‘mvn clean install’ you will get the jar file: ‘hadoop-wiki-pageranking-0.2-SNAPSHOT.jar’. I uploaded this jar file to the ‘job’ folder.
That is it for the preparation. Now we can fire up the cluster. For this demo I used the AWS Management Console:
via caught Somewhere In Time = true;: head first elastic search on java with spring boot and data features.
In this article I’ll try to give you an easy introduction on how to use Elastic Search in a Java project. As Spring Boot is the easiest and fastest way to begin our project I choose to use it. Futhermore, we will heavily use Repository goodies of beloved Spring Data.
The Cross Datacenter Replication (XDCR) feature in Couchbase Server lets us synchronize data between different database clusters. One of the most interesting uses of XDCR is replicating data from Couchbase to ElasticSearch in near real-time. On its own, Couchbase is awesome; combining it with the search and analytics capabilities of ElasticSearch turns the awesomeness up to eleven!
In this session, we’ll learn how to tie Couchbase and ElasticSearch together into a seamless data storage and analysis platform. We’ll examine the benefits that each brings to the table and how we can use the former for efficient storage and retrieval, and the latter for fast indexing, full-text searches, and geo-spatial querying. But mainly, we’ll explore the most interesting use case of combining ElasticSearch with Couchbase: using Kibana to create dynamic, real-time data analytics dashboards on top of the data stored in Couchbase Server.