Tag Archives: Cassandra

Ordered Sets and Logs in Cassandra vs SQL


I’ve written before that Cassandra’s achilles’ heel is devops: Storage, redundancy and performance are expanded by adding more nodes. This can happen during normal business hours as long as consistency parameters are met. Same applies to node replacements. As the number of servers grows be prepared to hire a devops army or look for a […]

http://thedulinreport.com/2015/04/08/ordered-sets-and-logs-in-cassandra-vs-sql/

Ordered Sets and Logs in Cassandra vs SQL


I’ve written before that Cassandra’s achilles’ heel is devops: Storage, redundancy and performance are expanded by adding more nodes. This can happen during normal business hours as long as consistency parameters are met. Same applies to node replacements. As the number of servers grows be prepared to hire a devops army or look for a […]

http://thedulinreport.com/2015/04/08/ordered-sets-and-logs-in-cassandra-vs-sql/

DataStax Java Driver: 2.2.0-rc1 released!


via DataStax Java Driver: 2.2.0-rc1 released! : DataStax.

The Java driver team is pleased to announce the release of version 2.2.0-rc1, which brings parity with Cassandra 2.2. Here’s what the new Cassandra features change for driver users:

Tuning Hadoop & Cassandra : Beware of vNodes, Splits and Pages


via Brian ONeill’s Random Thoughts: Tuning Hadoop & Cassandra : Beware of vNodes, Splits and Pages.

When running Hadoop jobs against Cassandra, you will want to be careful about a few parameters.

Specifically, pay special attention to vNodes, Splits and Page Sizes.

vNodes were introduced in Cassandra 1.2.  vNodes allow a host to have multiple portions of the token range.  This allows for more evenly distributed data, which means nodes can share the burden of a node rebuild (and it doesn’t fall all on one node).  Instead the rebuild is distributed across a number of nodes.  (way cool feature)

BUT, vNodes made Hadoop jobs a little trickier…

The first time I went to run a Hadoop job using CQL, I was running a local Hadoop cluster (only one container) against a Cassandra with lots of vNodes (~250).  I was running a simple job (probably a flavor of word count) over a few thousand records.  I expected the job to complete in seconds.  To my dismay, it was taking *forever*.

Video: Microservices with NodeJS and Cassandra


Stop writing monolithic web applications that are complex and inflexible! It’s time to break everything apart into independently deployable services that make it easier to scale and pivot in the face of rapidly changing requirements. Learn why Apache Cassandra and Node.js are great choices to build fault-tolerant systems.

In this webinar, Jorge Bay Gondra, Software Engineer at DataStax will review the benefits of deploying a microservices architecture with Cassandra as your backbone in order to ensure your applications become incredibly reliable. He will discuss in detail:

• How to create microservices in Node.js with ExpressJs and Seneca
• Tuning the Cassandra Driver: error handling, load balancing and degrees of parallelism
• Additional best practices to ensure your systems are highly performant and available

Building a Spark / SciPy / Cassandra “SparkLab” on AWS


via Building a Spark / SciPy / Cassandra “SparkLab” on AWS | Code Trips & Tips.

I have just completed for a client a complete setup of a “SparkLab” on a cluster of AWS machines: the setup has been completely automated via a Bash script which I have published to this public github gist.

The following is a copy of the README file there; the script can be used also on any standalone Ubuntu Server (I have recently used on a VirtualBox VM to build a local development instance).

As usual, comments and suggestions welcome.

Use Ansible to Automate AWS Cassandra Cluster Restart


via Use Ansible to Automate AWS Cassandra Cluster Restart | jj tech blog.

  • The cluster is used sparingly, but it is a reasonably-sized one to allow performance testing. It wastes a lot of money sitting idle.
  • If I shut down the nodes after usage, the next time I start them up again, all the IP addresses have changed, and I’ll need to modify the Cassandra configurations all over again.
  • I could have used a VPC and assign private IP addresses, however, it makes accessing these nodes from outside of the VPC impossible, unless I set up private/public subnets and use jumphost or NAT. I don’t have the privilege to do so in this company-wide account.

jj tech blog

I’ve set up a small test cluster on AWS to deploy Apache Cassandra 2.1.3.  Now I face a few problems related to how to manage the cluster’s EC-2 nodes:

  1. The cluster is used sparingly, but it is a reasonably-sized one to allow performance testing. It wastes a lot of money sitting idle.
  2. If I shut down the nodes after usage, the next time I start them up again, all the IP addresses have changed, and I’ll need to modify the Cassandra configurations all over again.
  3. I could have used a VPC and assign private IP addresses, however, it makes accessing these nodes from outside of the VPC impossible, unless I set up private/public subnets and use jumphost or NAT. I don’t have the privilege to do so in this company-wide account.

Given these constraints, I decided to deal with shutdown and start-up cycles in a more automated way. I could script something…

View original post 6,307 more words