Archive for March, 2010

Great presentation on the usage of the two most well known NoSql document databases. CouchDB and MongoDB. Also an excellent explanation of map/reduce.

Project Voldemort is a scalable, highly available, NoSql, distributed, key/value store developed by Linkedin which in turn was inspired by Amazon’s Dynamo paper. The source code for Voldemort is currently hosted at GitHub.

Voldemort is written in Java and will run on any platform where Java can run. Getting Voldemort up and running on Linux/OSX or any other Posix based system is seamless. Running Voldemort is a little challenging, but not impossible.

Voldemort Prerequisites

  • Recent Java JDK with Server HotSpot VM
  • Apache Ant
  • Optionally Python 2.6

Download and install the Java SDK and Apache Ant. Make sure the Apache Ant bin directory is on your path.

To verify that you have the Server HotSpot VM installed, open a command window and type the following:

java -server -version

You should see something like this:

java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode)
java version "1.6.0_17"Java(TM) SE Runtime Environment (build 1.6.0_17-b04)Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode)

If you get an error about no ‘server’ JVM at c:\Program Files\Java\jreXXX\bin\server.jvm.dll, then you’ll need to monkey-patch your Java installation before continuing.

Java Monkey Patch

  • Locate the directory where your Java JDK was installed (eg C:\Program Files\Java\jdk1.6.0_06)
  • Copy the \jre\bin\server folder to JRE\bin folder (eg: C:\Program Files\Java\jre6\bin)

Verify your Java installation again by entering the following in a command window:

java -server -version

You should get output similar to the paragraph above.

Installing Voldemort

For this tutorial, download the latest stable release from the Voldemort repository at GitHub. For this tutorial, I used Release 0.80.1.

After the download is complete, extract the files to a directory of your choosing.

If you navigate to the bin directory, you’ll notice that the majority of script files are shell scripts. Well that’s not going to work on Windows. Time for some more monkey patching.

Shell Script Monkey Patching

For this tutorial I will convert the three scripts you need to get Voldemort up and running:

  • run-class.sh
  • voldemort-server.sh
  • voldermort-shell.sh

Copy or download the 3 files below and place them in your Voldemort \bin directory.

run-class.bat [Download from GitHub]

voldemort-shell.bat [Download from GitHub]

nosql-mongodb-logo

MongoDB

Today 10Gen, the creators of MongoDB, announced in a blog post that MongoDB 1.4 is production ready.

Some of the new features and enhancements include:

Core server enhancements

  • indexing memory improvements
  • background index creation
  • better detection of regular expressions so the index can be used in more cases

Replication and Sharding

  • better handling for restarting slaves offline for a while
  • fast new slaves from snapshot
  • $inc replication fixes
  • 2 phase commit on config servers

Deployment and production

  • ability to do fsync + lock for backing up raw files
  • option for separate directory per database
  • REST interface is off by default
  • rotate logs with the db command logRotate
  • new mongostat tool and db.serverStatus enhancements

Query language improvements

  • $all with regex
  • $not
  • partial matching for array elements $elemMatch
  • $ operator for updating arrays
  • $addToSet
  • $unset
  • $pull supports matching objects
  • $set with array indices

Geo

  • 2d geospatial search
  • geo $center and $box searches

For more information, read the full release notes here.

NoSql Hadoop

If you have a legacy system and your thinking about migrating to Hadoop, you need to read BackType Technology’s post entitled “Migrating data from a SQL database to Hadoop“.

nosql-mongodb-logo

Try MongoDB Now

Wanted to try out MongoDB, but have no time to install and configure it? Well head over to http://try.mongodb.org/ and play around with MongoDB within your browser.

10Gen MongoDB NoSql Database

10Gen MongoDB

Full-day April 30th event in San Francisco to explore development on the NoSql document database MongoDB

The conference features sessions on database internals, schema design, GridFS, replication, sharding, and more. In addition to these topics, attendees can learn about MongoDB in the real world through a series of presentations about production deployments at Justin.tv, Boxed Ice, Punchbowl Software, and Harmony App.

Small, hands-on, language-specific development workshops will take place alongside the presentation tracks. The Conference Center Lounge will also be available for the duration of the conference for hacking, networking, and discussion.

DATE: Friday April 30, 2010

TIME: 8:30am – 5:30pm

LOCATION: Bently Reserve, 400 Sansome Street, San Francisco, CA 94111

BUY TICKETS: Register

Like Twitter, Digg is moving from MySql to Cassandra.

Digg is moving away from MySql to a “NoSql” solution due to difficulties in building a “high performance, write intensive” application said John Quinn, VP of engineering in a recent blog post.

Growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead

Most of Digg’s functionality has been reimplemented using Cassandra as it’s NoSql datastore. Digg previously ran experiments on their live site, which you can read more about here, replacing a high scale MySql system with a Cassandra alternative.

Digg has also made it’s own enhancements to Cassandra. “We’ve made massive performance improvements: increased comparitor speed, added better compaction threading, reduced logging overhead, added row-level caching and implemented multi-get capability. We’ve also implemented native atomic counters using Zookeeper” he says.

Quinn also mentions says “We’ve tested and improved the operational capabilities of Cassandra, upgrading its Rackaware capability, added slow query logging, improved the bulk import functionality and implemented Scribe support for improved logging”.

salvatore sanfilippo

In a press release, VMware announced that it had hired key Redis NoSql database developer Salvatore Sanfilippo.

The press release goes on to say that as cloud computing continues to push boundries, many exciting and new technologies will join the relational database as a means to store and retrieve data. They cite Google’s App Engine Big Table use as an example.

Redis will joing VMware’s other open source efforts that currently include Spring and Zimbra. They pledged to let Salvatore Sanfilippo continue his valuable work with Redis.

© Copyright Kommunicate Inc. All Rights Reserved.