The current Hadoop ecosystem is very reminiscent of the early J2EE days. EJB was cumbersome: only with the stamina of a stower one could pack a decent application. I never managed to be honest. Around 2003 Spring came along, and alternative O/R mappers were appearing: saving Java’s ass in a big way.
Ofcourse having a choice is great, it is easy to get lost in the forrest though. And sometimes I get the impression that doing programming is like being a boy scout: finding trails in the dark forrest of JAR. The downside of this is that as a practitioner one can be very much sucked into to the darkness instead of focussing on what your clients business is about.
M/R seems to be the EJB of 201x. The difference, and the good news, ofcourse being that M/R allows for abstractions. Pig, Hive to name a few. Two days ago I saw a presentation on Cascading at a pre-Hadoop summit BOF. Cascading is an impressive piece of work: it abstracts out M/R jobs completely, allowing for Hadoop to be used as a means, not and end.
The framework is focussed on creating functional data flows. It also allows to connect R to hdfs using a JDBC driver. I’m going to check that out.