Category Archives: Data science

Back to the Fortran

A lot of numerical programs are still written in Fortran. When I programmed Fortran 18 years ago, it felt to much like Basic. Soon I switched to C and later to C++, to forget all about that later again. Now, if I think about it: Fortran manages to get the performance of C with the simplicity of Basic. That is not a bad deal, if I think about it now.

I wanted to do some Fortran or C programming for R for quite a while. On this page a simple start is given: http://www.r-bloggers.com/fortran-and-r-speed-things-up/ I want to add a little Fortran help to it.

Diving into C/C++ or Fortran can be a bit ackward. A fast way to start out turns out to be Netbeans. Netbeans supports C/C++ and Fortran, there even is a special C/C++ version at netbeans.org; it uses make to drive the build process.  Dowload and install Netbeans C/C++ and install gfortran. Choose New Project -> C/C++ Dynamic Library, name it ‘facto’ and choose Fortran as language in the next step. Paste the source of the mentioned article as facto.f in the source files section. Switch to ‘Release’ in the toolbar and right click on the ‘facto’ project node to create a build. Presto, you can see the build running in the bottom right. A dynamically linked library is built.

nfb

Open R-Studio, create a function that calls the library, and factorial 5 is …

Rf

Unlimited possibilities!

M/R, the EJB of 201x?

The current Hadoop ecosystem is very reminiscent of the early J2EE days. EJB was cumbersome: only with the stamina of a stower one could pack a decent application. I never managed to be honest. Around 2003 Spring came along, and alternative O/R mappers were appearing: saving Java’s ass in a big way.

Ofcourse having a choice is great, it is easy to get lost in the forrest though. And sometimes I get the impression that doing programming is like being a boy scout: finding trails in the dark forrest of JAR. The downside of this is that as a practitioner one can be very much sucked into to the darkness instead of focussing on what your clients business is about.

M/R seems to be the EJB of 201x. The difference, and the good news, ofcourse being that M/R allows for abstractions. Pig, Hive to name a few. Two days ago I saw a presentation on Cascading at a pre-Hadoop summit BOF. Cascading is an impressive piece of work: it abstracts out M/R jobs completely, allowing for Hadoop to be used as a means, not and end.

The framework is focussed on creating functional data flows. It also allows to connect R to hdfs using a JDBC driver. I’m going to check that out.