Monthly Archives: December 2016

Rolling down the hill

Credit for the title of this post can be taken by dancinghands.com. I’m learning how to play the bongo, and rolling down the hill describes one of the many Latin rhythms. It also applies to data science.

In the quest for real-time processes, some unexpected help came from an enterprise architecture model. These days everything is agile, and architecture is seen as the work of charlatans. I do agree that architecture can go terribly wrong. At the same time though, not all quality stems from weekly iterations. The model I am referring to is depicted below. It gave me insight to what I saw happening.

320px-layers_of_the_enterprise_architecture

Once you decide that real-time processes are important (also look at how real-time is real-time), requirements start to trickle down. It is likely that the information architecture needs adjusting in the direction of events and micro-services (more on this later). This in turn will impact the systems and component designs. Real-time systems require high uptime, so application management and infrastructure is likely to change to. This is in essence what I have seen happening in the last year. Rolling down the hill.

Summarizing, the following qualities are required in the layers:

  • Business processes – real-time or as fast as needed (might be weekly)
  • Information architecture – avoiding hotness of databases and schema changes, focus on integrating using key indexes like contract number, address, postal code etc.
  • Application layer – systems are placed in highways of events, keep data streaming, state becomes localized, resilience, idem potency
  • Infrastructure layer – Scalability and fault-tolerance, don’t let success become you future problem

I will leave details to your imagination.

So how real-time is real-time?

Currently I am involved in transitioning a larger corporation to more real-time processes. One of the questions that re-occurs is the following: How real-time is real-time? Are micro-second response times really adding to the bottom line. After all, consumers have contracts for a year or so. So why do it?

I have to admit. This question confused me quite a lot for some time. Gradually the mist has disappeared and a clear answer has formed. It goes like this.

Listen batch is really great. Most algorithms can be optimized in batch mode, and put online. Works, fulfills quite some needs. But still, you are going to lose out. These batches are produced once a month. By working hard, for quite a long time, you could probably run the batches every three of even two weeks. But here it comes.

The cost and complexity of speeding up batching is just going to be really high. Basically you are going to make data move more and faster between databases. It will break. For every company, there is a breaking point for batch. Beyond this breaking point the only paradigm that is going to save you is real-time. It might be that your algorithms only need updating once week. Beyond your breaking point, once a week is real-time for you. Simple.

Did I mention that a lot of batch type of algorithms find it really hard to model sequences through time? And that deep-learning allows to combine convolutional and LSTM networks? Trust me, the future is streaming; real-time.