Thursday, March 30, 2017

Stream processing of time series and the uncertainty principle of Heisenberg

Recently at work, we worked on processing time-based event series.

The collected events could enter the system out of order, and we needed to sort and process them based on their time of arrival in the system.
This was done using time windows. Time windows are buckets of events. Events arrived in the system are placed in the corresponding time buckets.  If an event arrives later than a predefined max out order time, it is discarded.
Once the system decides a time window is ready for processing it releases the events and they are processed.
E.g. consider the following:
  • Time window 1: 0sec-10sec
  • Time window 2: 10sec-20sec
  • Time window 3: 20sec-30sec

And consider a maximum out of order delay of 2sec.
  • Event 1 arrives at t=1sec and is placed in time window 1
  • Event 2 arrives at t=5sec and is placed in time window 1
  • Event 3 arrives at t=11sec and is placed in time window 2
  • Event 4 arrives at t=9sec and is placed in time window 1 (out of order delay is still smaller or equal than 2 seconds)
  • Event 5 arrives at t=15 sec  and is placed in time window 2, since t=15 sec, and t-end of time window 1> out of order delay, the window 1, with events 1,2 and 4, is released for processing


The question arises, what is the optimal size of the window?
If you have small time windows, you fill them fast, and release them fast for further processing, but on the other end the ability to sort the events is limited.
On the other hand, using large time windows provides the ability to arrange a large amount of events in order of arrival in the system, but the delay to process the ordered events increases proportionally.
If you want to order an infinite stream of time-based events perfectly, you will have to wait an eternity before processing them.

This duality, this inherent impossibility to reconcile both aspects: reduce latency to a maximum (release events as fast as possible for processing: small time windows)  and the ability to have an ordered set of events (large time events). The same duality exists in computer science, to a certain extent, with latency vs performance.

This is exactly what the uncertainty principle in physics is about.

The uncertainty principle of Heisenberg is an important concept/idea in the field of quantum mechanics.
It was formulated by Werner Heisenberg in the 1920's when the basics quantum physics were conceived.

It basically states that there is a limit to the precision in wich we can measure velocity and position of an object at the same time. Either we can measure speed with extreme precision but then we compromise on the exactitude of the object's position or the other way around. 
For objects at human scale the lack of precision is negligible. However, for atomic particles the limit becomes really apparent.  This is a fundamental limitation. It does not result from our lack of technological ingenuity. It is a limit to the way we represent physical reality.

This fundamental uncertainty is the core principle of the famous Einstein's quote: God doesn't play dice.
There is another equivalent formulation of the uncertainty principle which uses energy and time, or frequency and time, instead of velocity and position:
It is impossible to measure the exact frequency of a phenomenon quickly. If you want zero uncertainty on the frequency you have to measure it during an infinite time.

People working on signal processing always have to compromise about the precision of the frequency measurements and the time windows.

Let's say you want to investigate the migration pattern of birds. You want to know the frequency of their migrations. 
You go outside once in October and you see birds flying southwards. Can you deduce anything regarding the frequency of their migration? No, you cannot.
Let's say you go outside a couple of times during the year, and observe that once during this time frame they were flying southwards and once to the north. Can you extrapolate with certainty that this pattern is applicable every year and at all times. No you cannot.

To be sure you will have to observe for eternity.

What I wanted to illustrate here is that sometimes seemingly unrelated ideas are somehow connected.
And  that the way we perceive and interpret reality sometimes limits us in seeing underlying concepts






No comments:

Post a Comment

Remaining relevant: Abstraction layers and API's for cloud native applications

Separation of concerns, abstraction layers and API's for Cloud native environments Those 3 terms are closely related to each other. T...