⇩ Markdown

Blog draft streaming vs batch

Not quite as big of a distinction as you might think

Page 395k Business processes have long imposed artificial bounds on data by cutting discrete batches. Keep in mind the true unboundedness of your data; streaming ingestion systems are simply a tool for preserving the unbounded nature of data so that subsequent steps in the lifecycle can also process it continuously. ^artificial-bounds-cutting-discrete-batches

In reality all streaming systems are micro batch systems because of the throughput advantages of dealing with blocks of data

Streaming joints allow for interesting QOS

In Flink the DataStream API unifies streaming and batch, so it becomes a Runtime decision

So-called batch is really just for bounded data sets where you can complete phases of processing. This allows for certain optimizations in scheduling

Sometimes thinking in batches is helpful, such as when you are considering whether some set of data is valid or not. These batches might be better thought of as windows, though.

link not tracked

link not tracked

link not tracked