Data Lake for Enterprises
上QQ阅读APP看书,第一时间看更新

Sink as an event destination

Sink represents the target system connectors from a Flume perspective. Flume has built-in sink connectors for connecting to various systems in an enterprise over various protocols, in a very similar way to that of the source.

While Flume was the initial approach for near real-time stream processing, it did lack from being a true near-real-time that could accommodate custom processing of events/messages, and had develop custom component for the same. The set up and deployment of flume was static at a given point in time, with a given configurations. For any changes required, the configurations needed changes and the Flume process had to be restarted. This posed a limitation for near-real-time use cases.

These limitations were soon addressed by frameworks such as Storm, Spark Streaming, and so on. For the context of this book and to apply the Lambda architecture to Data Lakes, we will be primarily considering Spark Streaming and the Flink framework.