Data Lake for Enterprises
上QQ阅读APP看书,第一时间看更新

DStreams

Streams represent discrete sets of RDDs (Resilient Distributed Datasets) for both input and output data streams. Spark streaming provides many of the Streams as part of the Spark streaming framework, while various frameworks supporting Spark streaming, provide their own implementations of RDDs that can be used for DStreams.

These DStreams are divided into micro-batches before getting submitted to the core Spark Engine for processing:

Figure 09: Spark streaming streams