Sampling/filtering RDDs to pick out relevant data points