Coordination and communication
We need to make sure that when the find process has nothing more to report, which means that it has found all the qualifying filenames, egrep should also stop!
Similarly, if any of the processes in the pipeline quits for any reason, then the entire pipeline should stop.
For example, here is a pipeline that computes the factorial of 1,000:
% seq 1000 | paste -s -d '*' | bc
40238726007709377354370243392300398571937486421071463254379991042993\
85123986290205920442084869694048004799886101971960586316668729948085\
.... rest of the output truncated
The pipeline has three filters: seq, paste, and bc. The seq command just prints numbers from 1 to 1,000 and puts them in the console. The shell arranges things so that the output gets fed into the pipe that is consumed by the paste filter.
The paste filter now joins all the lines with the * delimiter. It just does that little bit, and outputs the line to standard output, as shown in the following screenshot:
The paste command writes to the console; the shell has arranged the output to go into a pipe again. At the other end, the consumer is bc. The bc command or filter is capable of arbitrary precision arithmetic; in simpler terms, it can perform very large computations.
When the seq command exits normally, this triggers an EOF (end of file) on the pipe. This tells paste that the input stream has nothing more to read, so it does the joining, writes the output on the console (which is going to a pipe really), and quits in turn.
This quitting results in an EOF for the bc process, so it computes the multiplication, prints the result to the standard output, which is really a console, and finally quits. This is an ordered shutdown; no more work needs to be done, so exit and relinquish the computing resources for other concurrent processes, if there are any. The melodramatic term for this marker is poison pill. See https://dzone.com/articles/producers-and-consumers-part-3 for more information.
At this point, the pipeline processing is done and we get back to the shell prompt again, as shown in the following diagram:
Unbeknownst to all the filters participating in the pipeline, the parent shell has arranged for this coordination. This ability of the framework to be composed of smaller parts without the parts themselves being aware of the composition is a great design pattern, called pipes and filters. We will see how composition is one central theme, yielding robust concurrent programs.
What happens when the seq process produces numbers way too fast? Would the consumer (paste in this case) get overwhelmed? Aha, no! The pipeline also has an implicit flow control built into it. This is yet another central theme, called back-pressure, where the faster producer (or consumer) is forced to wait so the slower filter catches up.
Let's next look at this flow control mechanism.