One task is to perform the merge (join) of two large-scale distributed data sets. One dataset contains all of the particles with their positions and velocities, and the other data set, only contains clusters with the particle unique ids.

An example query is:
“given a specific cluster, extract all of the positions of the particles in that cluster.”

A second task (multi-stage merge/join) is:
“given a third dataset that contains the merging and splitting of clusters over time, extract the position of all of the particles, which merge into a specific cluster, over time.”

Finally, to exercise segmented operations (group by and reduce):
“given all three datasets, calculate the average statistics, such as velocity or position, for every cluster over time.”


Note: This page is currently under construction and will be updated soon.