This page is moving to a new website.
Mark Litwintschik has taken a large open source data set (1.1 billion taxi rides with data storage on the order of hundreds of gigabytes) and ran some benchmark queries on a variety of different systems. Perhaps the most humble of these systems is a cluster of three Raspberry Pi computers. This webpage talks about how he set up the software on this cluster.
Mark Litwinktschik. 1.1 Billion Taxi Rides with Spark 2.2 & 3 Raspberry Pi 3 Model Bs. September 17, 2017. Available in html format.