At CloudCamp St. Louis, Michael Groner, my colleague at Appistry, gave a talk on the MapReduce framework. The 5-minute Lightening Talk was a condensed version of the 45-minute presentation he delivered originally at the St. Louis Lambda Lounge (a local group focused on dynamic and functional programming). You can check out his slides below.
Michael received great feedback on his presentation and generated a lot of interest in MapReduce. As a follow-up, I thought it would be nice to feature a brief Q&A with him, which I present below:
Sam: What is MapReduce?
Michael: MapReduce is a software framework to support processing of massive data sets across distributed computers. MapReduce has its origins from functional programming where the “map” and “reduce” operations assist developers in processing data sets easily. For applications requiring the translation of gigabytes, terabytes, or even petabytes of data into usable data structures and then collecting to end results, MapReduce is a good candidate.
Sam: Where did it come from? Is it new?
Michael: MapReduce’s popularity surge started in 2004 with the publication of “MapReduce: Simplified Data Processing on Large Clusters” by Google. In this paper Google illustrated the framework they use to construct an index of the internet through a simple programming paradigm.
Sam: Why should people care? What does MapReduce have to do with cloud computing?
Michael: This framework contains many of the key architecture principles promoted for cloud computing:
- Scale – The framework is able to grow and expand in direct proportion to the number of machine applied to the system.
- Reliable – The framework is able to handle the loss of a node and restart the work somewhere else.
- Affordable – As reliability is accounted for in the framework, commodity hardware can be used. As scale is accounted for, a user can start small and add additional hardware as necessary.
- Simple – A user simply provides the implementation of a map and reduce function in order to process large quantities of data.
Due to these principles, the MapReduce framework has become one of the most popular platform tools for building cloud applications.
Sam: Can you give us some examples of who is using it?
Michael: MapReduce applications are used in by companies like Google, Yahoo, Visa, and the intelligence community, among others, to produce results from the large data sets they produce.
Sam: How should someone go about learning more?
Michael: To learn more I suggest reading the original Google paper and looking into a MapReduce framework such as the Apache Hadoop project.
Thanks, Michael!
Are you using MapReduce? We would love to hear your experiences with it!
