Map Reducer Design Pattern

A Powerful Tool for Efficient Data Processing in Software Development.

Ignacio Chiazzo
Dev Genius

--

The Map Reducer Pattern is a popular design pattern used in software development to process large amounts of data quickly and efficiently. By breaking down a large dataset into smaller, more manageable chunks, the Map Reducer Pattern can significantly reduce processing time and improve performance.

In this article, we will dive deep into the Map Reducer Pattern, exploring how it works and why it’s such a powerful tool for data processing.

Imagine we want to build a website called Flight Finder that automates searching for the best flight from many different websites. Our solution must use other systems to find the best tickets, collect them, and return the best options.

Flight Finder system user flow.

Each airline provides APIs to communicate with its system. Given some parameters, a user will ask our system to search for a flight. Our system will call each API, combine them, and return the user the best options. We will do that in two phases.

Phase I: Map phase

Given the user input, the first step is to call all the APIs to ask for the best flights available. Since the APIs use different modelling, we must create a standard class Flight with our own structure (dates, origin, destination, price and seat type). The second step is to iterate on the output of each API response and create Flight objects.

Map Reduce Pattern Map Phase.

After we have the set of Flight objects, we must select the best among them.

Phase II: Reduce phase

We will sort and filter the Flights’ set considering some parameters such as price, number of stops and total trip time and return the best options to the user.

Map Reduce Pattern used for the Flight Finder solution. Map and Reduce phases together.

Note: In an ideal world, we would use Caching and Batching, but I omitted them in this example to show the pattern.

Map Reduce Pattern

In the example above, we have used the popular pattern Map-Reduce. It’s mainly used for processing large amounts of data in a distributed and parallel manner.

It was first introduced by Google in 2004 and has since become widely adopted in big data.

The MapReduce pattern involves two main stages: Map and Reduce. In the map stage, the input data is divided into chunks and processed in parallel by multiple nodes in a cluster. Each node applies a mapping function to the input data, producing a standard set of objects.

In the reduce stage, the shared object is sorted and filtered following specific criteria and then processed by multiple nodes in parallel. Each node applies a reducing function to the common object, producing a set of final objects.

Map Reduce Pattern

The MapReduce pattern is often used to perform complex data processing tasks, such as distributed sorting, searching, and data analysis. It provides a scalable, fault-tolerant and efficient way to process large amounts of data in parallel across a distributed system.

If you didn’t know about this pattern, you have likely used it before without knowing its name.

Conclusion

The Map Reduce pattern is an intuitive and powerful pattern widely used for data processing, and it can also be used in many other situations where data needs to be mapped and reduced.

--

--