Can we use MongoDB with without Hadoop

Can card reduction algorithms written for MongoDB be later ported to Hadoop?


In our company, we have a MongoDB database with a lot of unstructured data for which we need to run map-reducing algorithms to generate reports and other analysis. There are two approaches to implement the required analyzes:

  1. One approach is to extract the data from MongoDB into a Hadoop cluster and perform the analysis entirely on the Hadoop platform. However, this requires significant investments in preparing the platform (software and hardware) and training the team to work with Hadoop and write card reduction assignments.

  2. Another approach is to focus only on designing the card reduction algorithms and run the algorithms on MongoDB card reduction functions. This will allow us to make an initial prototype of the final system that will be used to generate the reports. I know that MongoDB's map-reducing features are much slower compared to Hadoop, but currently the data isn't so big that this is still a bottleneck, at least not for the next six months.

The question is whether, with the second approach and the writing of the algorithms for MongoDB, these can later be ported to Hadoop with few changes and redesigns of the algorithm. MongoDB only supports JavaScript, but programming language differences are easy to work with. However, are there fundamental differences in MongoDB and Hadoop's card flattening model that could force us to fundamentally redesign algorithms for porting to Hadoop?


Reply:


In the end, there will definitely be a translation task if you just prototype with Mongo.

When you run a MapReduce task on mongodb, the data source and structure are built in. Finally, when you convert to hadoop, your data structures may not look the same. You could use the Mongodb Hadoop connector to access Mongo data directly from Hadoop, but it's not as easy as you might think. The time to figure out exactly how best to do the conversion is easier to justify once you have a prototype, IMO, installed.

While you need to translate Mapreduce functions, the basic pseudocode should apply well to both systems. You won't find anything in MongoDB that isn't possible with Java or that is much more complex with Java.


You can use map reduction algorithms in Hadoop without programming them in Java. It's called streaming and it works like Linux piping. If you think you can port your reading and writing functions to the terminal, this should work fine. Here is an example blog post showing how to use card flattening functions written in Python in Hadoop.



You can also create a MongoDB Hadoop connection.


We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.