MapReduce statistics

Tool stats & trends

Looking to identify usage trends or leading experts?


MapReduce specifications


Unique identifier OMICS_15542
Name MapReduce
Software type Package/Module
Interface Command line interface
Restrictions to use None
Operating system Unix/Linux
Computer skills Advanced
Stability Stable
Source code URL
Maintained Yes


No version available


  • person_outline Jeffrey Dean

Publication for MapReduce

MapReduce citations


Efficient iterative virtual screening with Apache Spark and conformal prediction

J Cheminform
PMCID: 5833896
PMID: 29492726
DOI: 10.1186/s13321-018-0265-z

[…] Apache Spark [] is a parallel programming and execution framework for cluster computing that is fast and easy to use. In terms of speed, it's much faster than the well-known Google MapReduce [] and its open source implementation, Apache Hadoop. One reason for its agility is keeping the data in-memory with support for iterative processing. A detailed discussion is provided in our […]


Scalable non negative matrix tri factorization

BioData Min
PMCID: 5746986
PMID: 29299064
DOI: 10.1186/s13040-017-0160-6

[…] n to latent factor learning for non-negative matrix tri-factorization. While there exists an initial solution to speed up the latent factor learning procedure using accelerated matrix operations on a MapReduce cluster [], this approach is not optimal because it requires a specialized architecture []. Even more importantly, in the case of two-factor non-negative matrix factorization, it was shown t […]


Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods

PMCID: 5596298
PMID: 28851678
DOI: 10.2196/resprot.7757

[…] nt.4. We will systematically compile the first list of regularly used operators for temporally aggregating clinical attributes. The list can be reused for future clinical data analysis studies. Using MapReduce [] for distributed computing, we will provide the first implementation of many aggregation operators not offered by current big data software such as Hadoop [] and Spark [].5. We will estima […]


Efficient Retrieval of Massive Ocean Remote Sensing Images via a Cloud Based Mean Shift Algorithm

PMCID: 5539728
PMID: 28737699
DOI: 10.3390/s17071693

[…] Firstly, the construction algorithm is respectively applied into Hadoop stand-alone mode and Hadoop distributed mode (using MapReduce programming model). Typhoon data size in the experiment is 100 MB, 1 GB, 10 GB, 100 GB, and comparison results are shown in and .From and , when the data size of typhoon RS images is small […]


Benchmarking distributed data warehouse solutions for storing genomic variant information

PMCID: 5504537
DOI: 10.1093/database/bax049

[…] The benchmark measures the performance of four distributed query engines described above, i.e. Apache Hive (MapReduce), Apache Spark (versions 1.x and 2.x), Presto, and Apache Impala. First three engines were tested using two different file formats: ORC and Parquet. Apache Impala, which does not support ORC […]


BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods

PLoS Comput Biol
PMCID: 5363996
PMID: 28278228
DOI: 10.1371/journal.pcbi.1005209

[…] e to the BIDS standard used for the organization of input data, errors caused by manually provided metadata are minimized. Finally, the unified command-line interface structure combined with flexible MapReduce-style execution schemes lends BIDS Apps to easy integration into data analysis platforms as well as efficient execution on computational clusters independently of the particular scheduling s […]


Looking to check out a full list of citations?

MapReduce institution(s)
Google’s Mountain View, CA, USA

MapReduce reviews

star_border star_border star_border star_border star_border
star star star star star

Be the first to review MapReduce