mCUDA-MEME is an ultrafast scalable motif discovery algorithm based on MEME (version 4.4.0) algorithm for mutliple GPUs using a hybrid combination of CUDA, MPI and OpenMP parallel programming models. This algorithm is a further extension of CUDA-MEME with respect to accuracy and speed and has been tested on a GPU cluster with eight compute nodes and two Fermi-based Tesla S2050 (and Tesla-based Tesla S1070) quad-GPU computing systems, running the Linux OS with the MPICH2 library. The experimental results showed that our algorithm scales well with respect to both dataset sizes and the number of GPUs. At present, OOPS and ZOOPS models are supported, which are sufficient for most motif discovery applications. This algorithm has been used inCompeteMOTIFs , a motif discovery platform developed to help biologists to find novel as well as known motifs in their peak datasets from transcription factor binding experiments such as ChIP-seq and ChIP-chip.