Efficient implementation of the overlap operator on multi-GPUs Article uri icon