An Optimized FFT-based Direct Poisson Solver on CUDA GPUs Academic Article uri icon