TY - GEN
T1 - Fast and parallel computation of the discrete periodic radon transform on GPUs, Multicore CPUs and FPGAs
AU - Carranza, Cesar
AU - Pattichis, Marios
AU - Llamocca, Daniel
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/8/29
Y1 - 2018/8/29
N2 - The Discrete Periodic Radon Transform (DPRT) has many important applications in reconstructing images from their projections and has recently been used in fast and scalable architectures for computing 2D convolutions. Unfortunately, the direct computation of the DPRT involves O(N3) additions and memory accesses that can be very costly in single-core architectures. The current paper presents new and efficient algorithms for computing the DPRT and its inverse on multi-core CPUs and GPUs. The results are compared against specialized hardware implementations (FPGAs/ASICs). The results provide significant evidence of the success of the new algorithms. On an 8-core CPU (Intel Xeon), with support for two threads per core, FastDirDPRT and FastDirInvDPRT achieve a speedup of approximately 10× (up to 12.83×) over the single-core CPU implementation. On a 2048-core GPU (GTX 980), FastRayDPRT and FastRayInvDPRT achieve speedups in the range of 526 (for 127 × 127) to 873 (for 1021 × 1021), which approximate ideal speedups of what can be achieved. The DPRT can be computed exactly and in real-time (30 frames per second) for 1471 × 1471 images using FastRayDPRT on the GPU. Furthermore, the GPU algorithms approximate the performance of an efficient FPGA implementation using 2N parallel cores at 100MHz.
AB - The Discrete Periodic Radon Transform (DPRT) has many important applications in reconstructing images from their projections and has recently been used in fast and scalable architectures for computing 2D convolutions. Unfortunately, the direct computation of the DPRT involves O(N3) additions and memory accesses that can be very costly in single-core architectures. The current paper presents new and efficient algorithms for computing the DPRT and its inverse on multi-core CPUs and GPUs. The results are compared against specialized hardware implementations (FPGAs/ASICs). The results provide significant evidence of the success of the new algorithms. On an 8-core CPU (Intel Xeon), with support for two threads per core, FastDirDPRT and FastDirInvDPRT achieve a speedup of approximately 10× (up to 12.83×) over the single-core CPU implementation. On a 2048-core GPU (GTX 980), FastRayDPRT and FastRayInvDPRT achieve speedups in the range of 526 (for 127 × 127) to 873 (for 1021 × 1021), which approximate ideal speedups of what can be achieved. The DPRT can be computed exactly and in real-time (30 frames per second) for 1471 × 1471 images using FastRayDPRT on the GPU. Furthermore, the GPU algorithms approximate the performance of an efficient FPGA implementation using 2N parallel cores at 100MHz.
KW - Discrete Periodic Radon Transform
KW - FPGA
KW - GPU
KW - Multi-core CPU
KW - Parallel Architecture
UR - http://www.scopus.com/inward/record.url?scp=85062905970&partnerID=8YFLogxK
U2 - 10.1109/ICIP.2018.8451751
DO - 10.1109/ICIP.2018.8451751
M3 - Conference contribution
AN - SCOPUS:85062905970
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 4158
EP - 4162
BT - 2018 IEEE International Conference on Image Processing, ICIP 2018 - Proceedings
PB - IEEE Computer Society
T2 - 25th IEEE International Conference on Image Processing, ICIP 2018
Y2 - 7 October 2018 through 10 October 2018
ER -