Heterogeneous acceleration of volumetric JPEG 2000 using OpenCL

First Published May 10, 2016 Research Article

Authors

13
 
Vrije Universiteit Brussel (VUB), Electronics and Informatics (ETRO) Dept., Belgium
 
iMinds, Multimedia Technologies Dept., Belgium
by this author
, 123
 
Vrije Universiteit Brussel (VUB), Electronics and Informatics (ETRO) Dept., Belgium
 
Vrije Universiteit Brussel (VUB), Dept. Of Industrial Sciences (INDI), Belgium
 
iMinds, Multimedia Technologies Dept., Belgium
by this author
, 13
 
Vrije Universiteit Brussel (VUB), Electronics and Informatics (ETRO) Dept., Belgium
 
iMinds, Multimedia Technologies Dept., Belgium
by this author
,
13
 
Vrije Universiteit Brussel (VUB), Electronics and Informatics (ETRO) Dept., Belgium
 
iMinds, Multimedia Technologies Dept., Belgium
by this author
...
First Published Online: May 10, 2016

This paper discusses an OpenCL version of a volumetric JPEG 2000 codec that runs on GPUs, multi-core processors or a combination of both. Since the performance critical part consists of a fine-grained (discrete wavelet transform) and coarse-grained algorithm (Tier-1), the best performance is obtained with a hybrid execution in which the discrete wavelet transform is executed on a GPU and Tier-1 on a multi-core. Using an Intel i7 multi-core in combination with a modest NVIDIA Quadro K620 GPU yields speedups greater than 10 compared with the original sequential code. The performance bottlenecks that arise on GPUs when parallelizing algorithms that are coarse-grained by nature are discussed and also the optimizations that are possible. A performance analysis reveals the inefficiencies and explains the deviations from the GPU peak performance.

Ahmadvand, M, Ezhdehakosh, A (2012) GPU-based implementation of JPEG 2000 encoder. In: The international conference on parallel and distributed processing techniques and applications (PDPTA), Las Vegas, NV, USA, 16–19 July 2012, pp.682688. Athens: CSREA Press.
Google Scholar
AMD Radeon Graphics Technology (2012) AMD Graphics cores next (GCN) architecture white paper. Available at: www.amd.com/Documents/GCN_Architecture_whitepaper.pdf (accessed 26 April 2016).
Google Scholar
Balevic, A, Fuerst, N, Heide, M. (2009) CUJ2K: JPEG 2000 encoder in CUDA. Technical Report, Institute for Parallel and Distributed Systems, University of Stuttgart.
Google Scholar
Bruylants, T, Munteanu, A, Alecu, A. (2007) Volumetric image compression with JPEG 2000. SPIE Newsroom Biomedical Optics and Medical Imaging. DOI: 10.1117/2.1200706.0779.
Google Scholar | Crossref
Bruylants, T, Munteanu, A, Schelkens, P (2015) Wavelet-based volumetric medical image compression. Signal Processing: Image Communication 31: 112133.
Google Scholar | Crossref | ISI
Ciznicki, M, Kurowski, K, Plaza, A (2011) GPU implementation of JPEG 2000 for hyperspectral image compression. In: SPIE remote sensing, Prague, Czech Republic, 19–22 September 2011, pp.81830H–81830H. Cardiff: SPIE.
Google Scholar
Ciznicki, M, Kierzynka, M, Kopta, P. (2014) Benchmarking JPEG 2000 implementations on modern CPU and GPU architectures. Journal of Computational Science 5(2): 9098.
Google Scholar | Crossref | ISI
Franco, J, Bernabé, G, Fernández, J. (2010) Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs. Procedia Computer Science 1(1): 11011110.
Google Scholar | Crossref
Galiano, V, López, O, Malumbres, MP. (2012) GPU-based 3D Wavelet Transform. Proceedings of the 12th international conference on computational and mathematical methods in science and engineering (CMMSE), La Manga, Spain, 2–7 July 2012, pp.580590. Available at: http://cmmse.usal.es/cmmse2015/images/stories/congreso/2-cmmse-2012.pdf (accessed 26 April 2016).
Google Scholar
Grama, A, Gupta, A, Karypis, G. (2003) Introduction to Parallel Computing. 2nd ed. Harlow: Pearson Education.
Google Scholar
Khronos (2012) OpenCL 1.2. Reference pages. Available at: www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/ (accessed 26 April 2016).
Google Scholar
Le, R, Bahar, IR, Mundy, JL (2011) A novel parallel Tier-1 coder for JPEG 2000 using GPUs. IEEE 9th symposium on application specific processors (SASP), San Diego, CA, USA, 5–6 June 2011, pp.129136. Piscataway: IEEE.
Google Scholar
Lee, JH, Nigania, N, Kim, H. (2015) OpenCL Performance evaluation on modern multicore CPUs. Scientific Programming 2015: 859491.
Google Scholar | Crossref | ISI
Matela, J (2009) GPU-Based DWT Acceleration for JPEG 2000. Annual doctoral workshop on mathematical and engineering methods in computer science (MEMCS), Znojmo, Czech Republic, 13–15 November 2009, pp.136143. Brno, Czech Republic: Novpress S.r.o.
Google Scholar
Matela, J, Rusnak, V, Holub, P (2011a) Efficient JPEG 2000 EBCOT context modeling for massively parallel architectures. Data compression conference (DCC), Snowbird, UT, USA, 29–31 March 2011, pp.423432. Piscataway: IEEE.
Google Scholar
Matela, J, Šrom, V, Holub, P (2011b) Low GPU occupancy approach to fast arithmetic coding in JPEG 2000. Mathematical and engineering methods in computer science (MEMCS), Lednice, Czech Republic, 14–16 October 2011, pp.136145. Berlin: Springer.
Google Scholar
Nickolls, J, Buck, I, Garland, M. (2008) Scalable parallel programming with CUDA. Queue 6(2): 4053.
Google Scholar | Crossref
NVIDIA corporation (2009) NVIDIA. NVIDIA’s next-generation CUDA compute architecture. Available at: http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf (accessed 26 April 2016).
Google Scholar
Patterson, D, Hennessy, J (2012) Computer Organization and Design: The Hardware/Software Interface. 4th ed. Boston: Morgan Kaufmann.
Google Scholar
Schelkens, P, Skodras, A, Ebrahimi, T (eds) (2009) The JPEG 2000 Suite. Chichester: John Wiley & Sons.
Google Scholar | Crossref
Seo, S, Jo, G, Lee, J (2011) Performance characterization of the NAS Parallel Benchmarks in OpenCL. IEEE international symposium on workload characterization (IISWC), Austin, TX, USA, 6–8 November 2011, pp.137148. Piscataway: IEEE.
Google Scholar | Crossref
Stone, JE, Gohara, D, Shi, G (2010) OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering 12(1–3): 6673.
Google Scholar | Crossref | Medline | ISI
Sweldens, W (1996) The lifting scheme: A custom-design construction of biorthogonal wavelets. Applied and Computational Harmonic Analysis 3(2): 186200.
Google Scholar | Crossref | ISI
Taubman, D, Marcellin, M (2002) JPEG 2000 Image Compression Fundamentals, Standards and Practice. Boston, MA: Kluwer Academic Publishers.
Google Scholar | Crossref
Wei, F, Cui, Q, Li, Y (2012) Fine-granular parallel EBCOT and optimization with CUDA for digital cinema image compression. IEEE international conference on Multimedia and expo (ICME), Melbourne, Australia, 9–13 July 2012, pp.10511054. Piscataway: IEEE.
Google Scholar | Crossref

Access content

To read the fulltext, please use one of the options below to sign in or purchase access.
  • Access Options

    My Account

    Welcome
    You do not have access to this content.

    Chinese Institutions / 中国用户

    Click the button below for the full-text content

    请点击以下获取该全文

    Institutional Login

    Purchase Content

    24 hours online access to download content

    Added to Cart

    Cart is full

    There is currently no price available for this item in your region.

    Research off-campus without worrying about access issues. Find out about Lean Library here


Purchase

HPC-article-ppv for GBP32.00
HPC-article-ppv for $41.50
Single Issue 24 hour E-access for GBP753.23
Single Issue 24 hour E-access for $902.50