Compressing atmospheric data into its real information content.
Journal
Nature computational science
ISSN: 2662-8457
Titre abrégé: Nat Comput Sci
Pays: United States
ID NLM: 101775476
Informations de publication
Date de publication:
Nov 2021
Nov 2021
Historique:
received:
04
06
2021
accepted:
12
10
2021
medline:
1
11
2021
pubmed:
1
11
2021
entrez:
13
1
2024
Statut:
ppublish
Résumé
Hundreds of petabytes are produced annually at weather and climate forecast centers worldwide. Compression is essential to reduce storage and to facilitate data sharing. Current techniques do not distinguish the real from the false information in data, leaving the level of meaningful precision unassessed. Here we define the bitwise real information content from information theory for the Copernicus Atmospheric Monitoring Service (CAMS). Most variables contain fewer than 7 bits of real information per value and are highly compressible due to spatio-temporal correlation. Rounding bits without real information to zero facilitates lossless compression algorithms and encodes the uncertainty within the data itself. All CAMS data are 17× compressed relative to 64-bit floats, while preserving 99% of real information. Combined with four-dimensional compression, factors beyond 60× are achieved. A data compression Turing test is proposed to optimize compressibility while minimizing information loss for the end use of weather and climate forecast data.
Identifiants
pubmed: 38217145
doi: 10.1038/s43588-021-00156-2
pii: 10.1038/s43588-021-00156-2
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
713-724Subventions
Organisme : RCUK | Natural Environment Research Council (NERC)
ID : NE/L002612/1
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 823988
Informations de copyright
© 2021. The Author(s).
Références
Bauer, P., Thorpe, A. & Brunet, G. The quiet revolution of numerical weather prediction. Nature 525, 47–55 (2015).
doi: 10.1038/nature14956
Bauer, P. et al. The ECMWF Scalability Programme: Progress and Plans (ECMWF, 2020); https://doi.org/10.21957/gdit22ulm
Voosen, P. Europe is building a ‘digital twin’ of Earth to revolutionize climate forecasts. Science https://doi.org/10.1126/science.abf0687 (2020).
Schär, C. et al. Kilometer-scale climate models: prospects and challenges. Bull. Am. Meteorol. Soc. 101, E567–E587 (2020).
doi: 10.1175/BAMS-D-18-0167.1
Bauer, P., Stevens, B. & Hazeleger, W. A digital twin of Earth for the green transition. Nat. Clim. Change 11, 80–83 (2021).
doi: 10.1038/s41558-021-00986-y
Stevens, B. et al. DYAMOND: the DYnamics of the Atmospheric general circulation Modeled On Non-hydrostatic Domains. Prog. Earth Planet. Sci. 6, 61 (2019).
doi: 10.1186/s40645-019-0304-z
Molteni, F., Buizza, R., Palmer, T. N. & Petroliagis, T. The ECMWF ensemble prediction system: methodology and validation. Q. J. R. Meteorol. Soc. 122, 73–119 (1996).
doi: 10.1002/qj.49712252905
Palmer, T. The ECMWF ensemble prediction system: looking back (more than) 25 years and projecting forward 25 years. Q. J. R. Meteorol. Soc. 145, 12–24 (2019).
doi: 10.1002/qj.3383
Ballester-Ripoll, R., Lindstrom, P. & Pajarola, R. TTHRESH: tensor compression for multidimensional visual data. IEEE Trans. Vis. Comput. Graph. 26, 2891–2903 (2020).
doi: 10.1109/TVCG.2019.2904063
Lindstrom, P. Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20, 2674–2683 (2014).
doi: 10.1109/TVCG.2014.2346458
von Larcher, T. & Klein, R. On identification of self-similar characteristics using the tensor train decomposition method with application to channel turbulence flow. Theor. Comput. Fluid Dyn. 33, 141–159 (2019).
doi: 10.1007/s00162-019-00485-z
Zhao, K. et al. Significantly improving lossy compression for HPC datasets with second-order prediction and parameter optimization. In Proc. 29th International Symposium on High-Performance Parallel and Distributed Computing 89–100 (ACM, 2020); https://doi.org/10.1145/3369583.3392688
IEEE Standard for Binary Floating-Point Arithmetic ANSIIEEE Std 754-1985 1–20 (IEEE, 1985); https://doi.org/10.1109/IEEESTD.1985.82928
Váňa, F. et al. Single precision in weather forecasting models: an evaluation with the IFS. Mon. Weather Rev. 145, 495–502 (2017).
doi: 10.1175/MWR-D-16-0228.1
Tintó Prims, O. et al. How to use mixed precision in ocean models: exploring a potential reduction of numerical precision in NEMO 4.0 and ROMS 3.6. Geosci. Model Dev. 12, 3135–3148 (2019).
doi: 10.5194/gmd-12-3135-2019
Hatfield, S., Chantry, M., Düben, P. & Palmer, T. Accelerating high-resolution weather models with deep-learning hardware. In Proc. Platform for Advanced Scientific Computing Conference 1–11 (ACM, 2019); https://doi.org/10.1145/3324989.3325711
Klöwer, M., Düben, P. D. & Palmer, T. N. Number formats, error mitigation and scope for 16-bit arithmetics in weather and climate modelling analysed with a shallow water model. J. Adv. Model. Earth Syst. 12, e2020MS002246 (2020).
doi: 10.1029/2020MS002246
Ackmann, J., Düben, P. D., Palmer, T. N. & Smolarkiewicz, P. K. Mixed-precision for linear solvers in global geophysical flows. Preprint at https://arxiv.org/abs/2103.16120 (2021).
Dawson, A., Düben, P. D., MacLeod, D. A. & Palmer, T. N. Reliable low precision simulations in land surface models. Clim. Dyn. 51, 2657–2666 (2018).
doi: 10.1007/s00382-017-4034-x
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 623–656 (1948).
doi: 10.1002/j.1538-7305.1948.tb00917.x
Kleeman, R. Information theory and dynamical system predictability. Entropy 13, 612–649 (2011).
doi: 10.3390/e13030612
Jeffress, S., Düben, P. & Palmer, T. Bitwise efficiency in chaotic models. Proc. R. Soc. Math. Phys. Eng. Sci. 473, 20170144 (2017).
Palmer, T. Modelling: build imprecise supercomputers. Nature 526, 32–33 (2015).
doi: 10.1038/526032a
Palmer, T. Climate forecasting: build high-resolution global climate models. Nature 515, 338–339 (2014).
doi: 10.1038/515338a
Lang, S. T. K. et al. More accuracy with less precision. Q. J. R. Meteorol. Soc. https://doi.org/10.1002/qj.4181 (2021).
Silver, J. D. & Zender, C. S. The compression-error trade-off for large gridded data sets. Geosci. Model Dev. 10, 413–423 (2017).
doi: 10.5194/gmd-10-413-2017
Kuhn, M., Kunkel, J. M. & Ludwig, T. Data compression for climate data. Supercomput. Front. Innov. 3, 75–94 (2016).
Hübbe, N., Wegener, A., Kunkel, J. M., Ling, Y. & Ludwig, T. in Supercomputing (eds Kunkel, J. M. et al.) 343–356 (Springer, 2013).
Zender, C. S. Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+). Geosci. Model Dev. 9, 3199–3211 (2016).
doi: 10.5194/gmd-9-3199-2016
Kouznetsov, R. A note on precision-preserving compression of scientific data. Geosci. Model Dev. 14, 377–389 (2021).
doi: 10.5194/gmd-14-377-2021
Di, S. & Cappello, F. Fast error-bounded lossy HPC data compression with SZ. In Proc. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 730–739 (IEEE, 2016); https://doi.org/10.1109/IPDPS.2016.11
Lindstrom, P. & Isenburg, M. Fast and efficient compression of floating-point data. IEEE Trans. Vis. Comput. Graph. 12, 1245–1250 (2006).
doi: 10.1109/TVCG.2006.143
Fan, Q., Lilja, D. J. & Sapatnekar, S. S. Using DCT-based approximate communication to improve MPI performance in parallel clusters. In Proc. 2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC) 1–10 (IEEE, 2019); https://doi.org/10.1109/IPCCC47392.2019.8958720
Baker, A. H. et al. Evaluating lossy data compression on climate simulation data within a large ensemble. Geosci. Model Dev. 9, 4381–4403 (2016).
doi: 10.5194/gmd-9-4381-2016
Woodring, J., Mniszewski, S., Brislawn, C., DeMarle, D. & Ahrens, J. Revisiting wavelet compression for large-scale climate data using JPEG 2000 and ensuring data precision. In Proc. 2011 IEEE Symposium on Large Data Analysis and Visualization 31–38 (IEEE, 2011); https://doi.org/10.1109/LDAV.2011.6092314
Inness, A. et al. The CAMS reanalysis of atmospheric composition. Atmos. Chem. Phys. 19, 3515–3556 (2019).
doi: 10.5194/acp-19-3515-2019
Guide to the WMO Table Driven Code Form Used for the Representation and Exchange of Regularly Spaced Data In Binary Form: FM 92 GRIB Edition 2 (WMO, 2003).
MacKay, D. Information Theory, Inference and Learning Algorithms (Cambridge Univ. Press, 2003).
Ziv, J. & Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 337–343 (1977).
doi: 10.1109/TIT.1977.1055714
Huffman, D. A. A method for the construction of minimum-redundancy codes. Proc. IRE 40, 1098–1101 (1952).
doi: 10.1109/JRPROC.1952.273898
Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 85, 461–464 (2000).
doi: 10.1103/PhysRevLett.85.461
Kraskov, A., Stögbauer, H. & Grassberger, P. Estimating mutual information. Phys. Rev. E 69, 066138 (2004).
doi: 10.1103/PhysRevE.69.066138
Pothapakula, P. K., Primo, C. & Ahrens, B. Quantification of information exchange in idealized and climate system applications. Entropy 21, 1094 (2019).
doi: 10.3390/e21111094
DelSole, T. Predictability and information theory. Part I: measures of predictability. J. Atmos. Sci. 61, 2425–2440 (2004).
doi: 10.1175/1520-0469(2004)061<2425:PAITPI>2.0.CO;2
Delaunay, X., Courtois, A. & Gouillon, F. Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files. Geosci. Model Dev. 12, 4099–4113 (2019).
doi: 10.5194/gmd-12-4099-2019
Ziv, J. & Lempel, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24, 530–536 (1978).
doi: 10.1109/TIT.1978.1055934
Skibinski, P. inikep/lzbench. GitHib https://github.com/inikep/lzbench (2020).
Alted, F. Why modern CPUs are starving and what can be done about It. Comput. Sci. Eng. 12, 68–71 (2010).
doi: 10.1109/MCSE.2010.51
Deutsch, L. P. DEFLATE Compressed Data Format Specification Version 1.3 (IETF, 1996); https://datatracker.ietf.org/doc/rfc1951
Collet, Y. & Kucherawy, M. Zstandard Compression and the Application/zstd Media Type (IETF, 2018); https://datatracker.ietf.org/doc/rfc8478
Matheson, J. E. & Winkler, R. L. Scoring rules for continuous probability distributions. Manag. Sci. 22, 1087–1096 (1976).
doi: 10.1287/mnsc.22.10.1087
Hersbach, H. Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast. 15, 559–570 (2000).
doi: 10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2
Zamo, M. & Naveau, P. Estimation of the continuous ranked probability score with limited information and applications to ensemble weather forecasts. Math. Geosci. 50, 209–234 (2018).
doi: 10.1007/s11004-017-9709-7
Baker, A. H., Hammerling, D. M. & Turton, T. L. Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data. Comput. Graph. Forum 38, 517–528 (2019).
doi: 10.1111/cgf.13707
Turing, A. M. I. Computing machinery and intelligence. Mind LIX, 433–460 (1950).
doi: 10.1093/mind/LIX.236.433
Malardel, S. et al. A new grid for the IFS. ECMWF Newsletter (January 2016); https://www.ecmwf.int/node/15041
Pinard, A., Hammerling, D. M. & Baker, A. H. Assessing differences in large spatio-temporal climate datasets with a new Python package. In Proc. 2020 IEEE International Conference on Big Data (Big Data) 2699–2707 (IEEE, 2020); https://doi.org/10.1109/BigData50022.2020.9378100
Poppick, A. et al. A statistical analysis of lossily compressed climate model data. Comput. Geosci. 145, 104599 (2020).
doi: 10.1016/j.cageo.2020.104599
Klöwer, M., Düben, P. D. & Palmer, T. N. Posits as an alternative to floats for weather and climate models. In Proc. Conference for Next Generation Arithmetic 2019, CoNGA’19 1–8 (ACM, 2019); https://doi.org/10.1145/3316279.3316281
Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
doi: 10.1109/TIP.2003.819861
Pelkonen, T. et al. Gorilla: a fast, scalable, in-memory time series database. Proc. VLDB Endow. 8, 1816–1827 (2015).
doi: 10.14778/2824032.2824078
CAMS Forecast Experiment using GRIB IEEE Data Encoding (CAMS, 2021); https://doi.org/10.21957/56GH-9Y86
Ensemble Temperature Forecast Experiment using GRIB IEEE Data Encoding (ECMWF, 2021); https://doi.org/10.21957/PHGF-BV34
Klöwer, M. Elefridge.jl (source code for accepted manuscript). Zenodo https://doi.org/10.5281/zenodo.5557138 (2021).
Klöwer, M. Compressing atmospheric data into its real information content (source code). Code Ocean https://doi.org/10.24433/CO.8682392.v1 (2021).