Compositional data analysis in geochemistry: Are we sure to see what really occurs during natural processes?

Geochemical data are typically reported as compositions, in the form of some proportions such as weight percents, parts per million, etc. The statistical analysis of compositional data has been a major issue for more than 100 years. The use of the log-ratio transform was introduced by John Aitchison to overcome these constraints by opening the data into the real number space, within which standard statistical methods can be applied. However, many statisticians and users of statistics in the field of geochemistry are unaware of the problems affecting compositional data, as well as solutions that overcome these problems. A look into the ISI Web of Science and Scopus databases shows that most papers where compositional data are the core of a geochemical research continue to ignore methods to correctly manage constrained data. A key question is how we can demonstrate that the interpretation of the behaviour of chemical species in natural environment and in geochemical processes is improved when the compositional constraint of geochemical data is taken into account through the use of new methods. In order to achieve this aim, this special issue of the Journal of Geochemical Exploration focuses on the correct statistical analysis of compositional data.

Applications in exploration, monitoring and environments by considering several geological matrices are presented and discussed illustrating that several paths can be followed to understand how geochemical processes work. Check if you have access through your login credentials or your institution. Regionalised compositions treated as raw data are prone to spurious correlation. Regionalised compositions can be analysed using isometric logratio transformations. Modelling of cross-variograms can be afforded through the variation variogram. Like the statistical analysis of compositional data in general, spatial analysis of compositional data requires specific tools.

Also mentioned are the use of matrix-valued variation-variograms as a tool to model crossvariograms, and the simplicial approach to indicator kriging, that solves inconsistencies in the standard approach to indicator kriging. SNOMED CT is considered to be the most comprehensive, multilingual clinical healthcare terminology in the world. The primary purpose of SNOMED CT is to encode the meanings that are used in health information and to support the effective clinical recording of data with the aim of improving patient care. SNOMED CT comprehensive coverage includes: clinical findings, symptoms, diagnoses, procedures, body structures, organisms and other etiologies, substances, pharmaceuticals, devices and specimens.

SNOMED CT is maintained and distributed by SNOMED International, an international non-profit standards development organization, located in London, UK. SNOMED CT provides for consistent information interchange and is fundamental to an interoperable electronic health record. It provides a consistent means to index, store, retrieve, and aggregate clinical data across specialties and sites of care. It also helps in organizing the content of electronic health records systems by reducing the variability in the way data are captured, encoded and used for clinical care of patients and research.

