Quality control
This is an essential step in the data assessment allowing to check, calibrate and validate each data set. Moreover, an indication to data treatment and accuracy then allows to use the available data in the most appropriate ways.
The QC issue seems even more crucial for data collected automatically by “remote” instruments. For oceanographic data acquired by remote platforms, i.e. satellites, moorings and gliders, several examples of QC already exist. Within the Argo program, an efficient QC protocol has also been established (and is continuously updated) for the physical data, i.e. sea temperature and salinity that is collected by profiling floats.
The Argo data undergo a QC that consists of two major phases: the (1) real time QC and the (2) delayed mode QC. Each of these phases identifies a specific level of data quality, but also a time delay for data availability to the end-users.
The (1) real-time QC targets at data being available with a maximum delay of approximately 24 hours from their transmission by the profiling float. Within this context of the “Real time” (RT) mode, QC is performed automatically. Flags are assigned to the data on the basis of 19 successive tests.
The RT can also include the so-called “Adjusted” mode, which is a data QC retrospectively applied on RT data using correction factors determined as part of the “Delayed” Mode processing.
The (2) “Delayed” Mode (DM) uses longer records and generally is available within a 0.5-1 year delay. It constitutes the final product delivered by Argo. To obtain the DM a set of semi-automatic algorithms checks drift and bias on the acquired data. Furthermore, a dedicated scientist visually examines the data sets.
On the basis of this established Argo-relevant protocol and inspired by other existing ones, the OAO team develops a data processing procedure http://www.oao.obs-vlfr.fr/datasm for Bio-Argo floats (and their related “clones”). For biogeochemical profiling floats, such a QC procedure ideally :
- relies on different processing levels
- defines and assigns a system of flags to indicate the quality of data
- run under a unique processor, i.e. the ensemble of algorithms and software procedures from data acquisition to user delivery
- is accurately documented, periodically revisited and actualized
- is, as much as possible, similar to those used for other remote platforms and their acquired variables. The integration of data in larger databases thus should be facilitated to increase data set interoperability.