Working with Processed Data
This chapter is to help FAAM’s data users access and use archived data collected via the facility. If there is any additional information or support that you need please contact us at faam-data@ncas.ac.uk. There are many ways we can provide extra help - we look forward to hearing from you. Training sessions are also run periodically, see https://ncas.ac.uk/study-with-us/ for details.
As part of any science flight, measurements will be made using a range of instruments, some of which are maintained by FAAM and some by external organisations. Some instruments will always be operated on every flight and others are optional extras, requested during the project planning process. A particularly useful data product is the core file, produced by FAAM for each flight in two versions: full data rate and 1Hz resampled. These files contain aircraft position and movement, meteorology and some basic aerosol and chemistry parameters.
Quality Controlled Data
Quality controlled (QCed) data have been checked manually by instrument or data scientists. Variables may be manually flagged for periods if the person performing quality control (QC) believes there is an issue, or may be removed completely if the variable comes from an instrument which is clearly non-functioning for the whole flight. Data which have been manually flagged in QC are identified with the flagged_in_qc
field in the flag_meaning
or flag_mask
attribute (depending on the type of flag being used).
The CEDA Archive
FAAM data are stored within the Centre for Environmental Data Analysis (CEDA), the UK’s national data centre for atmospheric and Earth observation research which forms part of NERC’s Environmental Data Service.
Data are stored on a one flight per dataset basis and can be accessed via the CEDA catalogue at https://catalogue.ceda.ac.uk/ once you have registered for access. To search for a particular flight or project type a keyword (e.g. flight number “C100”, or project name “AEOG”) into the search bar and select the required dataset from the returned list. Clicking on the “Download” button will take you to a folder structure (e.g. https://data.ceda.ac.uk/badc/faam/data/2018/c100-apr-26) containing raw and processed data from different data providers for that flight.
What Data Files to Expect on CEDA
or each flight, there will be some raw and some processed data, and files and folders containing the words ‘core’ and ‘non-core’. Raw data are typically not very useful to the data user - they are processed by an instrument or data scientist to produce the much more usable processed data. The words ‘core’ and ‘non-core’ are historical terms with diverse interpretations that can be somewhat confusing. In this context ‘core’ generally refers to data from FAAM-maintained instruments that are fitted to the aircraft as standard (with the exception of some ‘core cloud physics’ instruments which are not always fitted), and ‘non-core’ refers to data from optional-extra instruments maintained either by FAAM or other organisations. Some file and folder names contain the acronym ‘mo’ which refers to the Met Office. Information about file names can be found at https://help.ceda.ac.uk/article/3796-faam-flight-data-file-names.
For each science flight, there will always be a core_processed subfolder. This will always contain a core_faam file (referred to as the “core file”), named for example core_faam_20180426_v004_r0_c100.nc
(full data rate) or core_faam_20180426_v004_r0_c100_1hz.nc
(1Hz resampled data). These files contain aircraft position and dynamics, meteorology and some basic aerosol and chemistry parameters.
The full data rate file includes parameters at different frequencies, packed into 2-D variables using sps dimensions.
There will also usually be data from FAAM-maintained cloud physics instruments, with file names beginning core-cloud-phy_faam_
.
Another subfolder “non-core” may contain data from other FAAM-maintained instruments, as well as non-FAAM-maintained instruments.
The following files may also be available:
- Flight Summary.This is created by the Flight Manager during a flight, and named for example
flight-sum_faam_20180426_r0_c100.txt
. It is a timeline of events during the flight, where take-off and landing times, runs, profiles and notes are entered. - (For flights before C122) QA Report.This is a collection of plots that can be used to assess how well the instruments whose data goes into the core_faam file have performed during a flight, named for example
qa-report_faam_20180426_c100.pdf
. - (For flights since C234) Flight Report.Named for example
flight-report_faam_20240112_r0_c365.pdf
. This contains the sortie brief, the image contents of the in-flight Flight Folder, the crew list, planned timings, the Flight Summary, a log of the IRC chat, and QC plots for FAAM-maintained instruments. - (For flights since C149) Instrument Report.Named for example
instrument-report_faam_20240112_r0_c365.txt
. This is intended to be used by the archiver and not by individual users. This file contains a list of racks and/or instruments that were active during the flight. It is not a list of all instruments that were operated during a flight: some instruments are amalgamated and reported as a whole rack (e.g. AERACK01 is a combination of several instruments not individually listed in the instrument report) - Airborne Science Mission Metadata (ASMM).Named for example
asmm_faam_20240112_c365_fm1.xml
. Contains metadata information about the flight.
Version and Revision Numbers
Occasionally, revised data will need to be uploaded (when, for example, a calibration is found to have changed and a new one needs to be applied). These are indicated by a non-zero revision number following the ‘r’ in the file name, e.g. core_faam_20180426_v004_
r1_c100.nc
. CEDA does not hold records of who downloads which file, so it is currently not possible to notify data users when new revisions are uploaded.
The version number (a number following the ‘v’ in the file name, e.g. core_faam_20180426_
v004_r1_c100.nc
) indicates changes to the file structure or processing software. Where possible the newest version should be used, though older versions may be provided for backwards compatibility.
NetCDF File Format
Many of FAAM’s data products are provided in the netCDF format. NetCDF (Network Common Data Form) is a set of software libraries and platform independent data formats which are designed to support the creation, access, and sharing of array-oriented scientific data. NetCDF is designed to be:
Self-describing. A netCDF file include information about the data it contains (i.e. metadata)
Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.
Scalable. A small subset of a large dataset may be accessed efficiently.
Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.
Shareable. One writer and many readers may simultaneously access the same netCDF file.
Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.
NetCDF is extremely commonly used, and is almost ubiquitous within the Earth sciences. Unidata (https://unidata.ucar.edu) provides and maintains software libraries for accessing netCDF data using C, C++, Java, and FORTRAN. Third-party libraries (which are generally bindings or wrappers to the Unidata libraries) are available for Python, IDL, MATLAB, R, Ruby, and Perl, among others. Further information on FAAM’s use of the netCDF format can be found in the FAAM core dataset guide.
Documentation
Documentation can be found on the FAAM website as follows:
https://www.faam.ac.uk/sphinx/coredata/ - a description of the FAAM core data product, including a summary of the processing that takes place and a standard which describes the format of, and metadata associated with, the core data product.
https://www.faam.ac.uk/sphinx/data/ - a description of the FAAM data standard, and the data products known to be compliant with it
https://www.faam.ac.uk/sphinx/met-handbook/ - describes the calibrations and analyses performed to produce the temperature and humidity parameters in the FAAM core netCDF
https://www.faam.ac.uk/sphinx/fdat/ - A python toolkit to aid in accessing, visualising, and analysing data recorded on the FAAM aircraft.
Referenceable documentation with DOIs, as well as calibration certificates, can be found on the FAAM Zenodo community page, at https://zenodo.org/communities/faam-146/.
Flagging
Many of FAAM’s data products contain flagging information, and it is important that these are used to understand and interpret measurements made using the aircraft.
In the FAAM core data, all parameters have an associated flag variable, indicated by the _FLAG suffix to the variable short name. The flag variable should also be named in the ancillary_variables
attribute of the variable. In other datasets, a single flag variable may apply to multiple variables: see the corresponding data product documentation for more information in this case.
Two flagging strategies are used in FAAM data: value based flags and bitmask based flags:
Value based flags can be identified by the presence of the
flag_values
attribute on the flag variable. Value based flags give a single value for each data point which corresponds to a quality mode of the corresponding data, identified by the matching entry in theflag_meanings
attribute. For example, if a flag variable has the attributeflag_values = 0 1 2
and the attributeflag_meanings = data_good possible_issues data_bad
, then wherever the flag variable takes the value 0 the data are considered to be good, wherever the flag variable takes the value 1 the data may have issues, and wherever the flag variable take the value 2 the data are considered to be bad. A fuller description of each of theflag_meanings
should be available in the corresponding data product documentation.Bitmask based flags can be identified by the presence of the
flag_masks
attribute on the flag variable. Bitmask based flags use values of incrementing powers of two for each flag_meaning, which can then be decomposed into separate binary flags corresponding to each of the flag_meanings. For example, if the flag variable has the attributeflag_masks = 1 2 4
and the attributeflag_meanings = roll_limit_exceeded pitch_limit_exceeded instrument_error
, then the flag variable could take a value of between 1 and 7 (the sum of theflag_masks
), where each value would have the following meaning:roll_limit_exceeded
pitch_limit_exceeded
[=1+2]
roll_limit_exceeded
andpitch_limit_exceeded
instrument_error
[=1+4]
roll_limit_exceeded
andinstrument_error
[=2+4]
pitch_limit_exceeded
andinstrument_error
[=1+2+4]
roll_limit_exceeded
,pitch_limit_exceeded
, andinstrument_error
Both value based and bitmask based flags may contain``flagged_in_qc`` as one of the flag_meanings
. This indicates that a member of FAAM staff have flagged the variable as having an issue during the QC process. In bitmask based flags this will be a new mask; in value based flags it will be a new value which takes precedence over the other flag values. The reason for the flagging may be included in the data file or the flight constants file. If in doubt, please get in touch with FAAM.
Metadata
FAAM provides data in the netCDF file format, which allows extensive metadata to be stored together with the data, either as variable attributes, which provide information about particular variables, or group or global attributes which provide information relevant to all variables in the group or dataset, respectively.
Variable attributes may include information such as
Long descriptive names
Standard names
Variable units
Instrument manufacturer, model, and serial number information
Calibration information
Comments
…and much more
FAAM data and metadata are compliant with the CF conventions and the Attribute Conventions for Dataset Discovery. Metadata in FAAM products are standardised and loosely controlled; a full list of metadata should be included in the documentation for each dataset, and an overarching list of metadata is maintained in the FAAM dataset documentation.
FAAM maintains machine readable releases of dataset and metadata specifications in the FAAM Data Github repository. These releases are in JSON format, and are intended to be used with the vocal tool.
Data Tools
There are a wide array of open source tools available which may aid with the visualisation and analysis of FAAM data. FAAM make use of, and recommend the following tools
- Panopoly is a cross platform, Java based GUI application developed by NASA/GISS which can be used to quickly create timeseries or geographic plots of FAAM data, or to export data from FAAM netCDF files.
Examples of timeseries and geographic plots created with Panopoly using FAAM 1 Hz core data.
FDAT (FAAM Data Abstraction Toolkit) is a python library designed to make accessing and using flight data as simple as possible. It does this by abstracting away the data access layer of a workflow and binding dataset specific methods to the resulting objects. For example, to produce a quick timeseries plot of both deiced and non-deiced air temperatures without knowing the variable names beforehand, the following code could be used:
>>> import fdat
>>> from fdat.utils import Search
>>> flights = fdat.load('/path/to/data/directory')
>>> temps = flights.c350.core.search('air temperature', pprint=True)
Variable Long Name Standard Name
-------- ---------- --------------
TAT_ND_R True air temperature from the air_temperature
Rosemount non-deiced
temperature sensor
TAT_DI_R True air temperature from the air_temperature
Rosemount deiced temperature
sensor
>>> flights.c350.core[temps].fdat.plot()
FDAT uses the standard python data stack (numpy, pandas, matplotlib), so it should easily integrate with existing python projects. It is currently considered pre-release; feedback and suggestions are welcome.