Data Quality Assessment Process
Gemini Data Quality Assessment
Gemini science and calibration data go through a quality assessment (QA) procedure. This is a two-step process, with real-time QA performed by the observer at the summit followed by off-line QA done at the relevant base facility by the Science Operations Specialist (SOS) on duty, usually the working day after the observations are taken. The QA determines whether the files can be passed or need to be repeated.
For all queue observations, the PI will have specified four observing condition constraints: image quality, cloud cover, water vapor content and sky background. These conditions are defined to be within one of several percentile bins. By comparing the requested to the actual conditions at the time of the observation, the QA process identifies whether the requested criteria are met.
Unchecked Data
As of semester 2013A, we are no longer able to review all the data. All band 1 data will be checked, however, as well as any programs deemed to be high priority by the QC (up to 30% of the night's data in total). Other programs, including band 4 and classical programs, are not checked, and may be left with their QA state set to UNDEFINED if the night-time observer was unable to review them in real-time. The SOS will however monitor the automatic ingestion of files into the archive.
If you are a Gemini PI and you find some issues with your data (i.e. your data was left unchecked but did not meet requirements, or it was checked but you don't agree with the quality assessment) then please follow the usual routes for requesting a repeat observation (see below), provided we are still in the same semester, and that the target is still reachable:
- Contact the head of science operations for the Gemini telescope in question.
- Please supply information on what does not meet the requirements - be sure to include the program and observation ID.
- CC your email to the contact scientists.
Real-time Data QA
The observer will continuously monitor the observing conditions and choose observations from the various observing plans provided by the queue coordinator (QC). For example, if the observer starts the night with a program requiring IQ20 conditions (the best seeing) she/he will continuously monitor the seeing and if conditions deteriorate, stop the ongoing program and switch to the next available observation in one of the IQ70 (or other appropriate) plans. The same is done for cloud cover and water vapour. The sky background is calculated by the queue planning software and taken into account in the preparation of the plans, so is only checked by the observer when deviating from the plan.
The observer will also check nighttime data, including calibrations such as flats, arcs and standard stars, for saturation, obvious program setup errors, telescope/instrument problems and other possible issues. Other "sanity checks" are also performed. For instance, if the IR spectrum of a high-redshift galaxy requiring a blind offset acquisition is not detected in individual sky-subtracted pairs, this will not necessarily raise any red flags. If a faint spectrum is seen for a telluric standard, though, the observer may choose to troubleshoot, leave a note in the nightlog requesting daytime followup, or abandon the observation and move to one using a different instrument, depending on the circumstances.
At the end of the night the observer will queue the requested daytime calibrations as defined in the observing tool (OT).
Off-line Data Processing
While the observer will have made every effort to ensure that good data were taken during the night, this is done in parallel with acquiring targets, anticipating and reacting to the weather, and sometimes dealing with faults, in the middle of the night and (on Mauna Kea) at >4000 m above sea level. The daytime SOS on duty is specifically focused on data checking for a portion of the day and has the benefit of knowing the evolution of the observing conditions during the night.
The off-line data QA is split into three parts:
- Checking that the observing condition constraints were met
- Looking for issues such as telescope/instrument problems, incorrect observational setups, standard stars with low counts, etc.
- Checking that the necessary calibrations were obtained, with the correct setup
Some of the tools used can be seen below. Figure 1 shows seqplot, a quick-look tool that lets the observer or daytime SOS rapidly view a sequence of data frames, as well as check the most important header keywords, saturation levels, etc.
Figure 1: The "seqplot" quick-look tool displaying GMOS-N longslit data.
Gemini's QAP (Quality Assessment Pipeline) can be set to run automatically at night, immediately reducing new images as soon as they are written to disk. It outputs its results to a web-based GUI (as shown in figure 2), enabling the user to quickly access the measured seeing, cloud cover and sky brightness values for suitable data.
Figure 2: The Gemini Quality Assessment Pipeline (QAP)'s GUI, showing measured seeing, extinction, and sky brightness values.
view_wfs is another tool to help assess the data quality. It displays the guide counts as well as the wavefront sensor's seeing estimates over any desired range of frames. It is useful for extrapolating seeing values (since we don't always take imaging data) and for checking for the existence of clouds. Figure 3 shows an example from a rather cloudy night, using GMOS and its on-instrument wavefront sensor (OIWFS) for longslit spectroscopy. The gradual drop in counts was caused by thin - and subsequently, thick - cirrus moving through the telescope's field of view. The various colored lines represent the extinction corresponding to the CC bins.
Figure 3: The view_wfs tool displays the guide counts and seeing estimates for a specified range of files.
The daytime SOS will leave notes in the Observing Log section in the OT about any unusual issues and include their actual measurements of the IQ if applicable. They will then set the QA flag in the OT, if it is not already set or needs to be changed. This can either be PASS, USABLE or FAIL. FAIL is quite unusual and only applied if the file is not at all useful. This might apply to a badly saturated flat field, for example.
Data flagged as USABLE generally do not meet the PI's requirements. Time for these files is not charged to the program or partner country in the OT. For this reason, the USABLE flag is sometimes used for time accounting purposes. For instance, if an observer repeated a standard star observation with a longer exposure time to increase the counts, but this was not in fact necessary, the "extra" files might be given a QA state of USABLE. In these cases an explanatory note will usually be left in the OT.
The daytime SOS will report any issues found during data checking to the QC. The QC may work together with the daytime SOS and the program's Contact Scientist(s) to make a decision. They will sometimes contact the PI to ask them about data taken in borderline conditions tentatively set to PASS, or for advice on other issues with the data, instrument configuration, instructions, etc.
Aside from the entries in the OT the QA state is also reflected in the FITS headers by the following keywords:
REQIQ = '85-percentile' / Requested Image Quality
REQCC = '50-percentile' / Requested Cloud Cover
REQBG = 'Any ' / Requested Background
REQWV = 'Any ' / Requested Water Vapour
RAWIQ = '70-percentile' / Raw Image Quality
RAWCC = '50-percentile' / Raw Cloud Cover
RAWWV = 'Any ' / Raw Water Vapour/Transparency
RAWBG = 'Any ' / Raw Background
RAWPIREQ= 'YES ' / PI Requirements Met
RAWGEMQA= 'USABLE ' / Gemini Quality Assessment
The RAWIQ, RAWCC, RAWWV, and RAWBG keywords reflect the actual observing conditions as set by the observer and checked, where possible, by the daytime SOS. The RAWPIREQ keyword, with values of YES|NO|UNKNOWN shows whether the PI-specified observing conditions were either all met (YES) or at least one condition was violated to the worse side (NO). This keyword is set to "UNKNOWN" by default, and will be left at this value for unchecked programs.
RAWGEMQA, with a value of BAD|USABLE|UNKNOWN, is a more globally applicable parameter. It is set to USABLE if the data are useful in any way. For example, a field observed in poorer seeing than requested by the PI may still be useful for future users of the archive. A completely saturated flat field, on the other hand, would be set to BAD.
This table shows how the QA-related keywords in the headers relate to the QA states in the OT:
QA status in OT | RAWGEMQA | RAWPIREQ |
Undefined | UNKNOWN | UNKNOWN |
Pass | USABLE | YES |
Usable | USABLE | NO |
Fail | BAD | NO |
Repeating Observations
Observations will be scheduled to be repeated if there was a technical problem, the observing conditions did not meet the PI's constraints, or if a significant error was made by Gemini staff. When an observation is scheduled to be repeated, the data from the original observation will be distributed to the PI and to the science archive with the normal proprietary period. The original observation does not count towards the PI's time allocation, and the re-scheduled observation will have the same weighting in the queue as the original observation. An observation that was defined incorrectly by the PI or included insufficient information to execute the program as desired by the PI (inadequate finding charts, for example) may be repeated, but the time will usually be charged to the program.