How the Pipeline Works: in detail


The spectroscopic pipeline is a multi-purpose, highly automated pipeline for processing the roughly 106 galaxy,  105 QSO, and 105 stellar spectrafrom the SDSS spectrographs.  The pipeline is designed to extract, calibrate, and  process all spectra taken in the course of the Survey, and specifically to:
In addition to these science goals, the pipeline has the following roles in survey operations:
The Spectroscopic pipeline is split operationally into two parts, 2d and 1d.

The 2d pipeline reduces the raw data and calibration images from the red and blue CCD cameras from each spectrograph and outputs merged, co-added, flux-calibrated spectra and noise for analysis by the 1d pipeline.

The 1d pipeline determines emission and absorption redshifts, classifies spectra by object type, and outputs spectral information about each object.

Spectroscopic Observations

Each spectroscopic plug plate with 640 fibers typically has 3-5 spectroscopic exposures of 15 minutes duration, with the exact number determined by observing conditions (weather, moon).  This set of `science' exposures is preceded and followed by a series of shorter exposures for calibration: arc lamp exposures, flat-fields, and a 4-minute `smear' exposure on the sky for spectrophotometric calibration, in which the telescope is moved so that the 3" fiber on each object effectively  covers an 8" aperture. The `smear' exposures are meant to account for object light excluded from the 3" fibers: the smear frames are assumed to give an accurate measure of the true spectral shape of he objects and are used for spectrophotometric correction.The calibration and science exposures are immediately processed through a quick version of the 2d pipeline run at the telescope (APO2d) to inform the observers whether  he calibrations were successful and to provide S/N diagnostics on the science exposures.

For each science exposure, the $(S/N)^2$ through the SDSS imaging passbands is measured by APO2d and fit as a function of fiber magnitude for each spectrograph camera. The SDSS observers take repeated 15-minute exposures until the cumulative median $(S/N)^2 > 15$ at $g'=20.2$ and $i'=19.9$ in all 4 cameras.  Although these fiber magnitudes  are fainter than most of the spectroscopic targets,measurement at these magnitudes provides a robust  measure of $S/N$ across the range of moon conditions we encounter. These $S/N$ values in  APO2d correspond too $(S/N)^2 > 20$ for the full spectro2d pipeline at the same fiber magnitudes) due to the latter's use of optimal extraction. In clear, non-moony conditions, the $(S/N)^2$ threshold is easily reached in 3 exposures;in (partial) cloudy or moony conditions, more exposures may be required. Currently, the science exposure time is kept fixed at 15 minutes, and a minimum of 3 science exposures is taken to ensure adequate cosmic ray rejection.

Spectro2d: Extraction and Calibration of Spectra

The Spectro2d pipeline, known as idlspec2d comprises a series of IDL processing routines and associated `utility' routines. The version of the code used for the Early Data Release plates is v4.6.2.

The inputs for the 2d pipeline are the raw data files (fiber flats, arcs, and science frames), a two-dimensional pixel flat field image for each camera, the plPlugMap file (information on the spectroscopic target for each fiber), a file describing the arc lamp lines, and 3 files describing the spectrograph hardware: opBC.par   provides information on bad pixels/columns for a given spectrograph camera and observing date; {\it opECalib.par} is the spectrograph CCD calibration file, specifying the electronic characteristics for each chip (i.e., read noise, gain, bias level, linearity corrections, etc); and  opConfig.par provides information on the CCD dimensions (data, bias, overscan regions) and the amplifier configuration.

The  outputs of the 2d pipeline currently passed to the database include: for each fiber, the flat-fielded, sky-subtracted, red-blue merged, exposure-combined, flux-corrected spectrum binned in constant velocity pixels ($\log \lambda$), the noise (inverse variance) in each pixel, mask information (e.g., bad pixels, rejected outlier pixels such as those due to cosmic rays, strong sky lines, etc), the wavelength dispersion, and the target information from the plugmap file.
These outputs are passed to the 1d pipeline for processing. It is anticipated that intermediate outputs from 2d (i.e., the spectrum for each exposure, with associated sky, error, and flux-correction information) will soon be available in the database as well. Spectro2d also outputs a number of diagnostic plots for QA and QC.

Periodically (for example, when the camera electronics is changed),  a pixel-to-pixel flat field image is created  from a stack of flat field images using Spectro2d routines. A number of flat-field images are taken, with the collimator moved between exposures, so that the images of the fibers on each camera are shifted between readouts. Each flat-field image is fit by a model along each column. A number of such images are then stacked, so that the camera chip is essentially covered by the co-added illumination, and the combined model is used to measure and, when applied, remove pixel-to-pixel variations in the instrument response. These variations are expected to be reasonably stable over time.

In normal operations, the Spectro2d pipeline carries out the sequence of tasks below. In the first sequence, the pipeline reduces the individual images (frames)
from each camera of each spectrograph separately. The subsequent procedures are carried out using multiple frames.

The following procedures are carried out using multiple frames. Spectro1d

The 1d pipeline, which analyzes the combined spectra output by spectro2d, iswritten in C and TCL. The version of the code used for the Early Data Release plates is v5.3.2. The code outputs a FITS i mage for each fiber: it includes the 1d spectrum, noise, and mask arraypassed from the 2d pipeline, basic information about the target from  photo and the Target Selection pipelines, as well as line measurements, redshift determinations, and warning flags.

The code is designed to attempt to measure an emission and absorption redshift independently for every targeted (non-sky) object. That is, to avoid biases, the absorption and emission codes operate independently, and they are both independent of any target selection information.

The Spectro1d pipeline performs the following sequence of tasks for each object spectrum on a plate:

Emission line finding, fitting, and redshifts A separate routine searches for high-redshift ($z>2.3$) QSOs by identifying spectra that contain a Lyman-alpha forest signature: a broad emission line with more fluctuation on theblue side than on the red side of the line. The routine outputs the wavelength of the Ly$\alpha$ emission line; while this allows a determination of the redshift, it is not a high-precision estimate, because the Lalpha line is intrinsically broad. Spectro1d effectively treats this as an additional emission-line redshift.

If the highest CL emission line redshift uses lines only expected for QSOs (e.g., Ly$\alpha$, CIV, CIII), then the object is provisionally classified as a QSO.
If any of the identified lines is broader than 500 km/sec (FWHM), then the object is also provisionally classifed as a QSO.These provisional classifications will hold up if the final redshift assigned to the object agrees with its emission redshift.

Cross-correlation redshifts

The spectra are cross-correlated with stellar, emission-line galaxy, and QSO template spectra todetermine a cross-correlation redshift and error. When an object spectrum is cross-correlated with the stellar templates, its emission lines are masked out, i.e., the redshift is derived from the absorption features. The cross-correlation routine follows the Tonry-Davis technique: the continuum-subtracted spectrum is Fourier-transformed and convolved with associated CLs. The corresponding redshift errors are given by the widths of the CCF peaks. The cross-correlation CLs as a function of peak level are empirically calibrated based on manual inspection of a large number of EDR spectra (see figure on CLs vs. success).

The cross-correlation templates are obtained from SDSS commissioning spectra of high $S/N$, and comprise roughly one for each stellar spectral type from B to almost L, two late M-type templates, a non-magnetic and a magnetic WD, an emission line galaxy, a composite Luminous Red Galaxy (LRG) spectrum (from Eisenstein, etal), and a composite QSO spectrum (from Vanden Berk, etal). The composites are based on co-additions of about 2000 spectra each. The template
redshifts are determined by cross-correlation with a large number of stellar spectra from SDSS observations of the  M67 star cluster, whose radial velocity is precisely known.

The cross-correlation redshift is chosen as the one with the highest CL from among all the templates.

If there are discrepant high-CL cross-correlation peaks, i.e., if the highest peak has $CL < 0.99$ and the next highest peak corresponds to a CL that is greater than 70% of the highest peak, then this is given a warning flag (see below). In this case, the code extends the cross-correlation analysis for the corresponding templates to lower wavenumber and includes the continuum in the analysis, i.e., it chooses the redshift based on which template provides a better match to the continuum shape of the object. These flagged spectra are also  manually inspected as a back-up.

Final Redshift and Classification

Spectral Information

Once the final redshift has been determined, the pipeline computes additional spectral information about the object:

For galaxies, we compute the following absorption-line strengths:

Gaussians are fit at the positions of all expectedemission lines in the reference list (not just the common lines).

Galaxies are classified by a PCA analysis, using cross-correlation with
eigentemplates (see Connolly, etal). The code outputs 5
eigencoefficients and a classification number.

Redshift Warning flags

Spectro1d outputs a series of warning flags. These provide additional compact information about the spectra for end users and are used in certain
combinations to trigger manual inspection of a subset of spectra on every plate.

Manual Inspection of Spectra

A small percentage of spectra on every plate are inspected manually and, if necessary, the redshift and classification corrected.Currently the algorithm used to trigger manual inspection is the following.

Spectroscopic Pipeline Testing and Performance

In order to assess the performance of the spectroscopicpipeline, we carry out a number of internal and external checks. A subset of 39 EDR plates (comprising about 23,000 spectra)is used extensively for validation of the pipeline. Every spectrum on the validation plates has been manually checked,and a `truth' table with the manually determined (or manually confirmed) object classification and redshift has been constructed for each of these plates. Whenever a new version of the 2d or 1d pipeline is `tagged', the updated version of theentire pipeline is used to re-process the validation plates andthe results compared with the truth tables. This validation procedure allows us to assess the performance of the hardware, of the automated pipeline, and of the manually-corrected pipeline, to identify systematic problems in the pipeline and the data, to check that the redshift confidence levels are empirically accurate, etc.

Based on the validation plates, the version of the pipeline used for the EDR(with manual inspection triggered as above) has the followingestimated performance statistics:

Classification:
99.7 $\%$ Galaxies correctly classified,
97.9 $\%$ QSOs correctly classified,
99.1 $\%$ Stars correctly classified

Redshifts:
99.7 $\%$ Galaxy redshifts correct,
98.0 $\%$ QSO redshifts correct,
99.6 $\%$ Star redshifts correct.