Guidelines for annotating FCS files

FCS files contain raw flow cytometry data (often referred to as "listmode data") and can be accessed by most flow cytometry software. This document describes some basic guidelines to use for annotating your data at the time of sample acquisition that will prevent confusion when you or your colleagues wish to examine the data at some future date. This document was written with the program CellQuest in mind (because that was all that we had at the time), but I assume that the principles will apply to other software packages as well.

Contents of this document

The Goal of FACS Data Annotation

The goal of FACS data annotation should be to include enough information within the FCS file to allow someone else to make sense of the data without requiring crossreferences to your notebook. Return to Top

Filenames and Directories

The name of a FCS file can contain significant information that will give someone an idea about what is in the file without even opening it. Therefore, you are strongly urged to eschew default filenames such as "Data.001", which are completely uninformative. Although filenames often cannot provide a complete description of the contents of the file, there are number of different systems that can be used for naming files.

  • Filenames should include your initials. I strongly urge you to develop a system in which the filenames for your FACS data begin with you initials. This immediately indicates who created the file (in fact, as far as I know, there is no parameter in the standard FCS specification to store the "creator" of the file; I consider this to be an oversight).

    Here are a small number of instances where you might consider breaking this rule: 1) you are collecting data for a core laboratory; 2) you are collecting data for a large study that has developed its own file naming conventions; 3) you have other significant information that you'd like to indicate in the file name, and you do not have enough characters available due to limitations of the operating system.

  • Filenames that reference notebook pages. You might consider establishing a system for naming of your FCS files that include a reference to your laboratory notebook. For example, you might use a name such as "JDA-IV-07-M3.001", which refers to data collected for "mouse #3" of experiment 07 contained in JDA's notebook IV.
  • Filenames that contain dates. You might consider a filenaming system that includes a reference to the date in the filename. For example, you might use a name such as "JDA_010826", where the first two digits indicate the year (2001), the second two indicate the month (08 indicates August), and the last two indicate the day of the month (26). I strongly recommend this format for the date because a computer will properly sort filenames or directories in chronological order if they are named this way (note that this won't happen if you put the year last, or if you use alphabetic abbreviations for the month). I also recommend that if you use filenames that contain dates, you also include in the filename additional information that tells you something about the experiment, distinguishing the data from data that you collected on other dates.
  • Filenames that contain subject identifiers. This scheme is particularly useful and highly recommended for longitudinal studies. For example, we have carried out a study of T cell responses in a number of rhesus macaques for over a year, and the filenames always included the name of the monkey (e.g. "MH.000814.RLc5.001", where the name of the monkey is "RLc5").
  • Filenames that contain "study names". Suppose you are collecting data as part of a clinical trial for the HVTN. Most of these trials are assigned a study number (e.g. HVTN203). You might consider a file naming system that includes the study name.
  • Filenames that contain other information. Anything significant is fair game for inclusion in a file name. You might find it informative to include strain names (e.g. B6, C3H, or Balb), or tissue source (e.g. PBMC, "WB" for "whole blood", "spl" for spleen, etc.)

Remember the following additional guidelines.

  • Make sure that your filenames are unique, even over time. For example, in a longitudinal study, do not give a data file a name that carries only a subject identifier (e.g. "RLc5.001") since you are likely to collect data files for that animal on another date. While it is true that in most cases these files will be stored in different directories, why risk inviting confusion?

Finally, directory names should also be informative as well. While it is unlikely that your directory names will include subject identifiers, tissue sources, or even strain names, they should include your initials and an indication of either the date or a reference to your notebook. Return to top.

Parameter Descriptions

The parameters descriptions that are entered in the Parameter Description dialog of CellQuest are stored in the FCS file and are available to any program designed to process the data in those files. The more careful you are in entering informative descriptions, the easier it will be for collaborators, colleagues, advisors, or your future self to use your data. The following guidelines for annotating parameters should be followed.

  • Include the fluorophore name as well as antigen. The name of the fluorophore will then be carried through to the output. This is important because it often allows people examining the data to determine whether the data was well compensated, of if there were other problems.
  • To include or not to include "anti"? Your reagents are usually antibodies against something (e.g. CD4), but they are measuring the density of something (e.g. CD4) on the cell surface. For the most part, I'd recommend not including "anti" in most circumstances.
  • On inclusion of clone names. This is a bit of a conundrum. Obviously, inclusion of the antibody clone names in the parameter descriptionÑe.g. "CD45RA (UCHL-1) FITC"Ñprovides significant and sometimes crucial information. I will never discourage this practice, but I recognize that it often makes things unwieldy, and in most cases it is ok if the clone names are omitted. However, if the properties of certain clones are critical to the experiment (e.g. the difference between the anti-mouse CD8 antibodies 53-6.7 and CT-CD8a), then the clone name must be included in the parameter description.
  • On inclusion of titers. In most cases, reagents should have been carefully titered before hand and it should be safe to assume that the reagents have been used at the optimal concentration (oh, if this were only the case!). Therefore, in most cases it is permissible to omit the quantity of the reagent used to stain the cells. However, when performing the original titration experiments, the quantity of the reagent used should be included in the parameter description (e.g. "CD45RA (UCHL-1) 1:100 FITC").
  • Indirect stains. When using indirect stains, it is probably best to include both the primary and secondary reagent in the parameter description. For example, "anti-Qa1b / GAM-PE" is preferable to the less accurate "anti-Qa1b-PE" (GAM is a standard abbreviation for "goat anti-mouse").
  • Antigens defined by antibody names. Perhaps the best known example of this is the antibody Ki-67, which bound to an antigen that was not identified at the time the antibody was isolated. Subsequently, the protein was identified, but is still referred to as the Ki-67 protein.
  • MHC Tetramers. These are most often referred to by including the name of the MHC allele and the name of the peptide. It would be more correct to including some indication that the reagent is a tetramer (e.g. "(A2/HIV-pol)4"), but this is unwieldy and probably not necessary to make clear what reagent was used.
  • Reagents which are ligands for a receptor. A good example of this is the MIP-3b "chemotetramer". Simply referring to this as "MIP-3beta PE" risks confusing it with an antibody directed against MIP-3b. One way to avoid the confusion would be to refer to the parameter as "CCR7 (MIP-3beta) PE".
  • Stains specific for cytokines. Naming of stains that are specific for cytokines produced in response to specific stimulation present a unique set of "problems" that merit an entire section of their own within this document. This is included below. Return to top.

Proper naming of stains specific for cytokines

  • What is wrong with "IFNg FITC"? According to the guidelines above, many investigators might find it appropriate to simply include labels such as "IFNg FITC". While this is not strictly incorrect, it is possible to include significant information that would permit someone examining the data to better interpret its meaning.
  • Include stimulus in parameter name. The data contained in the FCS files will be much more informative if you include the stimulus in the parameter name. For example "SEA / IFNg FITC" or "LCMV.GP33 / IFNg FITC" are much preferable to "IFNg FITC".
  • Include concentrations in parameter name. In all cases, the responses that are you are measuring are dose-dependent, whether it is explicitly acknowledged or not. Therefore, the best names for cytokine parameters include the concentration of the stimulus as well. For example "CM9 (10 µg/ml) / IFNg FITC" is the prefered name.
  • What about names for CD69 or other activation antigen? This is a bit of a problem. Obviously the comments in the preceding paragraph would appear to apply to CD69 as well as IFNg. However, there is often more non-specific activation of CD69, so I recommend that you do not include antigen-specific information in the parameter description for CD69.
  • Why not include time of stimulation as well? While this carries significant information, in most cases it is too unwield. However, if you are doing a time course experiment,  the time of stimulation should be included in the parameter description (e.g. "CM9 (10 µg/ml; 2 hr) / IFNg FITC"). Return to top.

Patient ID

For the most part, I don't use these because I've included most of the important information elsewhere. It may be the case that I'm missing something and these can really be an important part of optimal annotation of FCS files. Please let me know your thoughts about this in the context of the comments in the document above. Return to top.

Sample ID

For the most part, I don't use these because I've included most of the important information elsewhere. It may be the case that I'm missing something and these can really be an important part of optimal annotation of FCS files. Please let me know your thoughts about this in the context of the comments in the document above. Return to top.

 
Site by John Altman

Last Modified: Wednesday, December 24, 2003