|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TRANSLATIONAL PHYSIOLOGY
1Cardiology Division and Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, Massachusetts; 2Department of Pediatrics, University of Colorado Denver and Health Sciences, Denver, Colorado; 3Division of Allergy, Pulmonary, and Critical Care Medicine, Department of Medicine, Vanderbilt University, Nashville, Tennessee; 4Mass Spectrometry Research Center, Vanderbilt University Medical Center, Nashville, Tennessee; 5Department of Laboratory Medicine and Pathology, Mayo Clinic and Mayo Foundation, Rochester, Minnesota; 6Division of Cardiovascular Diseases, Mayo Clinic, Rochester, Minnesota; 7Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts; 8Cardiovascular Division, Brigham and Women's Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts; and 9National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland
Submitted 24 January 2008 ; accepted in final form 27 April 2008
| ABSTRACT |
|---|
|
|
|---|
protein content; proteome
To directly address some of these challenges, the National Heart, Lung, and Blood Institute's (NHLBI) Clinical Proteomics Programs were established in 2005. The overall goal of the NHLBI Clinical Proteomics Programs is to promote systematic, comprehensive, large-scale validation of existing and new candidate protein markers that are appropriate for routine use in the diagnosis and management of heart, lung, blood, and sleep diseases. Specific goals include 1) to design panels of candidate proteins for unmet clinical disease areas; 2) to develop high-throughput analytical methods to simultaneously assay multiple putative markers; 3) to assess the predictive value of these proteomic measurements using biological specimens and clinical data from existing study populations; and 4) to establish procedures and standards for quality control.
In October 2007, the steering committee of the NHLBI Clinical Proteomics Programs met in Rochester, Minnesota. The meeting focused on current perceptions of the barriers to achieving rapid and effective translation of plasma protein biomarkers from discovery to clinical use, and importantly, research directions aimed at overcoming existing limitations. Here, we summarize highlights from this meeting. Table 1 presents an overview of the key challenges that were identified and strategies for overcoming these challenges.
|
|
Because it is difficult to demonstrate improvements in the AUC, some investigators have advocated use of other metrics to evaluate new biomarkers, such as model calibration (10) and reclassification percentage (31). The former refers to the correspondence between predicted event rates and observed event rates and is assessed quantitatively using goodness-of-fit measures such as the Hosmer-Lemeshow statistic. Reclassification refers to the ability of new biomarkers to move people between discrete risk categories, so that some low-risk individuals may be reclassified as high risk and vice versa. The utility of these alternative metrics remains under evaluation. For instance, reclassification relies on the use of cutpoints to separate patients into risk categories. Movement between risk categories may not be meaningful if the cutpoints are not well accepted, if there is no management strategy explicitly linked to the categorization, or if most of the movements involve small shifts in absolute risk from just below to just above the cutpoint (45).
Challenges inherent to plasma proteomics. Of the many sources available for identifying new biomarkers of clinical disease diagnosis and severity, the proteome offers the most promise for identifying previously unknown biomarkers that have the potential to be from novel pathways and to be complementary to previously identified biomarkers. The promise of proteomics may exceed that of genomics approaches because proteins and their biological and enzymatic activities are the major determinants of the diversity of phenotypes that can manifest from a common set genes. The complement of expressed proteins changes rapidly in response to environmental cues. Thus, the proteome is highly suited to represent the state of a cell, tissue, or organism at a given time, in the context of a specific stimulus.
One of the barriers to successful plasma biomarker identification and translation from discovery platforms is the high level of complexity of the proteome. This complexity presents unique analytical challenges that are further magnified with the use of clinical plasma samples to search for novel biomarkers of clinical disease. The plasma proteome is composed of tens of thousands of unique proteins. The plasma proteome does not result from expression of a particular cellular genome; rather, it reflects contributions from the collective expression of many cellular genomes. In fact, it has been hypothesized that the estimated complement of over 300,000 human polypeptide species arising from variable splicing and posttranslational modifications could be present in the plasma proteome (3). Proteins from all functional classes and cellular localizations are found in the plasma, and a majority of the lower abundance proteins are intracellular or membrane proteins, presumably found in the plasma as a result of cellular turnover.
One of the challenges inherent to plasma or serum studies is the issue of high abundance proteins. Greater than 95% of the serum proteome is composed of
20 high abundance proteins including albumin and the immunoglobulins (3). These high abundance proteins hinder the ability to detect low abundance proteins. However, it is the low abundance proteins that are most likely to be biologically relevant as markers of a disease state. Concentrations of low abundance proteins may differ from those of high abundance proteins by as much as 10 orders of magnitude. For example, plasma levels of markers of myocardial injury such as the troponins are in the nanomolar range, levels of insulin are in the picomolar range, and levels of the proinflammatory cytokine TNF
may be in the femtomolar range. To address the issue of high abundance proteins, mass spectrometers with wider dynamic ranges have been developed. In addition, there have been advances in technologies that allow depletion of high abundance proteins. New immunodepletion strategies efficiently remove as many as 20 of the high abundance constituents (7). Techniques for immunoextraction and concentration of targeted biomarker fragments may be more reliable (2, 11, 24, 28, 38, 49). However, the degree to which relevant low abundance proteins are lost during processing to remove high abundance proteins is unclear and may be highly variable. A recent study reported that albumin depletion also removed 58% of IL-6, 60% of TNF, and 74% of IL-8 (14).
Challenges in biostatistical analysis of proteomic and biomarker datasets. Although a high-throughput proteomic approach to plasma biomarker discovery has many advantages, it also brings a danger of generating false positive associations due to multiple testing and overfitting of data. Application of traditional statistical approaches (e.g., Bonferroni correction) in this setting tends to levy an insurmountable statistical penalty that can obscure biologically relevant associations. Even newer statistical techniques, such as advanced resampling methods or control of the false discovery rate (40), do not address adequately the fundamental problem of how to detect subtle but important changes in multiple variables identified with high-throughput proteomic approaches.
In contrast to traditional statistical approaches, a bioinformatics approach using pathways analysis harnesses the vast information gathered in proteomics experiments and turns it into a strength. Specifically, although measurement error in the marker discovery phase often prevents high confidence in any one particular protein's correlation, the observation that multiple proteins in a particular biological pathway are moving in tandem brings confidence that a particular pathway, and hence any biomarkers in that pathway, truly are correlated with the perturbation. By utilizing a more principled selection process for candidate marker triage, this approach increases the likelihood that candidate biomarkers will be validated in subsequent prospective validation studies. This approach also enhances the ability to use the proteomics data collected in the biomarker discovery phase to gain insight into disease biology. Identification of relevant pathways facilitates focus on other biomarkers in a perturbed pathway that may not have been identified in traditional screens as well as exploration of these pathways as possible targets for therapeutic intervention.
Although not yet widely used in proteomics, systematic analysis of functional trends has become widespread and important in the analysis of DNA microarray data from model organisms. An early use of this approach was an analysis by Tavazoie et al. in 1999 (41), in which clusters of genes with mutually similar expression in a synchronized Saccharomyces cerevisiae time-course experiment were examined. In this study, each cluster of genes was examined for overrepresented functional annotation trends (41). This study not only rigorously demonstrated the intuitive notion that coexpressed genes often share a function, but also objectively highlighted specific functional trends, e.g., that budding and cell polarity genes are overrepresented among genes expressed in the M-phase of the cell cycle.
The value of this approach in human studies was illustrated in a recent analysis of high-throughput differential mRNA expression (27). Expression of mRNA was assessed on more than 22,000 genes comparing patients with type 2 diabetes mellitus and unaffected controls (patients with normal glucose tolerance). A group of genes with depressed expression in diabetes vs. controls was identified and tested for association with a collection of other gene characteristics. It was found that this gene set was enriched for genes involved in oxidative phosphorylation. Although individual oxidative phosphorylation genes were not dramatically reduced in expression, as a group the trend was highly significant. Furthermore, the effect was attributable to a subset of oxidative phosphorylation genes regulated by peroxisome proliferator-activated receptor coactivator 1, a cold-inducible regulator of mitochondrial biogenesis. Thus, the analysis of trends among differentially expressed genes led directly to insight into altered metabolism in diabetes patients and hinted at therapeutic hypotheses involving the modulation of oxidative phosphorylation pathways.
Emerging software tools, including FuncAssociate (5), recently described by Berriz et al., may be used in conjunction with essentially any high-throughput experimental approach for identifying or ranking genes or proteins. Furthermore, although this approach has generally been used in conjunction with controlled vocabulary functional annotation, e.g., Gene Ontology (GO) annotation, it can be used in conjunction with many different sources of gene/protein/metabolite annotation, e.g., expression pattern in other studies, phenotype, protein complex membership, disease association, or phylogenetic profile.
Strengths and limitations of current multiplexing platforms for biomarker validation. Having established which novel plasma biomarkers are of sufficient interest for validation, emerging technologies allow us to assay multiple markers at once. Below we discuss the strengths and limitations of several multiplex platforms. Strengths and limitations are also summarized in Table 2.
|
ELISA, a "workhorse" for protein measurement for decades, has been adapted for multiplex biomarker assessment. Multiplex immunoassay assay formats may include suspension arrays or planar arrays. For planar arrays, analytes are traditionally detected using fluorescent or chemiluminescent sandwich immunoassay principles in which immobilized "capture" antibodies complex with protein in a biological sample and "detection" antibodies linked to reporter molecules bind the captured protein to create a "sandwich." The signal generated by the reporter molecules is directly proportional to protein concentration in the unknown sample.
In bead-based suspension arrays, capture antibodies are immobilized on polystyrene microspheres (beads) suspended in buffer. Biological sample is added to mixtures of the beads, and a detection antibody-fluorophore conjugate binds the captured protein. Flow cytometric systems allow simultaneous discrimination of bead types and quantification of captured sample proteins. As beads pass through laser beams housed in the flow cytometer, the reporter fluorophore is excited and emits light that is converted to a numeric signal by internal digital processors. Simultaneous excitation of internal bead dyes allows measurement of bead fluorescent intensity that is unique to each assay and used to assign fluorophore values to the correct assay. Several bead-based suspension arrays are commercially available (4, 6, 23, 25, 34). Advantages include in-house assay development, automated nature, and avoidance of the need to spot antibody material (19). A disadvantage is that nonspecific binding of serum proteins directly to the microspheres may result in bead aggregation and nonspecific fluorescent emission, thereby limiting assay sensitivity and accuracy (48).
For planar arrays, capture antibodies are discretely immobilized on a rigid microplate surface using robotic arrayers (13). Antibodies can be spotted directly onto the plate's surface using tiny pins (51) or by noncontact arrayers that use piezoelectric elements to transfer capture material to the microplate surface (51). Planar array protocols are comparable to traditional ELISAs and typically use a camera to detect chemiluminescent signal. Numeric values are generated based on the density of light image spots, and data are assigned to a specific assay based on the intra-well location of the light spot. Planar multiplex platforms that can assay 9–16 analytes concurrently are commercially available (12, 26). Technical limitations include the possibility of damage to the capture antibodies by mechanical forces during spotting, and the large dynamic range of serum protein levels, a factor that limits combination of analyte proteins within an array.
For validation of panels of biomarkers by either suspension or planar antibody array technologies, well-characterized multiplex assay components are needed to ensure that the data derived from multiplex assays are useful in the clinical setting. Capture and detection antibody materials should be well characterized and exhibit minimal interlot variability. A sustainable source of antibodies with adequate specificity is key, and lack of specific antibodies can be a major impediment to both singleplex and multipex assays. Ongoing large-scale efforts to generate antibodies against human epitopes will undoubtedly provide new reagents for multiplex assay development (30). Also, there is a need for validated reference standards that allow accurate and consistent quantification of proteins. Multiplex arrays are classified as in vitro diagnostic multivariate index assays by the Food and Drug Administration (FDA) center. Although formal regulatory guidance for clinical validation of multiplex assays is lacking, regulatory requirements are under review (43).
Multiplex biomarker measurements with mass spectrometry. Currently, the core technology for identification of novel plasma protein biomarkers is tandem mass spectrometry (MS/MS). Tandem mass spectrometry, coupled with upfront liquid chromatography, is applicable to the readily accessible biological fluids (serum, plasma, urine, etc.) and is highly sensitive for peptides and other small molecules. Recent advances in MS/MS now enable researchers to determine masses of analytes with high precision and accuracy such that many peptides and metabolites can be identified unambiguously even in complex fluids. With a wealth of novel proteins being found in discovery efforts, the emerging field of clinical proteomics focuses on the triage and validation of newly identified protein biomarkers.
Beyond its utility as a platform for biomarker discovery, mass spectrometry with selected reaction monitoring is a powerful tool for identification of small molecules such as drugs and steroids (15, 22) and is increasingly being applied to peptide analyses (20). This method has exquisite specificity due to the unique mass-to-charge ratios of these molecules and the corresponding daughter ion fragments of the selected target peptides. High sensitivity also can be achieved if mass spectrometry is coupled with immunogenic extraction and enrichment methods to increase low concentration target selection (2). Multiplex combinations of these peptide molecules can be quantitated by comparison to isotypically labeled internal standards that are similar to the endogenous substances but are shifted by a small number of mass units (15). The wider application of this technique to measure low concentrations of protein biomarkers requires procedures to cleave the proteins into smaller segments (2, 49) and methods to extract or concentrate the biomarker segments. The basic processes for mass spectrometry measurement of peptide digestion fragments of biomarkers are illustrated in Fig. 2.
|
Once the key peptide sequences have been selected, both the natural peptide (with a conjugation site such as cysteine added) and an isotopically labeled form of the peptide (often 13C) can be readily synthesized using automated platforms. The natural peptides can be conjugated to a carrier protein and used to make polyclonal and/or monoclonal antibodies. The isotopically labeled peptides are used as internal standards for the mass spectrometry (2).
Clinical specimen considerations. Validation of potential biomarkers of diagnosis or disease severity requires the use of biological fluids from substantial numbers of patients with the relevant disease and appropriate controls. Banked specimens that have been collected in conjunction with prior clinical trials or observational studies have the advantage of immediate availability in conjunction with well-phenotyped patient populations. However, the use of banked specimens also has significant limitations. Foremost, a clinical trial is usually not designed with the goal of validating a diagnostic or disease severity biomarker. Strict inclusion and exclusion criteria may limit the generalizability of findings. Appropriate controls may not be included in the study population. For studies of disease states such as myocardial ischemia, the inherent unpredictability of the onset complicates the timing of blood sampling. The effects of the clinical intervention may also affect the biomarker that is being studied or may confound any association with disease severity (21, 29). Finally, the sample collection and storage procedures may not be sufficiently uniform for reliable biomarker assays.
There are a number of preanalytic variables that can affect the validity of biomarker assays, and these variables need to be considered when designing validation studies and assessing the potential utility of banked specimens (35). Preanalytic variables that can affect assay validity include the method of sample collection, the type of anticoagulants or preservatives that are used, the procedure used to process the sample, the time between collection and assay, and the storage conditions used during this interval. Freezing and thawing, especially repetitive freeze-thaw cycles, may be particularly harmful to some protein analytes (44). Protein degradation can occur at any time from sample collection to time of assay. Investigators (36) in the Vanderbilt Clinical Proteomics program found that significant protein degradation was ongoing in plasma samples that were collected with EDTA and allowed to remain on ice for 7–8 h (Fig. 3) before reverse phase purification for a MALDI-TOF-based discovery platform. Furthermore, archival plasma samples that had been collected as part of a randomized clinical trial of two ventilator strategies in patients with ARDS (42) also showed significant evidence of protein degradation, suggesting that sample collection procedures in that study may not have been optimal for discovery proteomic studies. These findings, in conjunction with a growing body of published literature (reviewed in Ref. 35) on the importance of standardization of sample collection and processing for discovery proteomic studies, suggest that the potential limitations of archival samples must be carefully assessed before using these samples for discovery or validation of protein biomarkers. For some applications, prospective sample collection may be required, although this approach, by necessity, is more time-consuming and expensive. Biomarkers that are selected from smaller, carefully phenotyped cohorts that are prospectively collected can be subsequently validated in larger, more heterogeneous populations.
|
Biomarker discovery and validation in pediatrics also occurs on a background of growth and development. Since we do not have a good understanding of the normal proteome in children, it can be difficult to determine whether protein products are related to disease or to normal growth and development. For example, Winfield et al. (50) found that urinary desmosine levels, a breakdown product of elastin, were much higher in normal infants less than 2 yr old compared with older children. A possible interpretation of this is that the lung is undergoing remodeling as part of normal development in infants. In any case, it illustrates that before elastin breakdown can be studied as a biomarker of lung injury in pediatrics, studies of normal children of all ages are necessary.
Conclusions. With the explosion of genetic and genomic studies of human disease, including the growing number of genome-wide association studies, there is a critical need for complementary proteomic technologies. The potential for plasma proteomic analysis to identify and quantify novel proteins that can function as plasma biomarkers of the presence or severity of clinical disease continues to hold great promise for clinical use. Standardized approaches to sample collection and preparation, new analytical techniques, and novel algorithms for biostatistical and bioinformatics analysis will facilitate the translation of plasma proteomics from the bench to the bedside and allow the great potential of clinical proteomics to be realized.
| FOOTNOTES |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
Z. Ali, P. Sarcia, T. H Mosley, V. Kondragunta, and I. J Kullo Association of serum myeloperoxidase with the ankle-brachial index and peripheral arterial disease Vascular Medicine, August 1, 2009; 14(3): 215 - 220. [Abstract] [PDF] |
||||
![]() |
L. B. Ware and M. A. Matthay Beyond fishing: the role of discovery proteomics in mechanistic lung research Am J Physiol Lung Cell Mol Physiol, January 1, 2009; 296(1): L12 - L13. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |