## Abstract

The growing awareness of the importance of accurate morphometry in lung research has recently motivated the publication of guidelines set forth by a combined task force of the American Thoracic Society and the European Respiratory Society (20). This official ATS/ERS Research Policy Statement provides general recommendations on which stereological methods are to be used in quantitative microscopy of the lung. However, to integrate stereology into a particular experimental study design, investigators are left with the problem of how to implement this in practice. Specifically, different animal models of human lung disease require the use of different stereological techniques and may determine the mode of lung fixation, tissue processing, preparation of sections, and other things. Therefore, the present companion articles were designed to allow a short practically oriented introduction into the concepts of design-based stereology (Part 1) and to provide recommendations for choosing the most appropriate methods to investigate a number of important disease models (Part 2). Worked examples with illustrative images will facilitate the practical performance of equivalent analyses. Study algorithms provide comprehensive surveys to ensure that no essential step gets lost during the multistage workflow. Thus, with this review, we hope to close the gap between theory and practice and enhance the use of stereological techniques in pulmonary research.

- design-based stereology
- quantitative morphology
- microscopy

### The Challenges of Measuring Lung Structure by Microscopy and How to Handle Them by Stereology

what reasons do we have to measure lung structure? There are many possible and good reasons (43, 45), but of particular relevance in biomedical research is quantitative phenotype analysis of animal models of lung disease. Quantitative information about lung architecture in these models is necessary for assessing pathological alterations as well as treatment effects and to allow statistical comparisons among experimental groups. The basic parameters that describe the internal lung structure in quantitative terms are characterized by their dimension: volume (dimension 3), surface (dimension 2), length (dimension 1), number (dimension 0). Regarding the lung, such parameters may be the volume of alveolar septal tissue, the surface of the alveolar epithelium, the length of nerve fibers innervating the conducting airways, or the number of alveoli. All of these parameters reflect particular aspects of normal lung function, and they may change in the course of disease.

#### Problems of quantitative microscopy.

One has to be aware of particular problems that occur when lung structure is measured by microscopy. First of all, the amount of tissue that is investigated under the microscope is often infinitesimally small in relation to the whole organ. In qualitative disease studies, this fact causes the necessity to sample the lung in a way that guarantees that many pathological lesions are seen and analyzed. In a quantitative study, this procedure would lead to a strong bias, leading to overestimation of pathological lesions, and means that, in fact, the problem of size reduction warrants that the chosen samples need to represent the whole organ. Hence, they have to be distributed randomly over the whole organ to make sure that each part has an equal chance of being selected and analyzed. The second problem results from the more or less two-dimensional nature of a microscopic section, which results in qualitative and quantitative changes in the appearance of the sectioned structures. This raises the question how the three-dimensional quantitative features of these structures are represented in a two-dimensional section. The short answer is: Every feature loses one dimension. In other words, a volume is represented by an area (the larger a structure, the larger the area it occupies on a section); the surface area is represented by a line (the larger the surface area of a structure, the longer its boundary line on a section); the length is represented by the number of transects (the longer a structure, the higher is its chance to be seen as a transect in a section); and the number of a structure is simply not represented within one two-dimensional section (see Table 1). Thus three-dimensional information about the internal fine structure of the lung is lost in nearly two-dimensional histological lung sections. The lost information cannot be gained back by exhaustive image analysis of each pixel in digital micrographs but needs a proper scientific approach (see the later discussion about accuracy and precision).

#### Stereology is the gold standard for lung morphometry.

The solution to these problems lies in the application of stereology. It can be defined as the science of sampling structures with geometric probes (44). The mathematical foundations of stereology are derived from stochastic geometry (2, 30). The term stereology was coined in the early 1960s (for review, see Refs. 6 and 42). Global activities in the field of stereology are coordinated by the International Society for Stereology (www.stereologysociety.org). Stereological principles and their application to the lung have been reviewed (3, 8, 22, 31, 33, 37, 44, 45). An official research policy statement of the American Thoracic Society and the European Respiratory Society defined the standards for quantitative assessment of lung structure by stereology (20). The purpose of the present companion articles is to provide specific recommendations for the application of stereology to particular animal models of lung disease such as acute lung injury, lung fibrosis, emphysema, pulmonary hypertension, and asthma. Part 1 covers the basic concepts that serve as a framework for Part 2, which, in a problem-based approach, provides recommendations of useful parameters and worked examples to facilitate the implementation of stereology in practice.

### How Does Stereology Work?

#### 3D structures require 3D analysis.

Practical stereology is the application of unbiased sampling and measurement principles to obtain quantitative data about structures in 3D based on nearly 2D (physical or virtual) sections through the structures by using 3D (or geometric) probes. Thus stereology provides biologically meaningful 3D data. In most cases, the datasets that are analyzed by stereology are generated by light or electron microscopy. It should be noted, however, that stereology is not restricted to conventional microscopy. It can be applied to any type of imaging dataset. Recent applications to the lung include immunoelectron microscopy (27, 34, 36), micro-computed tomography (28, 40), and scanning laser optical tomography (21).

#### Sampling of location and orientation.

Practical stereology consists of two main parts: sampling and measurement. The principles of unbiased sampling have to be applied at all levels of the microscopic analysis (i.e., from the selection of animals in a study group via the selection of lung tissue blocks from one animal, to the selection of fields of view on a microscopic section, and finally the measurements in these fields of view). Sampling has to take into consideration randomization for location (i.e., giving each part of the lung an equal chance for being sampled) and, for certain parameters such as surface area/length of airways and blood vessels, randomization of orientation (i.e., giving each spatial orientation an equal chance for being selected by the sectioning process). There are several ways of unbiased sampling of location that mainly differ in their efficiency (15, 25). An easy and efficient way of sampling is systematic uniform random sampling, where the first item is chosen randomly and the subsequent items are selected by a predefined step length (the sampling interval). The fractionator (11) and a variant thereof, the smooth fractionator (13), form an even more efficient way of sampling. It is based on keeping track of the sampling fraction. If a certain number of objects is counted in a known fraction of an organ, it can easily be multiplied by the reciprocal of the sampling fraction. Although this is most useful and applicable to number estimation (where it carries the advantage that tissue shrinkage during processing does not affect the results), it may be combined with more traditional approaches of relating an estimation (e.g., of the surface area of alveolar septa in the lung) to the volume of the whole organ (which, however, is usually determined before embedding and, hence, carries the problem of tissue shrinkage between volume measurement and section analysis). A new method, the proportionator (10), is a deliberately biased way of sampling those areas where the structures of interest occur with higher probability and by assigning certain weights to the sampled areas, thus keeping track of the degree of sampling bias. The latter is used to recur to an unbiased estimate. This approach is particularly efficient for the analysis of rare events. Although the proportionator is a very promising technique in theory, its implementation into practical lung stereology has been limited in application. For certain parameters (particularly those that involve lines for surface area and planes for length), spatial orientation also has to be randomized. This is achieved by specific slicing or embedding procedures during sampling and processing of tissue blocks. As a result, tissue orientation is either made fully isotropic in 3D (orientator or isector method, see Refs. 24 and 32) or made isotropic in 2D, thus keeping the orientation of layered structures (vertical sections, Ref. 1). The latter method requires a sine-weighted distribution of measurements (e.g., cycloid test lines) for the randomization in the third dimension.

#### Which test system for which parameter?

The measurements at the final level of sampling are performed by using geometric probes. The basic rule here is that the dimension of the parameter one wants to measure plus the dimension of the geometric probe used for measurement has to equal at least 3 (see Table 1 and Fig. 1). These probes consist of sets of test points (dimension 0; used for volume which has dimension 3), test lines (dimension 1; used for surface which has dimension 2), test planes (dimension 2; used for length which has dimension 1), or test volumes (dimension 3; used for number which has dimension 0). The application of these probes to a microscopic section creates interaction events with the sectioned structures that can be counted digitally. Thus the measurements are reduced to simple counts. For example, test points “hit” the area that represents the sectioned volume (point counting for volume), test lines “hit” (intersect) the boundary line that represents the sectioned surface (intersection counting for surface), and test planes “hit” the transects that represent the sectioned length (transect counting for length). All these measurements can be done on single thin sections. An exception is number. The number of particles in 3D cannot be determined without bias from particle profiles in nearly 2D single sections because the basic rule that the dimension of the parameter (number has dimension 0) plus the dimension of the test system (on single thin sections, not more than dimension 2 is possible) has to equal at least 3 is violated. Number estimation requires a 3D test system. Therefore, either pairs of sections with known *z* distance (physical disector) or thick sections through which a focal plane is sweeping in *z* direction (optical disector) are used to create test volumes. These test volumes (disectors) “hit” the particles in the sense that they contain a unique point of the particle (e.g., the particle top or the nucleolus of a cell) that defines the number of particles in 3D irrespective of their size (top counting in disectors for number).

#### How to design a test system.

Although the parameter under investigation determines the probe, the actual design of the test grids has to be performed by the investigator according to principles of usefulness and efficiency. For example, it is useful to combine point and line grids by using a grid consisting of line segments, where the end points of the line segments are used for point counting. Also double test systems combining a coarse and a fine lattice (or coarse or fine points) are very efficient because they allow the simultaneous counting of structures with frequent (coarse lattice or points) and rare (fine lattice or points) counting events. In these coherent test systems, the different probes that are combined are quantitatively related to each other so that the unit test line length per test point can easily be determined. A discussion of the generation of test systems in general and also a free method to generate flexible test systems for both light and electron microscopy have been provided (39).

#### The reference space.

Because the counting of interaction events between probe and biological structure is performed at the section level, it is always referred to a reference volume [e.g., the whole lung determined by Archimedes' principle (35) or by the Cavalieri estimator (20), the lung parenchyma, or the wall components of the airways]. Thus it is a ratio or most often termed density between the volume, the surface area, the length, or the number of an object and the volume of the structure it is related to. It lies in the nature of a ratio that it may be changed by variations in the numerator and/or the denominator. Ratios are therefore prone to misinterpretation and should always be converted to total values, e.g., the total number of alveoli in the lung instead of the number of alveoli per unit of lung tissue. Horror examples of misinterpretations based on ratios (the so-called reference trap) are frequent and have a high adverse impact on science (see Ref. 4). Obviously, it would not be efficient to relate the volume of a cellular organelle to the whole lung (e.g., lamellar bodies) because that would mean that the counting procedure would involve analyzing fields of view over the whole lung. Thus the vast majority of test fields would contain no alveolar epithelial type II (AE2) cell, and an enormously large number of fields of view would have to be sampled. The result would not be biased, but the effort to yield a precise result would be huge. In such cases, multicascade sampling (see examples in Part 2) is used, which means that estimations of the reference volume are performed at various hierarchical levels. In the example of lamellar bodies, we could start by estimating the volume fraction of the parenchyma in relation to the lung [V_{V}(par/lung)]. In the next step, test fields would only be distributed over parenchyma to estimate the volume fraction of alveolar septal tissue [V_{V}(sept/par)]. Now, the volume fraction of AE2 cells would be estimated on fields of view only from alveolar septa [V_{V}(AE2/sept)]. In the final step, only AE2 cells would be sampled to estimate the volume fraction of lamellar bodies in AE2 cells [V_{V}(lb/AE2)]. The total volume of lamellar bodies in the lung could then be calculated as V(lb,lung) = V(lung) * V_{V}(par/lung) * V_{V}(sept/par) * V_{V}(AE2/sept) * V_{V}(lb/AE2).

In animal experiments, there is usually no reason not to measure the total lung volume and use it as reference space. In humans, apart from necropsies, the material is naturally limited to biopsy specimens. In such cases, it is important to choose a different reference structure than lung volume. One such parameter may be the surface area of the epithelial basement membrane.

#### Particle size.

The way particles are sampled with the disector method to obtain their number also allows the use of this unbiased selection of particles for estimation of their mean size. The volume of each sampled particle is estimated by linear measurements from one unique point (e.g., the nucleolus of a cell) to the boundary of the particle (nucleator or rotator method, see Refs. 12, 41). Because the particles were selected by disectors (which “feel” number), this mean size parameter is termed number-weighted mean volume. Alternatively, particles can be selected on single sections by test points (which “feel” volume) for linear measurements from boundary to boundary passing through these test points (point-sampled intercepts method, see Ref. 14). The resulting parameter, termed volume-weighted mean volume, is less intuitive than the ordinary mean size because it puts more weight on bigger particles but can be very useful in addition to the number-weighted mean volume because it contains information on both mean particle size and size distribution. Actually, volume-weighted mean volume equals number-weighted mean volume amplified by the relative variance, thus reflecting size heterogeneity.

#### Accuracy and precision.

Due to the stochastic nature of the analysis, all results obtained by stereology are estimates. These estimates are characterized by their accuracy and their precision. Accuracy describes the absence of bias (systematic error) in the data and can be neither detected in the data nor decreased by increasing the sampling or measurement effort. It has to be avoided a priori by using unbiased methods throughout the study. An intriguing illustration for this was provided by a study of Mendis-Handagama (29). The Leydig cell number was estimated in control and atrophied rat testis using unbiased as well as biased (assumption-based) stereological methods. Whereas in control subjects all methods provided similar results, in the atrophy group the biased method delivered a significant overestimation of cell number simply because in this group the assumptions the model was based on did not meet the true biological situation. This means that it can never be assumed that the bias of a method influences the results in the same way in different experimental conditions. Biased methods may or may not give the correct results; the difference, however, cannot be seen in the data (26). In contrast to accuracy, precision can be adjusted as needed in the context of a particular study. This is done via the sampling effort determining the number of samples and the number of counts. In general, more counts increase the precision of the estimate. The degree to which precision is improved depends on the level of the sampling cascade in which the effort is spent. The higher levels of the sampling cascade (number of animals per group, number of tissue blocks per animal) contribute significantly more to the overall variation of the data than the lower levels of the sampling cascade (number of sections per tissue blocks, number of fields of view per section, number of counts per field of view), so this is where the effort should go into to be efficient (18).

#### How much counting is enough counting?

How much does one have to count? There is no general answer that works in all cases. This is actually determined by the specific conditions of each individual study. In general, the interindividual variation between the subjects of an experimental group can be observed in the data from the standard variation or the coefficient of variation (CV_{obs} with CV_{obs} = SD/mean). The latter receives contributions from two sources: first, the true biological variation between subjects, which is unknown (biological coefficient of variation, CV_{biol}), and second, the (im-) precision or the error of the method (coefficient of error, CE_{meth}). The relationship between these components is described by CV_{obs}^{2} = CV_{biol}^{2} + CE_{meth}^{2}. As a matter of course, we do not want to have the interindividual variation of our data dominated by the estimation procedure. For various stereological estimators, however, there is a formula that allows one to predict the coefficient of error and to calculate the CE_{meth} (5, 7, 15–17, 23). With only one unknown variable (CV_{biol}), it is therefore possible to see how large the contribution of the interindividual variation and the imprecision of the method to the total variation is. As a rule of thumb, we aim at a relationship between CE_{meth} and CV_{obs} of ∼0.2 < CE_{meth}^{2}/CV_{obs}^{2} < 0.5. Sufficient precision is usually achieved by 100–200 well-distributed counting events per individual for each parameter of interest. How does one distribute these counts efficiently? Use the “do more less well” principle (see Ref. 18), i.e., invest the effort into the higher levels of the sampling cascade. Do a pilot study in two animals to determine an appropriate sampling design (number of samples at each level). As a minimum, plan the study with five animals per group because then the probability that changes in one direction occur by chance in five individuals is smaller than 5%, thus making the experiment conclusive (8). If the biological variation between animals is very high for a particular parameter, the number of animals per group may have to be increased to detect differences between groups. A reasonable sampling design within one animal may be to analyze six tissue blocks per lung, one section from each tissue block, ten fields of view per section, and a mean of around three counts on structure of interest per field of view. This would lead to about 180 counts per animal. If alterations within one animal are very rare or very heterogeneously distributed, the number of tissue blocks per lung may have to be increased.

### Prerequisites for Proper Application of Stereology to the Lung

The lung structures that are measured should reflect the “real” in vivo dimensions as close as possible. However, a microscopic section of a fixed and embedded lung sample is by definition an artifact. What one needs is an understanding of how this artifact is produced and what can be done to minimize changes in tissue dimensions. Optimization and standardization of tissue fixation are essential and involve the composition of the fixative, the duration between fixation, estimation of reference volume and embedding, and, of course, the instillation/perfusions pressure (see Ref. 20 for details and references). During dehydration and embedding procedure, the lung may shrink, and, unfortunately, the degree of shrinkage may depend on the study group. If the ratios estimated from the shrunken tissue sections are referred to the unshrunken reference volume, this may introduce severe bias and misinterpretation (19). Therefore, the embedding protocols need to be tested for tissue shrinkage in each study group, and, if shrinkage occurs to a degree that severely affects the estimates, it should be corrected (9).

It is important to point out that all of the considerations above, i.e., lung fixation and embedding, reference volume estimation, sampling of location and orientation, etc., have to take place in advance. Once the samples are embedded or sectioned, there is no possibility to influence any of these important preceding steps using the existing sections. Thus, if bias has already been introduced here, there is no way to rescue the data, which makes any further efforts at the measurement level useless.

### Dissecting Lung Structure by Stereology

The planning of a stereological analysis should first start with a thorough qualitative analysis. This should give an impression of which structural components are affected and in what way they are affected. Also, it provides a first impression about the frequency, the distribution (relatively homogenous, or very heterogeneous?), and the potential restriction to a lung compartment. Figure 2 shows the hierarchy of lung compartments that can be used to define a cascade sampling approach tailored for a specific study.

At the end of the qualitative analysis, disease-specific target parameters should be defined as endpoints, e.g., hyperplasia of airway smooth muscle cells, hypertrophy of alveolar epithelial type II cells, loss of gas-exchange area, and angiogenesis of peribronchial blood vessels. In the next step, it should be defined how these changes can be expressed by simple quantitative parameters, such as cell number for hyperplasia, mean volume for cellular hypertrophy, surface area of gas-exchange area, and length and number of blood vessels for angiogenesis. This information defines the stereological strategy, the sampling technique (e.g., fractionator for efficient number estimation, isector/orientator/vertical sections for surface area and length), the sampling effort (more heterogeneous lesions require a larger sample of the organ to be analyzed), and, of course, the choice of the test system that has to be applied. Figure 3 demonstrates an algorithm for planning a stereological study. In the second part of this review, we have recommended such parameters for a number of important pulmonary diseases and illustrate them with real worked examples.

### Summary: The 10 Basic Rules of Lung Stereology

The basic points can be summarized in what might be termed the 10 basic rules of lung stereology:

#### 1. 3D structures require 3D analysis.

Planimetric measurements on microscopic sections are not meaningful for obtaining quantitative information about lung structure because information about the 3D structure is lost due to the sectioning process; this lost information is gained back by stereology.

#### 2. Plan ahead wisely.

Measurement of lung structure by microscopy does not start at the level of microscopic images but when the experimental design is planned.

#### 3. Quality of the material determines quality of the data.

The way the lung is prepared for microscopy critically influences the final dimensions that can be measured by stereology.

#### 4. “Never ever not measure the reference space” (Hans Jørgen Gundersen).

A biologically meaningful reference parameter (usually total lung volume) must be defined as the starting point for analysis and the end point for reporting of data.

#### 5. Sampling, sampling, sampling.

Whatever is measured at the microscope can only be representative if the samples that are analyzed are representative of the whole lung in a statistical sense.

#### 6. Simple parameters first.

First start with parameters that only require single-section analysis (volume, surface area, length); then, if necessary, move on to parameters requiring disector-based analysis (particle number and particle size).

#### 7. Measure lung tissue, not air.

Use parameters related to lung tissue compartments (e.g., total volume of alveolar septal tissue or total surface area of alveolar epithelium) instead of “airspace size” to characterize lung architecture.

#### 8. Be as accurate as possible and as precise as necessary.

Accuracy (unbiasedness or validity) is more important than precision (reliability) because inaccuracy leads to uncontrollable systematic error, whereas imprecision is easily detectable and controllable. Precision can be easily improved by doing more samples, but that does not help to remove bias.

#### 9. “Do more less well” (Ewald Weibel).

Precision is adjusted via the sampling effort that is spent most effectively at the higher levels of the sampling cascade, whereas extensive analysis at the level of fields of view only leads to pseudo-precision.

#### 10. Don't be afraid of stereology.

Stereology is built on solid mathematical ground. The algorithms are simple, the workflow is transparent, and resources for further information are easily accessible.

## GRANTS

This work was supported by the German Research Federation (DFG SFB 587/TP B18, OC 23/9-3, OC 23/10-1, MU 3118/1-3, and MU 3118/2-1; Cluster of Excellence REBIRTH) as well as the German Ministry for Education and Research (BMBF) via the German Center for Lung Research (DZL).

## DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the authors.

## AUTHOR CONTRIBUTIONS

Author contributions: M.O. and C.M. prepared figures; M.O. and C.M. drafted manuscript; M.O. and C.M. edited and revised manuscript; M.O. and C.M. approved final version of manuscript.

- Copyright © 2013 the American Physiological Society