Industry criticisms of Swan et al. 2005 (Decrease in anogenital distance among male infants with prenatal phthalate exposure, EHP May 27, 2005) and my responses
Shanna H. Swan, Ph.D.
On May 27th our paper, Decrease in anogenital distance among male infants with prenatal phthalate exposure was published in Environmental Health Perspectives On-Line. On the same day, a critique, written by Rebecca Goldin, with the headline, Media Claims Phthalates (Might) Cause Genital Defects, appeared on a website affiliated with George Mason University called Stats.org. Industry representatives, including the American Chemistry Council and the Phthalate Esters Panel, have cited this critique. Here I first address specifics raised by the ACC and then I address Goldin’s arguments point by point. None of the points raised in these documents offer significant challenges to the scientific validity of our study.
Summary of our paper:
Here is what we state in our paper:
- Anogenital distance (AGD), a sensitive indicator of masculinization in rodents, was inversely associated in baby boys with concentration of four commonly used phthalates –chemicals used widely in such products as cosmetics, fragrances, medical tubing-- in their mothers’ prenatal urine. The associations were statistically significant for mono-ethyl phthalate [MEP], mono-n-butyl phthalate [MBP], mono-benzyl phthalate [MBzP], and mono-isobutyl phthalate [MiBP].
- For example, if the mother’s MBP was high (e.g. in the upper quartile), the odds that her boy had a shorter than expected AGD (for his age and weight) was increased 10-fold compared to mothers whose MBP was low (e.g. in the lower quartile), and this was highly significant.
- The strongest associations were seen when the mother was highly exposed to several of these four phthalate metabolites (mono-ethyl phthalate [MEP], mono-n-butyl phthalate [MBP], mono-benzyl phthalate [MBzP], and mono-isobutyl phthalate [MiBP]).For example, almost all (9 of 10) boys with highest combined phthalate exposure score had a short AGD (AGD below the 25th percentile for age and weight).
- Boys with short AGD had a higher proportion of testicles that were less likely to be completely descended than boys with longer AGD, and AGD was significantly correlated with the size (volume) of the penis.
- The pattern of genital changes seen in these baby boys is consistent with the “phthalate syndrome” previously identified in rodents exposed in utero to some phthalates.
- These changes are seen at phthalate levels below those measured in one-quarter of the female population of the United States, based on a nation-wide sample.
Industry critiques and responses
Industry criticism in plain text
Swan's comments in bold italic
I. Interim Statement from the Phthalate Esters Panel (American Chemistry Council), June 1, 2005)
These organizations, which represent corporations that manufacture phthalates or use them in plastics and other products, have attempted to cast doubt on our work: “A recently published study has been widely reported to claim a statistical correlation between exposure to some phthalates and structural changes in the genitalia of male infants. The report itself did not actually make that claim - its only claim is a correlation between phthalate exposure and a small anatomical change of unknown significance. The study will be thoroughly analyzed by the Phthalate Esters Panel's Toxicology Research Task Group, and a report will be posted.”
We did not claim a “correlation” (which is only appropriate to assess the linear relationship between two variables). The outcomes with which we found significant associations were, in fact, “structural changes of the genitalia of male infants.” At least one affected endpoint, incomplete testicular descent, is known to have adverse health consequences in both humans and animals. Shortened AGD may be “a change of unknown significance” in humans -- the significance of this will become apparent as the children age -- but rodent data would suggest that these alterations are markers of other changes of considerable significance.
“Initial analyses indicate that the study has many weaknesses, suggesting that the study may not stand up under rigid scientific scrutiny.”
The study underwent rigorous peer review by experts in the environmental health sciences.
“The authors themselves stated that the results need to be validated.”
Any scientific study needs to be validated .
“They concede that the clinical relevance of the key index used in the study, called anogenital index (AGI), is not known. And they state clearly that none of the boys in the study had any genital defects or malformations.”
We stated that; “No frank genital malformations or disease were detected and no parameters appeared grossly abnormal”. However, it is well established in rodents that reduced AGI is a sensitive marker for other endpoints of demasculinization that may appear later. Our study therefore provides grounds for concern and further study.
“A key question concerns the way the data in the study were analyzed. The statistical methods that are the basis for the paper's thesis have already been challenged. In a paper from the Statistical Assessment Service at George Mason University, mathematician Dr. Rebecca Goldin challenges whether statistically significant correlations between phthalates were indeed found by the authors, or how much "data fiddling was required to find a result." Dr. Goldin raises many questions about the study that need to be answered, and concludes that when appropriate statistical standards are applied, ‘not one phthalate passed the test of a statistically significant correlation.”
The arguments of Dr. Goldin, which were not published in a scientific journal, do not, as noted below, provide any reason to question the conclusions we published.
“Questions have also been raised about the study because it disagrees so dramatically with many studies developed over decades. First, the phthalate levels found in the mothers of the measured babies were thousands of times lower than the levels that caused similar changes in rodents - and most research indicates that primates are less sensitive than rodents to phthalates, not more sensitive. For example, recent laboratory research (Kessler et al.) shows that when fed equivalent doses of one phthalate, rats and marmosets (which are primates as are humans) take up the phthalate much differently. The marmosets took up between 7.5 and 16 times less phthalate than the rats, suggesting that the chemical would be expected to have less, not more impact on humans than on rodents. In addition, the report turns some of the scientifically validated information previously gathered about phthalates on its head.”
There are many examples of increased sensitivity in humans relative to rodents. For example, a recent study by Fraser et el. on genistein and the ability of sperm to fertilize an egg found humans to be 10-100 times more sensitive than rodents.
“One phthalate that shows few effects in rats and is generally considered very benign is said to show a correlation with AGI in this study. But the phthalate most widely known for its effects on rodents showed no correlation with AGI in this study.”
It is not the case that metabolites of DEHP – “the phthalate most widely known for its effects on rodents” -- showed no association with study endpoints. The strength of the associations for MEOHP and MEHHP were of similar magnitude to those seen for MBP and MBzP, although they did not achieve statistical significance. The DEHP metabolite that showed no association in our study (MEHP) also shows no reproductive toxicity in rodent studies .
Finally, none of the studies purporting to show a human health effect from phthalate exposure has been validated.
The Harvard group (Hauser, Duty, et al.) now has published four studies showing health impacts of phthalates on reproductive health in human males .
"The harder the look this paper gets, the more questions it raises," said Marian Stanley, manager of the Phthalate Esters Panel. "It has also been considerably overinterpreted in the media, thanks apparently to comments made in the teleconference promoting the paper by other than the paper's authors. In all, publication of a preliminary, small study of data whose significance is not known, in a journal that is not independently peer-reviewed, needs to be treated with extreme caution until it can be properly evaluated."
Environmental Health Perspectives is independently peer-reviewed.
II. Critique by Dr. Rebecca Goldin
The comments of Professor Goldin, a mathematician, reflect misconceptions about both the biology and epidemiological/ statistical methods used in this kind of study.
Summary of Goldin arguments, with my responses
- “In fact, the study did not examine the impact of phthalates on genitals.”
The endpoints we examined (anogenital distance, testicular descent, penile volume and scrotal size and description) are all genital .
- “Notably, the study did not claim there was a correlation between the level of metabolites and penile length or volume, or the size of the scrotum, despite having measured them. Perhaps they found no correlation to report.”
Anogenital distance (AGD) — or the anogenital index (AGI), which is AGD adjusted for weight— was selected to be the primary endpoint in this analysis. This decision was made for two reasons; AGD is the endpoint most strongly related to phthalates in toxicological studies; it has been recognized as the outcome most sensitive to anti-androgen action. It was also the measurement that our study pediatricians felt to be most reliable; it had the smallest coefficient of variation among the genital measurements. We subsequently examined correlations between AGI and both penile size and testicular descent. Shorter AGI was found to be associated with reduced penile size and testicular descent, and higher phthalate levels were found to be associated with shorter AGI.
- “The article claims there were 134 baby boys in the study, when in fact only 85 of the mothers had contributed urine.”
We state in our manuscript that we obtained genital measurements on 134 boys (all included in the analyses that generated expected values for AGD), but only had maternal urine samples on 85 boys, a consequence of adding urine collection to the prenatal protocol midway through the study. We state that the study’s statistical analyses and the conclusions drawn from them are based on the sample of 85.
- “In fact, they tested eight phthalates and found no correlation for four of them.”
In fact, we analyzed urine for nine phthalate monoesters that are the metabolites of seven phthalates. These monoesters were selected because they are included in the standard panel of phthalate metabolites that can be tested at the Centers for Disease Control and Prevention. Four of these metabolites were significantly associated with AGI. Three others are metabolites of a single phthalate (DEHP); two of these were inversely associated with AGI, a relationship of borderline statistical significance, the third DEHP metabolite was found at comparatively low levels. The remaining two monoesters (MCPP and MMP) were not associated with AGI in our study; these are metabolites of phthalates that are not considered reproductive toxicants in experimental animals. In summary, we tested metabolites of seven phthalates and found inverse associations for metabolites of five; four of these were statistically significant.
- “The authors of the study draw parallels with studies on rats that are not necessarily reflective of the human response to phthalates. According to Swan and coauthors, “The associations between male genital development and phthalate exposure seen here are consistent with the phthalate-related syndrome of incomplete virilization that has been reported in prenatally exposed rodents.” In other words, since rats exposed to phthalates before birth have fertility problems as well as lower AGI, human boys with lower AGI will also have fertility problems. This reasoning is faulty. First of all, the studies on rats involve high doses of phthalates, many times higher than the human exposure rate. It is entirely plausible that a low dose of phthalates has an effect on AGI and not on fertility, while a high dose has an effect on both AGI and fertility. Without a measured effect on human male fertility, the study cannot make any conclusions at all about phthalates and fertility.”
We did not say that boys would have fertility problems. The findings in these boys’ genital measurements we report are consistent with the phthalate syndrome. Moreover, a major conclusion from the study is that rodent studies are relevant to several human end-points. The next question should be, what other end-points respond similarly to prenatal phthalate exposure in rodents and humans?
- “The most blaring problem with the study is its limited scope. While 85 boys can show a trend, these children were not nationally representative; they were drawn from three localities only.”
We did not claim that these boys are nationally representative. We hope to examine this question in a range of study populations in the future.
- “Why does having a homogeneous sample matter? Suppose that there were a genetic disposition among a certain group of people toward a low AGI. Suppose also that this same group tends to live in areas with higher pollution and higher concentrations of phthalates in their environment. Then we might measure higher than usual phthalate concentrations, and lower than usual AGI, yet the AGI result may be entirely a result of genetics.”
Unmeasured confounding is always a possibility that can never be completely eliminated from epidemiological studies because of their non-random nature. Therefore scientists seek to examine effects of exposures in both humans and animals studies. It is also why it is particularly convincing when we see concordance between human and animal studies, as is the case for prenatal phthalate exposure and genital changes, since the latter are randomized so that unmeasured confounding cannot explain results.
- “You can’t tell whether the correlation found by the authors is a cause and effect, or just a coincidence.”
It is highly unlikely that these results are “a coincidence” as shown by the high level of statistical significance of the results.
- “In this case, the magic p-value should have been p=.05/8=.006.”
This criticism suggests using a technique called a “Bonferonni correction” to adjust for the possibility of an increased error rate when more than one hypothesis is tested. Such a “correction”, however, assumes independence of comparisons – an assumption not met here, because of the high degree of correlation among metabolites. Further, the Bonferonni correction has been frequently criticized because it is too conservative, but more fundamentally, because such adjustments are statistically suspect (See Rothman, Epidemiology, 1990, 1: 43-46). In addition, we observed four metabolites significantly associated, not one; the probability that four metabolites would be associated with AGI by chance alone is vanishingly small.
- “Another disturbing aspect of the study is the extent to which certain measurements were chosen over others. When the authors used anoscrotal distance instead of anogenital distance, they found only one of the eight phthalates had a statistically significant correlation. “
As we stated in our paper, all choices of models, indexes and cut-points were made prior to examining phthalate metabolite concentrations. We consulted with toxicologists working in this area and were informed that AGD was the most analogous measure to that used in rodents. It is also the measurement that is most reliable .
- “The babies’ penile volume was measured and no mention of correlation of volume to metabolite levels is mentioned. While the authors contend that small AGI was associated to low penile volume, they did not associate it with high metabolite levels.”
As stated in the paper, “AGD was significantly associated with penile volume; R 2 = 0.24 (p = 0.006) and penile volume divided by weight was correlated with AGI (R 2 = 0.41, p = 0.001)”. We did not examine penile volume in relation to metabolite levels separately. For this paper AGD/AGI was our primary study endpoint.
- “The study gives the overall impression that the authors looked hard to find something in the male genital region that correlated with high metabolite levels (of at least some of the phthalates) in order to make a preconceived point.”
Both the androgen-sensitive endpoints and study exposures were selected in advance precisely to evaluate the associations seen in prior animal studies in human infants. The only exception was a few phthalate metabolites that were included by the testing laboratory as part of a multi-metabolite screen. We reported these for completeness, because they were run, although we had not requested them.
None of the arguments put forth by Dr. Goldin, or the Phthalate Esters Panel weaken the conclusions of the study, namely that: “These changes in male infants, associated with prenatal exposure to some of the same phthalate metabolites that cause similar alterations in male rodents, suggest that commonly used phthalates may undervirilize humans as well as rodents. This conclusion has been supported by numerous scientists who are currently working in this research area (which Dr. Goldin is not). For example, Professor Richard Sharpe said of this study, "It is the first paper that bridges the gap between animal research and human exposures."