Misleading studies are being taken as gospel
Do people with severe depression have a right to accurate information about
antidepressants? I suspect
most people would answer “yes”. There is a general understanding that
individuals who suffer from medical conditions are in a vulnerable position,
making them susceptible to misinformation. There is also increased awareness of
the influence that the profit motive can have on how medical research is
funded, undertaken and communicated to the public.
But for some reason, this basic
principle doesn’t seem to apply to the hyper-politicised subject of gender
medicine. On one side, Republican states are
attempting to ban youth gender medicine — and, in some cases, to dial back access to adult gender medicine. On the other,
liberals maintain that there is solid evidence for these treatments, and that
only an ignorant person could suggest otherwise.
Whether or not you agree with the
GOP’s stance (I do not), the latter view is simply false. The trajectory of
youth gender medicine in nations with nationalised healthcare systems has been
relatively straightforward: these countries keep conducting careful reviews of
the evidence for puberty blockers and hormones, and they keep finding that
there is very little such evidence to speak of. That was the conclusion
in Sweden, Finland, the
UK, and, most recently, Norway. As a recent headline in The Economist had
it: “The evidence to support medicalised gender transitions in adolescents is
worryingly weak.”
Yet despite this evidentiary crisis in Europe, and despite multiple scandals vividly
demonstrating the downside of administering these treatments in a careless way,
liberal institutions in the US have only become more enthusiastic about them.
In recent years, everyone from Jon Stewart and John Oliver to reporters and
pundits at the New York Times, The Washington Post and NPR have exaggerated
the evidence for these interventions.
The logic seems to be that if activists, doctors and journalists repeat
“The evidence is great!” enough times, regardless of whether the evidence
actually is great, the controversy will go away — as though
the state of Arkansas could be shamed into reversing its policy on trans youth
because Jon Stewart made fun of them. Meanwhile, as I can tell you from
experience, if you openly question these treatments or highlight just how
little we know about them, you’re going to have a bad time.
But look a little closer, and it swiftly becomes clear that the evidence
for both adult and youth gender medicine is frequently drawn from alarmingly
low-quality studies. Almost invariably, when you examine the latest study to go
viral, there’s much less there than meets the eye — whether because of serious
overhyping and questionable statistical choices on the part of the researchers, outright missing data, flawed survey instruments, more missing data, or just generally beyond-broken methods.
Since any individual study or group of studies can suffer from these
issues, serious researchers know that you can’t just take a few that point in
the right direction and herald them as evidence. Rather, you need to sum up the
available evidence while also accounting for its quality. This is what European
countries have done, and they have all come to roughly the same conclusion: the
evidence supporting these treatments isn’t there.
But even at the level of sweeping summaries, America’s conclusions are
often distorted. A prime example came in a recent New York Times column by Marci
Bowers, a leading gender surgeon and the president of the World
Professional Association for Transgender Health (WPATH). Bowers
paints a very rosy picture of the evidence base:
“Decades of medical experience and research since has found that when
patients are treated for gender dysphoria, their self-esteem grows and their
stress, anxiety, substance use and suicidality decrease. In 2018, Cornell
University’s Center for the Study of Inequality released a comprehensive literature review finding that gender transition, including hormones and
surgery, ‘improves the well-being of transgender people’. Nathaniel Frank, the
project’s director, said that ‘a consensus like this is rare in social
science’.
“The Cornell review also found that regret… became even less common as surgical quality and social
support improved. All procedures in medicine and surgery inspire some
percentage of regret. But a study
published in 2021 found that fewer than 1% of those who have received
gender-affirming surgery say they regret their decision to do so… A separate
analysis of a survey of more than 27,000 transgender and
gender-diverse adults found that the vast majority of those who detransition
from medical affirming treatment said they did so because of external factors
(such as family pressure, financial reasons or a loss of access to care), not
because they had been misdiagnosed or their gender identities had changed.”
Here we have a leading expert (Bowers) citing a leading institution
(Cornell) and relating astonishing claims (what medical procedure has a 1%
regret rate?). The case appears to be closed — until you actually click the
links and read Bowers’s sources. (Bowers and WPATH did not return emailed
interview requests.)
Let’s start with Cornell’s data. According to a summary at its “What We Know Project“:
“We conducted a systematic literature review of all peer-reviewed
articles published in English between 1991 and June 2017 that assess the effect
of gender transition on transgender well-being. We identified 55 studies that
consist of primary research on this topic, of which 51 (93%) found that gender
transition improves the overall well-being of transgender people, while 4 (7%)
report mixed or null findings. We found no studies concluding that gender
transition causes overall harm.”
If you are familiar with systematic literature reviews, you will find
the above unusual. Researchers don’t generally ask whether a procedure works or
not in such a vague a manner, then tally up the results. To usefully gauge the
level of evidence, a review has to carefully define its research questions, and
factor in the potential biases of the existing studies. The Cornell project
does none of this.
I emailed Gordon Guyatt, one of the godfathers of the so-called evidence-based
medicine movement, to ask him whether he thought the Cornell project
qualified as a systematic literature review. His response was: “It meets
criteria for a profoundly flawed systematic review!” When we later spoke, he
explained why he didn’t trust it. “Presumably, they are trying to make a causal
connection between what the patients received and their outcomes,” he said.
“That is not possible unless one has a comparator.” In other words, if you’re
only tracking people who received a treatment, and don’t compare their outcomes
to another group not receiving the treatment, you simply can’t
learn that much. Guyatt offers the example of someone taking hormones and
saying afterwards that they feel better. “That does not mean that the
hormones have anything to do with your feeling good.”
This is a very basic, very well-understood problem in both medical and
social-scientific research. If all you have is before-and-after measurements of
how someone who received a treatment changed over time, there
are all sorts of potential confounds, from the placebo effect to regression
towards the mean to the possibility that receiving the treatment
coincided with some other salutary intervention, such as therapy, that wasn’t
accounted for.
Because the Cornell team made no effort to even evaluate the risk of
bias in the individual studies it evaluated, the final product tells us very
little. It’s roughly analogous to coming upon a pile of coins and trying to
determine its worth simply by counting how many coins there are, rather than
sorting the pile by denomination. When I raised this with Nathaniel Frank, the
head of the Cornell project, he said via email that “we don’t publish
traditional systematic reviews”, but rather web summaries of important research
questions. So the first words of its overview might confuse readers: “We
conducted a systematic literature review.”
If Bowers had wanted to cite a carefully conducted, peer-reviewed
systematic review of the gender medicine literature, she actually had one at
her fingertips: her own organisation, WPATH, funded one a few years ago. The
results, published in the Journal
of the Endocrine Society in 2021, revealed that there is almost no
high-quality evidence in this field of medicine. After they summarised every
study they could find that met certain quality criteria, and applied Cochrane
guidelines to evaluate their quality, the authors could find
only low-strength evidence to support the idea that hormones
improve quality of life, depression, and anxiety for trans people. Low
means, here,
that the authors “have limited confidence that the estimate of effect lies
close to the true effect for this outcome. The body of evidence has major or
numerous deficiencies (or both).” Meanwhile, there wasn’t enough evidence to
render any verdict on the quality of the evidence supporting
the idea that hormones reduce the risk of death by suicide, which
is an exceptionally common claim.
Oddly, though, the authors of this systematic review conclude by writing
that the benefits of these treatments “make hormone therapy an essential
component of care that promotes the health and well-being of transgender
people”. That claim completely clashes with their substantive findings about
the quality of the evidence. So, when Bowers cited the Cornell project, she was
citing a review that is of very limited evidentiary value — while also ignoring
a much more professionally conducted, and much more pessimistic, though
strangely concluded, review that her own organisation paid for.
But what about the study which,
she claims, “found that fewer than 1% of those who have received
gender-affirming surgery say they regret their decision to do so”? Here’s where
things get downright weird.
The study in question, published in 2021 in the journal Plastic
and Reconstructive Surgery Global Open, has dozens of
errors that its nine authors and editors have refused to correct. Indeed, it
appears to have been executed and published to such an unprofessional standard
that one might ask why it hasn’t been retracted entirely.
Before we get into all that, though, it’s worth pointing out that even
if it had been competently conducted, the review could not have provided
us with a reliable estimate of the regret rate following gender-affirming
surgery: the studies it meta-analyses are just too weak. Many of those included
did not actually contact people who had undergone surgery to ask them
if they regretted it; rather, the authors searched medical records for mentions
of regret and/or for other evidence of surgical reversals. Yet this method
is inevitably going to underestimate the number of regretters, because plenty
of people regret a procedure without going through the trouble of either
reversing it or informing the doctor who performed it. In one study of
detransitioners — albeit one focusing on a fairly small and non-random
online sample — three quarters of them said they did not inform their
clinicians that they had detransitioned.
The studies included in this review also failed to follow up with a very
large number of patients. The meta-analysis had a total sample size of about
5,600; the largest study, with a sample size of 2,627 — so a little
under half the entire sample — had a loss-to-follow-up
rate of 36%. If you’re losing track of a third of
your patients, you obviously don’t really know how they’re doing and
can’t make any strong claims about their regret rates. And yet, the authors
don’t mention the loss-to-follow-up issue anywhere in their paper. No
version of this meta-analysis, then, was likely to provide a reliable estimate
of the regret rate for gender-affirming surgery.
Even so, the version that was published was particularly disastrous.
Independent researcher J.L. Cederblom summed it up: “What are these numbers? These are all wrong…
And these weren’t even simple one-off errors — instead different tables
disagreed with each other. The metaphor that comes to mind is drunk driving.”
To take one example, the authors initially reported that the
aforementioned largest paper in their meta-analysis had a sample size of 4,863.
But they misread it — the true figure was actually only 2,627. They also
misstated other aspects of that report, such as how regret was investigated
(they said it was via questionnaire but it was via medical records search) and
the age of the sample (they said it included some juveniles, but it did not).
Not all the errors were significant, but they were remarkably numerous.
And because of the abundance of issues, the paper attracted the attention of
other researchers. “In light of these numerous issues affecting study quality
and data analysis, [the authors’] conclusion that ‘our study has shown a very
low percentage of regret in TGNB population after GAS’ is, in our opinion,
unsupported and potentially inaccurate,” wrote two critics, Pablo
ExpĆ³sito-Campos and Roberto D’Angelo, in a letter to the editor that the
journal subsequently published. In her own letter, the researcher
Susan Bewley highlighted what
appears to be an absence of vital information about the authors’ method of
putting together the meta-analysis.
The authors and the editors decided to simply not correct any of this.
They did publish an erratum, in which they republished seven tables that
still contained errors, while maintaining that all those errors had no impact
on the paper’s takeaway findings. But the paper itself remains published, in
its original form, complete with those 2,200 ghost-patients inflating the
sample size.
Bewley and Cederblom have continued to ask the journal to reveal the
process that led to the paper getting published, and to address why so many of
the errors remain uncorrected. In an email in January to Bewley, Aaron
Weinstein, its editorial director, claimed that because critical letters to the
editor had been published, and because the corrected data was reanalysed by a
statistical expert, “the Publisher and the ASPS [American Society of Plastic
Surgeons] feel that PRS Global Open has done due diligence on this article and
this case is closed”. He also claimed, curiously, that he had no power to force
the authors to address the many serious remaining questions raised by the
paper’s critics, saying “there is no precedent for an editorial office to do
so”. Neither Weinstein nor the paper’s corresponding author, Oscar
Manrique, responded to my emailed requests for comments.
Finally, there is Bowers’s claim that “a separate analysis of a
survey of more than 27,000 transgender and gender-diverse adults found
that the vast majority of those who detransition from medical affirming
treatment said they did so because of external factors”. This is technically
true, but is also rather misleading because the survey in question — the 2015
United States Transgender Survey (which has profound sampling issues) — was of currently
transgender people. It says so in the first sentence of the executive summary. Research
based on this survey obviously can’t provide us with any reliable information
about why people detransition, because it is not a survey of
detransitioners. If you want to know how often people detransition, you
need to follow large groups of trans people over time and check in to see if
they still identify that way later on — and we don’t have high-quality
research on that front.
It’s also worth bearing in mind that the vast majority of studies being
discussed here concern adults, while the legislative discussion mostly centres
on adolescents. The most recent version of WPATH’s Standards of Care is very open about the lack
of evidence when it comes to the latter: “Despite the slowly growing body of
evidence supporting the effectiveness of early medical intervention, the number
of studies is still low, and there are few outcome studies that follow youth
into adulthood. Therefore, a systematic review regarding outcomes of treatment
in adolescents is not possible.” Again, WPATH is Bowers’s own organisation —
surely she is familiar with its output?
Despite the backbreaking errors of that nine-authored paper, the severe
limitations of the Cornell review, and the near-utter-irrelevance of the United
States Transgender Survey, all three are chronically trotted out as evidence
that we know transgender medicine is profoundly helpful, or that detransition
or regret are rare — or both. It’s frustrating enough that these lacklustre
arguments are constantly made on social media, where all too many people get
their scientific information. But what’s worse is that many journalists have
perpetuated this sad state of affairs. A cursory Google search will reveal that
these three works have been treated as solid evidence by the Associated Press, Slate, Slate again, The Daily Beast, Scientific American and other outlets. The NYT,
meanwhile, further publicised Cornell’s half-baked systematic review by
giving Nathaniel Frank a whole column to
tout its misleading findings back in 2018.
Why does such low-quality work slip through? The answer is
straightforward: because it appears, if you don’t read it too
closely, or if you are unfamiliar with the basic concepts of evidence-based
medicine, to support the liberal view that these treatments are wonderful and
shouldn’t be questioned, let alone banned. That’s enough for most people, who
are less concerned with whether what they are sharing is accurate than whether
it can help with ongoing, high-stakes political fights.
But you’re not being a good ally to trans people if you disseminate
shoddy evidence about medicine they might seek. Whatever happens in the red
states seeking to ban these treatments, transgender people need to make
difficult healthcare choices, many of which can be ruinously
expensive. And yet, if you call for the same standards to be applied to
gender medicine that are applied to antidepressants, you’ll likely be told you
don’t care about trans people.
As Gordon Guyatt, who has done an enormous amount to increase the
evidentiary standards of the medical establishment, told me: “You’re doing harm
to transgender people if you don’t question the evidence. I
believe that people making any health decisions should know about what the best
evidence is, and what the quality of evidence is. So by pretending things are
not the way they are — I don’t see how you’re not harming people.”
The
media is spreading bad trans science - UnHerd