Little is known about the long-term benefits of medications used for sleep in typical practice.
We compared reductions in sleep difficulties across a large cohort of women reporting sleep difficulties who did and did not start prescription medications used for sleep.
Some of these medications may not have been prescribed for sleep difficulties and some medications were likely used intermittently.
Sleep disturbances are common, and an estimated 9 million adults in the USA report prescription medication use for this indication.1 The frequency of sleep medication use has increased since the 1990s and first decade of the 2000s.2 3 Sleep disorders are associated with many important chronic conditions, including diabetes, hypertension, pain and depression.4 Due to the prevalence of sleep disturbances and their interplay with important comorbidities, many pharmacological treatment options have been developed for sleep.
Prescription sleep medications consist of benzodiazepines (BZDs), Z-drugs (selective BZD receptor agonists that include zolpidem, zaleplon and eszopiclone) and other agents mostly used off-label to promote sleep through a variety of other mechanisms. Randomised controlled trials (RCTs) demonstrate the short-term sleep benefits of many agents in these categories, with typical trials for these agents lasting only 12–24 weeks and often including fewer than 100 patients.5 6 One 8-month study of zolpidem found improved polysomnographic sleep parameters and subject assessments on two nights in month 8.7 While sleep medications are recommended for short courses,8 sleep disturbances may be chronic and many patients use these agents for long periods, sometimes intermittently and other times nightly.9 Thus, data from typical practice would be useful for patients and clinicians if it included sleep medications used over several months in populations of patients with sleep disturbances; we found no such studies in the literature.
There has been increased interest in using non-randomised designs to test the benefits of drugs.10 We assessed the potential benefits of sleep medications among a large and diverse cohort of midlife women not reporting prevalent sleep medication use at baseline who self-reported sleep disturbances during observation in a longitudinal cohort. Women who subsequently started sleep medications were matched on a propensity score with women who did not and followed for 1–2 years with annual assessment of sleep disturbances.
The design of this study was based on the ‘target trial emulation’ concept as proposed by Hernán and Robins.11 In this study paradigm, a target RCT is designed and then an observational study is constructed to emulate the target trial. We specified all relevant aspects of the target trial and the observational corollary as noted in online supplemental table 1. The observational study focused on new users of sleep medications, never previously reporting sleep medication use during the period of observation and primarily used an intention-to-treat design to most closely emulate the target trial. Furthermore, we described the study design using standardised illustrations as suggested by Schneeweiss et al (see online supplemental figure 1).12
All potentially eligible women were drawn from the Study of Women’s Health Across the Nation (SWAN). SWAN is an ongoing multicentre, multiethnic/multiracial longitudinal study examining the biological and psychosocial changes that occur during the menopausal transition. Between 1995 and 1997, a screening survey assessed the eligibility of women at each of seven participating sites; sampling used either community-based or population-based frames.13 Major cohort entry criteria included: age 42–52 years; intact uterus and at least one ovary, not using sex steroid hormones or pregnant, breast feeding or lactating at enrolment or within the previous 3 months; at least one menstrual period in the 3 months prior to screening and self-identified as either white, African-American, Hispanic, Chinese or Japanese. Each site recruited at least 450 eligible women, including white women and a minority group sample, into the cohort in 1995–1997, resulting in an inception cohort of 3302 women.14 15 For the current analyses, we used follow-up data through 2016.
Since we were interested in the long-term effects of prescription sleep medications on sleep disturbances, we required all women to have reported during SWAN follow-up a sleep disturbance on at least three nights per week during a 2-week interval. On almost all annual visits, women were asked to self-report on three aspects of sleep: difficulty initiating, frequent awakening and early morning awakening. If women reported any of these disturbances at least once, they were eligible for the study cohort. We also required women to have sleep data at the visit after first reporting a sleep disturbance; some visits did not include the brief sleep inventory and thus follow-up information would be missing. Finally, we excluded women who reported use of prescription sleep medications at the baseline visit in SWAN, to eliminate prevalent users of these drugs.
There was no patient or public involvement in this research. Participants in SWAN receive updates on the conduct and results of the study. Data from SWAN are available for qualified researchers. All participants gave written informed consent to use their data for these analyses. The current analyses were funded by the US National Institutes of Health. All participants gave written informed consent after being educated about the nature of the study, potential risks and how their data may be used.
Many different medications are used for sleep. We focused on several groups of medications: BZDs, selective BZD receptor agonists and other hypnotics. The full list of medications considered included the following BZDs: estazolam, flurazepam, lorazepam, temazepam and triazolam;, selective BZD receptor agonists: zaleplon, zolpidem and eszopiclone and agents with other mechanisms: doxepin (a tertiary amine tricyclic), mirtazapine (noradrenergic and specific serotonergic), ramelteon (selective melatonin receptor agonist) and trazodone (serotonin antagonist and reuptake inhibitor). The primary analyses grouped all sleep medications together. In secondary analyses, groups of medications were considered separately. Lorazepam users (n=65) and their matched non-users (n=125) were dropped in a secondary analysis because it is used for many indications.
The drug information is collected at each study visit by asking women to bring in their medication bottles or a pharmacy generated list of medications that they have used in the last month. Interviewers record the medications used, which are coded using the Iowa Drug Information Service system.16 Women were not prompted specifically about sleep medications. Dosages and drug frequency were not reliably recorded and were not used for these analyses. Furthermore, over-the-counter medication use information was considered incomplete and not included in these analyses. Non-users were never users. They entered the study (index date) at visits matched in frequency distribution with the sleep medication user.
As noted, we only included new use of sleep medications. The first visit with a mention of a sleep medication was considered the index visit. Since there are no between visit medication updates, we considered women who reported starting a sleep medication as users until their next annual SWAN visit. This design mimics an intention-to-treat analysis.
Three domains of sleep disturbances were self-reported at all annual SWAN visits. Women were asked to pick the answer that best describes their difficulty initiating sleep, remaining asleep and early morning awakenings during the previous 2 weeks. They used a 5-point Likert scale to report on each type of disturbance, where 1=no difficulties on any nights, 2=difficulties on less than one night per week, 3=one to two nights per week, 4=three to four nights per week and 5=five to seven nights per week.17–19 We considered the results at 1 year to be the primary outcome and 2 years to be the secondary outcome. For the 2-year outcome, only women who had both year 1 and year 2 results were analysed.
SWAN collects a broad range of variables at cohort entry and at each subsequent annual visit. We considered a wide range of potential covariates including demographics, comorbidities, menopausal status, body mass index (BMI), tobacco use and alcohol use. The variables unlikely to change over time (race/ethnicity and educational attainment) were collected at cohort entry and others were collected at the visit prior to the index visit. Variables were not updated after the index date. Depression was measured with the Center for Epidemiologic Studies Depression Scale,20 anxiety with the General Anxiety Disorder-721 and the 36-Item Short Form Survey (SF-36) scales were used to measure pain, mental function and physical function.21
After assembling the analytic cohort, covariates were defined and compared across women who initiated a sleep medication and those who did not. To improve the baseline balance in characteristics, we estimated a propensity score using a logistic regression model.22 A propensity score estimates the likelihood that women would start a sleep medication, with values ranging from 0 to 1. All covariates shown in table 1 were included in the propensity score model. We then matched women who started a medication for sleep with women who did not based on their propensity score.23 We attempted to match two non-users for each user using a ‘greedy matching’ algorithm, with a maximum calliper of 0.2 of an SD of the logit of the propensity score.24
After matching, we examined baseline characteristics for balance using standardised mean differences (see table 1). With evidence of good balance across measured baseline characteristics, we next examined sleep disturbances at baseline and found these to be well balanced. We then examined sleep disturbance reports at 1 and 2 years, estimating means and SD, and the changes in sleep disturbance from baseline to 1 year and 1 year to 2 years. These changes were estimated and compared across medication exposure groups, using a mixed regression model. No adjustments were made, as the baseline characteristics were well balanced as noted in table 1.
Secondary analyses compared the distribution of scores on the Likert scale across medication exposures, specifically assessing for the per cent of women who reported less frequent sleep disturbance; this analysis has the benefit of not assuming a continuous or linear distribution across the five categories of the Likert scale. We also conducted a proportional odds analysis to determine if exposure to sleep medications was associated with a significant reduction in the Likert scale. Other secondary analyses used the visit before sleep medication initiation to define the baseline patient characteristics to calculate the propensity score; this analysis allows us to assess the sensitivity of the results to the timing of variable measurement. We restricted the analyses to women who reported more severe sleep disturbances at baseline, defined as a four or five on at least one sleep domain. This definition is consistent with the frequency criterion for clinically significant sleep difficulty (eg, insomnia disorder).25 26 We compared no medication use to specific sleep medications, BZDs and selective BZD receptor agonists. Finally, we ran models adjusted for SWAN site and oestrogen replacement therapy. Such analyses retained the propensity score match.
All analyses were conducted using SAS V9.1 (Cary, North Carolina, USA). All p values were nominal and not adjusted for multiple comparisons, as these were post hoc exploratory analyses.
We identified 2531 potentially eligible women in SWAN who reported the severity of a sleep disturbance at some point during the 21 years of follow-up, 1995–2016 (see figure 1). We applied the exclusion criteria and found 1528 women who were analysed in the propensity score to identify potential matches. From this group, the 238 women who initiated a prescription sleep medication were significantly different than the overall group of women who did not (see online supplemental table 2). Thus, we propensity matched the 238, attempting, attempting to find two non-users for each user; we were able to match 447 women who never initiated a sleep medication during study follow-up. These 685 women were similar in characteristics to the 1846 potentially eligible women not included in the analysis (see online supplemental table 3). Hundred per cent of women included reported a sleep disturbance at some point during follow-up. At baseline, 72%–77% reported sleep disturbance.
The baseline characteristics of the women in the study cohort are shown in table 1. After propensity score matching, the women who initiated a sleep medication and those who did not were similar; all standardised mean differences were <0.1, indicating successful propensity score matching. The mean age for this analytic sample was 49.5 years (SD 8.5) and their BMI was 29.1 kg/m2 (SD 7.4). Approximately 80% had some education beyond high school. Approximately one-quarter were African-American and 57.5% were white; Hispanic, Chinese and Japanese women made up the rest of the sample. Almost all women had some medical insurance. Approximately half were current or past tobacco users and half were moderate to heavy alcohol users. Mean depression, anxiety and pain scores were similar across the groups, as were SF-36 mental and physical function scores. Menopausal status was very similar across the groups with about 36% being in the perimenopause. The range of comorbidities was typical for this population and similar across exposure groups.
At baseline, women who did and did not start a sleep medication reported very similar levels of sleep disturbance (see table 2). In both groups, women reported difficulty initiating sleep on approximately one-third of nights, waking frequently on approximately two-thirds of nights and early morning awakenings on approximately one-third of nights of the week. More than 70% of both groups reported any sleep disturbance at least 3 times weekly.
After 1 year, there were slight reductions noted in women’s reports of all types of sleep disturbances, but none of the differences from baseline in either exposure group (medication users or non-users) was statistically significant (see figure 2). One-year reports of early morning awakenings appeared to be slightly lower on the Likert scale among women not using sleep medications (mean 2.5, 95% CI 2.3 to 2.6) compared with those who did (mean 2.8, 95% CI 2.6 to 3.0; p=0.02). The secondary 2-year outcomes were similar to the 1-year results; none demonstrated statistically significant reductions in sleep disturbances among sleep medication users.
Several secondary analyses were pursued. First, we examined the distribution of Likert scores at baseline and 1 year of follow-up in the two groups (see table 3). The distributions among medication users and non-users were similar at baseline and follow-up (all p values >0.10). We also examined whether the results differed by type of sleep medication, BZD versus selective BZD receptor agonists and other hypnotics (see table 4); no differences were observed in the change from baseline to 1 year for either sleep medication group compared with medication non-users. The BZD group was further examined after removing lorazepam, and we found similar results for all types of sleep disturbances. We also re-ran the analyses with the baseline characteristics defined at the visit prior to the start of medications to assess how sensitive the results were to possible imprecision in the timing of variable measurement. The results showed small improvements in early morning awakenings among the sleep medication group (see online supplemental table 4). Additional sensitivity analyses retained the five-level categorical Likert scale as the primary outcome and proportional odds analyses gave similar negative results (see table 3 and online supplemental table 5); all proportional odds assumptions were met. In analyses that only included the women reporting clinically significant weekly frequency of sleep disturbances at baseline (4 or 5 on the Likert scale), no differences were found between sleep medication users and non-users (see online supplemental figure 2). Finally, analyses that also included site and oestrogen use gave similar results (see online supplemental table 6).
Sleep difficulties are common.1 27 Not surprisingly, the use of sleep medications has also grown over the last two decades.2 These agents have a range of safety concerns5 and recent reports describe substantial driving impairments.28 Most data regarding their efficacy derive from short-term studies (ie, 2–12 weeks), but these agents appear to be used over the long-term by many patients. In this analysis of the long-term impact of sleep medications in a large longitudinal cohort of well-characterised middle-aged community-dwelling women with sleep disturbances, sleep medication use was not associated with reduced sleep disturbances.
When physicians or other clinicians prescribe these medicines, they often begin with short-term prescriptions, but many patients receiving these prescriptions become long-term users.9 In the SWAN cohort, 37% of women starting a medication for sleep report using a sleep medication 1 year later. While there are good data from RCTs that these medications improve sleep disturbances in the short term,8 the results we present here represent some of the only data on these medications’ long-term impact on sleep. The lack of benefit observed in the current study suggests that when physicians begin prescribing these medicines they should discuss with patients that many patients continue them long-term, and that there is scant evidence demonstrating benefit to using these medicines beyond several months.6 7 In the study cohort, approximately half of the women were current or past tobacco users and 20% were moderate to heavy alcohol users. This was higher than expected and may reflect the demographic of women who endorse having a sleep disturbance.
A broader issue raised by this example is how clinicians should consider prescribing medications when their expected use differs substantially from the RCT evidence. Without evidence from RCTs demonstrating the benefit of a given type of drug in a given patient population using the drug for a similar duration, clinicians lack the necessary information to prescribe appropriately. Real-world data, or data from observational cohorts such as what we present here, provide important opportunities for looking at the way drugs may actually be used in typical practice. There has been an increasing appreciation for the use of observational data analysed appropriately to complement RCTs.10 The Food and Drug Administration has published a framework for generating evidence from real-world observational data sets,29 with the hope that such analyses will allow clinicians to better understand the benefits and risks of drugs in typical practice.
We used rigorous epidemiological methods and analysed a well-characterised cohort of women, but as with all observational studies there are limitations to recognise. The use of sleep medications was not randomised. Thus, even though the propensity score matched cohorts were very similar, there may be unmeasured confounding not accounted for in the analyses. These analyses were not predefined prior to establishing the SWAN cohort and should be considered post hoc and exploratory. Medication use was collected only at annual or biennial study visits, and there may have been intermittent use or non-adherence between visits. This is a limitation of many retrospective cohort medication analyses and limits the inferences that can be drawn. In the primary 1-year analysis, women were required to report use of a sleep medication at the subsequent annual visit in the new initiator group and to not report a sleep medication in the non-user group. In the secondary 2-year analysis, women who remained on drug accrued no benefit compared with women who never used a sleep medication. We did not update covariates in the 2-year analysis.
Sleep disturbances were self-reported, without any objective measures of sleep. This may have introduced misclassification, however the outcomes were self-reported among both groups of women, limiting any potential bias. The outcome measure we used for sleep disturbances has been validated in prior studies,17 18 but never in SWAN participants. The five-level categorical Likert scale was primarily analysed as a continuous variable in the mixed regression models, however analyses that retained the five categories gave similar negative results (see table 3 and online supplemental table 4). We do not have measures of daytime consequences in this dataset. It is also possible that sleep medications may have helped in the short-term, that is, at 8 or 12 weeks. Women only reported medication use and sleep disturbances at annual visits and thus interim outcomes (ie, at 6-month intervals) and intermittent medication use are not available for analysis. We did not include over-the-counter medication use and thus some non-users may actually have been using an over-the-counter hypnotic. We know that 11% of the women in this study reported use of an over-the-counter hypnotic at the baseline visit; slightly more women in the user group reported such use compared with the non-user group. Finally, some prescription sleep medications can be used for multiple indications, regardless of the prescriber’s knowledge.
In addition to these limitations, several strengths of this study should be described. We examined a well-characterised cohort of women during a high-risk period for sleep disturbance. It is known that women going through the midlife often note sleep disturbances.30 As well, we studied women of several races and ethnicities, enhancing the generalisability of the results. The study design also allowed us to examine a well-balanced cohort with very similar identical baseline features after propensity score matching. However, unmeasured or residual confounding cannot be ruled out.
In conclusion, sleep disturbances are common and increasing in prevalence. The use of sleep medications has grown, and they are often used over a long period, despite the relative lack of evidence from RCTs. The current observational study does not support use of sleep medications over the long term, as there were no self-reported differences at 1 or 2 years of follow-up comparing sleep medication users with non-users. While we used rigorous epidemiological methods, the findings reported herein are based on a non-randomised observational dataset and must be seen in that light. It is also important to note that neither group reported more severe sleep disturbances over the study follow-up. Most patients, if not all, should have received cognitive behavioural therapy.31 While some small percentage of patients with sleep disturbances may receive benefit from using these medications over several years, the lack of benefit associated with use of sleep medications in the population studied after 1 and 2 years should help inform clinicians and patients considering initiating pharmacological treatment for midlife women who have sleep complaints.
Data are available on reasonable request. This is an NIH-funded study and data are accessible through appropriate channels.
This protocol was reviewed and approved at each participating SWAN site: University of Pittsburgh—REN15070236/IRB0709006; Massachusetts General Hospital—1999P006353; University of Michigan—00000245; Albert Einstein College of Medicine—2005-012; Rush University Medical Centre—13021201-IRB01-AM04; University of California, Davis—260339-17; UCLA—11-002274-AM-00009.