Updating SNP heritability results from 4,236 phenotypes in UK Biobank

We’re excited to announce that updated UK Biobank SNP-heritability results are now available for the current Neale Lab GWAS release. Read on below for a quick summary of what has changed since the last release, and see the links at the bottom of this post for accessing the results. If you’re looking for a refresher on what SNP heritability is and why we’re estimating it, you can find our previous blog series on heritability on the Neale Lab UK Biobank landing page.

Background

Almost two years ago, we released estimates of SNP-heritability for 2,419 phenotypes in UK Biobank to accompany to GWAS results from the Neale Lab. Last year, the Neale Lab released a second round of GWAS results, expanding both the sample size and the number of phenotypes, as well as refining QC and the association model. An additional 31 biomarker phenotypes and meta-data fixes were added to that GWAS release this summer. It’s our pleasure to now release a corresponding update of the SNP-heritability results to cover those new GWAS results and refine the LD score regression (LDSR) analysis.

New Phenotypes, Refreshed GWAS

With the inclusion of the Round 2 GWAS results, including the additional biomarker data, the total number of available phenotypes has nearly doubled from 2,419 to 4,236 (4,178 of which are unique). In addition to biomarkers, these added phenotypes include a large number of items related to diet, cognition, mental health, and occupations, which were all part of follow-up surveys conducted by UK Biobank.

The Round 2 GWAS also increased sample size by using a more inclusive definition of European ancestry when selecting which participants to include in the GWAS. This change yielded a GWAS sample size of 361,194, compared to 337,199 in the Round 1 GWAS, increasing power for the SNP-heritability analyses.

The phenotypes previously included in the Round 1 release of GWAS and SNP-heritability results were also updated with new GWAS in the larger sample and with an updated association model. Most notably for the LDSR analysis, the new association includes additional PCA covariates and uses PCs estimated within the GWAS sample. We find that these changes are indeed beneficial to controlling stratification (see details here).

Lastly, in addition to the primary GWAS of each phenotype the Round 2 GWAS results include GWAS split by sex and GWAS of both raw and rank-normalized versions of continuous phenotypes. Although we’re mostly focused on SNP-heritability of the primary GWAS of each phenotype in this release (e.g. the combined sex analysis unless the phenoype is sex-specific), LDSR results are available for all 11,685 GWAS covered by these variations.

Standardization of continuous phenotypes

The Round 2 GWAS of both the raw and rank-normalized versions of 305 continuous phenotypes has allowed us to evaluate the impact of this transformation on SNP-heritability results. (By comparison, Round 1 results used rank-normalization for all continuous phenotypes.) As we detail on the heritability results site, we find that the impact of this transformation on the LDSR results is generally limited, but that on average we observe slightly stronger SNP-heritability results when the phenotype is rank-normalized for the GWAS rather than analyzed on the raw scale of the phenotype recorded by UK Biobank. On that basis, our updated SNP-heritability results browser continues to report the results for rank-normalized (IRNT) version of continuous phenotypes as the primary LDSR result.

We observe slightly stronger SNP-heritability results when the phenotype is rank-normalized for the GWAS rather than analyzed on the raw scale of the phenotype recorded by UK Biobank. Screenshot from: https://nealelab.github.io/UKBB_ldsc/select_topl… — We observe slightly stronger SNP-heritability results when the phenotype is rank-normalized for the GWAS rather than analyzed on the raw scale of the phenotype recorded by UK Biobank. Screenshot from: https://nealelab.github.io/UKBB_ldsc/select_topline.html#continuous_phenotype_normalization

Assessing confidence in LDSR results

In the Round 1 release of SNP-heritability results we provided some general warnings about the stability of LDSR results at low sample sizes (in part due to concerns about the possibility of bias in SNP-heritability estimates when the effective sample size was small), but otherwise made no indication of how confident we were of a given result or whether it should be viewed as significant. Perhaps predictably, we then received multiple requests to provide a list of the “significantly heritable” phenotypes to be used for downstream analyses. Therefore, for this release we have spent more time evaluating both 1) what features lend us confidence in LDSR results and 2) selecting a set of significance thresholds to use for reporting the full set of results.

As detailed on the LDSR SNP-heritability results site, we evaluate confidence in the LDSR results for the primary GWAS of each of the 4,178 unique UK Biobank phenotypes. Our confidence ratings depend on the effective sample size of the GWAS, the size of the standard error on the SNP-heritability estimate, phenotypic sex bias in the GWAS sample, and the interpretability of the PHESANT encoding of levels for ordinal phenotypes. Based on these factors, we rate results as high (805 phenotypes), medium (372 phenotypes), low (703 phenotypes), or no confidence (2298 phenotypes).

The primary factor affecting our confidence rating is effective sample size, with low confidence requiring an effective N above 4,500, and high confidence requiring an effective N above 40,000. The Round 2 results suggest that the risk of downward bias in LDSR SNP-heritability estimates at low effective N is more nuanced than expected from Round 1, including some dependence on the LDSR intercept, but nevertheless results at low sample sizes are unstable. The floor of effective N > 4,500 for low confidence is based on an approximate estimate of the minimum sample size required to have meaningful statistical power for LDSR to detect realistic SNP-heritability values.

Second, we reduce our confidence rating for phenotypes when the jackknife standard error of the SNP-heritability estimate is substantially higher than would be expected based on sample size. This tends to identify phenotypes with few loci with extremely strong effects (e.g. SNPs with p-values on the order of 1e-5000 or more in UK Biobank) that don’t fit well with LDSR’s preference for highly polygenic traits and thus destabilize estimation.

Lastly, we lower our confidence rating for phenotypes with strong sex bias or with ordinal response codings that may not fully reflect an underlying numeric scale, reflecting the potential of those phenotypic features to influence the interpretability of the SNP heritability results. Although the LDSR results for those phenotype may be statistically stable, they have the potential to be misleading if the impact of phenotyping isn’t carefully considered. (Note this is true for all UK Biobank phenotypes, but the concerns in this case exist beyond the face value of the item in UK Biobank.)

Defining a set of significant SNP-heritability results

For evaluating the significance of the SNP-heritability results, we consider a range of possible significance thresholds that could be appropriate based on whether hypothesis testing is done in all phenotypes vs. only high confidence phenotypes and whether multiple testing should be accounted for via Bonferroni correction or by trying to estimate the effective number of tests for correlated phenotypes. (Details are on the heritability result site here.)

Screenshot from: https://nealelab.github.io/UKBB_ldsc/significance.html#distribution_of_(h^2_g)_results

Comparing the resulting thresholds, we find that these choices have a limited impact on how many phenotypes reach significance. We find that for most choices of hypothesis test and multiple testing correction the significance is well approximated (slightly conservatively) by using z > 4 (p < 3.17e-5), the threshold recommended previously as the rule of thumb for whether follow-up estimation of genetic correlation in LDSR is viable. We therefore report phenotypes at this threshold, as well as nominal significance (p < .05) and at the stricter z > 7 (p < 1.28e-12) recommended as a rule of thumb for considering partitioned heritability results from LDSR.

Based on these thresholds, among the primary GWAS for each of the 4,178 unique phenotypes we find 703 phenotypes with significant (z > 4) SNP-heritability estimates and at least medium confidence in the results, with 453 of those reaching the higher z > 7 threshold.

Results availability

As with the LDSR results from Round 1, we’re making these results fully available for browsing and download, including:

A browser with primary SNP-heritability results for each of the 4,179 unique phenotypes
More details on our processing of these results, including deduping phenotypes, assessing confidence, and evaluating significance
Pages with more detailed results for each phenotype, including enrichments from partitioned heritability for phenotypes with SNP-heritability z > 7 (e.g. Height)
Downloads of the full partitioned heritability results for all 11,685 GWAS, including sex specific and raw vs. IRNT versions of the analyses
LDSC sumstats files, extracted from the full GWAS results and formatted for easy use in your own LDSR analyses
Our code for running the LDSR analyses (and for creating the browser site)

We’ve also released estimates of the genetic correlation between all of the phenotypes identified as significantly heritable by this analysis, and you should now see heritability results accompanying the manhattan plots from @sbotgwa on twitter each day!

If you have any questions about those results, feel free to reach out to us at nealelab.ukb@gmail.com and we’ll get back to you as soon as we can. Thanks for your interest, and happy browsing!

Authored by Raymond Walters, with support from Daniel Howrigan, and thanks to the whole Neale Lab UK Biobank heritability and GWAS team.