Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Analysis of rare coding variants in 470,000 exome-sequenced subjects characterises contributions to risk of type 2 diabetes

Abstract

Aims

To follow up results from an earlier study using an extended sample of 470,000 exome-sequenced subjects to identify genes associated with type 2 diabetes (T2D) and to characterise the distribution of rare variants in these genes.

Materials and methods

Exome sequence data for 470,000 UK Biobank participants was analysed using a combined phenotype for T2D obtained from diagnostic and prescription data. Gene-wise weighted burden analysis of rare coding variants in the new cohort of 270,000 samples was carried out for the 32 genes previously significant with uncorrected p < 0.001 along with 7 other genes previously implicated in T2D. Follow-up studies of GCK, GIGYF1, HNF1A and HNF4A used the full sample of 470,000 to investigate the effects of different categories of variant.

Results

No novel genes were identified as exome wide significant. Rare loss of function (LOF) variants in GCK exerted a very large effect on T2D risk but more common (though still very rare) nonsynonymous variants classified as probably damaging by PolyPhen on average approximately doubled risk. Rare variants in the other three genes also had large effects on risk.

Conclusions

In spite of the very large sample size, no novel genes are implicated. Coding variants with an identifiable effect are collectively too rare be generally useful for guiding treatment choices for most patients. The finding that some nonsynonymous variants in GCK affect T2D risk is novel but not unexpected and does not have obvious practical implications. This research has been conducted using the UK Biobank Resource.

Introduction

A previous study carrying out gene-based weighed burden analysis of rare coding variants using 200,000 exome-sequenced UK Biobank participants identified three genes associated with type 2 diabetes (T2D) at exome-wide significance, GCK, HNF4A and GIGYF1 [1]. While GCK, HNF4A were already well-recognised as causes of maturity onset diabetes of the young (MODY), the implication of GIGYF1 was novel though was quickly confirmed in another study which had access to sequence data from 379,000 UK Biobank participants [24]. Although these three were the only genes which achieved exome-wide significance, a total of 32 genes were significant with an uncorrected p value < 0.001 whereas, given that there were 20,384 informative genes, only 20 would be expected by chance. Additionally, a number of these genes appeared to be of potential interest from a biological point of view. Of note, a number of other genes with well-established roles in T2D failed to produce strong evidence of association using the weighted burden analysis, consisting of HNF1A, HNF1B, ABCC8, INSR, MC4R, SLC30A8 and PAM.

Subsequently, rare variant analyses using multiple different phenotypes were carried out in larger numbers of exome sequenced participants from the same UK Biobank cohort and some of the phenotypes studied included T2D and related conditions [5, 6].

Exome sequence data for a full set of 470,000 participants has now been made more widely available and the current study carried out weighted burden analysis in the new samples of the genes significant at p < 0.001 in the previous study, along with the other T2D implicated genes mentioned above. This study aimed to test for evidence of association and to compare results with those obtained from the multiple phenotype studies referred to above, as well as to characterise the effects of different categories of coding variant on risk in implicated genes. The purpose of this study was to use the 270,000 newly available exomes to test whether some of the genes which had produced results which were not significant after correction for multiple testing in the earlier study might yield evidence for association with the new sample. Additionally, having the larger sample of 470,000 would mean that it would be possible to more accurately model the effects on disease risk of different categories of variant in the associated genes.

Materials and methods

The methods used were essentially the same as those described previously and are briefly repeated here for the reader’s convenience.

UK Biobank participants are volunteers intended to be broadly representative of the UK population and are not selected on the basis of having any health condition. UK Biobank had obtained ethics approval from the North West Multi-centre Research Ethics Committee which covers the UK (approval number: 11/NW/0382) and had obtained written informed consent from all participants. The UK Biobank approved an application for use of the data (ID 51119) and ethics approval for the analyses was obtained from the UCL Research Ethics Committee (11527/001). The data was accessed most recently on October 12 2023. There was no information which could be used to identify individual subjects. No subjects were minors. The UK Biobank Research Analysis Platform was used to access the Final Release Population level exome OQFE variants in PLINK format for 469,818 exomes which had been produced at the Regeneron Genetics Center using the protocols described here: https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/whole-exome-sequencing-oqfe-protocol/protocol-for-processing-ukb-whole-exome-sequencing-data-sets [6]. All variants were then annotated using the standard software packages VEP, PolyPhen and SIFT [79]. To obtain population principal components reflecting ancestry, version 2.0 of plink (https://www.cog-genomics.org/plink/2.0/) was run with the options—maf 0.1pca 20 approx [10, 11].

The T2D phenotype was defined in the same way as previously and was determined from three sources in the dataset: self-reported diabetes or type 2 diabetes (but not type 1 or gestational diabetes); reporting taking any of a list of named medications commonly used to treat T2D in the UK (https://www.diabetes.co.uk/Diabetes-drugs.html); having an ICD10 code for non-insulin-dependent diabetes mellitus in hospital records or as a cause of death [1]. Subjects in any of these categories were deemed to be cases while all other subjects were taken to be controls. In the primary analyses to implicate specific genes, attention was restricted to participants not included in the earlier study, consisting of 19,701 cases and 249,581 controls. For the subsequent analyses using the whole sample there were 33,629 cases and 436,136 controls.

The SCOREASSOC program was used to carry out a weighted burden analysis to test whether, in each gene, sequence variants which were rarer and/or predicted to have more severe functional effects occurred more commonly in cases than controls [1214]. Attention was restricted to rare variants with minor allele frequency (MAF) < = 0.01 in cases or controls or both. As previously described, variants were weighted by overall MAF so that variants with MAF > = 0.01 were given a weight of 1 while very rare variants with MAF close to zero were given a weight of 10. Variants were also weighted according to their functional annotation using the GENEVARASSOC program, which was used to generate the input files for weighted burden analysis by SCOREASSOC. Variants predicted to cause complete loss of function (LOF) of the gene were assigned a weight of 100. Nonsynonymous variants were assigned a weight of 5 but if PolyPhen annotated them as possibly or probably damaging then 5 or 10 was added to this and if SIFT annotated them as deleterious then 20 was added. The full set of weights and categories is displayed in Table 1 of the previous study [1]. The weighting scheme had been devised to be broadly concordant with the observed effects of variants of different annotations and allele frequencies, as detailed in an earlier report [15]. As described previously, the weight due to MAF and the weight due to functional annotation were multiplied together to provide an overall weight for each variant. Variants were excluded if there were more than 10% of genotypes missing in the controls and cases or if the heterozygote count was smaller than both homozygote counts in controls and cases. If a subject was not genotyped for a variant then they were assigned the subject-wise average score for that variant. For each subject a gene-wise weighted burden score was derived as the sum of the variant-wise weights, each multiplied by the number of alleles of the variant which the given subject possessed.

thumbnail
Table 1. Results of gene-wise weighted burden analysis of rare variants in the original sample of 200,000 participants, the new sample of 270,000 and, for genes of interest, in the combined sample of 470,000.

https://doi.org/10.1371/journal.pone.0311827.t001

Analyses were restricted to the 32 genes significant at p < 0.001 in the previous study along with the other 7 listed above as being previously implicated in T2D. For each gene, logistic regression analysis was carried out with T2D as the dependent variable including the first 20 population principal components and sex as covariates and a likelihood ratio test was performed comparing the likelihoods of the models with and without the gene-wise burden score. This is a test for association between the gene-wise burden score and caseness and the statistical significance was summarised as a signed log p value (SLP), which is the log base 10 of the p value given a positive sign if the score is higher in cases and negative if it is higher in controls. Since only 39 genes were analysed and each gene was subjected to a single test, in total only 39 tests were performed in the new samples. This means that after a Bonferroni correction for multiple testing a gene could be declared statistically significant if it achieved an SLP with absolute value greater than -log10(0.05/39) = 2.89 using the new samples.

Follow-up analyses were performed on all genes individually achieving this significance level and also GIGYF1, because this gene had reached conventional levels of exome-wide statistical significance in the earlier study of this dataset. For this subset of genes the weighted burden analysis described above was repeated using the whole sample of 33,629 cases and 436,136 controls. Additionally, for each subject a count was obtained of the number of variants they carried falling into particular broad annotation categories, such as LOF, protein altering, etc. The full list of these categories is shown in S1 Table. These counts were entered into a multiple logistic regression analysis with T2D as the dependent variable and again including sex and 20 principal components as covariates in order to elucidate the contribution of different types of variant to the overall evidence for association. The odds ratios (ORs) associated with each category were estimated along with their standard errors and the Wald statistic was used to obtain a p value. This p value was converted to an SLP, again with the sign being positive if the OR was greater than 1, indicating that variants in that category tended to increase risk.

Data manipulation and statistical analyses were performed using GENEVARASSOC, SCOREASSOC and R [13, 14, 16].

Results

Table 1 shows the results of the primary analysis, presenting the SLPs obtained in the previous study along with those obtained in the new sample. Of the genes showing evidence for association in the previous study, only GCK (SLP = 12.09) and HNF4A (SLP = 3.81) are formally significant after correction for multiple testing, while GIGYF1 yields SLP = 2.42 and none of the other genes previously with p < 0.001 shows evidence for association in the new sample. Of the 7 genes implicated in T2D in earlier studies, only HNF1A (SLP = 7.17) is formally statistically significant.

The four genes named above were carried forward for secondary analyses. The original study considered 20,384 genes, meaning that for a gene-wise result to be considered exome-wide significant the magnitude of the SLP obtained should exceed -log10(0.05/20384) = 5.61. For the four genes carried forward, the results of weighted burden analysis in the entire sample of 33,629 cases and 436,136 controls are also shown in Table 1 and it can be seen that all four of these genes produce results which would be regarded as exome-wide significant in the full sample.

In order to gain insights into the effects of different categories of variant within these four genes of interest, counts for variants of each category in each subject were entered into multiple logistic regression analysis along with sex and 20 principal components as covariates. These results are shown in Table 2 and are summarised briefly as follows.

thumbnail
Table 2. Results from logistic regression analysis including principal components and sex as covariates showing the contribution different categories of variant within each gene make to risk of hyperlipidaemia.

Odds ratios for each category are estimated and the strength of evidence for an effect is expressed as the SLP.

https://doi.org/10.1371/journal.pone.0311827.t002

Table 2A shows that LOF variants in GCK exert a substantial risk of T2D, with OR over 20, but that nonsynonymous variants classified as probably damaging by PolyPhen also increase risk, with OR estimated as 2.45. Variants in this latter category are observed 363 times in the sample of 470,000 participants, so occur in less than 1 in 1,000 people, whereas the LOF variants are rarer still, being seen only 43 times.

Table 2B shows that LOF variants in GIGYF1 are slightly commoner than in GCK, being seen 174 times, although they remain extremely rare. They have a more moderate effect on risk, with OR estimated as only 3.44. No other categories of variant have a clear effect on risk, though it is possible that variants classified has probably damaging by PolyPhen (SLP = 1.97) have a small effect (OR = 1.21).

Table 2C shows that LOF variants in HNF1A increase risk with OR = 4.88, but there may also be a modest effect of 5-prime UTR variants (SLP = 2.31, OR = 1.30) and/or variants classified as probably damaging by PolyPhen (SLP = 1.52, OR = 1.33).

Table 2D shows that LOF variants in HNF4A are extremely rare and do not have a detectable effect. Rather, the signal implicating this gene seems to come from variants classified as probably damaging by PolyPhen (SLP = 2.79, OR = 1.87) and two indel variants. These two variants consisted of 20:44428418GCCAACACAATGC>G (rs1349603952), observed in 4 controls and 2 cases, and 20:44424132A>AGCT (rs776489992), observed in 4 controls and 3 cases. Malacards lists 4 entries for rs776489992, with phenotypes MODY, MODY Type 1, T2D and Fanconi Renotubular Syndrome 4 with MODY (https://www.malacards.org/search/results?query=rs776489992). However there are no previous reports for rs1349603952.

Discussion

These analyses provide very strong support for GCK as a risk gene for T2D while three other previously identified genes also achieve conventional levels of significance: GIGYF1, HNF1A and HNF4A. However, no novel genes are implicated. As mentioned previously, this dataset has been used for analyses of multiple phenotypes including some relating to T2D. which we can refer to as the Regeneron and AstraZeneca studies (Backman et al., 2021; Wang et al., 2021). The Regeneron study carried out a variety of single variant and gene-wise burden tests on 3,994 health-related traits to produce a total of about 2.3 billion tests, yielding a critical p value of 2.18e-11 (corresponding to SLP = 10.66), and reported 8,865 significant associations which are presented in their Supplementary Data 2 (Backman et al., 2021). 64 associations were reported between GCK and diabetes or related phenotypes, with the most significant being with glycated haemoglobin HbA1c at p = 4.98e-22, equivalent to SLP = 21.30, whereas in the current study GCK yields SLP = 32.11. GIGYF1 was associated with T2D at SLP = 12.34 and HNF1A with T2D at SLP = 12.58. However no association with a diabetes-related phenotype was reported for HNF4A, although it was associated with levels of sex hormone binding globulin (SHBG) at SLP = 41.85. For the AstraZeneca study, all gene-wise and variant-wise associations with 17,361 binary and 1,419 quantitative phenotypes are reported on the AstraZeneca PheWAS Portal at https://azphewas.com (Wang et al., 2021). This was accessed to find the most significant p value for any analysis of each of these genes with the phenotype "Union#E11#Type 2 diabetes mellitus" and Table 3 shows the results obtained compared with those for the current study. It can be seen that the current study again produces stronger evidence for association with GCK, with SLP = 32.11 versus SLP = 23.10 for the AstraZeneca study, whereas for the other three genes the strength of evidence for association is fairly similar between the two studies.

thumbnail
Table 3. Comparison of results from current study to those reported for the AstraZeneca study.

The results for the AstraZeneca study are displayed as the equivalent SLP for the most significant result reported for that gene with the phenotype "Union#E11#Type 2 diabetes mellitus".

https://doi.org/10.1371/journal.pone.0311827.t003

The fact that current study finds stronger evidence for association of GCK relative to the other analyses may reflect the fact that, for this gene, the pattern of effects due to different variant types does resemble the model which is assumed for the weighted burden analysis, with strong effects due to LOF variants and more moderate effects due to some nonsynonymous variants. However for the other three genes this pattern is not seen and hence for them the weighted burden analysis does not have advantages over more conventional variant pooling analyses. In a subsequent study which used a wide variety of different methods to predict the effects of nonsynonymous variants, it was observed that other predictors would produce stronger evidence for effects of nonsynonymous variants in these genes [17]. However using a variety of different predictors would require correction for multiple testing and so was not thought appropriate for the current study, which aimed simply to obtain evidence for assocation at the level of the gene.

It is of interest to note that the evidence in favour of the association with T2D risk is considerably higher for GCK than for the other genes, and likewise the effect size of implicated variants is larger. It is tempting to speculate that this relates to the molecular mechanisms underlying the observed association. The product of GCK, glucokinase, is a low-affinity hexose kinase which acts as the rate limiting enzyme for glycolysis in pancreatic islet cells, as well as in some hepatocytes and neurons, meaning that it can be used by these cells as an indicator of blood glucose levels [18, 19]. Thus, impaired functioning of glucokinase is expected to lead to reduced sensitivity to higher glucose levels and hence inadequate glycaemic control. By contrast, GIGYF1, HNF1A and HNF4A are involved with lower level cellular processes which have a less immediate impact in terms of producing diabetes as a phenotype. The product of GIGFY1 binds to Grb10, a protein which regulates the response to insulin-like growth factor receptor signalling and it is associated with a number of different phenotypes in addition to T2D, including lipid-related phenotypes, education score, cognitive function and cystatin C levels [6, 2022]. The products of HNF1A and HNF4A are transcription factors affecting the expression of large numbers of other genes and influencing development of the liver and pancreas [23]. Biallelic variants in HNF1A can cause hepatocellular adenomas, while variants in HNF4A can cause Fanconi renotubular syndrome and are associated with SHBG levels [6, 24, 25]. The fact that GCK has such a direct effect on contributing to the control of glucose levels may explain in part why LOF variants in it have a larger effect on the T2D phenotype than for other genes.

The emphasis of the current study is to detect and characterise association at the level of the gene and of categories of variant within the gene, even though many of the variants concerned are too rare to be tested individually. However it is recognised that within an associated category there will be some variants having an effect on risk and others which do not. When the same variant is observed in multiple individuals then it would be possible to attempt to model the individual effect of such a variant in terms of its estimated odds ratio or penetrance, as has been carried out using variants designated as pathogenic in these genes in a subsample of the UK Biobank dataset [26]. The availability of the AstraZeneca PheWAS Portal at https://azphewas.com means that for any such variant one and any studied phenotype one can obtain the variant counts in controls and cases in order to estimate the odds ratio and/or penetrance.

It could be argued that the work presented here highlights some of the limitations as well as strengths of analysing rare coding variants identified in exome-sequencing studies of large population cohorts. Because of the high prevalence of T2D, many thousands of cases are available for study but, as the results show, only a small fraction of these cases carry a variant in a category which can be identified as impacting risk. A number of genes which had previously been implicated in targeted studies do not in the current study yield evidence at conventional levels of statistical significance after correction for multiple testing. Although T2D has a high prevalence, many other clinically important phenotypes have a substantial genetic contribution to risk but with a lower prevalence and there would be insufficient case numbers present in an unselected cohort for similar approaches to be likely to yield any convincing novel rare variant associations. In order to identify genes involved in such conditions it would be necessary to carry out studies involving specifically recruited cases, perhaps also focusing on those from densely affected families where large-effect variants may be active. To support such initiatives, it would be helpful to strengthen methods to incorporate existing samples as controls rather than requiring that a matching set of controls be recruited and sequenced for each new set of cases. Using existing samples as controls has been helpful in other sequencing studies but requires careful alignment of methodologies to minimise artefacts [27].

If adequately sized samples are used, exome sequencing studies can identify genes in which damaging variants have large effects on risk of particular phenotypes. The main value of such studies is to implicate specific genes, and hence their protein products, as impacting the phenotype. This may ultimately lead to a better understanding of the molecular pathways involved in pathogenesis. However, because for non-Mendelian diseases identifiable variants are only seen in a very small proportion of cases, typically fewer than 1%, such approaches seem unlikely to be helpful to guide individual treatment interventions in most situations. The vast majority of patients would not carry a variant which could be clearly identified as causal, and even if such a variant were encountered this might not automatically have clear implications for treatment choices. Taking the results obtained from the current study as an example, fewer than 1% of cases have a variant in one of the four identified genes which would be classified as having a pathogenic effect. A recent review provides an account of the variations in clinical course and responses to treatment in individuals carrying pathogenic variants in GCK, HNF1A or HNF4A and hence identifying these cases might provide some therapeutic benefit [28]. However for most patients with T2D, genetic screening would not be expected to produce actionable results.

Exome-sequencing studies to date, including the current one, now fairly consistently show that the category of variant having the highest identifiable impact on phenotype consists of those variants which are predicted to cause loss of function of the gene, or haploinsufficiency. This is not to say that individual variants in other categories might not have larger effects, and of course the literature is replete with examples of these. However, in a situation where individual variants are extremely rare, as is expected for those with large effect, it becomes necessary to pool variants together in some form of burden analysis and currently available methods for prediction of impact of non-LOF variants on the function of the gene and/or protein product are not able to reliably discriminate those which are pathogenic from those without major effect. If for a given gene-phenotype pair we can discover that that LOF variants have a particular effect on increasing or decreasing risk then this may provide an important endpoint in terms of improved insight into the molecular pathways involved in pathogenesis. For example, this might be sufficient to flag up the protein as a possible drug target. However it is possible to argue that additional useful information could be gained from more intensive investigations to elucidate the effects of other types of variant. For example, if one can find that non-synonymous variants affecting particular protein domains tend to show evidence of association this might yield a more sophisticated understanding of disease mechanisms which again might potentially be exploited therapeutically.

The present study confirms the role of four previously implicated genes in risk of T2D. It also demonstrates that nonsynonymous variants in GCK which PolyPhen annotates as probably damaging on average approximately double risk of T2D, although as these variants are still very rare this finding may not have much in the way of practical applications. The results show the distributions of different categories of variant in these genes in the general population. Overall, the study provides some insights into what can be achieved from the analysis of exome sequence data and into some limitations of such approaches.

Supporting information

S1 Table. The table shows the broad categories used for variant category specific analyses along with the annotations produced by VEP which were grouped into each category.

https://doi.org/10.1371/journal.pone.0311827.s001

(DOCX)

Acknowledgments

This research has been conducted using the UK Biobank Resource. The author wishes to acknowledge the staff supporting the High Performance Computing Cluster, Computer Science Department, University College London. The author wishes to thank the participants who volunteered for the UK Biobank project.

References

  1. 1. Curtis D. Analysis of rare coding variants in 200,000 exome-sequenced subjects reveals novel genetic risk factors for type 2 diabetes. Diabetes Metab Res Rev [Internet]. 2021 [cited 2021 Sep 24]; Available from: https://pubmed.ncbi.nlm.nih.gov/34216101/. pmid:34216101
  2. 2. Deaton AM, Parker MM, Ward LD, Flynn-Carroll AO, BonDurant L, Hinkle G, et al. Gene-level analysis of rare variants in 379,066 whole exome sequences identifies an association of GIGYF1 loss of function with type 2 diabetes. Sci Rep [Internet]. 2021 Nov 3 [cited 2022 May 25];11(1):21565. Available from: https://pubmed.ncbi.nlm.nih.gov/34732801/. pmid:34732801
  3. 3. Bishay RH, Greenfield JR. A review of maturity onset diabetes of the young (MODY) and challenges in the management of glucokinase‐MODY. Medical Journal of Australia [Internet]. 2016 Nov 21 [cited 2021 Jan 6];205(10):480–5. Available from: https://onlinelibrary.wiley.com/doi/abs/10.5694/mja16.00458. pmid:27852188
  4. 4. Naylor R, Johnson AK, del Gaudio D. Maturity-Onset Diabetes of the Young Overview. In: Adam M, Ardinger H, Pagon R, Wallace S, Bean L, Stephens K, et al., editors. GeneReviews [Internet]. Seattle (WA): University of Washington, Seattle; 2018 [cited 2021 Jan 6]. Available from: https://www.ncbi.nlm.nih.gov/books/NBK500456/.
  5. 5. Wang Q, Dhindsa RS, Carss K, Harper AR, Nag A, Tachmazidou I, et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature 2021 597:7877 [Internet]. 2021 Aug 10 [cited 2022 Mar 18];597(7877):527–32. Available from: https://www.nature.com/articles/s41586-021-03855-y.
  6. 6. Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature [Internet]. 2021 Nov 25 [cited 2023 Aug 30];599(7886):628–34. Available from: https://pubmed.ncbi.nlm.nih.gov/34662886/. pmid:34662886
  7. 7. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol [Internet]. 2016 Jun 6 [cited 2017 May 9];17(1):122. Available from: http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0974-4 pmid:27268795
  8. 8. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet [Internet]. 2013 Jan [cited 2017 May 17];7 Unit7.20. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23315928. pmid:23315928
  9. 9. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc [Internet]. 2009 Jun 25 [cited 2017 May 17];4(8):1073–81. Available from: http://www.ncbi.nlm.nih.gov/pubmed/19561590. pmid:19561590
  10. 10. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience [Internet]. 2015 Dec 25 [cited 2017 Sep 19];4(1):7. Available from: https://academic.oup.com/gigascience/article-lookup/doi/10.1186/s13742-015-0047-8.
  11. 11. Galinsky KJ, Bhatia G, Loh PR, Georgiev S, Mukherjee S, Patterson NJ, et al. Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. Am J Hum Genet [Internet]. 2016 Mar 3 [cited 2020 Dec 14];98(3):456–72. Available from: https://pubmed.ncbi.nlm.nih.gov/26924531/. pmid:26924531
  12. 12. Curtis D. A rapid method for combined analysis of common and rare variants at the level of a region, gene, or pathway. Adv Appl Bioinform Chem. 2012;5:1–9. pmid:22888262
  13. 13. Curtis D. Pathway analysis of whole exome sequence data provides further support for the involvement of histone modification in the aetiology of schizophrenia. Psychiatr Genet [Internet]. 2016;26:223–7. Available from: http://content.wkhealth.com/linkback/openurl?sid=WKPTLP:landingpage&an=00041444-900000000-99634. pmid:26981879
  14. 14. Curtis D. Multiple Linear Regression Allows Weighted Burden Analysis of Rare Coding Variants in an Ethnically Heterogeneous Population. Hum Hered [Internet]. 2020 Jan 7 [cited 2021 Jan 8];1–10. Available from: https://www.karger.com/Article/FullText/512576.
  15. 15. Curtis D. Exploration of weighting schemes based on allele frequency and annotation for weighted burden association analysis of complex phenotypes. Gene [Internet]. 2022 Jan 30 [cited 2023 Aug 23];809. Available from: https://pubmed.ncbi.nlm.nih.gov/34688815/. pmid:34688815
  16. 16. R Core Team. R: A language and environment for statistical computing. [Internet]. Vienna, Austria.: R Foundation for Statistical Computing; 2014. Available from: http://www.r-project.org.
  17. 17. Curtis D. Assessment of ability of AlphaMissense to identify variants affecting susceptibility to common disease. European Journal of Human Genetics 2024 [Internet]. 2024 Aug 3 [cited 2024 Aug 22];1–9. Available from: https://www.nature.com/articles/s41431-024-01675-y. pmid:39097650
  18. 18. Ogunnowo-Bada EO, Heeley N, Brochard L, Evans ML. Brain glucose sensing, glucokinase and neural control of metabolism and islet function. Diabetes Obes Metab [Internet]. 2014 [cited 2023 Oct 20];16 Suppl 1(Suppl 1):26–32. Available from: https://pubmed.ncbi.nlm.nih.gov/25200293/. pmid:25200293
  19. 19. McCrimmon RJ. Remembrance of things past: The consequences of recurrent hypoglycaemia in diabetes. Diabet Med [Internet]. 2022 Dec 1 [cited 2023 Oct 20];39(12). Available from: https://pubmed.ncbi.nlm.nih.gov/36251572/. pmid:36251572
  20. 20. Jurgens SJ, Choi SH, Morrill VN, Chaffin M, Pirruccello JP, Halford JL, et al. Analysis of rare genetic variation underlying cardiometabolic diseases and traits among 200,000 individuals in the UK Biobank. Nat Genet [Internet]. 2022 Mar 1 [cited 2023 Oct 20];54(3):240–50. Available from: https://pubmed.ncbi.nlm.nih.gov/35177841/. pmid:35177841
  21. 21. Giovannone B, Lee E, Laviola L, Giorgino F, Cleveland KA, Smith RJ. Two novel proteins that are linked to insulin-like growth factor (IGF-I) receptors by the Grb10 adapter and modulate IGF-I signaling. Journal of Biological Chemistry [Internet]. 2003 Aug 22 [cited 2021 Jan 6];278(34):31564–73. Available from: https://pubmed.ncbi.nlm.nih.gov/12771153/. pmid:12771153
  22. 22. Chen CY, Tian R, Ge T, Lam M, Sanchez-Andrade G, Singh T, et al. The impact of rare protein coding genetic variation on adult cognitive function. Nat Genet. 2023 Jun 1;55(6):927–38. pmid:37231097
  23. 23. Xue D, Narisu N, Taylor DL, Zhang M, Grenko C, Taylor HJ, et al. Functional interrogation of twenty type 2 diabetes-associated genes using isogenic human embryonic stem cell-derived β-like cells. Cell Metab [Internet]. 2023 Oct [cited 2023 Oct 20]; Available from: https://pubmed.ncbi.nlm.nih.gov/37858332/.
  24. 24. Bioulac-Sage P, Sempoux C, Balabaud C. Hepatocellular adenoma: Classification, variants and clinical relevance. Semin Diagn Pathol. 2017 Mar 1;34(2):112–25. pmid:28131467
  25. 25. Lemaire M. Novel Fanconi renotubular syndromes provide insights in proximal tubule pathophysiology. Am J Physiol Renal Physiol [Internet]. 2021 Feb 1 [cited 2023 Oct 20];320(2):F145–60. Available from: https://pubmed.ncbi.nlm.nih.gov/33283647/. pmid:33283647
  26. 26. Mirshahi UL, Colclough K, Wright CF, Wood AR, Beaumont RN, Tyrrell J, et al. Reduced penetrance of MODY-associated HNF1A/HNF4A variants but not GCK variants in clinically unselected cohorts. Am J Hum Genet [Internet]. 2022 Nov 3 [cited 2024 Aug 22];109(11):2018–28. Available from: https://pubmed.ncbi.nlm.nih.gov/36257325/. pmid:36257325
  27. 27. Singh T, The Schizophrenia Exome Meta-Analysis (SCHEMA) Consortium. Exome sequencing identifies rare coding variants in 10 genes which confer substantial risk for schizophrenia. Nature [Internet]. 2022; Available from: https://doi.org/10.1038/s41586-022-04556-w.
  28. 28. Sharma M, Maurya K, Nautiyal A, Chitme HR. Monogenic Diabetes: A Comprehensive Overview and Therapeutic Management of Subtypes of Mody. Endocr Res [Internet]. 2024 [cited 2024 Aug 21]; Available from: https://pubmed.ncbi.nlm.nih.gov/39106207/. pmid:39106207