Download PDF

Determination of the Variability and Associated Epigenetic Signature of Tandem Repeats by Single Molecule Sequencing

Publication date: 2018-12-21

Author:

Ardui, S
Vermeesch, J ; Matthijs, G

Abstract:

Around 1.5 million short tandem repeats (STRs) are spread across the entire human genome. STRs are functionally important elements that are able to modulate the phenotype of an individual. They can modify cellular biology by influencing the genome, transcriptome and proteome of a cell. The most extreme examples of the functional impact of STRs are the more than 40 repeat expansion disorders like fragile X syndrome (FXS) and myotonic dystrophy 1 (DM1). Up to now, many aspects of STRs remain illusive. Due to an historical underestimation of their importance and the lack of adequate technologies they remain understudied. The rise of Single Molecule Real-Time (SMRT) sequencing from Pacific Biosciences changes this paradigm and arms researchers with better tools to investigate STRs. In this thesis we studied different aspects of STRs, thereby exploring different assets of SMRT sequencing. SMRT sequencing can span long, GC-rich repeats, whilst simultaneously revealing DNA modifications in the sequenced region. Unfortunately, the throughput of the technology is limited, making it economically unfeasible to sequence an entire genome if one is only interested in a single locus or a subset of the genome. Therefore, enrichment strategies like PCR are commonly being used. These strategies aver nevertheless very error-prone, especially when amplifying repeats. Furthermore, they remove all epigenetic marks. Thus, amplification impedes the complete genetic and epigenetic characterization of STRs. To tackle this, we developed a CRISPR-CAS9 approach to excise the FMR1 CGG repeat in combination with restriction enzymes to remove off-target genomic DNA. This generated a very accurate picture of the FMR1 CGG repeat variability of the BAC molecule and made it possible to identify DNA methylation. Indeed, besides avoiding amplification biases, this method permits native DNA capture and, hence, allows for direct detection of base modifications. On human DNA, enrichment factors over 100X were achieved while up to 5 reads covering the FMR1 CGG repeat could be retrieved from one SMRT cell. Albeit further improvements are necessary to allow wide spread implementation, this method has the potential to significantly further unravel the complex genotype-phenotype correlations in FXS. In addition, it could be used to screen for long, methylated CGG alleles in diagnostics where it could replace complicated and laborious Southern blots. FXS arises from the FMR1 CGG expansion of a premutation (55-200 repeats) to a full mutation allele (>200 repeats) in females. This type of expansion is the most frequent cause of inherited X-linked intellectual disability. The risk for a premutation to expand to a full mutation allele depends on the repeat length and AGG triplets interrupting this repeat. Therefore, it is necessary to map these AGG interruptions in order to study the stability of the FMR1 allele. Additionally, easy access to accurate size estimates and AGG information is also of great importance in genetic counseling since they allow for women carrying a premutation allele to estimate the risk for expansion. Unfortunately, the detection of AGG interruptions is hampered by technical difficulties. We demonstrated that single-molecule sequencing enables the determination of not only the repeat size, but also of the complete repeat sequence, including AGG interruptions in male and female alleles. This approach outperforms current strategies because it allows for an unambiguous separation of the normal allele from the expanded one. This permits the determination of the repeat structure for each allele in every male or female. Hence, we implemented SMRT sequencing as a diagnostic tool to identify AGG interruptions in females with a FMR1 premutation. By doing so, we improved the risk assessments for genetic counseling and positively impacted the management of the disorder. Except for diagnostic use, single-molecule sequencing will also facilitate large-scale studies assessing the influence of AGG interruptions on the stability of the CGG repeat. We performed already a proof-of-principle study to investigate the influence of AGG's on the stability of intermediate FMR1 CGG alleles (45-54 repeat units). SMRT sequencing was also explored to study the DMPK CTG repeat underlying DM1. Firstly, the variability of long CTG repeats was determined by small-pool PCR followed by long-read sequencing. This approach resulted in a higher accuracy, higher throughput and less hands-on time compared to Southern blots. Therefore, this methodology is now used to study the influence of the mismatch repair system on DM1 repeat instability. Undoubtedly, this approach will be implemented more broadly in the future. Besides, long-read sequencing was also used to assess the efficiency of CRISPR/CAS9 excision of the DMPK CTG repeat region. To our knowledge, this is the first study tackling the DMPK CTG repeat by single-molecule long-read sequencing. Ultimately, a targeted amplification-free enrichment method for the DMPK CTG repeat would remove the need for PCR completely and could further improve the analysis of this repeat. To conclude, SMRT sequencing is a powerful tool forging ahead STR research and diagnostics. In this thesis novel methodologies were developed to make maximal use of the advantages of SMRT sequencing (high accuracy, long reads & detection of base modifications). It will be interesting to see how novel methodologies employing long-read sequencing developed in this thesis and by other research groups will move the STR field ahead in the future.