An analysis of GC-content in vertebrate protein coding genes and its implications on nuclear export, recombination, transcription, and the birth of new genes
dc.contributor.advisor | Palazzo, Alexander F | |
dc.contributor.author | Kang, Yoon Mo | |
dc.contributor.department | Biochemistry | |
dc.date | 2024-11 | |
dc.date.accepted | 2024-11 | |
dc.date.accessioned | 2024-11-13T19:26:46Z | |
dc.date.available | 2024-11-13T19:26:46Z | |
dc.date.convocation | 2024-11 | |
dc.date.issued | 2024-11 | |
dc.description.abstract | The nucleus plays a critical role in segregating transcription from translation, ensuringthe quality control of gene expression. This compartmentalization allows mature mRNAs to be exported from the nucleus while undesired transcripts and misprocessed RNAs are retained and degraded. While intron splicing is known to recruit nuclear export factors, we and others have observed that high GC-content at the 5' ends of transcripts (~62%) also seems to direct efficient nuclear export. While the selective advantage of efficient nuclear export likely contributes to the elevation of GC-content at the 5' ends of protein-coding genes, our investigation suggests the involvement of non-adaptive forces as well. One of these forces is meiotic recombination. Due to the inherent GC-bias of the mismatch repair pathway, GC-alleles in heteroduplexes formed during meiotic recombination are more likely to spread than AT-alleles in a process called GC-biased gene conversion. When I compared the GC-contents of genes undergoing frequent recombination to genes with low rates of recombination, I observed that high GC-content positively correlates with higher rates of recombination. Furthermore, genes that do not undergo recombination on the hemizygous sex chromosomes (Y or W chromosomes) have a significant reduction in GC-content. However the 5’ peak in GC-content in these genes is not completely abolished, implicating the influence of other evolutionary forces other than selection and current recombination. Unexpectedly, measurements of substitution rates in primates and rodents revealed that the 5’ GC-peaks are currently in a state of decay. Comparisons with changes in GC-content in GC-matched intergenic sequences revealed that this decay is comparable to the decay rate expected by neutral evolution. My data, along with recently published work from other groups, suggest that the 5’ GC-peaks are decaying back to the genomic equilibrium of ~41% because recombination is currently directed away from transcription start sites in primates and rodents by the protein PRDM9. In support of this theory is the fact that in the canid lineage where PRDM9 became a pseudogene and recombination still occurs at transcriptional start sites, GC-content continues to increase at the 5’ end. Other processes such as transcription may also influence the way GC-content evolves. We observe positive correlations between expression levels of human protein coding genes (obtained by ordering FPKM values from ProteinAtlas), its RNA polymerase II occupancy (obtained through peak calling RNA polymerase II chip-seq data), and their GC-content. I suspect high GC-content at the 5’ ends of protein coding genes evolved largely due to recombination and transcription, and that the cell was able to co-opt this feature for functional purposes. High GC-content may be used as an mRNA identity feature for nuclear export. This is evident as when we depleted TPR, a component of the splicing-independent nuclear export pathway, we observed that mRNA transcripts whose nuclear export was affected by this depletion tended to be more GC-rich than unaffected mRNAs. GC-content may also be used to define exonic and intronic boundaries. While comparing introns that frequently experience premature cleavage and intronic polyadenylation, we noticed that frequently misprocessed introns on average had higher GC-content than introns that are generally properly processed. Additionally, we noticed that low GC-content may work to signal for transcription termination. | |
dc.description.degree | M.Sc. | |
dc.identifier.uri | http://hdl.handle.net/1807/141310 | |
dc.subject | GC-content | |
dc.subject | Nuclear Export | |
dc.subject | Recombination | |
dc.subject | Splicing | |
dc.subject | Transcription | |
dc.subject.classification | 0487 | |
dc.title | An analysis of GC-content in vertebrate protein coding genes and its implications on nuclear export, recombination, transcription, and the birth of new genes | |
dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Kang_Yoon_Mo_202411_MSc_thesis.pdf
- Size:
- 3.45 MB
- Format:
- Adobe Portable Document Format