Relevance of Non Coding DNA JunkDNA and Gene Deserts

The idea that the amount of DNA per chromosome set might be constant for all cells within individuals appeared more than a century ago. In 1948, Vendrely and Vendrely could confirm this assumption, and they defined the "C-value," the nuclear DNA content per cell, in all the individuals within a given species. These observations provided the first clue that DNA rather than protein is the heritable material.

Chromosome Territory, CT

Interchromatin Domain Compartment, ICD

Chromosome Territory, CT

Interchromatin Domain Compartment, ICD

Fig. 1 S/MARs in the framework of the CT-ICD model. The eukaryotic genome is organized into chromatin domains, each of which is delimited by an extended "constitutive" S/MAR, i.e. an element that is permanently attached to components of the nuclear matrix; the matrix itself fills major parts of the ICD compartment. The assembly of activating/remodeling factors at short domain-internal scaffold/matrix attachment regions accompanies gene activation. Thereby these "facultative" S/MARs mediate the factor-induced (reversible) association with the matrix. SIDD profiles can efficiently assist the classification of these elements (reviewed by Winkelmann et al. 2006): while a constitutive S/MAR consists of a series of evenly spaced "unpairing elements" (UEs; minima in the SIDD profile), which together form a "base-unpairing region" (BUR), the "facultative" class mostly consists of 200-300-bp-long strongly destabilized individual UEs that are separated by >500 bp (see text and Bode et al. 2006). The insert exemplifies the latter situation: for the human interferon-P gene domain, all UEs (i.e. the four pronounced minima) coincide with DNAse I hypersensitive sites and do have regulatory potential. Two of these elements associate, each with a molecule of YY1/YY2 (small elliptic bodies), which in turn recruit a histone-acetyltransferase molecule (extended ellipse) to support activation of the inducible promoter (Klar and Bode 2005). The outline of this figure follows discussions with Thomas Werner (Genomatix Munich) and comprises the concepts by Bode and colleagues (2003a,b)

Soon it was found, however, that genome sizes vary enormously among eukaryotes and that size bears no relationship to the presumed number of genes (the so-called "C-value paradox"; Thomas 1971): while one copy of a human's genome contains about 3.5 pg of DNA packaged into 23 chromosomes, the 5.8-pg equivalent of an aardvark genome is contained in only ten chromosomes and the 140 pg in the genome of some salamanders in only 12 chromosomes. Triggered by the question whether eukaryotes evolved large genomes simply because they can tolerate useless DNA or because they need them for organization or function, the view that transcriptional regulation operates at the level of individual genes had to be continuously extended.

- (transient) PARP activation by BUR-association

- Localized modification of histones then ^

- PARP inactivation (dissociation) by automodification

Lamins

- (transient) PARP activation by BUR-association

- Localized modification of histones then ^

- PARP inactivation (dissociation) by automodification

Break recognition by PARP i- Release from lamins

- Auto-modification

- Withdrawal of histones

- Capture/ inactivation of DNA methyltransferase

- Assembly of repair complex

PARP-hyperactivation by DNA strand breaks

Auto-poly (ADP-ribosyl)action: Release from lamins Recruitment of repair complex to BUR - no BUR-association of automodified PARP

Fig. 2 Functional states of PARP-1. Inactive forms have been drawn in black, and increasing activity is indicated in grey or light-grey colours, respectively. The insert (SIDD profile, a analyses the PARP promoter for the presence of S/MAR-like elements, mediating the gene's autoregulation (Soldatenkov, 2002). Under certain circumstances, PARP association with an S/MAR may induce activity and enable PARP to transactivate certain genes due to a variety of domain-opening functions (Lonskaya et al. 2006); the figure (b) comprises both constitutive elements and a facultative S/MAR element as defined in Fig. 1. PARP-1 has also been shown to be a component of the multiprotein DNA replication complex (MRC); it poly(ADP-ribosyl)ates 15 of the -40 MRC proteins, including DNA pol a, topo I and PCNA. Note the DNAse I hypersensitive site (HS), which is a frequent concomitant of constitutive domain borders (Sect. 3)

Break recognition by PARP i- Release from lamins

- Auto-modification

- Withdrawal of histones

- Capture/ inactivation of DNA methyltransferase

- Assembly of repair complex

PARP-hyperactivation by DNA strand breaks

Auto-poly (ADP-ribosyl)action: Release from lamins Recruitment of repair complex to BUR - no BUR-association of automodified PARP

Lamins

Fig. 2 Functional states of PARP-1. Inactive forms have been drawn in black, and increasing activity is indicated in grey or light-grey colours, respectively. The insert (SIDD profile, a analyses the PARP promoter for the presence of S/MAR-like elements, mediating the gene's autoregulation (Soldatenkov, 2002). Under certain circumstances, PARP association with an S/MAR may induce activity and enable PARP to transactivate certain genes due to a variety of domain-opening functions (Lonskaya et al. 2006); the figure (b) comprises both constitutive elements and a facultative S/MAR element as defined in Fig. 1. PARP-1 has also been shown to be a component of the multiprotein DNA replication complex (MRC); it poly(ADP-ribosyl)ates 15 of the -40 MRC proteins, including DNA pol a, topo I and PCNA. Note the DNAse I hypersensitive site (HS), which is a frequent concomitant of constitutive domain borders (Sect. 3)

To account for these developments, we tend not to talk about "genes" anymore, but prefer the terms "chromatin domains" or, more generally, "transcriptional units" for any autonomous regulatory entity in the genome.

For decades, geneticists have focused on just those 2% of mammalian DNA that contain blueprints for proteins, while the remainder was sometimes dismissed as "junk". Actual scans of the mouse genome led to estimates according to which there are between 70,000 and 100,000 transcriptional units, half of which are non-coding. The discovery of these "hidden genes", which work through RNA rather than protein, has initiated a re-thinking, the more so as active forms of RNA are now known to provide an additional level of regulation. Nowadays, long genomic regions without any obvious biological function are referred to as "gene deserts" (Venter et al. 2001). Again, for some of these deserts regulatory sequences could be localized that exert control functions over large distances (Nobrega et al. 2003). Many of these units have particular evolutionary histories and sequence signatures that make them distinct from the rest of the genome. Other gene-sparse regions, however, may in fact be nonessential to genome function, since they could be deleted without significant phenotypic effects (Nobrega et al. 2004). Information of this kind will be essential for researchers looking for mutations causing disease, because it highlights large areas of the genome that are unlikely to be involved in such a process.

In the context of such a classification, it appears rewarding to consider the genomic distribution of retroelements. Retroelements are involved in shaping the genome and have guided its evolution, extension and organization. Besides endogenous retroviruses, there are populations of truncated retroelements, such as the long, interspersed nuclear elements (LINEs), which constitute about 5% of the total human genome and may encode a functional reverse transcriptase. The short, interspersed nuclear elements (SINEs) represent an even larger proportion, but have many deletions. SINEs can modulate gene expression by movement, amplification and re-insertion into genes and regulatory sequences, but to do so they have to depend on reverse transcriptase from other sources. The human prototype SINE, the Alu repeat, is roughly 300 nucleotides in length. An RNA polymerase III start site is located within some repeats that can direct transcription in response to viral infections or the exposure to carcinogens. This expression may facilitate recombination with other Alus or their flanking regions, and this may be one reason for the fact that Alu sequences are frequent concomitants of chromosomal breakpoints. While LINEs tend to be found in AT-rich DNA, characteristic of intergenic regions, SINEs, and Alus in particular, are more often located in GC-rich regions, where genes tend to reside. This location does not seem to be a function of insertion site preference, but rather appears to be due to differential retention principles. In this respect, it is of note that SINEs participate in the transcriptional regulation of certain genes, suggesting a continuous selection against their random accumulation.

During evolution, retrotransposons have steadily screened mammalian genomes for the most attractive integration sites. For a deeper understanding, we have to consider the nature of these preferred sites. Our studies have clearly demonstrated that, without an exception, all provirus integration sites are associated with a S/MAR (Goetze et al. 2003b; Johnson and Levy 2005 and references therein). The integration of retrotransposons may obey the same rules that govern retroviral integration, and therefore the location of these elements may simply represent a marker for the presence of S/MARs. How then are S/MARs distributed over the genome? A study by Glazko et al. (2003) has localized numerous homologous intergenic tracts (HITs) of largely unknown function in orthologous human-mouse genomic regions. Fifty percent of the hits could be correlated with predicted S/MARs, which suggests that these conserved elements have probably retained their function during the 80100 million years since the radiation from their common ancestor. The other half of predicted S/MARs turned out to be non-conserved. This group is hence likely to be species-specific and might mediate unique functions. Interestingly, an excess of orthologous S/MARs was observed in spacers between divergently transcribed genes, while there were no conserved S/MARs located between convergent genes. This distribution suggests that the conserved elements are primarily involved in the regulation (augmentation) of transcription initiation, which would be in accord with a study from this laboratory (Schuebeler et al.1996).

0 0

Post a comment