Given the range of functions and fates that different cells in any organism must adopt over its lifetime, it is apparent that not all genes in the genome can be actively expressed in every cell at all times. As important as completion of the Human Genome Project has been for contributing to our understanding of human biology and disease, identifying the genomic sequences and features that direct developmental, spatial, and temporal aspects of gene expression remains a formidable challenge. Several decades of work in molecular biology have defined critical regulatory elements for many individual genes, as we saw in the previous section, and more recent attention has been directed toward per forming such studies on a genome-wide scale.
In Chapter 2 we introduced general aspects of chromatin that package the genome and its genes in all cells. Here, we explore the specific characteristics of chroma tin that are associated with active or repressed genes as a step toward identifying the regulatory code for expression of the human genome. Such studies focus on reversible changes in the chromatin landscape as determinants of gene function rather than on changes to the genome sequence itself and are thus called epigenetic or, when considered in the context of the entire genome, epigenomic (Greek epi-, over or upon).
The field of epigenetics is growing rapidly and is the study of heritable changes in cellular function or gene expression that can be transmitted from cell to cell (and even generation to generation), as a result of chromatin based molecular signals (Fig.1). Complex epigenetic states can be established, maintained, and transmitted by a variety of mechanisms: modifications to the DNA, such as DNA methylation; numerous histone modifications that alter chromatin packaging or access; and substitution of specialized histone variants that mark chromatin associated with particular sequences or regions in the genome. These chromatin changes can be highly dynamic and transient, capable of responding rapidly and sensitively to changing needs in the cell, or they can be long lasting, capable of being transmitted through multiple cell divisions or even to subsequent generations. In either instance, the key concept is that epigenetic mechanisms do not alter the underlying DNA sequence, and this distinguishes them from genetic mechanisms, which are sequence based. Together, the epigenetic marks and the DNA sequence make up the set of signals that guide the genome to express its genes at the right time, in the right place, and in the right amounts.

Fig1. Schematic representation of chromatin and three major epigenetic mechanisms: DNA methylation at CpG dinucleotides, associated with gene repression; various modifications (indicated by different colors) on histone tails, associated with either gene expression or repression; and various histone variants that mark specific regions of the genome, associated with specific functions required for chromosome stability or genome integrity. Not to scale.
DNA Methylation
DNA methylation involves the modification of cytosine bases by methylation of the carbon at the fifth position in the pyrimidine ring (Fig. 2). Extensive DNA methylation is a mark of repressed genes and is a wide spread mechanism associated with the establishment of specific programs of gene expression during cell differentiation and development. Typically, DNA methylation occurs on the C of CpG dinucleotides (see Fig. 1) and inhibits gene expression by recruitment of specific methyl-CpG–binding proteins that, in turn, recruit chromatin-modifying enzymes to silence transcription. The presence of 5-methylcytosine (5-mC) is considered to be a stable epigenetic mark that can be faithfully transmit ted through cell division; however, altered methylation states are frequently observed in cancer, with hypo methylation of large genomic segments or with regional hypermethylation (particularly at CpG islands) in others.

Fig2. The modified DNA bases, 5-methylcytosine and 5-hydroxymethylcytosine. The added methyl and hydroxymethyl groups are boxed in purple. The atoms in the pyrimidine rings are numbered 1–6 to indicate the 5-carbon.
Extensive demethylation occurs during germ cell development and in the early stages of embryonic development, consistent with the need to reset the chromatin environment and restore totipotency or pluripotency of the zygote and of various stem cell populations. Although the details are still incompletely understood, these reprogramming steps appear to involve the enzymatic conversion of 5-mC to 5-hydroxymethylcytosine (5-hmC) (see Fig. 2), as a likely intermediate in the demethylation of DNA. Overall, 5-mC levels are stable across adult tissues (~5% of all cytosines), whereas 5-hmC levels are much lower and much more variable (0.1–1% of all cytosines). Interestingly, although 5-hmC is widespread in the genome, its highest levels are found in known regulatory regions, suggesting a possible role in the regulation of specific promoters and enhancers.
Histone Modifications
A second class of epigenetic signals consists of an extensive inventory of modifications to any of the core his tone types: H2A, H2B, H3, and H4. Such modifications include histone methylation, phosphorylation, acetylation, and others at specific amino acid residues, mostly located on the N-terminal tails of histones that extend out from the core nucleosome itself (see Fig. 1). These epigenetic modifications are believed to influence gene expression by affecting chromatin compaction or accessibility and by signaling protein complexes that—depending on the nature of the signal—activate or silence gene expression at that site.
There are dozens of modified sites that can be experimentally queried genome-wide by using antibodies that recognize specifically modified sites—for example, his tone H3 methylated at lysine position 9 (H3K9 methylation, using the one-letter abbreviation K for lysine; see Table 1) or histone H3 acetylated at lysine position 27 (H3K27 acetylation). The former is a repressive mark associated with silent regions of the genome, whereas the latter is a mark for activating regulatory regions.

Table1. The Genetic Code
Histone Variants
The histone modifications just discussed involve modification of the core histones themselves, which are all encoded by multigene clusters in a few locations in the genome. In contrast, the many dozens of histone variants are products of entirely different genes located elsewhere in the genome, and their amino acid sequences are distinct from, although related to, those of the canonical histones.
Different histone variants are associated with dif erent functions, and they replace—all or in part—the related member of the core histones found in typical nucleosomes to generate specialized chromatin structures (see Fig. 1). Some variants mark specific regions or loci in the genome with highly specialized functions (e.g., the CENP-A histone is a histone H3-related variant that is found exclusively at functional centromeres in the genome and contributes to essential features of centromeric chromatin that mark the location of kinetochores along the chromosome fiber). Other variants are more transient and mark regions of the genome with particular attributes (e.g., H2A.X is a histone H2A variant involved in the response to DNA damage to mark regions of the genome that require DNA repair).
Chromatin Architecture
In contrast to the impression one gets from viewing the genome as a linear string of sequence, the genome adopts a highly ordered and dynamic arrangement within the space of the nucleus, correlated with and likely guided by the epigenetic and epigenomic signals just discussed. This three-dimensional (3D) land scape is highly predictive of the map of all expressed sequences in any given cell type (the transcriptome) and reflects dynamic changes in chromatin architecture at different levels (Fig. 3). First, large chromosomal domains (up to millions of base pairs in size) can exhibit coordinated patterns of gene expression at the chromo some level, involving dynamic interactions between different intrachromosomal and interchromosomal points of contact within the nucleus. At a finer level, technical advances to map and sequence points of contact around the genome in the context of 3D space have pointed to ordered loops of chromatin that position and orient genes precisely, exposing or blocking critical regulatory regions for access by RNA pol II, transcription factors, and other regulators. Lastly, specific and dynamic patterns of nucleosome positioning differ among cell types and tissues in the face of changing environmental and developmental cues (see Fig. 3). The biophysical, epigenomic, and/or genomic properties that facilitate or specify the orderly and dynamic pack aging of each chromosome during each cell cycle, without reducing the genome to a disordered tangle within the nucleus, remain a marvel of landscape engineering.

Fig3. Three-dimensional architecture and dynamic packaging of the genome, viewed at increasing levels of resolution. (A) Within interphase nuclei, each chromosome occupies a particular territory, represented by the different colors. (B) Chromatin is organized into large subchromosomal domains within each territory, with loops that bring certain sequences and genes into proximity with each other, with detectable intrachromosomal and interchromosomal interactions. (C) Loops bring long-range regulatory elements (e.g., enhancers or locus-control regions) into association with promoters, leading to active transcription and gene expression. (D) Positioning of nucleosomes along the chromatin fiber provides access to specific DNA sequences for binding by transcription factors and other regulatory proteins.
Topologically Associating Domains
Philipp Maass
Chromosomes are organized in the 3D space of the nucleus to fulfill gene regulation. Within this 3D genome organization, chromosomes are spatially segregated in A- and B-type genomic compartments that represent active (euchromatin = open chromatin) and inactive (heterochromatin = repressive chromatin) domains, respectively. Euchromatin is typically associated with higher gene density and early replication, while repressive chromatin—which tends to be transcriptionally silent—is densely packed and protects chromosome integrity. Genomic compartmentalization can comprise multiple subcompartments with specific histone marks, which contribute to the general organization of the genome in the nucleus by establishing subnuclear domains with various functions. For example, the nucleolus is the largest subnuclear compartment and forms at the site of hundreds of ribosomal genes of the five acrocentric chromosomes (13, 14, 15, 21, and 22) for the preassembly of the ribosomal subunits. Splicing or nuclear speckles are nuclear domains enriched for splicing machinery; Cajal bodies relate to mRNA processing, and promyelocytic leukemia (PML) bodies are involved in cell cycle processes and DNA repair. Collectively, these 3D organizational hubs in the nuclei regulate gene expression and posttranscriptional processing of RNAs, and facilitate tissue-specific gene regulation.
On the molecular level genomic compartments are subdivided into clusters of genomic interactions, termed topologically associating domains (TADs). They range from several hundred kilobase- to megabase-long genomic regions that interact with themselves. The organization of TADs between cell types and species is consistent. TADs are separated from one another by regions with less frequent interactions, termed TAD boundaries. A model called loop-extrusion can explain the formation of TADs as organizational structures at the sub chromosomal level. Loop-extrusion suggests bringing otherwise distal gene-regulatory elements (i.e., enhancers and silencers) into 3D proximity of target genes to regulate their expression. Loop extrusion forms the majority of TADs by two cohesin/condensin molecules sliding toward each other while extruding the DNA from in between, until two convergent CTCF (CCCTC binding factor) sites are recognized. CTCF is a highly conserved zinc finger protein, considered the master regulator of the genome because it acts as a transcriptional repressor to regulate the communication between gene-regulatory elements and genes. TAD boundaries often show enrichment of CTCF, cohesion, and media tor proteins, which contribute to the 3D topology of the genome by establishing chromatin loops and the TAD structure. Weaker TAD boundaries may allow for inter actions between different TADs (inter-TAD) to regulate genes at larger genomic distances; however, this occurs less frequently than interactions within the same TAD (intra-TAD). Most gene-regulatory processes occur by intra-TAD chromatin loops; an enhancer with bound transcription factors reaches spatial proximity with the target gene’s promoter site and its core transcriptional machinery (i.e., polymerase II), just upstream of its transcription start site. The analysis of intra-TAD interactions in different cell types and single cell studies showed high variability, indicating that tissue-specific gene expression can be partially explained by specific cells and by genomic contacts within TADs.
The reorganization of the TAD architecture by chromosomal rearrangements can alter gene expression and may cause clinically apparent disease phenotypes (see Chapter 6). Disrupting higher-order chromatin features and gene-regulatory elements may affect transcriptional programs and development. For example, genomic deletions can lead to TADs fusing; duplications may form neo-TADs; inversions reshuffle TADs; and translocations could alter interchromosomal contacts between nonhomologous chromosomes (Fig. 4).

Fig4. Topologically associating domains (TADs). The genome is organized in the three-dimensional architecture of the nucleus, with A and B compartments representing open chromatin and heterochromatin, respectively. Further functional nuclear subdomains are the nucleolus, splicing speckles, and promyelocytic leukemia (PML) bodies. The next organization level of chromatin involves TADs that are formed by the loop-extrusion model, with CCCTC-binding factor (CTCF) and cohesion to facilitate spatial proximity of gene-regulatory elements (enhancers), and gene promoters to regulate gene expression. (Courtesy Philipp Maass.)
الاكثر قراءة في مواضيع عامة في الاحياء الجزيئي
اخر الاخبار
اخبار العتبة العباسية المقدسة