A
gene is a locatable region of
genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions. The physical
development and
phenotype of organisms can be thought of as a product of genes interacting with each other and with the environment, and genes can be considered as units of
inheritance. A concise definition of gene taking into account complex patterns of regulation and transcription, genic conservation and non-coding RNA genes, has been proposed by Gerstein et al. "A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products".
In cells, genes consist of a long strand of
DNA that contains a
promoter, which controls the activity of a gene, and a coding sequence, which determines what the gene produces. When a gene is active, the coding sequence is copied in a process called
transcription, producing an RNA copy of the gene's information. This RNA can then direct the synthesis of proteins via the
genetic code. However, RNAs can also be used directly, for example as part of the
ribosome. These molecules resulting from gene expression, whether
RNA or
protein, are known as
gene products.
Most genes contain non-coding regions that don't code for the gene products, but
regulate gene expression. The genes of
eukaryotic organisms can contain non-coding regions called
introns that are removed from the messenger RNA in a process known as
splicing. The regions that actually encode the gene product, which can be much smaller than the
introns, are known as
exons. One single gene can lead to the synthesis of multiple proteins through the different arrangements of exons produced by alternative splicings.
The total complement of genes in an organism or cell is known as its
genome. The
genome size of an organism is generally lower in
prokaryotes such as
bacteria and
archaea have generally smaller genomes, both in number of
base pairs and number of genes, than even single-celled
eukaryotes, although there's no clear relationship between genome sizes and perceived complexity of eukaryotic organisms. One of the largest known genomes belongs to the single-celled
amoeba Amoeba dubia, with over 670 billion base pairs, some 200 times larger than the human genome. The estimated number of genes in the
human genome has been repeatedly revised downward since the completion of the
Human Genome Project; current estimates place the human genome at just under 3 billion base pairs and about 20,000–25,000 genes.. A recent
Science article gives a final number of 20,488, with perhaps 100 more yet to be discovered . The gene density of a genome is a measure of the number of genes per million base pairs (called a megabase, Mb); prokaryotic genomes have much higher gene densities than eukaryotes. The gene density of the human genome is roughly 12–15 genes/Mb.
History
The existence of genes was first suggested by
Gregor Mendel (1822-1884), who, in the
1860s, studied inheritance in
pea plants and
hypothesized a factor that conveys traits from parent to offspring. He spent over 10 years of his life on one experiment. Although he didn't use the term
gene, he explained his results in terms of inherited characteristics. Mendel was also the first to hypothesize
independent assortment, the distinction between
dominant and
recessive traits, the distinction between a
heterozygote and
homozygote, and the difference between what would later be described as
genotype and
phenotype. Mendel's concept was given a name by
Hugo de Vries in 1889, who, at that time probably unaware of Mendel's work, in his book
Intracellular Pangenesis coined the term "pangen" for "the smallest particle
[representing] one hereditary characteristic"
Richard J. Roberts and
Phillip Sharp discovered in 1977 that genes can be split into segments. This leads to the idea that one gene can make several proteins. Recently (as of
2003-
2006),
biological results let the notion of gene appear more slippery. In particular, genes don't seem to sit side by side on
DNA like discrete beads. Instead,
regions of the DNA producing distinct proteins may overlap, so that the idea emerges that "genes are one long
continuum". while the related word
genetics was first used by
William Bateson in
1905., itself a derivative of the word
pangenesis coined by
Darwin (1868). The word pangenesis is made from the
Greek words
pan (a prefix meaning "whole", "encompassing") and
genesis ("birth") or
genos ("origin").
According to the theory of Mendelian inheritance, variations in
phenotype - the observable physical and behavioral characteristics of an organism - are due to variations in
genotype, or the organism's particular set of genes, each of which specifies a particular trait. Different genes for the same trait, which give rise to different phenotypes, are known as
alleles. Organisms such as the pea plants Mendel worked on, along with many plants and animals, have two alleles for each trait, one inherited from each parent. Alleles may be
dominant or
recessive; dominant alleles give rise to their corresponding phenotypes when paired with any other allele for the same trait, while recessive alleles give rise to their corresponding phenotype only when paired with another copy of the same allele. For example, if the allele specifying tall stems in pea plants is dominant over the allele specifying short stems, then pea plants that inherit one tall allele from one parent and one short allele from the other parent will also have tall stems. Mendel's work found that alleles assort independently in the production of
gametes, or
germ cells, ensuring variation in the next generation.
Prior to Mendel's work, the dominant theory of heredity was one of
blending inheritance, which proposes that the traits of the parents blend or mix in a smooth, continuous gradient in the offspring. Although Mendel's work was largely unrecognized after its first publication in 1866, it was rediscovered in 1900 by three European scientists,
Hugo de Vries,
Carl Correns, and
Erich von Tschermak, who had reached similar conclusions from their own research. However, these scientists were not yet aware of the identity of the 'discrete units' on which genetic material resides.
A series of subsequent discoveries led to the realization decades later that
chromosomes within
cells are the carriers of genetic material, and that they're made of
DNA (deoxyribonucleic acid), a
polymeric molecule found in all cells on which the 'discrete units' of Mendelian inheritance are encoded. The modern study of
genetics at the level of DNA is known as
molecular genetics and the synthesis of molecular genetics with traditional
Darwinian evolution is known as the
modern evolutionary synthesis.
Physical definitions
The vast majority of living organisms encode their genes in long strands of
DNA. DNA consists of a chain made from four types of
nucleotide subunits:
adenosine,
cytidine,
guanosine, and
thymidine. Each nucleotide subunit consists of three components: a
phosphate group, a
deoxyribose sugar ring, and a
nucleobase. Thus, nucleotides in DNA or RNA are typically called 'bases'; consequently they're commonly referred to simply by their
purine or
pyrimidine original base components adenine, cytosine, guanine, thymine. Adenine and guanine are purines and cytosine and thymine are pyrimidines. The most common form of DNA in a cell is in a
double helix structure, in which two individual DNA strands twist around each other in a right-handed spiral. In this structure, the
base pairing rules specify that
guanine pairs with
cytosine and
adenine pairs with
thymine (each pair contains one purine and one pyrimidine). The base pairing between guanine and cytosine forms three hydrogen bonds, while the base pairing between adenine and thymine forms two hydrogen bonds. The two strands in a double helix must therefore be
complementary, that is, their bases must align such that the adenines of one strand are paired with the thymines of the other strand, and so on.
Due to the chemical composition of the pentose residues of the bases, DNA strands have directionality. One end of a DNA polymer contains an exposed
hydroxyl group on the
deoxyribose, this is known as the
3' end of the molecule. The other end contains an exposed
phosphate group, this is the
5' end. The directionality of DNA is vitally important to many cellular processes, since double helices are necessarily directional (a strand running 5'-3' pairs with a complementary strand running 3'-5') and processes such as
DNA replication occur in only one direction. All nucleic acid synthesis in a cell occurs in the 5'-3' direction, because new monomers are added via a
dehydration reaction that uses the exposed 3' hydroxyl as a
nucleophile.
The
expression of genes encoded in DNA begins by
transcribing the gene into
RNA, a second type of
nucleic acid that's very similar to DNA, but whose monomers contain the sugar
ribose rather than
deoxyribose. RNA also contains the base
uracil in place of
thymine. RNA molecules are less stable than DNA and are typically single-stranded. Genes that encode
proteins are composed of a series of three-
nucleotide sequences called
codons, which serve as the "words" in the genetic "language". The
genetic code specifies the correspondence during
protein translation between codons and
amino acids. The genetic code is nearly the same for all known organisms.
RNA genes
In most cases,
RNA is an intermediate product in the process of manufacturing proteins from genes. However, for some gene sequences, the RNA molecules are the actual functional products. For example, RNAs known as
ribozymes are capable of
enzymatic function, and
miRNAs have a regulatory role. The
DNA sequences from which such RNAs are transcribed are known as
non-coding DNA, or
RNA genes.
Some
viruses store their entire genomes in the form of RNA, and contain no DNA at all. Because they use RNA to store genes, their
cellular hosts may synthesize their proteins as soon as they're
infected and without the delay in waiting for transcription. On the other hand, RNA
retroviruses, such as
HIV, require the
reverse transcription of their
genome from RNA into DNA before their proteins can be synthesized.
In 2006, French researchers came across a puzzling example of RNA-mediated inheritance in mouse. Mice with a
loss-of-function mutation in the gene Kit have white tails. Offspring of these mutants can have white tails despite having only normal Kit genes. The research team traced this effect back to mutated Kit RNA. While RNA is common as genetic storage material in viruses, in mammals in particular RNA inheritance has been observed very rarely.
Functional structure of a gene
All genes have regulatory regions in addition to regions that explicitly code for a protein or RNA product. A universal regulatory region shared by all genes is known as the
promoter, which provides a position that's recognized by the transcription machinery when a gene is about to be transcribed and expressed. Although promoter regions have a
consensus sequence that's the most common sequence at this position, some genes have "strong" promoters that bind the transcription machinery well, and others have "weak" promoters that bind poorly. These weak promoters usually permit a lower rate of transcription than the strong promoters, because the transcription machinery binds to them and initiates transcription less frequently. Other possible regulatory regions include
enhancers, which can compensate for a weak promoter. Most regulatory regions are "upstream" — that is, before or toward the 5' end of the transcription initiation site.
Eukaryotic promoter regions are much more complex and difficult to identify than
prokaryotic promoters.
Many prokaryotic genes are organized into
operons, or groups of genes whose products have related functions and which are transcribed as a unit. By contrast,
eukaryotic genes are transcribed only one at a time, but may include long stretches of DNA called
introns which are transcribed but never translated into protein (they are spliced out before translation).
Chromosomes
The total complement of genes in an organism or cell is known as its
genome, which may be stored on one or more
chromosomes; the region of the chromosome at which a particular gene is located is called its
locus. A chromosome consists of a single, very long DNA helix on which thousands of genes are encoded.
Prokaryotes -
bacteria and
archaea - typically store their genomes on a single large, circular chromosome, sometimes supplemented by additional small circles of DNA called
plasmids, which usually encode only a few genes and are easily transferable between individuals. For example, the genes for
antibiotic resistance are usually encoded on bacterial plasmids and can be passed between individual cells, even those of different species, via
horizontal gene transfer.
Although some simple eukaryotes also possess plasmids with small numbers of genes, the majority of eukaryotic genes are stored on multiple linear chromosomes, which are packed within the
nucleus in complex with storage proteins called
histones. The manner in which DNA is stored on the histone, as well as chemical modifications of the histone itself, are regulatory mechanisms governing whether a particular region of DNA is accessible for
gene expression. The ends of eukaryotic chromosomes are capped by long stretches of repetitive sequences called
telomeres, which don't code for any gene product but are present to prevent degradation of coding and regulatory regions during
DNA replication. The length of the telomeres tends to decrease each time the genome is replicated in preparation for cell division; the loss of telomeres has been proposed as an explanation for cellular
senescence, or the loss of the ability to divide, and by extension for the
aging process in organisms.
While the chromosomes of prokaryotes are relatively gene-dense, those of eukaryotes often contain so-called "
junk DNA", or regions of DNA that serve no obvious function. Simple single-celled eukaryotes have relatively small amounts of such DNA, while the genomes of complex multicellular organisms, including humans, contain an absolute majority of DNA without an identified function. Computational gene finding methods are still significantly more reliable than earlier techniques that required mapping the locations of specific mutations that gave rise to distinguishable alleles. Moreover, the genes are often fragmented internally by non-coding sequences called
introns, which can be many times longer than the coding sequence but are
spliced during
post-transcriptional modification of pre-
mRNA.
Genetic and genomic nomenclature
Gene nomenclature has been established by the
HUGO Gene Nomenclature Committee (HGNC) for each known human gene in the form of an approved gene name and
symbol (short-form
abbreviation). All approved symbols are stored in the
HGNC Database
. Each symbol is unique and each gene is only given one approved gene symbol. It is necessary to provide a unique symbol for each gene so that people can talk about them. This also facilitates
electronic data retrieval from publications. In preference each symbol maintains parallel construction in different members of a
gene family and can be used in other
species, especially the
mouse.
Evolutionary concept of a gene
George C. Williams first explicitly advocated the
gene-centric view of evolution in his 1966 book
Adaptation and Natural Selection. He proposed an evolutionary concept of gene to be used when we're talking about
natural selection favoring some genes. The definition is: "that which segregates and recombines with appreciable frequency." According to this definition, even an
asexual genome could be considered a gene, insofar it have an appreciable permanency through many generations.
The difference is: the molecular gene
transcribes as a unit, and the evolutionary gene
inherits as a unit.
Richard Dawkins'
The Selfish Gene and
The Extended Phenotype defended the idea that the gene is the only
replicator in living systems. This means that only genes transmit their structure largely intact and are potentially immortal in the form of copies. So, genes should be the
unit of selection. In
The Selfish Gene Dawkins attempts to redefine the word 'gene' to mean "an inheritable unit" instead of the generally accepted definition of "a section of DNA coding for a particular protein". In
River Out of Eden, Dawkins further refined the idea of gene-centric selection by describing life as a river of compatible genes flowing through
geological time. Scoop up a bucket of genes from the river of genes, and we've an
organism serving as temporary bodies or
survival machines. A river of genes may fork into two branches representing two non-
interbreeding species as a result of geographical separation.
The gene concept is still changing
The concept of the gene has changed considerably (see
history section). Originally considered a "unit of inheritance" to a usually
DNA-based unit that can exert its effects on the organism through
RNA or
protein products. It was also previously believed that one gene makes one protein; this concept has been overthrown by the discovery of
alternative splicing and
trans-splicing. In plants, cases of traits reappearing after several generation of absence have lead researchers to hypothesise RNA-directed overwriting of genomic DNA. Evidence is also accumulating that the
control regions of a gene don't necessarily have to be close to the
coding sequence on the linear molecule or even on the same chromosome. Spilianakis and colleagues discovered that the
promoter region of the
interferon-gamma gene on chromosome 10 and the regulatory regions of the T(H)2
cytokine locus on chromosome 11 come into close proximity in the
nucleus possibly to be jointly regulated.
The concept that genes are clearly limited is also being eroded. There is evidence for fused proteins stemming from two adjacent genes that can produce two separate protein products. While it isn't clear whether these fusion proteins are functional, the phenomena is more frequent than previously thought. Even more ground-breaking than the discovery of fused genes is the observation that some proteins can be composed of
exons from far away regions and even different chromosomes. This new definition categorizes genes by functional products, whether they be proteins or RNA, rather than specific DNA loci; all regulatory elements of DNA are therefore classified as
gene-associated regions.
External results
Click here for more details on Gene
|
External Link Exchanges
Do you know how hard it is to get a link from a large encyclopaedia? Well we're different and will prove it. To get a link from us just add the following HTML to your site on a relevant page:
<a href="http://gene.totallyexplained.com">Gene Totally Explained</a>
Then simply click through this link from your web page. Our crawlers will verify your link, extract the title of your web page and instantly add a link back to it. If you like you can remove the words Totally Explained and embed the link in article text.
As long as your link remains in place, we'll keep our link to you right here. Please play fair - our crawlers are watching. Your site must be closely related to this one's topic. Any kind of spamming, dubious practises or removing the link will result in your link from us being dropped and, potentially, your whole site being banned. |
We see you're using Internet Explorer. Try Firefox, we think you'll like it better.
· Firefox blocks pop-up windows.
· It stops viruses and spyware.
· It keeps Microsoft from controlling the future of the internet.
Click the button on the right to download Firefox. It's free.