Clastify logo
Clastify logo
Exam prep
Exemplars
Review
HOT
Back to D: Change

D3.2: Inheritance

Master IB Biology D3.2: Inheritance with notes created by examiners and strictly aligned with the syllabus.

IB Syllabus Requirements for Inheritance

D3.2.1 Production of haploid gametes in parents and their fusion to form a diploid zygote as the means of inheritance

D3.2.2 Methods for conducting genetic crosses in flowering plants

D3.2.3 Genotype as the combination of alleles inherited by an organism

D3.2.4 Phenotype as the observable traits of an organism resulting from genotype and environmental factors

Sexual inheritance depends on halving, then restoring, chromosome number

A gamete is a reproductive cell that carries one set of chromosomes and can fuse with another gamete during fertilization. In animals, sperm and eggs are gametes. In flowering plants, male gametes are carried in pollen, while female gametes are in ovules within the ovary.

A haploid cell has one chromosome of each homologous type. A diploid cell has two chromosomes of each homologous type. In a sexual life cycle, meiosis makes haploid gametes, and fertilization fuses two haploid nuclei to form a diploid zygote, which is a cell produced by fusion of gametes that can develop into a new organism.

Image

You see this pattern across eukaryotes with a sexual life cycle. Mosses, mammals and flowering plants differ in the details, but the logic stays the same: meiosis halves chromosome number, fertilization restores it. In humans, for example, body cells are diploid and gametes are haploid.

For autosomal genes, a diploid cell has two copies because it has two homologous autosomes: one inherited from each parent. An autosomal gene is a gene located on a non-sex chromosome. The two copies may be identical versions of the gene, or they may be different versions; that distinction matters once we start using alleles.

Carrying out a controlled cross

A genetic cross is a planned mating between organisms chosen for particular traits, used to investigate inheritance patterns or to breed useful combinations of traits. Flowering plants work especially well for this because the investigator can control where the pollen goes.

In a flowering plant, pollen is a structure containing the male gametes. The female gametes are found inside ovules in the ovary. To carry out a cross, pollen from the chosen male parent is placed on the stigma of the chosen female parent. The pollen germinates, a pollen tube grows down the style, and male gametes are delivered to the ovary, where fertilization can occur.

Image

For the cross to stay controlled, unwanted pollen has to be kept out. Usually, this involves removing immature anthers from the flower used as the female parent before they release pollen, then covering the flower with a bag so insects or wind cannot bring in pollen from another source. The chosen pollen can then be applied to the stigma using a small brush or anther.

Generations and Punnett grids

The P generation is the parental generation used at the start of a genetic cross. The F1 generation is the first generation of offspring from the P generation. The F2 generation is the generation produced when F1 individuals reproduce with each other or self-fertilize.

A Punnett grid is a table used to predict possible offspring genotypes by combining the gametes from each parent. It isn’t a magic square; it’s just a neat way to keep track of which gametes can meet.

Image

Plants such as peas produce both male and female gametes on the same plant, so they can self-pollinate and therefore self-fertilize. That makes them useful for producing true-breeding lines or F2 offspring. Controlled genetic crosses are not just classroom history: they are widely used to develop crop varieties and ornamental plants with desirable traits, such as flower colour, disease resistance or growth form.

Genes, alleles and genotype

A gene is a length of DNA whose base sequence helps produce a functional product, usually a polypeptide or RNA. An allele is one version of a gene, different from another version at the same locus. Don’t treat the two words as interchangeable: the gene is the DNA region; alleles are the alternative versions of that region.

A genotype is the combination of alleles an organism inherits for one or more genes. For a gene with alleles D and d, a diploid individual could have genotype DD, Dd or dd. Each parent contributes one allele through their gametes.

A homozygous organism is an organism with two identical alleles of a gene, such as DD or dd. A heterozygous organism is an organism with two different alleles of a gene, such as Dd. Since gametes are haploid, a homozygous parent makes gametes carrying only one allele type for that gene, while a heterozygous parent makes gametes carrying one allele or the other.

Image

Allele symbols are just shorthand, not the trait itself. A capital letter often represents a dominant allele and a lower-case letter the corresponding recessive allele, but the letters are chosen by convention. Biologically, what matters is the DNA sequence and how it affects the product of the gene.

Phenotype is what can be detected

A phenotype is any observable or measurable trait of an organism, produced by its genotype and environmental influences. “Observable” doesn’t just mean visible to the eye. Blood group, enzyme activity and ability to distinguish colours are phenotypes because tests can detect them.

Some human traits depend mainly on genotype. ABO blood group is a clear example: your ABO phenotype is determined by alleles at one gene and does not change because you practise or because the weather changes.

Other traits come from environment only. A scar from an injury, a tattoo, or the particular language a person learns as a child is not inherited as a DNA sequence. These traits can matter, but they are not passed on through gametes.

Many traits involve genotype interacting with environment. Human height is influenced by many genes, but also by nutrition and health during growth. Human skin pigmentation has a genetic basis, but exposure to sunlight can increase melanin production. This interaction is where many real biological traits sit, so don’t use the lazy answer “it is genetic” unless the evidence really supports that.

Human trait examples grouped by how genotype and environment contribute to phenotype.

Phenotype categoryHuman examplesGenotype contributionEnvironmental contributionInheritance note
Mainly genotypeABO blood groupAlleles at one gene determine the phenotypeLittle to none for normal ABO typeAlleles are inherited through gametes
Mainly environmentScar, tattoo, learned languageDoes not require a specific inherited alleleInjury, choice, or upbringing produces the traitTrait itself is not inherited as DNA
Genotype + environmentHeightMany genes influence growth potentialNutrition and health affect final heightGenetic potential is inherited; exact height is not fixed
Genotype + environmentSkin pigmentationGenes influence baseline melanin levelSunlight can increase melanin productionBaseline tendency is inherited; tanning depends on exposure

Why heterozygotes can resemble homozygous dominants

A dominant allele is an allele that determines the phenotype in a heterozygote. A recessive allele is an allele whose phenotypic effect is masked in a heterozygote by a dominant allele of the same gene.

With a simple dominant-recessive pattern, the homozygous dominant genotype and the heterozygous genotype give the same phenotype. For example, if T is dominant to t, both TT and Tt show the dominant phenotype, while tt shows the recessive phenotype.

Image

At the molecular level, this often comes down to the gene product. Many genes code for polypeptides. If a mutation in a recessive allele makes a non-functional enzyme, a heterozygote may still produce enough functional enzyme from the dominant allele to show the normal phenotype. So TT and Tt look the same: one working copy is enough. The recessive phenotype appears only when both alleles fail to provide enough functional product.

This explanation is common, but it isn't a universal law. Some alleles are dominant because they make a harmful active product, or because half the normal amount of product is not enough. For many school-level examples, though, “one functional copy makes enough product” explains why carriers do not show the recessive phenotype.

A monohybrid cross is a genetic cross that follows one gene. In a typical cross between two heterozygotes, each parent produces two gamete types, and a Punnett grid predicts a 1:2:1 genotypic ratio and a 3:1 phenotypic ratio when one allele is completely dominant.

Same genotype, different expressed traits

Phenotypic plasticity means that an organism with a given genotype can develop different traits in response to its environment by changing patterns of gene expression. The genotype itself does not change. Instead, genes are switched on, switched off or expressed at different levels.

Tanning is a useful human example. Increased sunlight exposure can increase expression of genes involved in melanin production in skin cells. If sunlight exposure later decreases, melanin production can fall again, so the phenotype may reverse. That reversibility shows why plasticity is not the same as mutation.

Image

Phenotypic plasticity helps in variable environments, since an organism can adjust its phenotype to the conditions it actually experiences. In plants, seedlings grown in darkness often develop differently from seedlings grown in light, even though they have the same genotype. Some plastic changes are reversible during life; others, especially changes made during development, may be difficult or impossible to reverse.

PKU and recessive inheritance

A genetic disease is a disease caused by one or more alleles that alter normal biological function. Phenylketonuria is a recessive genetic condition caused by mutation in an autosomal gene coding for the enzyme needed to convert phenylalanine to tyrosine.

That enzyme is phenylalanine hydroxylase. Someone with one normal allele and one PKU allele usually still makes enough functional enzyme. They are a carrier: a heterozygous individual that can pass on a recessive disease allele without showing the disease phenotype.

Image

With two recessive PKU alleles, a person cannot make enough functional phenylalanine hydroxylase. Phenylalanine builds up, and tyrosine production falls. If untreated, high phenylalanine concentrations can impair brain development. Newborn screening and a diet low in phenylalanine are useful for that reason. The genotype cannot be changed, but the environment can be managed to reduce the harmful phenotype.

The PKU gene is autosomal, so boys and girls inherit it in the same pattern. If both parents are carriers, each child has a 1 in 4 probability of inheriting both recessive alleles, a 1 in 2 probability of being a carrier, and a 1 in 4 probability of inheriting no PKU allele for that gene.

Gene pools contain more alleles than any one individual can carry

A gene pool means the complete set of alleles present in all individuals of an interbreeding population. One diploid individual can inherit only two alleles of an autosomal gene, but the population can hold many more than two versions.

A single-nucleotide polymorphism is a position in DNA where individuals in a population differ by one nucleotide base. The abbreviation is SNP, pronounced “snip”. One person might have A at a particular position in a gene; another might have G at that same position. Across a long gene, several SNPs can occur, so many allele versions can build up in the gene pool.

Multiple alleles are three or more alleles of the same gene present in a population. That doesn’t mean one individual has many alleles of the gene. It means the population’s gene pool contains many possible versions, from which each diploid individual inherits at most two.

Image

This helps explain why inheritance is richer than the simple T and t examples suggest. Those simple examples are useful for learning the logic, but real gene pools often contain many sequence variants.

Three alleles, four phenotypes

The ABO blood group system is the standard human example of multiple alleles. Use the allele symbols Iᴬ, Iᴮ and i. Only one gene is involved, but three common alleles occur in the population.

Each person inherits two alleles, giving the possible genotypes IIᴬ, Ii, IIᴮ, Ii, IIᴮ and ii. These give four phenotypes: blood group A, B, AB or O.

ABO blood groups produced by the three alleles Iᴬ, Iᴮ and i.

Blood group phenotypePossible genotype(s)Red blood cell antigen(s)
AIᴬIᴬ or IᴬiA antigen
BIᴮIᴮ or IᴮiB antigen
ABIᴬIᴮA and B antigens
OiiNo A or B antigens

The Iᴬ allele produces the A antigen on red blood cells. The Iᴮ allele produces the B antigen. The i allele produces neither A nor B antigen. So Iᴬ is dominant over i, and Iᴮ is dominant over i. Blood group AB comes from the genotype IIᴮ because both A and B antigens are present.

This has medical importance because incompatible transfusions can make red blood cells clump when antibodies bind to unfamiliar antigens. For this topic, keep the focus on inheritance pattern and allele notation; transfusion compatibility fits better with immunity and blood physiology.

Two ways heterozygotes can differ from both homozygotes

Codominance is an inheritance pattern where both alleles in a heterozygote are expressed, producing a dual phenotype. The required example is the AB blood type: genotype IIᴮ produces both A and B antigens, so the phenotype is not “halfway” between A and B; it is both.

Incomplete dominance is an inheritance pattern where the heterozygote shows an intermediate phenotype between the two homozygotes. In four o’clock flower, also called marvel of Peru, Mirabilis jalapa, red-flowered and white-flowered homozygotes can produce pink-flowered heterozygotes.

Image

At the phenotypic level, the distinction is straightforward. Codominance gives a dual phenotype: both effects show. Incomplete dominance gives an intermediate phenotype: the heterozygote falls between the homozygotes. For the plant example, either the common name or Mirabilis jalapa is acceptable.

X and Y chromosomes determine typical human sex development

A sex chromosome is a chromosome involved in determining sex and carrying genes with sex-linked inheritance patterns. An autosome is any chromosome that is not a sex chromosome.

Most human females have two X chromosomes. Most human males have one X chromosome and one Y chromosome. Eggs normally carry an X chromosome, while sperm carry either an X or a Y chromosome, so the sperm determines whether the zygote is typically XX or XY.

Image

A gene on the Y chromosome initiates development of testes in the embryo. The testes then secrete hormones that lead to development of many male-typical physical characteristics. Without a Y chromosome, development usually follows a female-typical pathway.

The X chromosome is much larger than the Y chromosome and carries far more genes. That difference explains why X-linked inheritance is so much more common than Y-linked inheritance. Many X-linked genes have nothing directly to do with sex development; they are simply located on the X chromosome.

A sex-linked gene is a gene located on a sex chromosome. In many school genetics examples, that usually means a gene on the X chromosome. Males usually have only one X chromosome, so a recessive allele on that X can be expressed in males even when females would need two copies to show the same recessive phenotype.

X-linked recessive inheritance

Haemophilia is a sex-linked genetic disorder where blood clotting is impaired because a clotting factor is absent or defective. In the required notation, alleles are written as superscript letters on an uppercase X: for example, Xᴴ for an X chromosome carrying the normal clotting allele and Xʰ for an X chromosome carrying the haemophilia allele. The Y chromosome is written as Y because it does not carry that allele.

Image

A male with genotype XʰY has haemophilia, since his only X chromosome carries the recessive allele. A female with genotype XᴴXʰ is usually a carrier; the normal allele on one X chromosome is enough for normal clotting. A female with genotype XʰXʰ would be affected, but this is much rarer because she must inherit the haemophilia allele from both parents.

This gives the usual pedigree pattern: more affected males than females, no father-to-son transmission for an X-linked allele, and carrier mothers who can have affected sons. Keep the allele symbols attached to X, not floating on their own, because the point is that the allele is carried on the sex chromosome.

Reading family patterns instead of doing human crosses

A pedigree chart is a family-tree diagram that shows how a trait appears across generations. Human geneticists use pedigrees because controlled genetic crosses in people would not be ethical.

Standard conventions matter. Squares show males; circles show females. Shaded symbols mark affected individuals, horizontal lines join parents, and vertical lines lead to offspring. Roman numerals label generations, while Arabic numerals identify individuals within a generation.

Image

To work out an inheritance pattern, start by looking for contradictions. Two unaffected parents with an affected child strongly suggest a recessive allele. If an affected father has all affected daughters but no affected sons, you may suspect an X-linked dominant pattern. When mostly males are affected and affected fathers do not pass the trait to sons, X-linked recessive inheritance is possible. If males and females are affected in similar numbers, autosomal inheritance is more likely.

A close relative is someone who shares a recent common ancestor, such as a sibling or first cousin. Many societies prohibit marriage between close relatives partly because it raises the probability that both parents carry the same rare recessive allele inherited from a shared ancestor. Their child is then more likely to inherit two copies and show the genetic disorder.

Inductive and deductive reasoning

Inductive reasoning forms a general conclusion from observations of some cases. In a pedigree, several affected children born to unaffected parents may lead you to the general hypothesis that the condition is recessive.

Deductive reasoning applies a general rule or hypothesis to predict or explain a specific case. Once you hypothesize “this disorder is autosomal recessive”, you can deduce that unaffected parents of an affected child must both be carriers.

A typical workflow is simple: observe part of the pedigree, induce a likely inheritance pattern, then use that pattern deductively to assign possible genotypes to individuals.

Continuous and discrete variation

Continuous variation is variation where phenotypes spread across a range, with many possible values in between. The required example is human skin colour. Discrete variation is variation where phenotypes fit into separate categories with no intermediate values, such as ABO blood group.

Compares continuous and discrete variation in inheritance examples.

AspectContinuous variationDiscrete variation
Phenotype patternRange of values with many intermediatesSeparate categories with no intermediates
Typical graphHistogram or frequency curveBar chart of category counts
Genetic influenceOften polygenic: several genes add small effectsOften determined by distinct alleles; ABO has A, B, AB or O categories
Environmental influenceCan modify phenotype, e.g. sunlight affects skin pigmentationUsually does not change the category, e.g. ABO group
ExamplesSkin colour; height; massABO blood group

Several genes affect melanin production and distribution, so they influence skin colour. Environmental exposure to sunlight also affects it. Polygenic inheritance is inheritance of a trait influenced by two or more genes. When several genes each add small effects, the pattern often looks like a continuous distribution, not a set of neat categories.

ABO blood group works differently. It is a discrete variable: a person is A, B, AB or O. You do not measure someone as 37% blood group A. Skin pigmentation, height and mass, by contrast, can vary along a scale and can be measured with units or ordered continuously.

Measures of central tendency

A measure of central tendency is a statistic that shows the centre or typical value of a data set. The mean is the sum of all values divided by the number of values. The median is the middle value once the values have been placed in order. The mode is the most frequent value.

For many continuous data sets that are roughly symmetrical, the mean is useful. The median often works better when the data are skewed or contain outliers. The mode is useful for categorical data such as ABO blood group, because “average blood group” is meaningless.

For a data set, the mean is calculated as = Σx / n, where is the mean value of the variable (same unit as the measured variable), Σx is the sum of all measured values (same unit as the measured variable), and n is the number of values (dimensionless).

What a box-and-whisker plot shows

A box-and-whisker plot is a graph showing the spread and centre of a continuous data set, using the median, quartiles, minimum, maximum and outliers. It works well for variables such as student height, where you often want to compare spread and skew at a glance.

A good box-and-whisker plot has six required features: outliers, minimum, first quartile, median, third quartile and maximum. The box goes from the first quartile to the third quartile. Inside the box, a line marks the median. The whiskers reach out to the minimum and maximum values that are not outliers.

Image

The interquartile range is the spread of the middle 50% of the data. IQR = Q₃ − Q₁, where IQR is the interquartile range (same unit as the measured variable), Q₃ is the third quartile (same unit as the measured variable), and Q₁ is the first quartile (same unit as the measured variable).

A data point counts as an outlier if it is more than 1.5 × IQR above the third quartile or more than 1.5 × IQR below the first quartile. So the upper outlier boundary is Q₃ + 1.5 × IQR, and the lower outlier boundary is Q₁ − 1.5 × IQR.

When constructing a box-and-whisker plot, order the data first. Then find the median, first quartile and third quartile before adding the whiskers. That order prevents most errors.

Segregation: alleles separate into gametes

Segregation means the two alleles of a gene separate into different gametes during meiosis. A diploid cell carries two alleles of most autosomal genes; a haploid gamete gets only one. That’s why a heterozygote can make two gamete types for a gene.

Independent assortment: unlinked genes separate independently

Independent assortment means alleles of different genes separate into gametes independently of one another. It applies to unlinked genes: genes on different chromosomes, or far enough apart on the same chromosome that crossing over makes their inheritance effectively independent.

Chromosome movement explains it. In metaphase I of meiosis, homologous chromosome pairs line up at the equator, and each pair’s orientation is random. In anaphase I, homologous chromosomes move to opposite poles. Since each pair orients independently, the allele inherited for one gene does not determine the allele inherited for another unlinked gene.

Image

This connects directly to dihybrid crosses. A double heterozygote for two unlinked genes, such as AaBb, can produce AB, Ab, aB and ab gametes in equal proportions. That equal production gives the familiar dihybrid ratios in the next section.

The broader “doubling and halving” idea appears throughout biology. Meiosis halves chromosome number to make gametes, and fertilization restores the diploid number. DNA replication in S phase doubles DNA before division. Glycolysis even splits a six-carbon glucose molecule into two three-carbon molecules, so halving can happen at the molecular level as well as at the chromosome level.

Dihybrid crosses and the 9:3:3:1 ratio

A dihybrid cross follows the inheritance of two genes at once. With two unlinked autosomal genes showing complete dominance, crossing two double heterozygotes, AaBb × AaBb, produces four gamete types from each parent: AB, Ab, aB and ab.

A 4 × 4 Punnett grid brings those gametes together. You get a 9:3:3:1 phenotypic ratio when both genes show complete dominance and both parents are heterozygous for both genes. Out of the offspring, 9 show both dominant phenotypes, 3 show the first dominant and second recessive phenotype, 3 show the first recessive and second dominant phenotype, and 1 shows both recessive phenotypes.

Image

Test crosses and the 1:1:1:1 ratio

Crossing a double heterozygote with a double homozygous recessive, AaBb × aabb, gives a dihybrid test cross. The recessive parent can produce only ab gametes, so the offspring phenotypes show directly which gametes the double heterozygote made. If the genes are unlinked, AB, Ab, aB and ab gametes have equal probability, producing a 1:1:1:1 phenotypic ratio.

Image

Mendel’s second law has conditions

The 9:3:3:1 and 1:1:1:1 ratios come from what is often called Mendel’s second law: alleles of one gene assort into gametes independently of alleles of another gene. Be careful with the word “law”. In biology, a law is a reliable prediction under specified conditions, not an exception-proof commandment.

This law works when genes are on different chromosomes, or when they are far enough apart on the same chromosome that recombination reaches about 50%. Linked genes, selection, small sample sizes, epistasis and other factors can all cause observed ratios to differ from the simple prediction.

Genes have positions and products

A locus is the exact position of a gene on a chromosome. In humans, protein-coding genes occur on autosomes 1–22 and on the sex chromosomes X and Y. The base sequence of a gene determines the amino acid sequence of its polypeptide product, though different alleles may have small sequence differences.

A polypeptide product is the chain of amino acids made when a coding gene is expressed and translated. For example, a typical database entry for a human gene gives the chromosome, locus, gene name, transcript information and protein product.

Using databases properly

When working with databases, you should be able to locate genes on different chromosomes, as well as genes close together on the same chromosome. Ensembl, NCBI Gene and OMIM are examples of databases used for this.

A sensible workflow is:

  1. Search the gene name or disease-associated variant.
  2. Record the chromosome and locus.
  3. Identify the polypeptide product.
  4. Compare two genes on different chromosomes with two genes near each other on the same chromosome.

Image

Genes on different chromosomes are unlinked. Genes close together on the same chromosome may be linked and may fail to assort independently. That is why database skills connect directly to inheritance patterns: the physical location of genes helps explain the ratios seen in crosses.

Linked genes travel together more often than expected

Gene linkage is the tendency of genes located close together on the same chromosome to be inherited together. Autosomal gene linkage is gene linkage involving genes on autosomes rather than sex chromosomes.

Linked genes may not assort independently because they sit on the same physical DNA molecule. During meiosis, a chromosome carrying one allele of the first gene will probably carry the nearby allele of the second gene into the same gamete. Crossing over can split them apart, but when the genes are close together, it happens relatively rarely.

In crosses involving linkage, write allele symbols beside vertical lines that represent homologous chromosomes. This isn’t just decoration; it shows which allele combinations are on the same chromosome. For example, AB on one vertical chromosome line and ab on the homologous line is a different arrangement from Ab and aB.

Image

Suspect linked genes when offspring ratios differ significantly from the ratios expected for independent assortment. Parent-like combinations are usually more common, while recombinant combinations are less common. A chi-squared test is used later to decide whether the difference is large enough to count as statistically significant rather than ordinary sampling variation.

What counts as recombinant?

A recombinant is a gamete, genotype or phenotype that has a new combination of alleles when you compare it with the parental combinations. The term can apply at three levels, so say exactly what you mean: recombinant gametes, recombinant offspring genotypes, or recombinant offspring phenotypes.

For unlinked genes, recombinants come from the random orientation of homologous chromosome pairs during meiosis I. A double heterozygote can produce all four gamete combinations in equal proportions. In a test cross with a double homozygous recessive, the offspring phenotypes directly show those gametes, so you expect a 1:1:1:1 ratio.

Image

For linked genes, crossing over between the two loci produces recombinants. If the parental arrangement is AB/ab, then AB and ab are parental gametes, while Ab and aB are recombinant gametes. In a test cross AB/ab × ab/ab, offspring with genotypes AaBb and aabb are parental types, while Aabb and aaBb are recombinants.

Image

Start by working out the allele combinations in the original parents. Only after that can you decide which offspring combinations are new. A common mistake is to look only at dominance, but recombination is about allele combinations compared with the parents, not about whether a phenotype is dominant or recessive.

Why chi-squared is used

A chi-squared test compares observed frequencies with expected frequencies to judge goodness of fit. In genetics, it’s used to decide whether the numbers of offspring observed match an expected ratio, such as 9:3:3:1 or 1:1:1:1.

The null hypothesis is a testable statement that there is no significant difference between observed and expected results. The alternative hypothesis is a testable statement that there is a significant difference between observed and expected results.

For a dihybrid cross, a typical null hypothesis might be: “The observed phenotypic frequencies fit a 9:3:3:1 ratio.” The alternative would be: “The observed phenotypic frequencies do not fit a 9:3:3:1 ratio.”

Observed, expected and significance

Observed results are the actual counts collected. Expected results are the counts predicted by the genetic model. To calculate expected numbers, multiply the total offspring by the expected fraction for each phenotype.

Calculate the chi-squared statistic as χ² = Σ((OE)² / E), where χ² is the chi-squared test statistic (dimensionless), Σ means sum over all categories (dimensionless), O is the observed frequency in a category (individuals), and E is the expected frequency in that category (individuals).

The degrees of freedom for a goodness-of-fit test is usually the number of phenotypic categories minus one. For four dihybrid phenotype classes, degrees of freedom = 3.

A p-value is the probability of obtaining results at least as different from expected as the observed results, assuming the null hypothesis is true. At the p = 0.05 level, where p is the probability value (dimensionless), results are treated as statistically significant if there is less than a 5% probability that the deviation is due to chance sampling alone.

Worked chi-squared goodness-of-fit test for a dihybrid 9:3:3:1 ratio.

Phenotype classObserved O / offspringExpected E / offspringO − E / offspring(O − E)² / Eχ² totaldfDecision at p = 0.05
Both dominant959050.281.6431.64 < 7.82: do not reject H₀
Trait 1 dom.2530−50.831.6431.64 < 7.82: do not reject H₀
Trait 2 dom.323020.131.6431.64 < 7.82: do not reject H₀
Both recessive810−20.401.6431.64 < 7.82: do not reject H₀

If calculated χ² is greater than the critical value for the correct degrees of freedom at p = 0.05, reject the null hypothesis. If calculated χ² is equal to or less than the critical value, do not reject the null hypothesis. Use “do not reject” rather than “prove”, because statistics does not prove the model true; it only shows whether the data give enough evidence against it.

Samples, populations and effective sampling

Statistical tests usually use a sample to represent a population. In a genetics cross, the F2 generation is the sample; in many experiments, repeated measurements or replicates are the sample. Good sampling needs a representative sample, chosen without bias, and large enough to reduce random error. Random sampling means every member of the population has an equal chance of selection, as when quadrat positions are chosen randomly in ecology. The same idea appears in mark-release-recapture methods such as the Lincoln index: the sampled individuals must represent the wider population.

This is the link between genetics and ecology statistics: a p-value is only meaningful if the data come from a sampling method that justifies treating the sample as a model of the population.

Were those notes helpful?

D3.1 Reproduction

D3.3 Homeostasis