JavaScript is disabled in your browser. Please enable JavaScript to view this website.

DNA sequencing: Decoding the genetic blueprint

DNA sequencing is a technique that determines the precise order of nucleotides - adenine (A), thymine (T), cytosine (C), and guanine (G) - in a DNA strand. This sequence constitutes the genetic blueprint of an organism, encoding instructions for the development, function, and reproduction of all living beings.

Search our range of products for studying DNA

View products
button-secondary

The specific sequence of the nucleotide bases encodes genetic information, which directs the production of proteins in each cell. These proteins determine cell identity, organismal traits, and overall biological function. Aberrations in this process may lead to diseased states. Variations in their arrangement lead to genetic diversity both within a species and between different species. By decoding this blueprint, scientists can gain insights into genetic variations, evolutionary relationships, and the molecular basis of diseases.

Advances in DNA sequencing have underpinned and revolutionized fields such as genomics, personalized medicine, forensic science, and biotechnology, enabling more accurate diagnoses, targeted therapies, and a deeper understanding of life’s complexity.

Understanding the basics of DNA

DNA is a double-stranded polynucleotide chain that carries the genetic information necessary for the development and functioning of an organism. It is composed of nucleotides, which are considered the building blocks of DNA.

Chemically, the backbone of DNA consists of two strands, each made up of nucleotides, which comprise a deoxyribose sugar, a phosphate group, and a nitrogenous base. Each sugar molecule is attached to one of the following four nitrogenous bases: adenine, cytosine, guanine, or thymine. Chemical linkages between the bases hold together the two strands of DNA. Cytosine pairs with guanine through a triple hydrogen bond, while adenine pairs with thymine through a double hydrogen bond.

The DNA strand is formed as nucleotides link together into chains. This linkage occurs between the phosphate group of one nucleotide and the sugar molecule of the next, creating an alternating sugar-phosphate backbone. These connections ensure the stability and structure of the DNA double helix.

DNA is responsible for transporting hereditary elements or genetic instructions from parents to offspring. It carries genetic instructions that determine the traits and characteristics of living organisms. This ensures the continuity of genetic information across generations.

The "DNA-RNA-protein" pathway, which entails the transcription of genetic information from DNA into RNA and subsequent translation into proteins, is a fundamental tenant of molecular biology. An organism’s structure, function, and, ultimately, its cellular organization are all determined by these proteins, which act as its building blocks. Beyond heredity, DNA plays a vital role in protein synthesis. It contains specific sequences, or genes, responsible for the expression of proteins. The process of converting DNA into protein occurs in two main steps:

These processes are essential for gene expression and the functioning of all living cells, making DNA the cornerstone of biological life.

Genome sequencing

The genome is the complete set of genes or genetic material (DNA and RNA) present in an organism. Genome sequencing involves determining the entire DNA (or RNA) sequence of an organism, providing a comprehensive view of its genetic information.

Genome sequencing is now performed using automated DNA sequencing techniques and computer software. The process involves multiple stages, including:

Whole genome sequencing (WGS) allows the mapping of all the genes, regulatory regions, and other elements of DNA that contribute to the different characteristics of organisms. The Human Genome Project was initiated in 1990, and in 2003, a genome sequence encompassing more than 90% of the human genome was generated. The first complete sequence of a human genome was reported in 2022.

Key methods of DNA sequencing

DNA sequencing technologies have evolved through three generations, each bringing significant advancements in throughput, cost-efficiency, and technological capabilities.

First-generation sequencing: Sanger sequencing

Initially, DNA sequencing was achieved by methods like Sanger sequencing and focused on sequencing short fragments of DNA. Sanger sequencing, which utilizes the chain termination method, was developed by Frederick Sanger in 1977. This technique is based on the principle that elongated nucleotides terminate DNA synthesis at specific points when synthetic dideoxynucleotides (ddNTPs), such as ddCTP, ddGTP, ddATP, ddTTP, are incorporated into the growing DNA strand.

As ddNTPs lack a 3’ hydroxyl group in the deoxyribose sugar, further elongation of the DNA is prevented, and fragments of varying lengths are produced. These fragments are then separated based on size using gel electrophoresis or capillary electrophoresis. The sequence is then assembled base-by-base based on the difference in fragment lengths at which the chain was terminated.

Sanger sequencing is extremely precise and remains the gold standard for smaller-scale sequencing operations, such as sequencing individual genes. However, it is labor-intensive, time-consuming, and more expensive than modern approaches such as next-generation sequencing (NGS), limiting its application in large-scale projects. The labor-intensive nature of Sanger sequencing has made it less feasible for sequencing of entire genomes or large sets of genetic data.

Capillary electrophoresis (CE) is a technique used to separate components of a chemical mixture within a narrow capillary tube under the influence of an electric field. CE separates DNA fragments based on size. When DNA is inserted in a capillary tube and an electric field is applied, the fragments flow at varying speeds. The smaller fragments move faster than the larger ones, which can then be precisely separated within the capillaries using a gel-like matrix.

CE is also commonly used for Sanger sequencing, which studies the sequence of DNA fragments. CE is also widely employed to measure fragment length, with applications such as short tandem repeat profiling, which is used in forensic science and paternity testing1.

Second and third-generation sequencing: Next-generation sequencing (NGS)

The second generation of DNA sequencing brought significant advances, marked by increased throughput and drastically reduced costs and turnaround times. This generation enabled the sequencing of whole genomes and transcriptomes, which are collections of all RNA transcripts transcribed by a single cell or a population of cells at a given point in time. These advances made large-scale sequencing projects more feasible and efficient, allowing researchers to explore genetic data on a much larger scale.

The third generation of DNA sequencing introduced single-molecule sequencing, which does not require prior amplification of DNA. This generation continues to push technological boundaries, offering real-time sequencing with greater precision and speed. The ability to sequence DNA molecules directly and in real-time has opened novel possibilities for genetic research, providing more detailed and accurate insights into the genetic makeup of organisms.

The advancements in the second and third generations are collectively referred to as next-generation sequencing (NGS) and have revolutionized genomics research by making complex sequencing tasks faster, more affordable, and more accurate.

NGS is a set of high-throughput technologies used to rapidly sequence either long-reads or short-reads, allowing sequencing of as much or as little of the genome as desired. Unlike Sanger sequencing, which processes one DNA fragment at a time, NGS sequences millions of DNA fragments at once.

Sanger sequencing has a read depth of just 1 (or 2 in the case of bidirectional sequencing), with highly accurate individual reads, usually of length 800 base pairs at once, while having a low depth of coverage, and requires 7 hours to complete. On the other hand, NGS technologies, such as sequencing by synthesis, take anywhere from 56 hours to 14 days, depending on the platform and sequencing depth, and nanopore sequencing takes nearly 0.5 to 4 hours per run.

NGS makes it possible to sequence millions of DNA fragments in massively parallel sequencing, with coverage depending on the chosen sequencing depth, which can be high in NGS. NGS readings are aligned into a consensus sequence, and errors, including those from PCR amplification, are eliminated through statistical analyses. This means the accuracy of current NGS technologies is over 99%.

Sanger sequencing is primarily used for targeted sequencing of small regions of DNA, whereas NGS is widely used for genomic analyses such as whole genome sequencing, exome sequencing, and RNA sequencing due to its ability to sequence large amounts of data quickly and cost-effectively.

The NGS technique consists of multiple steps:

Data analysis (aligning sequences to a reference genome and finding differences).

Long-read sequencing techniques: Third-generation long-read sequencing technologies (such as nanopore sequencing, which involves threading DNA molecules through nanopores and detecting changes in electrical current to determine the sequence) enable reading longer stretches of DNA in a single pass. This method is beneficial for sequencing complex portions of the genome, particularly repeated sequences.

Data analysis and interpretation in bioinformatics

After sequencing is complete, the data can be processed and evaluated using bioinformatics techniques. Bioinformatics analysis comprises quality control, genomic alignment, variant calling, and functional annotation. The goal of interpretation is to detect genetic alterations, understand their significance in the organism’s body, and associate them with diseases or phenotypic features.

The sequence information can also be applied to fields like species identification (comparing the DNA of unknown organismswith sequences of known species and determining the species based on distinctive variances in their DNA sequence), pharmacogenomics (the effect of a patient’s genome on their response to medicines), ancestry analysis (tracing genetic lineage and population history), and forensic investigations (analyzing DNA for criminal or identity verification).

Importance and applications of DNA sequencing

DNA sequencing has become indispensable in science and medicine, providing critical insights into genetic information and driving advancements in research, diagnostics, and therapeutic development.

Genomics and population genetics

Personalized medicine and pharmacogenomics

Evolutionary biology and phylogenetics

Forensics and criminal investigations

Agricultural and environmental applications

Challenges in DNA sequencing

New techniques, such as CRISPR-based sequencing, are enhancing DNA sequencing by enabling precise genome editing and analysis of specific regions. Additionally, advances in NGS have improved accuracy, throughput, cost-effectiveness, and scalability, along with more efficient data analysis.

Third-generation sequencing technologies, such as nanopore sequencing, offer long-read capabilities, shorter turnaround times, and portability. These features have the potential to transform clinical diagnostics, microbiome research, and real-time disease monitoring. AI and ML are revolutionizing DNA sequencing by advancing data interpretation. These technologies can improve variation recognition accuracy, speed up data processing, and reveal previously unavailable insights. Collectively, these technologies will shape the future of personalized medicine and genetic research.

FAQs

What are the advantages of de novo sequencing over other methods?

De novo sequencing has the main advantage of directly inferring full-length or partial tag-based peptide sequences from experimental tandem mass spectrometry spectra without the need for a reference database.

DNA sequencing poses serious privacy concerns since it may expose sensitive medical information. Genetic databases can be used to reveal identities or connect personal health information to public profiles. Individuals can be identified using bioinformatics systems, which raises the risk of data breaches. Third parties may use genomic data to derive health information, influencing insurance, employment, or legal results.

How can individuals access DNA sequencingservices?

Individuals can get DNA sequencing services from direct-to-consumer companies that provide at-home kits. Additionally, healthcare providers or genetic counselors may prescribe sequencing for medical purposes, which is often done through specialist clinics or laboratories. Some research facilities offer sequencing services for specialized investigations.

References

  1. Karger, B. L., & Guttman, A. DNA sequencing by CE. Electrophoresis, 30 Suppl 1(Suppl 1), S196–S202. (2009). https://doi.org/10.1002/elps.200900218