BGGN-213: Dot Plot Comparison of Two Sequences


Dot plots are a simple graphical approach for the visual comparison of two sequences. They have a long history (see Maizel and Lenk 1981 and references therein) and entail placing one sequence on the vertical axis of a 2D grid (or matrix) and the other on the horizontal. In its simplest form, a dot is placed where the horizontal and vertical sequence values match. That is a dot is produced at position (i,j) if character number i in the first sequence is the same as character number j in the second sequence. More elaborate forms use 'sliding windows' composed of multiple characters and a threshold value, or 'match stringency' for two windows to be considered as matched.

Dot Plot Parameters

Alter the parameters below to change the displayed protein and DNA dot plots. It is important to have a good feel for these parameters when we get to alignment heuristic approaches later.

Match stringency specifies the number of match characters required per window. It should not be larger than your window size!

Questions for discussion:

  • Why does the DNA sequence have more dots than the protein sequence plot?
  • How can we increase the signal to noise ratio?
  • What does a 'Match stringency' larger than 'Window size' yield and why?
  • What would off-diagonal runs of dots represent?
  • What are the major weaknesses of this approach?