BGGN-213: Dot Plot Comparison of Two Sequences

BGGN-213: Dot Plot Comparison of Two Sequences

Dot plots are a simple graphical approach for the visual comparison of two sequences. They have a long history (see Maizel and Lenk 1981 and references therein) and entail placing one sequence on the vertical axis of a 2D grid (or matrix) and the other on the horizontal. In its simplest form, a dot is placed where the horizontal and vertical sequence values match. That is a dot is produced at position (i,j) if character number i in the first sequence is the same as character number j in the second sequence. More elaborate forms use 'sliding windows' composed of multiple characters and a threshold value, or 'match stringency' for two windows to be considered as matched.

Questions for discussion:

Why does the DNA sequence have more dots than the protein sequence plot?
How can we increase the signal to noise ratio?
What does a 'Match stringency' larger than 'Window size' yield and why?
What would off-diagonal runs of dots represent?
What are the major weaknesses of this approach?