BGGN-213: Dot Plot Comparison of Two Sequences
Dot plots are a simple graphical approach for the visual comparison of two sequences. They have a long history (see
Maizel and Lenk 1981 and references therein) and entail placing one sequence on the vertical axis of a 2D grid (or matrix) and the other on the horizontal. In its simplest form, a dot is placed where the horizontal and vertical sequence values match. That is a dot is produced at position (
i,
j) if character number
i in the first sequence is the same as character number
j in the second sequence. More elaborate forms use 'sliding windows' composed of multiple characters and a threshold value, or 'match stringency' for two windows to be considered as matched.
Questions for discussion:
- Why does the DNA sequence have more dots than the protein sequence plot?
- How can we increase the signal to noise ratio?
- What does a 'Match stringency' larger than 'Window size' yield and why?
- What would off-diagonal runs of dots represent?
- What are the major weaknesses of this approach?