Sequence Analysis - Companion Site: Alternative metrics to align sequences

Click here to get instructions…

Please download and unzip the replication files for Chapter 3 ( Chapter03.zip).
Read readme.html and run 3-0_ChapterSetup.R. This will create 3-0_ChapterSetup.RData in the sub folder data/R. This file contains the data required to produce the plots shown below.
You also have to add the function legend_large_box to your environment in order to render the tweaked version of the legend described below. You find this file in the source folder of the unzipped Chapter 3 archive.
We also recommend to load the libraries listed in Chapter 3’s LoadInstallPackages.R

# assuming you are working within .Rproj environment
library(here)

# install (if necessary) and load other required packages
source(here("source", "LoadInstallPackages.R"))

# load environment generated in "3-0_ChapterSetup.R"
load(here("data", "R", "3-0_ChapterSetup.RData"))

In chapter 3.4, we consider the so-called nonalignment techniques, that is techniques not based on OM but on the identification of subsequences that occur in the same order along the sequence. The data come from a sub-sample of the German Family Panel - pairfam. For further information on the study and on how to access the full scientific use file see here.

Longest common subsequence (LCS)

For illustrative purpose, we use three example sequences (6 time-points, 3 states: A, B, C)

ch3.ex2 <- c("A-B-B-C-C-C", "A-B-B-B-B-B", "B-C-C-C-B-B")

ch3.ex2.seq <- seqdef(ch3.ex2)

We compute the dissimilarity matrix between these three example sequences using the longest common subsequence method:

lcs.diss<-seqdist(ch3.ex2.seq, method="LCS")

…and display the LCS-based dissimilarity matrix for three example sequences:

lcs.diss

    [1] [2] [3]
[1]   0   6   4
[2]   6   0   6
[3]   4   6   0

Alternative metrics to align sequences

Longest common subsequence (LCS)

Corrections

Reuse