Alternative metrics to align sequences

Chapter 3.4 Nonalignment techniques

Click here to get instructions…
# assuming you are working within .Rproj environment
library(here)

# install (if necessary) and load other required packages
source(here("source", "LoadInstallPackages.R"))

# load environment generated in "3-0_ChapterSetup.R"
load(here("data", "R", "3-0_ChapterSetup.RData"))


In chapter 3.4, we consider the so-called nonalignment techniques, that is techniques not based on OM but on the identification of subsequences that occur in the same order along the sequence. The data come from a sub-sample of the German Family Panel - pairfam. For further information on the study and on how to access the full scientific use file see here.

Longest common subsequence (LCS)

For illustrative purpose, we use three example sequences (6 time-points, 3 states: A, B, C)

ch3.ex2 <- c("A-B-B-C-C-C", "A-B-B-B-B-B", "B-C-C-C-B-B")

ch3.ex2.seq <- seqdef(ch3.ex2)

We compute the dissimilarity matrix between these three example sequences using the longest common subsequence method:

lcs.diss<-seqdist(ch3.ex2.seq, method="LCS")

…and display the LCS-based dissimilarity matrix for three example sequences:

lcs.diss
    [1] [2] [3]
[1]   0   6   4
[2]   6   0   6
[3]   4   6   0

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/sa-book/sa-book.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".