Sequence Analysis - Companion Site: Sequence length and granularity

Click here to get instructions…

Please download and unzip the replication files for Chapter 2 ( Chapter02.zip).
Read readme.html and run 2-0_ChapterSetup.R. This will create 2-0_ChapterSetup.RData in the sub folder data/R. This file contains the data required to produce the table shown at the bottom of this page.
We also recommend to load the libraries listed in the Chapter 2’s LoadInstallPackages.R

# assuming you are working within .Rproj environment
library(here)

# install (if necessary) and load other required packages
source(here("source", "load_libraries.R"))

# load environment generated in "2-0_ChapterSetup.R"
load(here("data", "R", "2-0_ChapterSetup.RData"))

Table 2.2 in Chapter 2.2 compares different approaches towards defining sequence data.

The sequences consist of monthly information on respondents’ partnership status between age 18 to 40 (monthly data). The sequences are stored in the object partner.month.seq and distinguish four states:

State	Short Label
Single	S
LAT	LAT
Cohabiting	COH
Married	MAR

In addition to using the original data, two alternative approaches of defining the sequences are discussed:

The first alternative aims at reducing the complexity of the original data by imposing a threshold rule that defines a minimum length for partnership spells. If a spell falls below this threshold value, it would be discounted and the respective states coded as being single rather than in a relationship. The sequence object based on this manipulated data set is stored in partner.month.seq2 which was created in the same way as partner.month.seq (see Chapter 2-1 or 2-0_ChapterSetup.R).
The second strategy changes the granularity of the sequence data from monthly to yearly. That is, twelve states from the original sequence are condensed to one state in the new sequence by applying the seqgranularity function from the {TraMineRExtras} package. The resulting object (already stored in 2-0_ChapterSetup.RData) was created with the following code

# change granularity --> years instead of months (using modal values)
partner.year.seq <- seqgranularity(partner.month.seq, 
                               tspan=12, method="mostfreq")

Now we can produce Table 2.2 from the book which shows a small selection of four sequences using the three different specifications.

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Table 2.2 - Different alternatives of defining sequences ----
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# Using original Sequence data (monthly granularity) (column 1)
col1 <- print(partner.month.seq[c(4, 8, 16, 21), ], format = "SPS")

# Using recoded monthly data only considering
# spells lasting at least 12 months (column 2)
col2 <- print(partner.month.seq2[c(4, 8, 16, 21), ], format = "SPS")

# Using sequences with yearly granularity (Column 3)
col3 <- print(partner.year.seq[c(4, 8, 16, 21), ], format = "SPS")

# Print selection of sequences from differently specified sequence data
tibble(col1, col2, col3) %>%
  kable(col.names = c("Original sequence", 
                      "Strategy 1 – recode", 
                      "Strategy 2 – aggregate")) %>%
  kable_styling(bootstrap_options = 
                  c("striped", "hover", "condensed", "responsive"))

Original sequence	Strategy 1 – recode	Strategy 2 – aggregate
(S,89)-(LAT,26)-(COH,14)-(LAT,6)-(S,34)-(LAT,4)-(MAR,91)	(S,89)-(LAT,26)-(COH,14)-(S,44)-(MAR,91)	(S,7)-(LAT,3)-(COH,1)-(S,3)-(MAR,8)
(LAT,13)-(S,6)-(LAT,33)-(S,24)-(LAT,41)-(S,35)-(LAT,10)-(COH,14)-(MAR,88)	(LAT,13)-(S,6)-(LAT,33)-(S,24)-(LAT,41)-(S,45)-(COH,14)-(MAR,88)	(LAT,1)-(S,1)-(LAT,2)-(S,2)-(LAT,4)-(S,3)-(LAT,1)-(COH,1)-(MAR,7)
(S,56)-(LAT,69)-(COH,47)-(MAR,92)	(S,56)-(LAT,69)-(COH,47)-(MAR,92)	(S,5)-(LAT,5)-(COH,4)-(MAR,8)
(LAT,4)-(S,134)-(LAT,9)-(COH,3)-(MAR,52)-(LAT,5)-(COH,25)-(MAR,32)	(S,150)-(MAR,52)-(S,5)-(COH,25)-(MAR,32)	(S,12)-(MAR,5)-(COH,2)-(MAR,3)

Sequence length and granularity

Corrections

Reuse