Sequence Analysis - Companion Site: Exploring group-specific discrepancies

Click here to get instructions…

Please download and unzip the replication files for Chapter 6 ( Chapter06.zip).
Read readme.html and run 6-0_ChapterSetup.R. This will create 6-0_ChapterSetup.RData in the sub folder data/R. This file contains the data required to produce the table shown at the bottom of this page.
We also recommend to load the libraries listed in the Chapter 6’s LoadInstallPackages.R

# assuming you are working within .Rproj environment
library(here)

# install (if necessary) and load other required packages
source(here("source", "LoadInstallPackages.R"))

# load environment generated in "6-0_ChapterSetup.R"
load(here("data", "R", "6-0_ChapterSetup.RData"))

Table 6.1 in Chapter 6.1 presents dicrepancies and complexity scores for different subgroups in our example datasets on family and labor market trajectories (yearly granularity) between age 18 to 40 (monthly data). The sequences are stored in the objects partner.child.year.seq and activity.year.seq. The data frames family and activity include information on the grouping variables east (living in East vs West Germany), sex (male vs. female), highschool (at least highschool degree: yes vs. no).

The computation of the discrepancies requires a dissimilarity matrix as an input. The two dissimilarity matrices partner.child.year.om and activity.year.om come from an OM analysis using constant substitution costs of 2 and indel costs of 1. All objects required for the analysis are stored in 6-0_ChapterSetup.RData(see instructions above). Discrepancies are obtained by {TraMineR}’s dissassoc function.

The following code snippet illustrates the function comparing the discrepancies in women’s and men’s labor market biographies.

discr_sex <- dissassoc(activity.year.om,activity$sex)
discr_sex

Pseudo ANOVA table:
              SS   df        MSE
Exp     981.5653    1 981.565295
Res   10027.9985 1025   9.783413
Total 11009.5638 1026  10.730569

Test values  (p-values based on 1000 permutation):
                     t0 p.value
Pseudo F   100.32953529   0.001
Pseudo Fbf 101.24714023   0.001
Pseudo R2    0.08915569   0.001
Bartlett    31.59488891   0.001
Levene     172.45072755   0.001

Inconclusive intervals: 
0.00383  <  0.01  <  0.0162
0.03649  <  0.05  <  0.0635 

Discrepancy per level:
         n discrepancy
0      506    7.346404
1      521   12.112702
Total 1027   10.720121

The output indicates a statistically significant difference (Levene Test) in the discrepancies of women (\(=\) 12.1) and men (\(=\) 7.3). The pairwise dissimilarities in the sequence of women are greater than those of men.

Next to discrepancies Table 6.1 also displays group-specific complexity values which are obtained by seqici. With t.testwe test for the equality of the average complexities of men and women.

complex_men <- seqici(activity.year.seq[activity$sex == 0,])
complex_women <- seqici(activity.year.seq[activity$sex == 1,])

complex_sex  <- t.test(complex_men, complex_women)

The test indicates that the mean complexity of men (0.17) is different from the corresponding complexity of women (0.27). By adding the argument alternative = "less" to t.test we even could finally prove that (the life courses of) men are less complex than (those of) women… Hence, the within and between sequence variation of women is larger than the corresponding values of men.

Table 6.1 in the book presents several group differences at once. Instead of re-iterating the code for all group comparisons and two different data sets, we wrote a function named DiscrCompTest (defined in 6-1_Table_6-1_Step1_Source.R in Chapter06.zip) that can be reused to compute the same group comparisons (group indicators: east, sex, highschool) using two different sets of input arguments (set 1: family, partner.child.year.seq, partner.child.year.om; set 2: activity, activity.year.seq, activity.year.om). The function’s output is a nicely formatted table using {knitr}’s kable and the {kableExtra} package. The function is sourced and called in 6-1_Table_6-1_Step2_Print.R.

On this page we do not further elaborate on the code defining the function because this website’s focus is on introducing the functions required to re-produce the results shown in the book. Re-using and adjusting the two simple code chunks shown above would be sufficient to achieve this goal. Admittedly, such a verbose coding strategy wouldn’t be very efficient and is also quite error prone, but done correctly would produce all the bits and pieces required to “manually” re-build Table 6.1.

# Discrepancy & Complexity I - Labor Market 
dissassoc(activity.year.om,activity$sex)
complex_men <- seqici(activity.year.seq[activity$sex == 0,])
complex_women <- seqici(activity.year.seq[activity$sex == 1,])
t.test(complex_men, complex_women)

dissassoc(activity.year.om,activity$east)
complex_west <- seqici(activity.year.seq[activity$east == 0,])
complex_east <- seqici(activity.year.seq[activity$east == 1,])
t.test(complex_west, complex_east)

dissassoc(activity.year.om,activity$highschool)
complex_no_highschool <- seqici(activity.year.seq[activity$highschool == 0,])
complex_highschool <- seqici(activity.year.seq[activity$highschool == 1,])
t.test(complex_no_highschool, complex_highschool)


# Discrepancy & Complexity II - Family 
dissassoc(partner.child.year.om,family$sex)
complex_men <- seqici(partner.child.year.seq[family$sex == 0,])
complex_women <- seqici(partner.child.year.seq[family$sex == 1,])
t.test(complex_men, complex_women)

dissassoc(partner.child.year.om,family$east)
complex_west <- seqici(partner.child.year.seq[family$east == 0,])
complex_east <- seqici(partner.child.year.seq[family$east == 1,])
t.test(complex_west, complex_east)

dissassoc(partner.child.year.om,family$highschool)
complex_no_highschool <- seqici(partner.child.year.seq[family$highschool == 0,])
complex_highschool <- seqici(partner.child.year.seq[family$highschool == 1,])
t.test(complex_no_highschool, complex_highschool)

Exploring group-specific discrepancies

Corrections

Reuse