Chapter 6.1 Comparing Within-Group Discrepancies
readme.html
and run
6-0_ChapterSetup.R
. This will create
6-0_ChapterSetup.RData
in the sub folder
data/R
. This file contains the data required to produce the
table shown at the bottom of this page.LoadInstallPackages.R
# assuming you are working within .Rproj environment
library(here)
# install (if necessary) and load other required packages
source(here("source", "LoadInstallPackages.R"))
# load environment generated in "6-0_ChapterSetup.R"
load(here("data", "R", "6-0_ChapterSetup.RData"))
Table 6.1 in Chapter 6.1 presents dicrepancies and
complexity scores for different subgroups in our example datasets on
family and labor market trajectories (yearly granularity) between age 18
to 40 (monthly data). The sequences are stored in the objects
partner.child.year.seq
and activity.year.seq
.
The data frames family
and activity
include
information on the grouping variables east
(living in East
vs West Germany), sex
(male vs. female),
highschool
(at least highschool degree: yes vs. no).
The computation of the discrepancies requires a dissimilarity matrix
as an input. The two dissimilarity matrices
partner.child.year.om
and activity.year.om
come from an OM analysis using constant substitution costs of 2 and
indel costs of 1. All objects required for the analysis are stored in
6-0_ChapterSetup.RData
(see instructions above).
Discrepancies are obtained by {TraMineR}
’s dissassoc
function.
The following code snippet illustrates the function comparing the discrepancies in women’s and men’s labor market biographies.
<- dissassoc(activity.year.om,activity$sex)
discr_sex discr_sex
Pseudo ANOVA table:
SS df MSE
Exp 981.5653 1 981.565295
Res 10027.9985 1025 9.783413
Total 11009.5638 1026 10.730569
Test values (p-values based on 1000 permutation):
t0 p.value
Pseudo F 100.32953529 0.001
Pseudo Fbf 101.24714023 0.001
Pseudo R2 0.08915569 0.001
Bartlett 31.59488891 0.001
Levene 172.45072755 0.001
Inconclusive intervals:
0.00383 < 0.01 < 0.0162
0.03649 < 0.05 < 0.0635
Discrepancy per level:
n discrepancy
0 506 7.346404
1 521 12.112702
Total 1027 10.720121
The output indicates a statistically significant difference (Levene Test) in the discrepancies of women (\(=\) 12.1) and men (\(=\) 7.3). The pairwise dissimilarities in the sequence of women are greater than those of men.
Next to discrepancies Table 6.1 also displays
group-specific complexity values which are obtained by
seqici
. With t.test
we test for the equality of
the average complexities of men and women.
<- seqici(activity.year.seq[activity$sex == 0,])
complex_men <- seqici(activity.year.seq[activity$sex == 1,])
complex_women
<- t.test(complex_men, complex_women) complex_sex
The test indicates that the mean complexity of men (0.17) is
different from the corresponding complexity of women (0.27). By adding
the argument alternative = "less"
to t.test
we
even could finally prove that (the life courses of) men are less complex
than (those of) women… Hence, the within and between sequence variation
of women is larger than the corresponding values of men.
Table 6.1 in the book presents several group
differences at once. Instead of re-iterating the code for all group
comparisons and two different data sets, we wrote a function named
DiscrCompTest
(defined in
6-1_Table_6-1_Step1_Source.R
in
Chapter06.zip) that can be reused to compute the same group
comparisons (group indicators: east
, sex
,
highschool
) using two different sets of input arguments
(set 1: family
, partner.child.year.seq
,
partner.child.year.om
; set 2: activity
,
activity.year.seq
, activity.year.om
). The
function’s output is a nicely formatted table using
{knitr}
’s kable
and the
{kableExtra}
package. The function is sourced and called in
6-1_Table_6-1_Step2_Print.R
.
On this page we do not further elaborate on the code defining the function because this website’s focus is on introducing the functions required to re-produce the results shown in the book. Re-using and adjusting the two simple code chunks shown above would be sufficient to achieve this goal. Admittedly, such a verbose coding strategy wouldn’t be very efficient and is also quite error prone, but done correctly would produce all the bits and pieces required to “manually” re-build Table 6.1.
# Discrepancy & Complexity I - Labor Market
dissassoc(activity.year.om,activity$sex)
<- seqici(activity.year.seq[activity$sex == 0,])
complex_men <- seqici(activity.year.seq[activity$sex == 1,])
complex_women t.test(complex_men, complex_women)
dissassoc(activity.year.om,activity$east)
<- seqici(activity.year.seq[activity$east == 0,])
complex_west <- seqici(activity.year.seq[activity$east == 1,])
complex_east t.test(complex_west, complex_east)
dissassoc(activity.year.om,activity$highschool)
<- seqici(activity.year.seq[activity$highschool == 0,])
complex_no_highschool <- seqici(activity.year.seq[activity$highschool == 1,])
complex_highschool t.test(complex_no_highschool, complex_highschool)
# Discrepancy & Complexity II - Family
dissassoc(partner.child.year.om,family$sex)
<- seqici(partner.child.year.seq[family$sex == 0,])
complex_men <- seqici(partner.child.year.seq[family$sex == 1,])
complex_women t.test(complex_men, complex_women)
dissassoc(partner.child.year.om,family$east)
<- seqici(partner.child.year.seq[family$east == 0,])
complex_west <- seqici(partner.child.year.seq[family$east == 1,])
complex_east t.test(complex_west, complex_east)
dissassoc(partner.child.year.om,family$highschool)
<- seqici(partner.child.year.seq[family$highschool == 0,])
complex_no_highschool <- seqici(partner.child.year.seq[family$highschool == 1,])
complex_highschool t.test(complex_no_highschool, complex_highschool)
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/sa-book/sa-book.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".