Chapter 6.2.2 Bayesian Information Criterion and the Likelihood Ratio Test
readme.html
and run
6-0_ChapterSetup.R
. This will create
6-0_ChapterSetup.RData
in the sub folder
data/R
. This file contains the data required to produce the
table shown at the bottom of this page.LoadInstallPackages.R
# assuming you are working within .Rproj environment
library(here)
# install (if necessary) and load other required packages
source(here("source", "LoadInstallPackages.R"))
# load environment generated in "6-0_ChapterSetup.R"
load(here("data", "R", "6-0_ChapterSetup.RData"))
Table 6.4 in Chapter 6.2.2 presents group
comparisons based on a discrepancy analysis and an alternative approach
recently proposed by Liao and Fasang (LiaoFasang2021?). In
this example we examine the labor market trajectories stored in
activity.year.seq
. The group comparisons are using dummy
variables stored in activity
: east
(living in
East vs West Germany), sex
(male vs. female),
highschool
(at least highschool degree: yes vs. no).
The analysis are done with {TraMineR}
’s dissassoc
function and the seqCompare
function from the {TraMineRextras}
package.
In the book we present the following table (note that we highlighted one line we are going to elborate on in the following paragraphs):
BIC diff. | p-value | pseudo-R2 | p-value | N | |
---|---|---|---|---|---|
West vs East | |||||
Overall | 0.01 | 0.028 | 0.01 | 0.001 | 1027 |
Men | 3.69 | 0.013 | 0.00 | 0.019 | 506 |
Women | 0.68 | 0.018 | 0.02 | 0.001 | 521 |
Men vs Women | |||||
Overall | 24.85 | 0.000 | 0.09 | 0.001 | 1027 |
West | 26.72 | 0.000 | 0.11 | 0.001 | 629 |
East | 14.41 | 0.000 | 0.07 | 0.001 | 398 |
No highschool degree | 50.67 | 0.000 | 0.12 | 0.001 | 602 |
West | 62.98 | 0.000 | 0.17 | 0.001 | 338 |
East | 19.91 | 0.000 | 0.09 | 0.001 | 264 |
Highschool degree | 17.40 | 0.000 | 0.06 | 0.001 | 425 |
West | 17.80 | 0.000 | 0.08 | 0.001 | 291 |
East | 10.09 | 0.000 | 0.04 | 0.001 | 134 |
Obviously, the table includes a lot of group comparisons and it would
require a lot of repetitive coding if each single comparisons would be
explicitly spelled out. For that reason, we were using {purrr}
’s map
function to
to do the comparisons within loops. We also wrote several lines of code
for extracting and formatting the results to be displayed in a nice
table using {knitr}
’s kable
and the
{kableExtra}
package. In the accompanying script file
6-2_Table_6-4_Discrepancies-BIC.R
stored in
Chapter06.zip) you can find the full code required to produce the
table shown above.
On this page, however, we do not further elaborate on all these
technicalities and rather focus on briefly showcasing the plain {TraMineR}
commands required to obtain
the results shown in the table. As you see, this does not require a lot
of coding:
We illustrate the procedure by calling the functions required to
obtain the figures shown in the highlighted row of table displayed
above. The functions dissassoc
and
seqCompare
are easy to use. The most “tricky” part for R
novices is probably to apply the functions to the correct subset of the
data. In our example we are interested in the question of whether the
sequences of East German women are systematically different from the
sequences of their West German counterparts. Hence, our analysis have to
be restricted to women. While our sequences are stored in the object
activity.year.seq
any further information characterizing
the individual who experience these sequences is stored in the dataframe
activity
. We start our analysis by inspecting how many
women reside in East and West Germany.
(table(activity$east[activity$sex==1])
).
Women | |
---|---|
West Germany | 325 |
East Germany | 196 |
We then proceed by generating a logical vector (women
)
indicating which of the 1027 rows of activity
is referring
to a female respondent.
<- activity$sex==1
women
# visual inspection
%>%
activity select(sex) %>%
mutate (women = women)
# A tibble: 1,027 × 2
sex women
<dbl+lbl> <lgl>
1 1 [Female] TRUE
2 0 [Male] FALSE
3 1 [Female] TRUE
4 0 [Male] FALSE
5 0 [Male] FALSE
6 1 [Female] TRUE
7 0 [Male] FALSE
8 0 [Male] FALSE
9 1 [Female] TRUE
10 1 [Female] TRUE
# … with 1,017 more rows
# ℹ Use `print(n = ...)` to see more rows
We use this vector to subset women from the dissimilarity matrix
activity.year.om
, the sequence object
activity.year.seq
and the vector activity$east
when we call dissassoc
and seqCompare
. Note
that subsetting the pairwise distance matrix requires to extract the
correct rows and columns [women,women]
,
while subsetting activity.year.seq
only requires to extract
the desired rows [women, ]
.
Below we compute the values shown in the table. While the discrepancy
analysis requires a dissimilarity matrix
(activity.year.om[women,women]
) as an input,
seqCompare
computes the dissimilarity matrix on its own and
requires a sequence object (activity.year.seq[women,]
) and
instructions on how to compute the distances
(method="OM", sm="CONSTANT"
) as an input. If you want to
compare the results of the two approaches, the distances should be
computed with the same method. Both functions require a vector
(activity$east[women]
)specifying the groups to be compared.
Note that seqCompare
currently only accepts a dichotomous
grouping indicator.
<- dissassoc(activity.year.om[women,women],
discr.region group = activity$east[women])
<- seqCompare(activity.year.seq[women,],
bic.region group = activity$east[women],
method="OM", sm="CONSTANT")
The output of seqCompare
is quite parsimonious. For
Table 6.4 we extracted the p-value
(bic.region[,2]
) and the BIC difference
(bic.region[,3]
)
%>%
bic.region kable(digits = c(2,3,2,2,2)) %>%
column_spec(2:3, background = "#8DCEA4") %>%
kable_styling(full_width =FALSE)
LRT | p-value | BIC diff. | Bayes Factor (Avg) | Bayes Factor (From Avg BIC) |
---|---|---|---|---|
5.98 | 0.018 | 0.68 | 1.67 | 1.41 |
dissassoc
, on the other hand, produces quite a lot of
output which can be stored as a list object. The results shown in the
table are the total number of cases
(discr.region$groups[3,1]
), the pseudo-\(R^2\)
(discr.region$stat[3,1]
), and the corresponding p-value
(discr.region$stat[3,2]
).
We close this section double checking if the subsetting worked as
intended by inspecting the case numbers stored in
discr.region$groups
. The first row refers to East German
women and the second one to West German women. The case numbers are as
expected (see frequency table above). Hence, we can be confident that we
achieved what we wanted.
kable(discr.region$groups) %>%
kable_styling(full_width =FALSE)
n | discrepancy | |
---|---|---|
0 | 325 | 11.87963 |
1 | 196 | 11.94612 |
Total | 521 | 12.11270 |
For the more complex subsetting including all groups shown in
Table 6.4 we refer you to the script
6-2_Table_6-4_Discrepancies-BIC.R
. There we also illustrate
how to extract, arrange, and format the outputs from
dissassoc
and seqCompare
to produce the table
shown at the beginning of this page.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/sa-book/sa-book.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".