Chapter 5.3 Cross-tabulation of groups from different dissimilarity matrices
readme.html
and run
5-0_ChapterSetup.R
. This will create
5-0_ChapterSetup.RData
in the sub folder
data/R
. This file contains the data required to produce the
plots shown below.legend_large_box
to
your environment in order to render the tweaked version of the legend
described below. You find this file in the source
folder of
the unzipped Chapter 5 archive.LoadInstallPackages.R
# assuming you are working within .Rproj environment
library(here)
# install (if necessary) and load other required packages
source(here("source", "load_libraries.R"))
# load environment generated in "5-0_ChapterSetup.R"
load(here("data", "R", "5-0_ChapterSetup.RData"))
In chapter 5.3, we introduce one of the options to account for the
parallel unfolding of temporal processes: the cross-tabulation of
cluster solutions extracted separately from two (or more) pools of
sequences representing the trajectories in different domains. We are now
using the data.frame
multidim
, which contains
both family formation and labour market sequences. The data come from a
sub-sample of the German Family Panel - pairfam. For further information
on the study and on how to access the full scientific use file see here.
First, we run a Ward cluster analysis based on the dissimilarity
matrix mc.fam.year.om
:
<-hclust(as.dist(mc.fam.year.om),
fam.wardmethod="ward.D",
members=multidim$weight40)
… to be used as initialization of the PAM clustering
<- wcKMedRange(mc.fam.year.om,
fam.pam weights = multidim$weight40,
kvals = 2:10,
initialclust = fam.ward)
We now extract 5 clusters…
.5cl <- fam.pam$clustering$cluster5 fam.pam
…attach the cluster info to the main data.frame
multidim
…
$fam.pam.5cl<-fam.pam.5cl multidim
… and re-label clusters from 1 to 5 instead of medoid identifiers…
.5cl.factor <- factor(fam.pam.5cl,
fam.pamlevels = c(16, 460, 479, 892, 898),
c("1", "2", "3", "4", "5"))
…to finally attach the factor info to the main
data.frame
multidim
:
$fam.pam.5cl.factor<-fam.pam.5cl.factor multidim
First, we run a Ward cluster analysis based on the dissimilarity
matrix mc.act.year.om
:
<-hclust(as.dist(mc.act.year.om),
act.wardmethod="ward.D",
members=multidim$weight40)
… to be used as initialization of the PAM clustering
<- wcKMedRange(mc.act.year.om,
act.pam weights = multidim$weight40,
kvals = 2:10,
initialclust = act.ward)
We now extract 5 clusters…
.5cl <- act.pam$clustering$cluster5 act.pam
…attach the cluster info to the main data.frame
multidim
…
$act.pam.5cl<-act.pam.5cl multidim
… and re-label clusters from 1 to 5 instead of medoid identifiers…
.5cl.factor <- factor(act.pam.5cl,
act.pamlevels = c(6, 25, 78, 539, 709),
c("1", "2", "3", "4", "5"))
…to finally attach the factor info to the main
data.frame
multidim
$act.pam.5cl.factor<-act.pam.5cl.factor multidim
Tabulate the two vectors and store the results in an object that we
name crosstab
…
<-table(multidim$act.pam.5cl.factor, multidim$fam.pam.5cl.factor) crosstab
…to print it at our convenience:
crosstab
1 2 3 4 5
1 118 40 58 76 42
2 74 32 44 43 42
3 69 32 21 32 12
4 38 13 7 19 5
5 131 8 9 29 33
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/sa-book/sa-book.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".