Chapter 5.3 Cross-tabulation of groups from different dissimilarity matrices
readme.html and run
5-0_ChapterSetup.R. This will create
5-0_ChapterSetup.RData in the sub folder
data/R. This file contains the data required to produce the
plots shown below.legend_large_box to
your environment in order to render the tweaked version of the legend
described below. You find this file in the source folder of
the unzipped Chapter 5 archive.LoadInstallPackages.R# assuming you are working within .Rproj environment
library(here)
# install (if necessary) and load other required packages
source(here("source", "load_libraries.R"))
# load environment generated in "5-0_ChapterSetup.R"
load(here("data", "R", "5-0_ChapterSetup.RData"))
In chapter 5.3, we introduce one of the options to account for the
parallel unfolding of temporal processes: the cross-tabulation of
cluster solutions extracted separately from two (or more) pools of
sequences representing the trajectories in different domains. We are now
using the data.frame multidim, which contains
both family formation and labour market sequences. The data come from a
sub-sample of the German Family Panel - pairfam. For further information
on the study and on how to access the full scientific use file see here.
First, we run a Ward cluster analysis based on the dissimilarity
matrix mc.fam.year.om:
fam.ward<-hclust(as.dist(mc.fam.year.om),
method="ward.D",
members=multidim$weight40)… to be used as initialization of the PAM clustering
fam.pam <- wcKMedRange(mc.fam.year.om,
weights = multidim$weight40,
kvals = 2:10,
initialclust = fam.ward)We now extract 5 clusters…
fam.pam.5cl <- fam.pam$clustering$cluster5…attach the cluster info to the main data.frame
multidim…
multidim$fam.pam.5cl<-fam.pam.5cl… and re-label clusters from 1 to 5 instead of medoid identifiers…
fam.pam.5cl.factor <- factor(fam.pam.5cl,
levels = c(16, 460, 479, 892, 898),
c("1", "2", "3", "4", "5"))…to finally attach the factor info to the main
data.frame multidim:
multidim$fam.pam.5cl.factor<-fam.pam.5cl.factorFirst, we run a Ward cluster analysis based on the dissimilarity
matrix mc.act.year.om:
act.ward<-hclust(as.dist(mc.act.year.om),
method="ward.D",
members=multidim$weight40)… to be used as initialization of the PAM clustering
act.pam <- wcKMedRange(mc.act.year.om,
weights = multidim$weight40,
kvals = 2:10,
initialclust = act.ward)We now extract 5 clusters…
act.pam.5cl <- act.pam$clustering$cluster5…attach the cluster info to the main data.frame
multidim…
multidim$act.pam.5cl<-act.pam.5cl… and re-label clusters from 1 to 5 instead of medoid identifiers…
act.pam.5cl.factor <- factor(act.pam.5cl,
levels = c(6, 25, 78, 539, 709),
c("1", "2", "3", "4", "5"))…to finally attach the factor info to the main
data.frame multidim
multidim$act.pam.5cl.factor<-act.pam.5cl.factorTabulate the two vectors and store the results in an object that we
name crosstab…
crosstab<-table(multidim$act.pam.5cl.factor, multidim$fam.pam.5cl.factor)…to print it at our convenience:
crosstab
1 2 3 4 5
1 118 40 58 76 42
2 74 32 44 43 42
3 69 32 21 32 12
4 38 13 7 19 5
5 131 8 9 29 33
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/sa-book/sa-book.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".