Cross-tabulation of clusters

Chapter 5.3 Cross-tabulation of groups from different dissimilarity matrices

Click here to get instructions…
# assuming you are working within .Rproj environment
library(here)

# install (if necessary) and load other required packages
source(here("source", "load_libraries.R"))

# load environment generated in "5-0_ChapterSetup.R"
load(here("data", "R", "5-0_ChapterSetup.RData"))


In chapter 5.3, we introduce one of the options to account for the parallel unfolding of temporal processes: the cross-tabulation of cluster solutions extracted separately from two (or more) pools of sequences representing the trajectories in different domains. We are now using the data.frame multidim, which contains both family formation and labour market sequences. The data come from a sub-sample of the German Family Panel - pairfam. For further information on the study and on how to access the full scientific use file see here.

Preparatory work for family formation trajectories

First, we run a Ward cluster analysis based on the dissimilarity matrix mc.fam.year.om:

fam.ward<-hclust(as.dist(mc.fam.year.om), 
                       method="ward.D", 
                       members=multidim$weight40)

… to be used as initialization of the PAM clustering

fam.pam <- wcKMedRange(mc.fam.year.om, 
                            weights = multidim$weight40, 
                            kvals = 2:10,
                            initialclust = fam.ward)

We now extract 5 clusters…

fam.pam.5cl <- fam.pam$clustering$cluster5

…attach the cluster info to the main data.frame multidim

multidim$fam.pam.5cl<-fam.pam.5cl

… and re-label clusters from 1 to 5 instead of medoid identifiers…

fam.pam.5cl.factor <- factor(fam.pam.5cl, 
                             levels = c(16, 460, 479, 892, 898), 
                             c("1", "2", "3", "4", "5"))

…to finally attach the factor info to the main data.frame multidim:

multidim$fam.pam.5cl.factor<-fam.pam.5cl.factor

Preparatory work for labor market trajectories

First, we run a Ward cluster analysis based on the dissimilarity matrix mc.act.year.om:

act.ward<-hclust(as.dist(mc.act.year.om), 
                 method="ward.D", 
                 members=multidim$weight40)

… to be used as initialization of the PAM clustering

act.pam <- wcKMedRange(mc.act.year.om, 
                       weights = multidim$weight40, 
                       kvals = 2:10,
                       initialclust = act.ward)

We now extract 5 clusters…

act.pam.5cl <- act.pam$clustering$cluster5

…attach the cluster info to the main data.frame multidim

multidim$act.pam.5cl<-act.pam.5cl

… and re-label clusters from 1 to 5 instead of medoid identifiers…

act.pam.5cl.factor <- factor(act.pam.5cl, 
                             levels = c(6, 25, 78, 539, 709), 
                             c("1", "2", "3", "4", "5"))

…to finally attach the factor info to the main data.frame multidim

multidim$act.pam.5cl.factor<-act.pam.5cl.factor

Cross-tabulation for a 5-cluster solution on both channels

Tabulate the two vectors and store the results in an object that we name crosstab

crosstab<-table(multidim$act.pam.5cl.factor, multidim$fam.pam.5cl.factor)

…to print it at our convenience:

crosstab
   
      1   2   3   4   5
  1 118  40  58  76  42
  2  74  32  44  43  42
  3  69  32  21  32  12
  4  38  13   7  19   5
  5 131   8   9  29  33

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/sa-book/sa-book.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".