Chapter 5.4 Combining domain-specific dissimilarities
readme.html
and run
5-0_ChapterSetup.R
. This will create
5-0_ChapterSetup.RData
in the sub folder
data/R
. This file contains the data required to produce the
plots shown below.legend_large_box
to
your environment in order to render the tweaked version of the legend
described below. You find this file in the source
folder of
the unzipped Chapter 5 archive.LoadInstallPackages.R
# assuming you are working within .Rproj environment
library(here)
# install (if necessary) and load other required packages
source(here("source", "load_libraries.R"))
# load environment generated in "5-0_ChapterSetup.R"
load(here("data", "R", "5-0_ChapterSetup.RData"))
In chapter 5.3, we introduce another option to account for the
parallel unfolding of temporal processes: the clustering on a
dissimilarity matrix that results from the summation or the averaging of
pairwise dissimilarity matrices computed on separately on two (or more)
pools of sequences representing the trajectories in different domains.
We are now using the data.frame
multidim
,
which contains both family formation and labour market sequences. The
data come from a sub-sample of the German Family Panel - pairfam. For
further information on the study and on how to access the full
scientific use file see here.
We first have to construct an object X
that contains the
dissimilarity matrices in a sequence
<- list(mc.fam.year.om, mc.act.year.om) X
We then use the ?cbind
command to combine the
X
object in rows and columns and generate the object
Y
<- do.call(cbind, X) Y
We overwrite Y
by using the ?array
command
to give the right dimensions to the object so that the next steps can be
performed
<- array(Y, dim=c(dim(X[[1]]), length(X))) Y
We finally use the ?apply
command to apply a summation
function (option sum
) to the two dimensions of the object
Y
(1 and 2). We store the resulting dissimilarity matrix in
an object called mc.summation
<-apply(Y, c(1, 2), sum, na.rm = TRUE) mc.summation
For sake of clarity, we construct another object Z
that
contains the dissimilarity matrices in a sequence as for the case of
summation above
<- list(mc.act.year.om, mc.fam.year.om) Z
We then use the ?cbind
command to combine the Z object
in rows and columns and generate the object W
<- do.call(cbind, Z) W
We overwrite Y
by using the ?array
command
to give the right dimensions to the object so that the next steps can be
performed
<- array(W, dim=c(dim(X[[1]]), length(X))) W
We finally use the ?apply
command to apply a averaging
function (option mean
) to the two dimensions of the object
W
(1 and 2). We store the resulting dissimilarity matrix in
an object called mc.summation
<-apply(W, c(1, 2), mean, na.rm = TRUE) mc.average
Let’s first display the dissimilarity matrix between the first three sequences in the sample
1:3, 1:3] mc.fam.year.om[
1 2 3
1 0 40 34
2 40 0 22
3 34 22 0
1:3, 1:3] mc.act.year.om[
1 2 3
1 0 28 26
2 28 0 10
3 26 10 0
We now inspect the dissimilarity matrix between the first three sequences in the sample after summation….
1:3, 1:3] mc.summation[
[,1] [,2] [,3]
[1,] 0 68 60
[2,] 68 0 32
[3,] 60 32 0
… and averaging
1:3, 1:3] mc.average[
[,1] [,2] [,3]
[1,] 0 34 30
[2,] 34 0 16
[3,] 30 16 0
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/sa-book/sa-book.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".