Chapter 4.1 Clustering sequences to uncover typologies
readme.html
and run
4-0_ChapterSetup.R
. This will create
4-0_ChapterSetup.RData
in the sub folder
data/R
. This file contains the data required to produce the
plots shown below.legend_large_box
to
your environment in order to render the tweaked version of the legend
described below. You find this file in the source
folder of
the unzipped Chapter 4 archive.LoadInstallPackages.R
# assuming you are working within .Rproj environment
library(here)
# install (if necessary) and load other required packages
source(here("source", "load_libraries.R"))
# load environment generated in "4-0_ChapterSetup.R"
load(here("data", "R", "4-0_ChapterSetup.RData"))
In chapter 4.1, we introduce crisp/hard clustering algorithms and cluster quality indeces to be considered when making decisions on the number of clusters to extract from the initial sample. The data come from a sub-sample of the German Family Panel - pairfam. For further information on the study and on how to access the full scientific use file see here.
We apply a hierarchical cluster analysis by using the command
?hclust
to the dissimilarity matrix
partner.child.year.om
for the family formation sequences,
computed based on OM with indel
=1 and sm
=2. We
use non-squared dissimilarities (see the method
option) and
weights (see the members
option, where we have to specify
to which data.frame
the vector with the weights belongs
to).
<- hclust(as.dist(partner.child.year.om),
fam.ward1 method = "ward.D",
members = family$weight40)
The nested structure emerging from the hierarchical clustering algorithm can be displayed using a dendrogram:
par(mar = c(3, 10, 3, 3))
plot(fam.ward1, labels = FALSE,
main ="",
ylab="",
xlab="", sub="",
cex.axis=2.5,
cex.lab=2.5)
mtext("Dissimilarity threshold", side = 2, line = 5, cex = 3)
dev.off()
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/sa-book/sa-book.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".