Crisp clustering algorithms and cluster quality indeces

Chapter 4.1 Clustering sequences to uncover typologies

Click here to get instructions…
# assuming you are working within .Rproj environment
library(here)

# install (if necessary) and load other required packages
source(here("source", "load_libraries.R"))

# load environment generated in "4-0_ChapterSetup.R"
load(here("data", "R", "4-0_ChapterSetup.RData"))


In chapter 4.1, we introduce crisp/hard clustering algorithms and cluster quality indeces to be considered when making decisions on the number of clusters to extract from the initial sample. The data come from a sub-sample of the German Family Panel - pairfam. For further information on the study and on how to access the full scientific use file see here.

Crisp (or hard) clustering algorithms

We apply a hierarchical cluster analysis by using the command ?hclust to the dissimilarity matrix partner.child.year.om for the family formation sequences, computed based on OM with indel=1 and sm=2. We use non-squared dissimilarities (see the method option) and weights (see the members option, where we have to specify to which data.frame the vector with the weights belongs to).

fam.ward1 <- hclust(as.dist(partner.child.year.om), 
                    method = "ward.D", 
                    members = family$weight40)

The nested structure emerging from the hierarchical clustering algorithm can be displayed using a dendrogram:

par(mar = c(3, 10, 3, 3))
plot(fam.ward1, labels = FALSE, 
     main ="", 
     ylab="",
     xlab="", sub="",
     cex.axis=2.5,
     cex.lab=2.5)
mtext("Dissimilarity threshold", side = 2, line = 5, cex = 3)

dev.off()

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/sa-book/sa-book.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".