Contents

1 Introduction

This document is a tutorial for the Pedixplorer package, with examples of creating Pedigree objects and kinship matrices and other Pedigree utilities.

The Pedixplorer package is an updated version of the Kinship2 package, featuring a change in maintainer and repository from CRAN to Bioconductor for continued development and support.

It contains the routines to handle family data with a Pedigree object. The initial purpose was to create correlation structures that describe family relationships such as kinship and identity-by-descent, which can be used to model family data in mixed effects models, such as in the coxme function. It also includes tools for Pedigree drawing and filtering which is focused on producing compact layouts without intervention. Recent additions include utilities to trim the Pedigree object with various criteria, and kinship for the X chromosome.

Supplementary vignettes are available to explain:

2 Installation

The \(Pedixplorer\) package is available on Bioconductor and can be installed with the following command:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("Pedixplorer")

The package can then be loaded with the following command:

library(Pedixplorer)

3 The Pedigree S4 object

The \(Pedigree\) object is a list of dataframes that describe the family structure. It contains the following components:

4 Basic Usage

4.1 Example Data

Two datasets are provided within the \(Pedixplorer\) package: + minnbreast: 17 families from a breast cancer study + sampleped: two sample pedigrees, with 41 and 14 subjects

This vignette uses the two pedigrees in \(sampleped\). For more information on these datasets, see help(minnbreast) and help(sampleped).

4.2 Pedigree

First, we load \(sampleped\) and look at some of the values in the dataset, and create a \(Pedigree\) object using the Pedigree() function. This function automaticaly detect the necessary columns in the dataframe. If necessary you can modify the columns names with cols_ren. To create a \(Pedigree\) object, with multiple families, the dataframe just need a family column in the ped_df dataframe. When this is the case, the famid column will be pasted to the id of each individuals separated by an underscore to create a unique id for each individual in the \(Pedigree\) object.

data("sampleped")
print(sampleped[1:10, ])
##    famid  id dadid momid sex affection avail
## 1      1 101  <NA>  <NA>   1         0     0
## 2      1 102  <NA>  <NA>   2         1     0
## 3      1 103   135   136   1         1     0
## 4      1 104  <NA>  <NA>   2         0     0
## 5      1 105  <NA>  <NA>   1        NA     0
## 6      1 106  <NA>  <NA>   2        NA     0
## 7      1 107  <NA>  <NA>   1         1     0
## 8      1 108  <NA>  <NA>   2         0     0
## 9      1 109   101   102   2         0     1
## 10     1 110   103   104   1         1     1
ped <- Pedigree(sampleped[c(3, 4, 10, 35, 36), ])
print(ped)
## Pedigree object with: 
## Ped object with 5 individuals and 12 metadata columns:
##                id       dadid       momid                      sex       famid    steril    status
##       <character> <character> <character> <c("ordered", "factor")> <character> <logical> <logical>
## 1_103       1_103       1_135       1_136                     male           1      <NA>      <NA>
## 1_104       1_104        <NA>        <NA>                   female           1      <NA>      <NA>
## 1_110       1_110       1_103       1_104                     male           1      <NA>      <NA>
## 1_135       1_135        <NA>        <NA>                     male           1      <NA>      <NA>
## 1_136       1_136        <NA>        <NA>                   female           1      <NA>      <NA>
##           avail  affected    useful       kin     isinf num_child_tot num_child_dir num_child_ind |
##       <logical> <logical> <logical> <numeric> <logical>     <numeric>     <numeric>     <numeric> |
## 1_103     FALSE      TRUE      <NA>      <NA>      <NA>             1             1             0 |
## 1_104     FALSE     FALSE      <NA>      <NA>      <NA>             1             1             0 |
## 1_110      TRUE      TRUE      <NA>      <NA>      <NA>             0             0             0 |
## 1_135     FALSE      <NA>      <NA>      <NA>      <NA>             1             1             0 |
## 1_136     FALSE      <NA>      <NA>      <NA>      <NA>             1             1             0 |
##            family       indId    fatherId    motherId    gender affection available       error
##       <character> <character> <character> <character> <integer> <integer> <integer> <character>
## 1_103           1         103         135         136         1         1         0        <NA>
## 1_104           1         104        <NA>        <NA>         2         0         0        <NA>
## 1_110           1         110         103         104         1         1         1        <NA>
## 1_135           1         135        <NA>        <NA>         1      <NA>         0        <NA>
## 1_136           1         136        <NA>        <NA>         2      <NA>         0        <NA>
##       sterilisation vitalStatus affection_mods avail_mods
##           <logical>   <logical>      <numeric>  <numeric>
## 1_103          <NA>        <NA>              1          0
## 1_104          <NA>        <NA>              0          0
## 1_110          <NA>        <NA>              1          1
## 1_135          <NA>        <NA>             NA          0
## 1_136          <NA>        <NA>             NA          0
## Rel object with 0 relationshipswith 0 MZ twin, 0 DZ twin, 0 UZ twin, 0 Spouse:
##          id1         id2                     code       famid
##  <character> <character> <c("ordered", "factor")> <character>

For more information on the Pedigree() function, see help(Pedigree).

The \(Pedigree\) object can be subset to individual pedigrees by their family id. The \(Pedigree\) object has a print, summary and plot method, which we show below. The print method prints the \(Ped\) and \(Rel\) object of the pedigree. The summary method prints a short summary of the pedigree. Finally the plot method displays the pedigree.

ped <- Pedigree(sampleped)
print(famid(ped))
##  [1] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1"
## [25] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "2" "2" "2" "2" "2" "2" "2"
## [49] "2" "2" "2" "2" "2" "2" "2"
ped1 <- ped[famid(ped) == "1"]
summary(ped1)
## Pedigree object with 
## [1] "Ped object with 41 individuals and 12 metadata columns"
## [1] "Rel object with 0 relationshipswith 0 MZ twin, 0 DZ twin, 0 UZ twin, 0 Spouse"
plot(ped1)

You can add a title and a legend to the plot with the following command:

plot(ped1, title = "Pedigree 1", legend = TRUE, leg_loc = c(5, 15, 4.5, 5))

5 Fixing Pedigree Issues

To “break” the pedigree, we can manipulate the sex value to not match the parent value (in this example, we change \(203\) from a male to a female, even though \(203\) is a father). To do this, we first subset \(datped2\), locate the id column, and match it to a specific id (in this case, \(203\)). Within id \(203\), then locate in the sex column. Assign this subset to the incorrect value of 2 (female) to change the original/correct value of 1 (male).

To further break the pedigree, we can delete subjects who seem irrelevant to the pedigree (in this example, we delete \(209\) because he is a married-in father). To do this, we subset \(datped2\) and use the -which() function to locate and delete the specified subject (in this case, \(209\)). Reassign this code to \(datped22\) to drop the specified subject entirely.

datped2 <- sampleped[sampleped$famid == 2, ]
datped2[datped2$id %in% 203, "sex"] <- 2
datped2 <- datped2[-which(datped2$id %in% 209), ]

An error occurs when the Pedigree() function notices that id \(203\) is not coded to be male (1) but is a father. To correct this, we simply employ the fix_parents() function to adjust the sex value to match either momid or dadid. fix_parents() will also add back in any deleted subjects, further fixing the Pedigree.

tryout <- try({
    ped2 <- Pedigree(datped2)
})
## Error in validObject(.Object) : 
##   invalid class "Ped" object: dadid values '2_209' should be in '2_201', '2_202', '2_203', '2_204', '2_205'...
fixped2 <- with(datped2, fix_parents(id, dadid, momid, sex))
fixped2
##     id momid dadid sex
## 1  201  <NA>  <NA>   1
## 2  202  <NA>  <NA>   2
## 3  203  <NA>  <NA>   1
## 4  204   202   201   2
## 5  205   202   201   1
## 6  206   202   201   2
## 7  207   202   201   2
## 8  208   202   201   2
## 9  210   204   203   1
## 10 211   204   203   1
## 11 212   208   209   2
## 12 213   208   209   1
## 13 214   208   209   1
## 14 209  <NA>  <NA>   1
ped2 <- Pedigree(fixped2)
plot(ped2)

If the fix is straightforward (changing one sex value based on either being a mother or father), fix_parents() will resolve the issue. If the issue is more complicated, say if \(203\) is coded to be both a father and a mother, fix_parents() will not know which one is correct and therefore the issue will not be resolved.

6 Kinship

A common use for pedigrees is to make a matrix of kinship coefficients that can be used in mixed effect models. A kinship coefficient is the probability that a randomly selected allele from two people at a given locus will be identical by descent (IBD), assuming all founder alleles are independent. For example, we each have two alleles per autosomal marker, so sampling two alleles with replacement from our own DNA has only \(p=0.50\) probability of getting the same allele twice.

6.1 Kinship for Pedigree object

We use kinship() to calculate the kinship matrix for \(ped2\). The result is a special symmetrix matrix class from the Matrix R package, which is stored efficiently to avoid repeating elements.

kin2 <- kinship(ped2)
kin2[1:9, 1:9]
## 9 x 9 sparse Matrix of class "dsCMatrix"
##       201   202  203  204   205   206   207   208   210
## 201 0.500 .     .    0.25 0.250 0.250 0.250 0.250 0.125
## 202 .     0.500 .    0.25 0.250 0.250 0.250 0.250 0.125
## 203 .     .     0.50 .    .     .     .     .     0.250
## 204 0.250 0.250 .    0.50 0.250 0.250 0.250 0.250 0.250
## 205 0.250 0.250 .    0.25 0.500 0.250 0.250 0.250 0.125
## 206 0.250 0.250 .    0.25 0.250 0.500 0.250 0.250 0.125
## 207 0.250 0.250 .    0.25 0.250 0.250 0.500 0.250 0.125
## 208 0.250 0.250 .    0.25 0.250 0.250 0.250 0.500 0.125
## 210 0.125 0.125 0.25 0.25 0.125 0.125 0.125 0.125 0.500

For family 2, see that the row and column names match the id in the figure below, and see that each kinship coefficient with themselves is \(0.50\), siblings are \(0.25\) (e.g. \(204-205\)), and pedigree marry-ins only share alleles IBD with their children with coefficient \(0.25\) (e.g. \(203-210\)). The plot can be used to verify other kinship coefficients.

6.2 Kinship for Pedigree with multiple families

The kinship() function also works on a \(Pedigree\) object with multiple families. We show how to create the kinship matrix, then show a snapshot of them for the two families, where the row and columns names are the ids of the subject.

ped <- Pedigree(sampleped)
kin_all <- kinship(ped)
kin_all[1:9, 1:9]
## 9 x 9 sparse Matrix of class "dsCMatrix"
##       1_101 1_102 1_103 1_104 1_105 1_106 1_107 1_108 1_109
## 1_101  0.50  .      .     .     .     .     .     .    0.25
## 1_102  .     0.50   .     .     .     .     .     .    0.25
## 1_103  .     .      0.5   .     .     .     .     .    .   
## 1_104  .     .      .     0.5   .     .     .     .    .   
## 1_105  .     .      .     .     0.5   .     .     .    .   
## 1_106  .     .      .     .     .     0.5   .     .    .   
## 1_107  .     .      .     .     .     .     0.5   .    .   
## 1_108  .     .      .     .     .     .     .     0.5  .   
## 1_109  0.25  0.25   .     .     .     .     .     .    0.50
kin_all[40:43, 40:43]
## 4 x 4 sparse Matrix of class "dsCMatrix"
##       1_140 1_141 2_201 2_202
## 1_140  0.50  0.25   .     .  
## 1_141  0.25  0.50   .     .  
## 2_201  .     .      0.5   .  
## 2_202  .     .      .     0.5
kin_all[42:46, 42:46]
## 5 x 5 sparse Matrix of class "dsCMatrix"
##       2_201 2_202 2_203 2_204 2_205
## 2_201  0.50  .      .    0.25  0.25
## 2_202  .     0.50   .    0.25  0.25
## 2_203  .     .      0.5  .     .   
## 2_204  0.25  0.25   .    0.50  0.25
## 2_205  0.25  0.25   .    0.25  0.50

6.3 Kinship for twins in Pedigree with multiple families

Specifying twin relationships in a Pedigree with multiple families object is complicated by the fact that the user must specify the family id to which the id1 and id2 belong. We show below the relation matrix requires the family id to be in the last column, with the column names as done below, to make the plotting and kinship matrices to show up with the monozygotic twins correctly. We show how to specify monozygosity for subjects \(206\) and \(207\) in \(ped2\), and subjects \(125\) and \(126\) in \(ped1\). We check it by looking at the kinship matrix for these pairs, which are correctly at \(0.5\).

reltwins <- as.data.frame(rbind(c(206, 207, 1, 2), c(125, 126, 1, 1)))
colnames(reltwins) <- c("indId1", "indId2", "code", "family")
ped <- Pedigree(sampleped, reltwins)

kin_all <- kinship(ped)
kin_all[24:27, 24:27]
## 4 x 4 sparse Matrix of class "dsCMatrix"
##        1_124  1_125  1_126  1_127
## 1_124 0.5000 0.0625 0.0625 0.0625
## 1_125 0.0625 0.5000 0.5000 0.1250
## 1_126 0.0625 0.5000 0.5000 0.1250
## 1_127 0.0625 0.1250 0.1250 0.5000
kin_all[46:50, 46:50]
## 5 x 5 sparse Matrix of class "dsCMatrix"
##       2_205 2_206 2_207 2_208 2_209
## 2_205  0.50  0.25  0.25  0.25   .  
## 2_206  0.25  0.50  0.50  0.25   .  
## 2_207  0.25  0.50  0.50  0.25   .  
## 2_208  0.25  0.25  0.25  0.50   .  
## 2_209  .     .     .     .      0.5

Note that subject \(113\) is not in \(ped1\) because they are a marry-in without children in the \(Pedigree\). Subject \(113\) is in their own \(Pedigree\) of size 1 in the \(kin_all\) matrix at index \(41\). We later show how to handle such marry-ins for plotting.

7 Optional Pedigree Informations

We use \(ped2\) from \(sampleped\) to sequentially add optional information to the \(Pedigree\) object.

7.1 Status

The example below shows how to specify a \(status\) indicator, such as vital status. The \(sampleped\) data does not include such an
indicator, so we create one to indicate that the first generation of \(ped2\), subjects \(1\) and \(2\), are deceased. The \(status\) indicator is used to cross out the individuals in the Pedigree plot.

df2 <- sampleped[sampleped$famid == 2, ]
names(df2)
## [1] "famid"     "id"        "dadid"     "momid"     "sex"       "affection" "avail"
df2$status <- c(1, 1, rep(0, 12))
ped2 <- Pedigree(df2)
summary(status(ped(ped2)))
##    Mode   FALSE    TRUE 
## logical      12       2
plot(ped2)

7.2 Labels

Here we show how to use the \(label\) argument in the plot method to add additional information under each subject. In the example below, we add names to the existing plot by adding a new column to the \(elementMetadata\) of the \(Ped\) object of the \(Pedigree\).

As space permits, more lines and characters per line can be made using the a {/em } character to indicate a new line.

mcols(ped2)$Names <- c(
    "John\nDalton", "Linda", "Jack", "Rachel", "Joe", "Deb",
    "Lucy", "Ken", "Barb", "Mike", "Matt",
    "Mindy", "Mark", "Marie\nCurie"
)
plot(ped2, label = "Names")

7.3 Affected Indicators

We show how to specify affected status with a single indicator and multiple indicators. First, we use the affected indicator from \(sampleped\), which contains \(0\)/\(1\) indicators and \(NA\) as missing, and let it it indicate blue eyes. Next, we create a vector as an indicator for baldness. And add it as a second filling scale for the plot with generate_colors(add_to_scale = TRUE). The plot shapes for each subject is therefore divided into two equal parts and shaded differently to indicate the two affected indicators.

mcols(ped2)$bald <- as.factor(c(0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1))
ped2 <- generate_colors(ped2, col_aff = "bald", add_to_scale = TRUE)
plot(ped2, legend = TRUE)