This document is a tutorial for the Pedixplorer
package, with examples
of creating Pedigree
objects and kinship matrices and other pedigree
utilities.
The Pedixplorer
package is an updated version of the
Kinship2
package, featuring a
change in maintainer and repository from CRAN to Bioconductor for
continued development and support.
It contains the routines to handle family data with a Pedigree
object.
The initial purpose was to create correlation structures that describe
family relationships such as kinship and identity-by-descent, which can
be used to model family data in mixed effects models, such as in the
coxme
function. It also includes tools for pedigree drawing and
filtering which is focused on producing compact layouts without
intervention. Recent additions include utilities to trim the Pedigree
object with various criteria, and kinship for the X chromosome.
Supplementary vignettes are available to explain:
Pedigree
object
vignette("pedigree_object", package = "Pedixplorer")
vignette("pedigree_alignment", package = "Pedixplorer")
vignette("pedigree_kinship", package = "Pedixplorer")
vignette("pedigree_plot", package = "Pedixplorer")
The Pedixplorer
package is available on
Bioconductor
and can be installed with the following command:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("Pedixplorer")
The package can then be loaded with the following command:
library(Pedixplorer)
Pedigree
S4 objectThe Pedigree
object is a list of dataframes that describe the family
structure. It contains the following components:
Ped
object with the pedigree information help(Ped)
.Rel
object with the relationship information help(Rel)
.Scales
object of 2 dataframe with the filling and borders
informations for the plot help(Scales)
.Hints
objects with 2 slots indicating the horder and the
spouse to organise the pedigree structure help(Hints)
.Two datasets are provided within the Pedixplorer
package: + minnbreast
:
17 families from a breast cancer study + sampleped
: two sample pedigrees,
with 41 and 14 subjects and the special relationship of these two pedigrees
in relped
.
This vignette uses the two pedigrees in sampleped
. For more
information on these datasets, see help(minnbreast)
and
help(sampleped)
.
First, we load sampleped
and look at some of the values in the dataset,
and create a Pedigree
object using the Pedigree()
function. This
function automaticaly detect the necessary columns in the dataframe. If
necessary you can modify the columns names with cols_ren. To create a
Pedigree
object, with multiple families, the dataframe just need a
family column in the ped_df dataframe. When this is the case, the
famid column will be pasted to the id of each individuals separated by
an underscore to create a unique id for each individual in the Pedigree
object.
data("sampleped")
print(sampleped[1:10, ])
## famid id dadid momid sex affection avail num
## 1 1 101 <NA> <NA> 1 0 0 2
## 2 1 102 <NA> <NA> 2 1 0 3
## 3 1 103 135 136 1 1 0 2
## 4 1 104 <NA> <NA> 2 0 0 4
## 5 1 105 <NA> <NA> 1 NA 0 6
## 6 1 106 <NA> <NA> 2 NA 0 1
## 7 1 107 <NA> <NA> 1 1 0 NA
## 8 1 108 <NA> <NA> 2 0 0 0
## 9 1 109 101 102 2 0 1 3
## 10 1 110 103 104 1 1 1 2
ped <- Pedigree(sampleped[c(3, 4, 10, 35, 36), ])
print(ped)
## Pedigree object with:
## Ped object with 5 individuals and 13 metadata columns:
## id dadid momid sex famid steril status avail
## col_class <character> <character> <character> <ordered> <character> <logical> <logical> <logical>
## 1_103 1_103 1_135 1_136 male 1 <NA> <NA> FALSE
## 1_104 1_104 <NA> <NA> female 1 <NA> <NA> FALSE
## 1_110 1_110 1_103 1_104 male 1 <NA> <NA> TRUE
## 1_135 1_135 <NA> <NA> male 1 <NA> <NA> FALSE
## 1_136 1_136 <NA> <NA> female 1 <NA> <NA> FALSE
## affected useful kin isinf num_child_tot num_child_dir num_child_ind |
## col_class <logical> <logical> <numeric> <logical> <numeric> <numeric> <numeric>
## 1_103 TRUE <NA> <NA> <NA> 1 1 0
## 1_104 FALSE <NA> <NA> <NA> 1 1 0
## 1_110 TRUE <NA> <NA> <NA> 0 0 0
## 1_135 <NA> <NA> <NA> <NA> 1 1 0
## 1_136 <NA> <NA> <NA> <NA> 1 1 0
## family indId fatherId motherId gender affection available
## col_class <character> <character> <character> <character> <character> <character> <character>
## 1_103 1 103 135 136 1 1 0
## 1_104 1 104 <NA> <NA> 2 0 0
## 1_110 1 110 103 104 1 1 1
## 1_135 1 135 <NA> <NA> 1 <NA> 0
## 1_136 1 136 <NA> <NA> 2 <NA> 0
## num error sterilisation vitalStatus affection_mods avail_mods
## col_class <character> <character> <character> <character> <character> <character>
## 1_103 2 <NA> <NA> <NA> 1 0
## 1_104 4 <NA> <NA> <NA> 0 0
## 1_110 2 <NA> <NA> <NA> 1 1
## 1_135 5 <NA> <NA> <NA> NA 0
## 1_136 6 <NA> <NA> <NA> NA 0
## Rel object with 0 relationshipswith 0 MZ twin, 0 DZ twin, 0 UZ twin, 0 Spouse:
## id1 id2 code famid
## <character> <character> <c("ordered", "factor")> <character>
For more information on the Pedigree()
function, see help(Pedigree)
.
The Pedigree
object can be subset to individual pedigrees by their
family id. The Pedigree
object has a print, summary and plot method,
which we show below. The print method prints the Ped
and Rel
object of
the pedigree. The summary method prints a short summary of the pedigree.
Finally the plot method displays the pedigree.
ped <- Pedigree(sampleped)
print(famid(ped(ped)))
## [1] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1"
## [25] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "2" "2" "2" "2" "2" "2" "2"
## [49] "2" "2" "2" "2" "2" "2" "2"
ped1 <- ped[famid(ped(ped)) == "1"]
summary(ped1)
## Pedigree object with
## [1] "Ped object with 41 individuals and 13 metadata columns"
## [1] "Rel object with 0 relationshipswith 0 MZ twin, 0 DZ twin, 0 UZ twin, 0 Spouse"
plot(ped1, cex = 0.7)
You can add a title and a legend to the plot with the following command:
plot(
ped1, title = "Pedigree 1",
legend = TRUE, leg_loc = c(7, 16, 1.05, 1.9),
cex = 0.7
)
A shiny application is available to create, interact and plot pedigrees. To launch the application, use the following command:
ped_shiny()
The use is simple:
To “break” the pedigree, we can manipulate the sex value to not match
the parent value (in this example, we change 203 from a male to a
female, even though 203 is a father). To do this, we first subset
datped2, locate the id
column, and match it to a specific id (in
this case, 203). Within id 203, then locate in the sex
column.
Assign this subset to the incorrect value of 2
(female) to change the
original/correct value of 1
(male).
To further break the pedigree, we can delete subjects who seem
irrelevant to the pedigree (in this example, we delete 209 because he
is a married-in father). To do this, we subset datped2 and use the
which()
function to locate and delete the specified subject (in this
case, 209). Reassign this code to datped22 to drop the specified
subject entirely.
datped2 <- sampleped[sampleped$famid == 2, ]
datped2[datped2$id %in% 203, "sex"] <- 2
datped2 <- datped2[-which(datped2$id %in% 209), ]
An error occurs when the Pedigree()
function notices that id 203 is
not coded to be male (1
) but is a father. To correct this, we simply
employ the fix_parents()
function to adjust the sex
value to match
either momid
or dadid
. fix_parents()
will also add back in any
deleted subjects, further fixing the Pedigree.
tryout <- try({
ped2 <- Pedigree(datped2)
})
## Error in validObject(.Object) :
## invalid class "Ped" object: dadid values '2_209' should be in '2_201', '2_202', '2_203', '2_204', '2_205'...
fixped2 <- with(datped2, fix_parents(id, dadid, momid, sex))
fixped2
## id momid dadid sex famid
## 1 201 <NA> <NA> 1 1
## 2 202 <NA> <NA> 2 1
## 3 203 <NA> <NA> 1 1
## 4 204 202 201 2 1
## 5 205 202 201 1 1
## 6 206 202 201 2 1
## 7 207 202 201 2 1
## 8 208 202 201 2 1
## 9 210 204 203 1 1
## 10 211 204 203 1 1
## 11 212 208 209 2 1
## 12 213 208 209 1 1
## 13 214 208 209 1 1
## 14 209 <NA> <NA> 1 1
ped2 <- Pedigree(fixped2)
plot(ped2)
If the fix is straightforward (changing one sex value based on either
being a mother or father), fix_parents()
will resolve the issue. If
the issue is more complicated, say if 203 is coded to be both a father
and a mother, fix_parents()
will not know which one is correct and
therefore the issue will not be resolved.
A common use for pedigrees is to make a matrix of kinship coefficients that can be used in mixed effect models. A kinship coefficient is the probability that a randomly selected allele from two people at a given locus will be identical by descent (IBD), assuming all founder alleles are independent. For example, we each have two alleles per autosomal marker, so sampling two alleles with replacement from our own DNA has only \(p=0.50\) probability of getting the same allele twice.
We use kinship()
to calculate the kinship matrix for ped2. The
result is a special symmetrix matrix class from the Matrix R
package, which is stored
efficiently to avoid repeating elements.
kin2 <- kinship(ped2)
kin2[1:9, 1:9]
## 9 x 9 sparse Matrix of class "dsCMatrix"
## 1_201 1_202 1_203 1_204 1_205 1_206 1_207 1_208 1_209
## 1_201 0.50 . . 0.25 0.25 0.25 0.25 0.25 .
## 1_202 . 0.50 . 0.25 0.25 0.25 0.25 0.25 .
## 1_203 . . 0.5 . . . . . .
## 1_204 0.25 0.25 . 0.50 0.25 0.25 0.25 0.25 .
## 1_205 0.25 0.25 . 0.25 0.50 0.25 0.25 0.25 .
## 1_206 0.25 0.25 . 0.25 0.25 0.50 0.25 0.25 .
## 1_207 0.25 0.25 . 0.25 0.25 0.25 0.50 0.25 .
## 1_208 0.25 0.25 . 0.25 0.25 0.25 0.25 0.50 .
## 1_209 . . . . . . . . 0.5
For family 2, see that the row and column names match the id in the figure below, and see that each kinship coefficient with themselves is 0.50, siblings are 0.25 (e.g. 204-205), and pedigree marry-ins only share alleles IBD with their children with coefficient 0.25 (e.g. 203-210). The plot can be used to verify other kinship coefficients.
The kinship()
function also works on a Pedigree
object with multiple
families. We show how to create the kinship matrix, then show a snapshot
of them for the two families, where the row and columns names are the
ids of the subject.
ped <- Pedigree(sampleped)
kin_all <- kinship(ped)
kin_all[1:9, 1:9]
## 9 x 9 sparse Matrix of class "dsCMatrix"
## 1_101 1_102 1_103 1_104 1_105 1_106 1_107 1_108 1_109
## 1_101 0.50 . . . . . . . 0.25
## 1_102 . 0.50 . . . . . . 0.25
## 1_103 . . 0.5 . . . . . .
## 1_104 . . . 0.5 . . . . .
## 1_105 . . . . 0.5 . . . .
## 1_106 . . . . . 0.5 . . .
## 1_107 . . . . . . 0.5 . .
## 1_108 . . . . . . . 0.5 .
## 1_109 0.25 0.25 . . . . . . 0.50
kin_all[40:43, 40:43]
## 4 x 4 sparse Matrix of class "dsCMatrix"
## 1_140 1_141 2_201 2_202
## 1_140 0.50 0.25 . .
## 1_141 0.25 0.50 . .
## 2_201 . . 0.5 .
## 2_202 . . . 0.5
kin_all[42:46, 42:46]
## 5 x 5 sparse Matrix of class "dsCMatrix"
## 2_201 2_202 2_203 2_204 2_205
## 2_201 0.50 . . 0.25 0.25
## 2_202 . 0.50 . 0.25 0.25
## 2_203 . . 0.5 . .
## 2_204 0.25 0.25 . 0.50 0.25
## 2_205 0.25 0.25 . 0.25 0.50
Specifying twin relationships in a Pedigree with multiple families
object is complicated by the fact that the user must specify the family
id to which the id1
and id2
belong. We show below the relation
matrix requires the family id to be in the last column, with the column
names as done below, to make the plotting and kinship matrices to show
up with the monozygotic twins correctly. We show how to specify
monozygosity for subjects 206 and 207 in ped2, and subjects
125 and 126 in ped1. We check it by looking at the kinship matrix
for these pairs, which are correctly at 0.5.
data("relped")
relped
## famid id1 id2 code
## 1 1 140 141 1
## 2 1 139 140 2
## 3 1 121 123 2
## 4 1 129 126 4
## 5 1 130 133 3
## 6 2 210 211 1
## 7 2 208 204 2
## 8 2 212 213 3
ped <- Pedigree(sampleped, relped)
kin_all <- kinship(ped)
kin_all[24:27, 24:27]
## 4 x 4 sparse Matrix of class "dsCMatrix"
## 1_124 1_125 1_126 1_127
## 1_124 0.5000 0.0625 0.0625 0.0625
## 1_125 0.0625 0.5000 0.2500 0.1250
## 1_126 0.0625 0.2500 0.5000 0.1250
## 1_127 0.0625 0.1250 0.1250 0.5000
kin_all[46:50, 46:50]
## 5 x 5 sparse Matrix of class "dsCMatrix"
## 2_205 2_206 2_207 2_208 2_209
## 2_205 0.50 0.25 0.25 0.25 .
## 2_206 0.25 0.50 0.25 0.25 .
## 2_207 0.25 0.25 0.50 0.25 .
## 2_208 0.25 0.25 0.25 0.50 .
## 2_209 . . . . 0.5
Note that subject 113 is not in ped1 because they are a marry-in
without children in the Pedigree
. Subject 113 is in their own Pedigree
of size 1 in the kin_all matrix at index 41. We later show how to
handle such marry-ins for plotting.
We use ped2 from sampleped
to sequentially add optional
information to the Pedigree
object.
The example below shows how to specify a status
indicator, such as
vital status. The sampleped
data does not include such an
indicator, so we create one to indicate that the first generation of
ped2, subjects 1 and 2, are deceased. The status
indicator is
used to cross out the individuals in the Pedigree plot.
df2 <- sampleped[sampleped$famid == 2, ]
names(df2)
## [1] "famid" "id" "dadid" "momid" "sex" "affection" "avail" "num"
df2$status <- c(1, 1, rep(0, 12))
ped2 <- Pedigree(df2)
summary(status(ped(ped2)))
## Mode FALSE TRUE
## logical 12 2
plot(ped2)
Here we show how to use the label
argument in the plot method to add
additional information under each subject. In the example below, we add
names to the existing plot by adding a new column to the elementMetadata
of the Ped
object of the Pedigree
.
As space permits, more lines and characters per line can be made using the a {/em } character to indicate a new line.
mcols(ped2)$Names <- c(
"John\nDalton", "Linda", "Jack", "Rachel", "Joe", "Deb",
"Lucy", "Ken", "Barb", "Mike", "Matt",
"Mindy", "Mark", "Marie\nCurie"
)
plot(ped2, label = "Names", cex = 0.7)
We show how to specify affected status with a single indicator and
multiple indicators. First, we use the affected indicator from
sampleped
, which contains 0/1 indicators and NA as missing, and let it
it indicate blue eyes. Next, we create a vector as an indicator for
baldness. And add it as a second filling scale for the plot with
generate_colors(add_to_scale = TRUE)
. The plot shapes for each subject
is therefore divided into two equal parts and shaded differently to
indicate the two affected indicators.
mcols(ped2)$bald <- as.factor(c(0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1))
ped2 <- generate_colors(ped2, col_aff = "bald", add_to_scale = TRUE)
# Increase down margin for the legend
op <- par(mai = c(1.5, 0.2, 0.2, 0.2))
plot(
ped2, legend = TRUE,
leg_loc = c(0.5, 6, 3.5, 4)
)