FHIR stands for Fast Health Interoperability Resources.
The Wikipedia article is a useful overview. The official website is fhir.org.
This R package addresses very basic tasks of parsing FHIR R4 documents in JSON format. The overall information model of FHIR documents is complex and various decisions are made to help extract and annotate fields presumed to have high value. Submit github issues if important fields are not being propagated.
Install this package using
::install("BiocFHIR") BiocManager
We use jsonlite::fromJSON
to import a randomly selected
FHIR document from a collection simulated by the MITRE corporation.
See the associated site for details.
We’ll drill down through the hierarchy of elements collected in a FHIR document with some base R commands, after importing the JSON.
dir(system.file("json", package="BiocFHIR"), full=TRUE)
testf = fromJSON(testf)
tt =names(tt)
## [1] "resourceType" "type" "entry"
1:2] tt[
## $resourceType
## [1] "Bundle"
##
## $type
## [1] "transaction"
tt$entry
tte =class(tte)
## [1] "data.frame"
dim(tte)
## [1] 301 3
head(names(tte))
## [1] "fullUrl" "resource" "request"
tte$resource
tter =dim(tter)
## [1] 301 72
head(names(tter))
## [1] "resourceType" "id" "text" "extension" "identifier"
## [6] "name"
table(tter$resourceType)
##
## AllergyIntolerance CarePlan CareTeam
## 8 3 3
## Claim Condition DiagnosticReport
## 46 15 3
## Encounter ExplanationOfBenefit Immunization
## 37 37 10
## MedicationRequest Observation Organization
## 9 114 3
## Patient Practitioner Procedure
## 1 3 9
It is by filtering the data frame tter
that we acquire
information that may be useful in data analysis. The
data frame is sparse: many fields are not used in many records.
Code in this package attempts to produce useful tables
from the sparse information.
As a prologue to table extraction, we do some basic
decomposition of tter
using process_fhir_bundle
.
process_fhir_bundle(testf) # just give file path
bu1 = bu1
## BiocFHIR FHIR.bundle instance.
## resource types are:
## AllergyIntolerance CarePlan ... Patient Procedure
bu1
is just a list of data.frames, but with considerable
nesting of data.frames and lists within the basic
data.frames corresponding to the major FHIR concepts.
“Flattening” of such structures is not fully automatic.
We use process_Condition
to extract information.
process_Condition(bu1$Condition)
cond1 =datatable(cond1)
We have collected 50 documents from the synthea resource. These were obtained using random draws from the 1180 records provided. A temporary folder holding them can be produced as follows:
make_test_json_set()
tset =1] tset[
## [1] "/tmp/RtmpzTExJt/jsontest/Angel97_Swift555_c072e6ad-b03f-4eee-abe0-2dbc93bbadfe.json"
We import ten documents into a list.
lapply(tset[1:10], process_fhir_bundle)
myl =1:2] myl[
## [[1]]
## BiocFHIR FHIR.bundle instance.
## resource types are:
## AllergyIntolerance CarePlan ... Patient Procedure
##
## [[2]]
## BiocFHIR FHIR.bundle instance.
## resource types are:
## CarePlan Claim ... Patient Procedure
sapply(myl,length)
## [1] 10 9 7 9 9 9 9 9 9 10
We see with the last command that documents can have different numbers of components present.
sessionInfo()
## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rjsoncons_1.0.0 jsonlite_1.8.4 DT_0.27 BiocFHIR_1.2.0
## [5] BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] dplyr_1.1.2 compiler_4.3.0 BiocManager_1.30.20
## [4] BiocBaseUtils_1.2.0 promises_1.2.0.1 tidyselect_1.2.0
## [7] Rcpp_1.0.10 tidyr_1.3.0 later_1.3.0
## [10] jquerylib_0.1.4 yaml_2.3.7 fastmap_1.1.1
## [13] mime_0.12 R6_2.5.1 generics_0.1.3
## [16] knitr_1.42 htmlwidgets_1.6.2 visNetwork_2.1.2
## [19] BiocGenerics_0.46.0 graph_1.78.0 tibble_3.2.1
## [22] bookdown_0.33 shiny_1.7.4 bslib_0.4.2
## [25] pillar_1.9.0 rlang_1.1.0 utf8_1.2.3
## [28] cachem_1.0.7 httpuv_1.6.9 xfun_0.39
## [31] sass_0.4.5 cli_3.6.1 magrittr_2.0.3
## [34] crosstalk_1.2.0 digest_0.6.31 xtable_1.8-4
## [37] lifecycle_1.0.3 vctrs_0.6.2 evaluate_0.20
## [40] glue_1.6.2 stats4_4.3.0 fansi_1.0.4
## [43] purrr_1.0.1 rmarkdown_2.21 ellipsis_0.3.2
## [46] tools_4.3.0 pkgconfig_2.0.3 htmltools_0.5.5