1 Introduction
2 Jumpstart: How to build queries using InterMineR
- 2.1 Select a database
- 2.2 Obtain a prebuilt query
  - 2.2.1 Look at the data model
3 ```{r getModel, warning=FALSE, message=FALSE}
4 model <- getModel(im)
5 head(model)
6 ```{r pathway_type, warning=FALSE, message=FALSE}
7 model[which(model$type==“Pathway”),]
8 Modify a Query
9 {r add_column, warning=FALSE, message=FALSE} # model[which(model$type=="Gene"),] #
10 {r add_column2, warning=FALSE, message=FALSE} # model[which(model$type=="Disease"),] #
- 10.1 Change constraint logic
11 Recipes
12 System info
13 Appendix
- 13.1 Visual way to derive the column name of a query view or the path name in a query constraint from the database webpage

1 Introduction

InterMine is a powerful open source data warehouse system integrating diverse biological data sets (e.g. genomic, expression and protein data) for various organisms. Integrating data makes it possible to run sophisticated data mining queries that span domains of biological knowledge. A selected list of databases powered by InterMine is shown in Table 1:

Database	Organism	Data
FlyMine	Drosophila	Genes, homology, proteins, interactions, gene ontology, expression, regulation, phenotypes, pathways, diseases, resources, publications
HumanMine	H. sapiens	Genomics, SNPs, GWAS, proteins, gene ontology, pathways, gene expression, interactions, publications, disease, orthologues, alleles
MouseMine	M. musculus	Genomics, proteins, gene ontology, expression, interactions, pathways, phenotypes, diseases, homology, publications
RatMine	R. norvegicus	Disease, gene ontology, genomics, interactions, phenotype, pathway, proteins, publication QTL, SNP
WormMine	C. elegans	Genes, alleles, homology, go annotation, phenotypes, strains
YeastMine	S. cerevisiae	Genomics, proteins, gene ontology, comparative genomics, phenotypes, interactions, literature, pathways, gene expression
ZebrafishMine	D. rerio	Genes, constructs, disease, gene ontology, genotypes, homology, morpholinos, phenotypes
TargetMine	H. sapiens, M. musculus	Genes, protein structures, chemical compounds, protein domains, gene function, pathways, interactions, disease, drug targets
MitoMiner	H. sapiens, M. musculus, R. norvegicus, D. rerio, S. cerevisiae, S. pombe	Genes, homology, localisation evidence, Mitochondrial reference gene lists, phenotypes, diseases, expression, interactions, pathways, exome
IndigoMine	Archae	Genomics
ThaleMine	A. thaliana	Genomics, proteins, domains, homology, gene ontology, interactions, expression, publications, pathways, GeneRIF, stocks, phenotypes, alleles, insertions, TAIR
MedicMine	Medicago truncatula	Genomics, pathways, gene ontology, publications, proteins, homology
PhytoMine	over 50 plant genomes	Genes, proteins, expression, transcripts, homology

Please see the InterMine home page for a full list of available InterMines.

InterMine includes an attractive, user-friendly web interface that works ‘out of the box’ and a powerful, scriptable web-service API to allow programmatic access to your data. This R package provides an interface with the InterMine-powered databases through Web services.

2 Jumpstart: How to build queries using InterMineR

Let’s start with a simple task - find the pathways of the gene ABO.

2.1 Select a database

First, we look at what databases are available.

library(InterMineR)
listMines()

##                                                BMAP 
##                "https://bmap.jgi.doe.gov/bmapmine/" 
##                                            BeanMine 
##             "https://mines.legumeinfo.org/beanmine" 
##                                          BovineMine 
##            "http://genomes.missouri.edu/bovinemine" 
##                                             CHOmine 
##                "https://chomine.boku.ac.at/chomine" 
##                                        ChickpeaMine 
##         "https://mines.legumeinfo.org/chickpeamine" 
##                                           CovidMine 
##             "https://test.intermine.org/covidmine/" 
##                                          CowpeaMine 
##           "https://mines.legumeinfo.org/cowpeamine" 
##                                             FawMine 
##                "http://insectmine.org:8080/fawmine" 
##                                             FlyMine 
##                   "https://www.flymine.org/flymine" 
##                                           GrapeMine 
##          "http://urgi.versailles.inra.fr/GrapeMine" 
##                                           HumanMine 
##               "https://www.humanmine.org/humanmine" 
##                                     HymenopteraMine 
##         "http://128.206.116.3:8080/hymenopteramine" 
##                                      JointvetchMine 
##       "https://mines.legumeinfo.org/jointvetchmine" 
##                                          LegumeMine 
##           "https://mines.legumeinfo.org/legumemine" 
##                                          LocustMine 
##             "http://locustmine.org:8080/locustmine" 
##                                           LupinMine 
##            "https://mines.legumeinfo.org/lupinmine" 
##                                           MaizeMine 
## "http://maizemine.rnet.missouri.edu:8080/maizemine" 
##                                           MedicMine 
##            "https://mines.legumeinfo.org/medicmine" 
##                                             ModMine 
##         "http://intermine.modencode.org/release-33" 
##                                           MouseMine 
##                "http://www.mousemine.org/mousemine" 
##                                             OakMine 
##      "https://urgi.versailles.inra.fr/OakMine_PM1N" 
##                                          PeanutMine 
##           "https://mines.legumeinfo.org/peanutmine" 
##                                           PhytoMine 
##          "https://phytozome.jgi.doe.gov/phytomine/" 
##                                            PlanMine 
##               "http://planmine.mpi-cbg.de/planmine" 
##                                             RatMine 
##                    "http://ratmine.mcw.edu/ratmine" 
##                                             RepetDB 
##            "http://urgi.versailles.inra.fr/repetdb" 
##                                             SoyMine 
##              "https://mines.legumeinfo.org/soymine" 
##                                          TargetMine 
##    "https://targetmine.mizuguchilab.org/targetmine" 
##                                           ThaleMine 
##                 "https://bar.utoronto.ca/thalemine" 
##                                           WheatMine 
##         "https://urgi.versailles.inra.fr/WheatMine" 
##                                            WormMine 
##     "http://intermine.wormbase.org/tools/wormmine/" 
##                                             XenMine 
##                    "http://www.xenmine.org/xenmine" 
##                                           YeastMine 
##       "https://yeastmine.yeastgenome.org/yeastmine" 
##                                       ZebrafishMine 
##                          "http://zebrafishmine.org"

Since we would like to query human genes, we select HumanMine.

# load HumaMine
im <- initInterMine(mine=listMines()["HumanMine"])
im

## An object of class "Service"
## Slot "mine":
##                             HumanMine 
## "https://www.humanmine.org/humanmine" 
## 
## Slot "token":
## [1] ""

2.2 Obtain a prebuilt query

Both in InterMine database website and in InterMineR, you are able to build custom queries. However, to facilitate the retrieval of information from InterMine databases, a variety of pre-built queries, called templates, have also been made available. Templates are queries that have already been created with a fixed set of output columns and one or more constraints.

# Get template (collection of pre-defined queries)
template = getTemplates(im)
head(template)

##                         name
## 1 Tissue_Expression_illumina
## 2          humDisGeneOrthol2
## 3              PhenotypeGene
## 4    Protein_complex_details
## 5                disExprGene
## 6       protein_interactions
##                                                  title
## 1       Tissue --> Gene Expression (Illumina body map)
## 2    Human Disease --> Human Gene + Orthologue Gene(s)
## 3 Mouse Phenotype -->  Mouse Genes + Orthologous genes
## 4                          Protein --> Protein Complex
## 5                         Disease Expression --> Genes
## 6                             Protein --> Interactions

We would like to find templates involving genes.

# Get gene-related templates
template[grep("gene", template$name, ignore.case=TRUE),]

##                               name
## 2                humDisGeneOrthol2
## 3                    PhenotypeGene
## 5                      disExprGene
## 7               Gene_Interactions2
## 8               Protein_Gene_Ortho
## 9                      GOterm_Gene
## 12             Disease_gene_RNAseq
## 13           Gene_Alleles_Disease2
## 15      ChromRegion_GenesTransExon
## 18                     GeneExpress
## 19                  Disease_Genes2
## 20                   Gene_Location
## 21    Protein_GeneChromosomeLength
## 22                Gene_Identifiers
## 24                    Gene_Pathway
## 25            gene_complex_details
## 28           Gene_protein_sequence
## 29                    PathwayGenes
## 31                    Gene_Protein
## 32          Gene_OverlapppingGenes
## 35            Gene_To_Publications
## 36 Gene_Interactions_forReportPage
## 38                Gene_Disease_HPO
## 39                testPathwayGenes
## 40                         Gene_GO
## 42       GeneInteractorsExpression
## 43     Gene_particularGoannotation
## 44   Gene_TissueExpressionIllumina
## 46             Gene_HPOphenotype_2
## 47             domain_protein_gene
## 48            Gene_Expression_GTex
## 49     Gene_ExpressionProteinAtlas
## 50             Pathway_ProteinGene
## 51                Gene_description
## 55           Gene_Interact_disease
## 56           GeneHPOparent_Genes_2
## 57              Gene_proteindomain
## 58                        HPO_Gene
## 59                     Gene_SigSNP
## 60                     Gene_inGWAS
## 61               geneGWAS_reportPg
## 62             geneInteractiongene
## 63                   Gene_Disease2
## 64         Term_inGWASoptionalGene
## 65    Gene_proteinAtlasExpression2
## 66                  GeneOrthAllele
## 68                      Gene_Orth2
## 71                       Gene_Orth
## 72               ChromRegion_Genes
## 74       GenePathway_interactions2
## 75                 Gene_AllelePhen
##                                                                          title
## 2                            Human Disease --> Human Gene + Orthologue Gene(s)
## 3                         Mouse Phenotype -->  Mouse Genes + Orthologous genes
## 5                                                 Disease Expression --> Genes
## 7                                                        Gene --> Interactions
## 8                                             Protein --> Gene and Orthologues
## 9                                                            GO term --> Genes
## 12                                       Disease -> Genes + RNA-seq Expression
## 13                                 Gene --> Alleles and Disease (clinVar data)
## 15                    Chromosomal Location --> All Genes + Transcripts + Exons
## 18       Gene --> Gene Expression  (Tissue, Disease; Array Express, E-MTAB-62)
## 19                                                         Disease --> Gene(s)
## 20                                              Gene --> Chromosomal location.
## 21                                                           Protein --> Gene.
## 22                                                   Gene --> All identifiers.
## 24                                                            Gene --> Pathway
## 25                                                    Gene --> protein complex
## 28                                          Gene -> Protein + protein sequence
## 29                                                           Pathway --> Genes
## 31                                                          Gene --> Proteins.
## 32                                                 Gene --> Overlapping genes.
## 35                                                      Gene --> Publications.
## 36                                  Gene --> Physical and Genetic Interactions
## 38                Gene --> Disease + HPO annotations (Human Phenotype Ontology
## 39                                                       testPathway --> Genes
## 40                                                          Gene --> GO terms.
## 42 Gene + Tissue Expression  --> Interactors that are expressed in that tissue
## 43                                         Gene + GO term --> Genes by GO term
## 44                              Gene --> Tissue Expression (Illumina body map)
## 46                           Gene -> HPO annotation (Human Phenotype Ontology)
## 47                                        Protein Domain --> Protein and Genes
## 48                                      Gene --> Tissue Expression (GTex data)
## 49                       Gene(s) --> Tissue Expression (Protein Atlas RNA-seq)
## 50                                                Pathway --> Protein and Gene
## 51                                                         Gene -> Description
## 55                                             Gene -> Interactions + diseases
## 56                                   Gene + HPO Phenotype parent term -> Genes
## 57                                                  Gene --> Protein + Domains
## 58                                                          HPO term --> Genes
## 59                                    Gene(s) --> Significant SNPs (GTex data)
## 60                                                           Gene --> GWAS hit
## 61                                                    Gene Report --> GWAS hit
## 62                                           Gene A --> Interaction <-- Gene B
## 63                                                     Gene --> Disease (OMIM)
## 64                            GWAS term --> SNP + Associated gene if available
## 65                                        Gene --> Protein tissue Localisation
## 66                              Gene (Hum OR Rat) --> Mouse Allele (Phenotype)
## 68                                                       Gene --> Orthologues2
## 71                                                        Gene --> Orthologues
## 72                                                            Region --> Genes
## 74                                             Gene + Pathway --> Interactions
## 75                                           Mouse Gene --> Allele [Phenotype]

The template Gene_Pathway seems to be what we want. Let’s look at this template in more detail.

# Query for gene pathways
queryGenePath = getTemplateQuery(
  im = im, 
  name = "Gene_Pathway"
)
queryGenePath

## $model
##      name 
## "genomic" 
## 
## $title
## [1] "Gene --> Pathway"
## 
## $description
## [1] "For a given Gene (or List of Genes) show any associated Pathway(s) (Data Source: KEGG or REACTOME). Keywords: pathways, metabolism, cascade "
## 
## $select
## [1] "Gene.primaryIdentifier"      "Gene.symbol"                
## [3] "Gene.name"                   "Gene.pathways.name"         
## [5] "Gene.pathways.dataSets.name" "Gene.pathways.identifier"   
## [7] "Gene.organism.shortName"    
## 
## $constraintLogic
## [1] "B and A"
## 
## $name
## [1] "Gene_Pathway"
## 
## $comment
## [1] "Added 26OCT2010: ML"
## 
## $tags
## [1] "im:public"    "im:report"    "im:favourite"
## 
## $rank
## [1] "1"
## 
## $orderBy
## $orderBy[[1]]
## Gene.primaryIdentifier 
##                  "ASC" 
## 
## $orderBy[[2]]
## Gene.primaryIdentifier 
##                  "ASC" 
## 
## $orderBy[[3]]
## Gene.primaryIdentifier 
##                  "ASC" 
## 
## 
## $where
## $where[[1]]
## $where[[1]]$path
## [1] "Gene"
## 
## $where[[1]]$op
## [1] "LOOKUP"
## 
## $where[[1]]$code
## [1] "A"
## 
## $where[[1]]$editable
## [1] TRUE
## 
## $where[[1]]$switchable
## [1] FALSE
## 
## $where[[1]]$switched
## [1] "LOCKED"
## 
## $where[[1]]$value
## [1] "pparg"
## 
## 
## $where[[2]]
## $where[[2]]$path
## [1] "Gene.organism.name"
## 
## $where[[2]]$op
## [1] "="
## 
## $where[[2]]$code
## [1] "B"
## 
## $where[[2]]$editable
## [1] TRUE
## 
## $where[[2]]$switchable
## [1] FALSE
## 
## $where[[2]]$switched
## [1] "LOCKED"
## 
## $where[[2]]$value
## [1] "Homo sapiens"

There are three essential members in a query - SELECT, WHERE and constraintLogic.

SELECT
1. The SELECT (or view) represents the output columns in the query output.
2. Columns of a view are usually of the form “A.B”, where B is the child of A. For example in the column Gene.symbol, symbol is the child of Gene. Columns could also be in cascade form “A.B.C”. For example, in the column Gene.locations.start, locations is the child of Gene and start is the child of locations.
WHERE
1. The WHERE statement is a collection of constraints.
2. Query constraints include a list of the following columns:
  1. path
    1. in the same format as view columns
  2. op
    1. the constraint operator
    2. Valid values: “=”, “!=”, “LOOKUP”, “ONE OF”, “NONE OF”, “>”, “<”, “>=”, “<=”, “LIKE”
  3. value 1. the constraint value
  4. code
    1. Ignore
    2. The logic code for the constraint (e.g. A, B or C).
    3. Only used in the constrainLogic (discussed below
  5. extraValue
    1. optional, required for LOOKUP constraints
    2. Short name of organism, e.g. H. sapiens
    1. Editable
      1. Ignore
      2. Used to determine if user is allowed to edit this constraint. Only for the UI.
  6. Switchable
    1. Ignore
    2. Used to determine if user is allowed to disable this constraint.
      Only for the UI.
  7. Switched
    1. Ignore
    2. Used to determine if user has enabled this constraint. Only for the UI.
constraintLogic
1. Constraint Logic, if not explicitly given, is “AND” operation, e.g., “A and B”, where A and B are the codes in the constraints.

2.2.1 Look at the data model

What does ‘Gene.symbol’ mean? What is ‘Gene.pathway.identifier’?

Let’s take a look at the data model. NOTE: Section temporarily removed due to errors

3 ```{r getModel, warning=FALSE, message=FALSE}

4 model <- getModel(im)

5 head(model)


Let's look at the children of the Gene data type.

# ```{r gene_type_data, warning=FALSE, message=FALSE}

# model[which(model$type=="Gene"),]

Gene has a field called ‘symbol’ (hence the column Gene.symbol). Gene also references the Pathways class, which is of the Pathway data type.

6 ```{r pathway_type, warning=FALSE, message=FALSE}

7 model[which(model$type==“Pathway”),]




## Run a Query
Let's now run our template.


```r
resGenePath <- runQuery(im, queryGenePath)
head(resGenePath)

##   Gene.primaryIdentifier Gene.symbol
## 1                   5468       PPARG
## 2                   5468       PPARG
## 3                   5468       PPARG
## 4                   5468       PPARG
## 5                   5468       PPARG
## 6                   5468       PPARG
##                                          Gene.name
## 1 peroxisome proliferator activated receptor gamma
## 2 peroxisome proliferator activated receptor gamma
## 3 peroxisome proliferator activated receptor gamma
## 4 peroxisome proliferator activated receptor gamma
## 5 peroxisome proliferator activated receptor gamma
## 6 peroxisome proliferator activated receptor gamma
##                             Gene.pathways.name Gene.pathways.dataSets.name
## 1                        Developmental Biology  Reactome pathways data set
## 2              Gene expression (Transcription)  Reactome pathways data set
## 3                Generic Transcription Pathway  Reactome pathways data set
## 4 Intracellular signaling by second messengers  Reactome pathways data set
## 5        MECP2 regulates transcription factors  Reactome pathways data set
## 6                                   Metabolism  Reactome pathways data set
##   Gene.pathways.identifier Gene.organism.shortName
## 1            R-HSA-1266738              H. sapiens
## 2              R-HSA-74160              H. sapiens
## 3             R-HSA-212436              H. sapiens
## 4            R-HSA-9006925              H. sapiens
## 5            R-HSA-9022707              H. sapiens
## 6            R-HSA-1430728              H. sapiens

8 Modify a Query

8.1 Edit a constraint

Let’s modify the query to find the pathways of the gene ABO. We want to change the ‘value’ attribute from PPARG to ABO.

There are two ways to build a query in InterMineR.

We can either build a query as a list object with newQuery function, and assign all input values (selection of retrieved data type, constraints, etc.) as items of that list,
Or we can build the query as an InterMineR-class object with the functions setConstraint, which allows us to generate a new or modify an existing list of constraints, and setQuery, which generates the query as a InterMineR-class object.

setConstraints and setQuery functions are designed to facilitate the generation of queries for InterMine instances and avoid using multiple iterative loops, especially when it is required to include multiple constraints or constraint values (e.g. genes, organisms) in your query.

# modify directly the value of the first constraint from the list query
queryGenePath$where[[1]][["value"]] <- "ABO"

# or modify the value of the first constraint from the list query with setConstraints
queryGenePath$where = setConstraints(
  modifyQueryConstraints = queryGenePath,
  m.index = 1,
  values = list("ABO")
)

queryGenePath$where

## [[1]]
## [[1]]$path
## [1] "Gene"
## 
## [[1]]$op
## [1] "LOOKUP"
## 
## [[1]]$code
## [1] "A"
## 
## [[1]]$editable
## [1] TRUE
## 
## [[1]]$switchable
## [1] FALSE
## 
## [[1]]$switched
## [1] "LOCKED"
## 
## [[1]]$value
## [1] "ABO"
## 
## 
## [[2]]
## [[2]]$path
## [1] "Gene.organism.name"
## 
## [[2]]$op
## [1] "="
## 
## [[2]]$code
## [1] "B"
## 
## [[2]]$editable
## [1] TRUE
## 
## [[2]]$switchable
## [1] FALSE
## 
## [[2]]$switched
## [1] "LOCKED"
## 
## [[2]]$value
## [1] "Homo sapiens"

Note the value is now equal to ‘ABO’. Let’s rerun our query with the new constraint.

resGenePath <- runQuery(im, queryGenePath)
head(resGenePath)

##   Gene.primaryIdentifier Gene.symbol
## 1                     28         ABO
## 2                     28         ABO
## 3                     28         ABO
## 4                     28         ABO
##                                                                              Gene.name
## 1 ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase
## 2 ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase
## 3 ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase
## 4 ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase
##                 Gene.pathways.name Gene.pathways.dataSets.name
## 1     ABO blood group biosynthesis  Reactome pathways data set
## 2 Blood group systems biosynthesis  Reactome pathways data set
## 3                       Metabolism  Reactome pathways data set
## 4      Metabolism of carbohydrates  Reactome pathways data set
##   Gene.pathways.identifier Gene.organism.shortName
## 1            R-HSA-9033807              H. sapiens
## 2            R-HSA-9033658              H. sapiens
## 3            R-HSA-1430728              H. sapiens
## 4              R-HSA-71387              H. sapiens

Now we are seeing pathways for the ABO gene.

8.2 Add a new constraint

You can also add additional filters. Let’s look for a specifc pathway.

There are four parts of a constraint to add:

path
1. I got the path from the output columns but I could have figured out it from the data model.
op
1. Valid values: “=”, “!=”, “LOOKUP”, “ONE OF”, “NONE OF”, “>”, “<”, “>=”, “<=”, “LIKE”
value
1. What value I am filtering on.
code
1. Must be a letter not in use by the query already. Looking at the query output above we can see we only have one constraint, labelled ‘A’. Let’s use ‘B’ for our code.

newConstraint <- list(
  path=c("Gene.pathways.name"),
  op=c("="), 
  value=c("ABO blood group biosynthesis"), 
  code=c("B")
)

queryGenePath$where[[2]] <- newConstraint
queryGenePath$where

## [[1]]
## [[1]]$path
## [1] "Gene"
## 
## [[1]]$op
## [1] "LOOKUP"
## 
## [[1]]$code
## [1] "A"
## 
## [[1]]$editable
## [1] TRUE
## 
## [[1]]$switchable
## [1] FALSE
## 
## [[1]]$switched
## [1] "LOCKED"
## 
## [[1]]$value
## [1] "ABO"
## 
## 
## [[2]]
## [[2]]$path
## [1] "Gene.pathways.name"
## 
## [[2]]$op
## [1] "="
## 
## [[2]]$value
## [1] "ABO blood group biosynthesis"
## 
## [[2]]$code
## [1] "B"

Our new filter has been added successfully. Rerun the query and you see you only have one pathway,ABO blood group biosynthesis, returned.

resGenePath <- runQuery(im, queryGenePath)
resGenePath

##   Gene.primaryIdentifier Gene.symbol
## 1                     28         ABO
## 2            MGI:2135738         Abo
##                                                                                                                      Gene.name
## 1                                         ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase
## 2 ABO blood group (transferase A, alpha 1-3-N-acetylgalactosaminyltransferase, transferase B, alpha 1-3-galactosyltransferase)
##             Gene.pathways.name Gene.pathways.dataSets.name
## 1 ABO blood group biosynthesis  Reactome pathways data set
## 2 ABO blood group biosynthesis  Reactome pathways data set
##   Gene.pathways.identifier Gene.organism.shortName
## 1            R-HSA-9033807              H. sapiens
## 2            R-MMU-9033807             M. musculus

8.3 Add a column

You can also add additional columns to the output. For instance, is the Gene also involved in any disease? Let’s add this information.

Let’s see what we know about diseases.

9 `{r add_column, warning=FALSE, message=FALSE} # model[which(model$type=="Gene"),] #`

The Gene data type has an ‘Diseases’ reference of type ‘Disease’.

10 `{r add_column2, warning=FALSE, message=FALSE} # model[which(model$type=="Disease"),] #`

Disease has an attribute called “name”. Add Gene.diseases.name to the view. We’ll add it as the last column, we can see from above there are 7 other columns already so we’ll put it as #8:

# use setQuery function which will create an InterMineR-class query
queryGenePath.InterMineR = setQuery(
  inheritQuery = queryGenePath,
  select = c(queryGenePath$select, 
             "Gene.diseases.name")
  )

getSelect(queryGenePath.InterMineR)

## [1] "Gene.primaryIdentifier"      "Gene.symbol"                
## [3] "Gene.name"                   "Gene.pathways.name"         
## [5] "Gene.pathways.dataSets.name" "Gene.pathways.identifier"   
## [7] "Gene.organism.shortName"     "Gene.diseases.name"

#queryGenePath.InterMineR@select

# or assign new column directly to the existing list query
queryGenePath$select[[8]] <- "Gene.diseases.name"
queryGenePath$select

## [1] "Gene.primaryIdentifier"      "Gene.symbol"                
## [3] "Gene.name"                   "Gene.pathways.name"         
## [5] "Gene.pathways.dataSets.name" "Gene.pathways.identifier"   
## [7] "Gene.organism.shortName"     "Gene.diseases.name"

# run queries
resGenePath.InterMineR <- runQuery(im, queryGenePath.InterMineR)
resGenePath <- runQuery(im, queryGenePath)

all(resGenePath == resGenePath.InterMineR)

## [1] TRUE

head(resGenePath, 3)

##   Gene.primaryIdentifier Gene.symbol
## 1                     28         ABO
##                                                                              Gene.name
## 1 ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase
##             Gene.pathways.name Gene.pathways.dataSets.name
## 1 ABO blood group biosynthesis  Reactome pathways data set
##   Gene.pathways.identifier Gene.organism.shortName      Gene.diseases.name
## 1            R-HSA-9033807              H. sapiens BLOOD GROUP, ABO SYSTEM

NB: adding columns can result in changing the row count.

10.1 Change constraint logic

The constraintLogic, if not given, is ‘A and B’. We would now try to explicitly specify the constraintLogic. A and B corresponds to the ‘code’ for each constraint.

queryGenePath$constraintLogic <- "A and B"
queryGenePath$constraintLogic

## [1] "A and B"

Run the query again and see no change:

resGenePath <- runQuery(im, queryGenePath)
resGenePath

##   Gene.primaryIdentifier Gene.symbol
## 1                     28         ABO
##                                                                              Gene.name
## 1 ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase
##             Gene.pathways.name Gene.pathways.dataSets.name
## 1 ABO blood group biosynthesis  Reactome pathways data set
##   Gene.pathways.identifier Gene.organism.shortName      Gene.diseases.name
## 1            R-HSA-9033807              H. sapiens BLOOD GROUP, ABO SYSTEM

Change to be ‘A or B’ and see how the results change.

11 Recipes

11.1 Obtain the gene ontology (GO) terms associated with gene ABO

Start with the template Gene GO

queryGeneGO <- getTemplateQuery(im, "Gene_GO")
queryGeneGO

## $model
##      name 
## "genomic" 
## 
## $title
## [1] "Gene --> GO terms."
## 
## $description
## [1] "Search for GO annotations for a particular gene (or List of Genes)."
## 
## $select
## [1] "Gene.primaryIdentifier"                           
## [2] "Gene.symbol"                                      
## [3] "Gene.goAnnotation.ontologyTerm.identifier"        
## [4] "Gene.goAnnotation.ontologyTerm.name"              
## [5] "Gene.goAnnotation.ontologyTerm.namespace"         
## [6] "Gene.goAnnotation.evidence.code.code"             
## [7] "Gene.goAnnotation.ontologyTerm.parents.identifier"
## [8] "Gene.goAnnotation.ontologyTerm.parents.name"      
## [9] "Gene.goAnnotation.qualifier"                      
## 
## $name
## [1] "Gene_GO"
## 
## $comment
## [1] "Added 15NOV2010: ML"
## 
## $tags
## [1] "im:aspect:Function"      "im:aspect:Gene Ontology"
## [3] "im:aspect:Genomics"      "im:frontpage"           
## [5] "im:public"               "im:report"              
## 
## $rank
## [1] "4"
## 
## $orderBy
## $orderBy[[1]]
## Gene.primaryIdentifier 
##                  "ASC" 
## 
## 
## $where
## $where[[1]]
## $where[[1]]$path
## [1] "Gene"
## 
## $where[[1]]$op
## [1] "LOOKUP"
## 
## $where[[1]]$code
## [1] "A"
## 
## $where[[1]]$editable
## [1] TRUE
## 
## $where[[1]]$switchable
## [1] FALSE
## 
## $where[[1]]$switched
## [1] "LOCKED"
## 
## $where[[1]]$value
## [1] "PPARG"
## 
## $where[[1]]$extraValue
## [1] "H. sapiens"

Modify the view to display a compact view

queryGeneGO$select <- queryGeneGO$select[2:5]
queryGeneGO$select

## [1] "Gene.symbol"                              
## [2] "Gene.goAnnotation.ontologyTerm.identifier"
## [3] "Gene.goAnnotation.ontologyTerm.name"      
## [4] "Gene.goAnnotation.ontologyTerm.namespace"

Modify the constraints to look for gene ABO.

queryGeneGO$where[[1]][["value"]] <- "ABO"
queryGeneGO$where

## [[1]]
## [[1]]$path
## [1] "Gene"
## 
## [[1]]$op
## [1] "LOOKUP"
## 
## [[1]]$code
## [1] "A"
## 
## [[1]]$editable
## [1] TRUE
## 
## [[1]]$switchable
## [1] FALSE
## 
## [[1]]$switched
## [1] "LOCKED"
## 
## [[1]]$value
## [1] "ABO"
## 
## [[1]]$extraValue
## [1] "H. sapiens"

Run the query

resGeneGO <- runQuery(im, queryGeneGO )
head(resGeneGO)

##   Gene.symbol Gene.goAnnotation.ontologyTerm.identifier
## 1         ABO                                GO:0000166
## 2         ABO                                GO:0003823
## 3         ABO                                GO:0004380
## 4         ABO                                GO:0004381
## 5         ABO                                GO:0005576
## 6         ABO                                GO:0005794
##                                                Gene.goAnnotation.ontologyTerm.name
## 1                                                               nucleotide binding
## 2                                                                  antigen binding
## 3 glycoprotein-fucosylgalactoside alpha-N-acetylgalactosaminyltransferase activity
## 4                        fucosylgalactoside 3-alpha-galactosyltransferase activity
## 5                                                             extracellular region
## 6                                                                  Golgi apparatus
##   Gene.goAnnotation.ontologyTerm.namespace
## 1                       molecular_function
## 2                       molecular_function
## 3                       molecular_function
## 4                       molecular_function
## 5                       cellular_component
## 6                       cellular_component

11.2 Obtain the genes associated with gene ontology (GO) term ‘metal ion binding’

Start with the template Gene GO

queryGOGene <- getTemplateQuery(im, "GOterm_Gene")
queryGOGene

## $model
##      name 
## "genomic" 
## 
## $title
## [1] "GO term --> Genes"
## 
## $description
## [1] "Search for Genes in a specified organism that are associated with a particular Gene Ontology (GO) annotation."
## 
## $select
## [1] "Gene.primaryIdentifier"                   
## [2] "Gene.symbol"                              
## [3] "Gene.name"                                
## [4] "Gene.goAnnotation.ontologyTerm.identifier"
## [5] "Gene.goAnnotation.ontologyTerm.name"      
## [6] "Gene.organism.shortName"                  
## 
## $constraintLogic
## [1] "A and B"
## 
## $name
## [1] "GOterm_Gene"
## 
## $comment
## [1] "Added 26OCT2010: ML"
## 
## $tags
## [1] "im:aspect:Function"      "im:aspect:Gene Ontology"
## [3] "im:aspect:Genomics"      "im:public"              
## [5] "im:report"              
## 
## $rank
## [1] "2"
## 
## $orderBy
## $orderBy[[1]]
## Gene.symbol 
##       "ASC" 
## 
## 
## $where
## $where[[1]]
## $where[[1]]$path
## [1] "Gene.goAnnotation.ontologyTerm.name"
## 
## $where[[1]]$op
## [1] "LIKE"
## 
## $where[[1]]$code
## [1] "A"
## 
## $where[[1]]$editable
## [1] TRUE
## 
## $where[[1]]$switchable
## [1] FALSE
## 
## $where[[1]]$switched
## [1] "LOCKED"
## 
## $where[[1]]$value
## [1] "DNA binding"
## 
## 
## $where[[2]]
## $where[[2]]$path
## [1] "Gene.organism.shortName"
## 
## $where[[2]]$op
## [1] "="
## 
## $where[[2]]$code
## [1] "B"
## 
## $where[[2]]$editable
## [1] TRUE
## 
## $where[[2]]$switchable
## [1] FALSE
## 
## $where[[2]]$switched
## [1] "LOCKED"
## 
## $where[[2]]$value
## [1] "H. sapiens"

Modify the view to display a compact view

queryGOGene$select <- queryGOGene$select[2:5]
queryGOGene$select

## [1] "Gene.symbol"                              
## [2] "Gene.name"                                
## [3] "Gene.goAnnotation.ontologyTerm.identifier"
## [4] "Gene.goAnnotation.ontologyTerm.name"

Modify the constraints to look for GO term ‘metal ion binding’

queryGOGene$where[[1]]$value = "metal ion binding"
queryGOGene$where

## [[1]]
## [[1]]$path
## [1] "Gene.goAnnotation.ontologyTerm.name"
## 
## [[1]]$op
## [1] "LIKE"
## 
## [[1]]$code
## [1] "A"
## 
## [[1]]$editable
## [1] TRUE
## 
## [[1]]$switchable
## [1] FALSE
## 
## [[1]]$switched
## [1] "LOCKED"
## 
## [[1]]$value
## [1] "metal ion binding"
## 
## 
## [[2]]
## [[2]]$path
## [1] "Gene.organism.shortName"
## 
## [[2]]$op
## [1] "="
## 
## [[2]]$code
## [1] "B"
## 
## [[2]]$editable
## [1] TRUE
## 
## [[2]]$switchable
## [1] FALSE
## 
## [[2]]$switched
## [1] "LOCKED"
## 
## [[2]]$value
## [1] "H. sapiens"

Run the query

resGOGene <- runQuery(im, queryGOGene )
head(resGOGene)

##   Gene.symbol                                  Gene.name
## 1     A3GALT2          alpha 1,3-galactosyltransferase 2
## 2      AARSD1 alanyl-tRNA synthetase domain containing 1
## 3        ABAT           4-aminobutyrate aminotransferase
## 4       ABCG5  ATP binding cassette subfamily G member 5
## 5       ABCG8  ATP binding cassette subfamily G member 8
## 6      ABLIM1                actin binding LIM protein 1
##   Gene.goAnnotation.ontologyTerm.identifier Gene.goAnnotation.ontologyTerm.name
## 1                                GO:0046872                   metal ion binding
## 2                                GO:0046872                   metal ion binding
## 3                                GO:0046872                   metal ion binding
## 4                                GO:0046872                   metal ion binding
## 5                                GO:0046872                   metal ion binding
## 6                                GO:0046872                   metal ion binding

11.3 Find and plot the genes within 50000 base pairs of gene ABCA6

Start with the Gene_Location template, update to search for ABCA6 gene.

queryGeneLoc = getTemplateQuery(im, "Gene_Location")
queryGeneLoc$where[[2]][["value"]] = "ABCA6"
resGeneLoc= runQuery(im, queryGeneLoc)

resGeneLoc

##   Gene.primaryIdentifier Gene.secondaryIdentifier Gene.symbol
## 1                  23460          ENSG00000154262       ABCA6
##                                   Gene.name Gene.chromosome.primaryIdentifier
## 1 ATP binding cassette subfamily A member 6                                17
##   Gene.locations.start Gene.locations.end Gene.locations.strand
## 1             69062044           69141927                    -1

We’re going to use the output (gene location) as input for the next query.

Define a new query

# set constraints
constraints = setConstraints(
  paths = c(
    "Gene.chromosome.primaryIdentifier",
    "Gene.locations.start",
    "Gene.locations.end",
    "Gene.organism.name"
  ),
  operators = c(
    "=",
    ">=",
    "<=",
    "="
  ),
  values = list(
    resGeneLoc[1, "Gene.chromosome.primaryIdentifier"],
    as.character(as.numeric(resGeneLoc[1, "Gene.locations.start"])-50000),
    as.character(as.numeric(resGeneLoc[1, "Gene.locations.end"])+50000),
    "Homo sapiens"
  )
)

# set InterMineR-class query
queryNeighborGene = setQuery(
  select = c("Gene.primaryIdentifier", 
             "Gene.symbol",
             "Gene.chromosome.primaryIdentifier",
             "Gene.locations.start", 
             "Gene.locations.end", 
             "Gene.locations.strand"),
  where = constraints
)

summary(queryNeighborGene)

##                                path op        value code
## 1 Gene.chromosome.primaryIdentifier  =           17    A
## 2              Gene.locations.start >=     69012044    B
## 3                Gene.locations.end <=     69191927    C
## 4                Gene.organism.name  = Homo sapiens    D

Run the query

resNeighborGene <- runQuery(im, queryNeighborGene)
resNeighborGene

##   Gene.primaryIdentifier Gene.symbol Gene.chromosome.primaryIdentifier
## 1              100616316    MIR4524A                                17
## 2              100847008    MIR4524B                                17
## 3                  23460       ABCA6                                17
##   Gene.locations.start Gene.locations.end Gene.locations.strand
## 1             69099564           69099632                    -1
## 2             69099542           69099656                     1
## 3             69062044           69141927                    -1

Plot the genes

resNeighborGene$Gene.locations.strand[which(resNeighborGene$Gene.locations.strand==1)]="+"

resNeighborGene$Gene.locations.strand[which(resNeighborGene$Gene.locations.strand==-1)]="-"

gene.idx = which(nchar(resNeighborGene$Gene.symbol)==0)

resNeighborGene$Gene.symbol[gene.idx]=resNeighborGene$Gene.primaryIdentifier[gene.idx]

require(Gviz)

annTrack = AnnotationTrack(
  start=resNeighborGene$Gene.locations.start,
  end=resNeighborGene$Gene.locations.end,
  strand=resNeighborGene$Gene.locations.strand,
  chromosome=resNeighborGene$Gene.chromosome.primaryIdentifier[1],
  genome="GRCh38", 
  name="around ABCA6",
  id=resNeighborGene$Gene.symbol)

gtr <- GenomeAxisTrack()
itr <- IdeogramTrack(genome="hg38", chromosome="chr17")

plotTracks(list(gtr, itr, annTrack), shape="box", showFeatureId=TRUE, fontcolor="black")

12 System info

sessionInfo()

## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=C                  LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
## [1] Gviz_1.34.1          GenomicRanges_1.42.0 GenomeInfoDb_1.26.7 
## [4] IRanges_2.24.1       S4Vectors_0.28.1     BiocGenerics_0.36.1 
## [7] InterMineR_1.12.3    BiocStyle_2.18.1    
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.0-0            ellipsis_0.3.1             
##   [3] biovizBase_1.38.0           htmlTable_2.1.0            
##   [5] XVector_0.30.0              base64enc_0.1-3            
##   [7] dichromat_2.0-0             rstudioapi_0.13            
##   [9] bit64_4.0.5                 AnnotationDbi_1.52.0       
##  [11] fansi_0.4.2                 sqldf_0.4-11               
##  [13] xml2_1.3.2                  splines_4.0.5              
##  [15] cachem_1.0.4                knitr_1.33                 
##  [17] Formula_1.2-4               jsonlite_1.7.2             
##  [19] Rsamtools_2.6.0             cluster_2.1.2              
##  [21] dbplyr_2.1.1                png_0.1-7                  
##  [23] BiocManager_1.30.12         compiler_4.0.5             
##  [25] httr_1.4.2                  backports_1.2.1            
##  [27] lazyeval_0.2.2              assertthat_0.2.1           
##  [29] Matrix_1.3-2                fastmap_1.1.0              
##  [31] htmltools_0.5.1.1           prettyunits_1.1.1          
##  [33] tools_4.0.5                 igraph_1.2.6               
##  [35] gtable_0.3.0                glue_1.4.2                 
##  [37] GenomeInfoDbData_1.2.4      dplyr_1.0.5                
##  [39] rappdirs_0.3.3              Rcpp_1.0.6                 
##  [41] Biobase_2.50.0              jquerylib_0.1.4            
##  [43] vctrs_0.3.7                 Biostrings_2.58.0          
##  [45] RJSONIO_1.3-1.4             rtracklayer_1.50.0         
##  [47] xfun_0.22                   stringr_1.4.0              
##  [49] proto_1.0.0                 lifecycle_1.0.0            
##  [51] ensembldb_2.14.1            XML_3.99-0.6               
##  [53] zlibbioc_1.36.0             scales_1.1.1               
##  [55] BSgenome_1.58.0             VariantAnnotation_1.36.0   
##  [57] ProtGenerics_1.22.0         hms_1.0.0                  
##  [59] MatrixGenerics_1.2.1        SummarizedExperiment_1.20.0
##  [61] AnnotationFilter_1.14.0     RColorBrewer_1.1-2         
##  [63] yaml_2.2.1                  curl_4.3                   
##  [65] memoise_2.0.0               gridExtra_2.3              
##  [67] ggplot2_3.3.3               sass_0.3.1                 
##  [69] biomaRt_2.46.3              rpart_4.1-15               
##  [71] latticeExtra_0.6-29         stringi_1.5.3              
##  [73] RSQLite_2.2.7               highr_0.9                  
##  [75] checkmate_2.0.0             GenomicFeatures_1.42.3     
##  [77] BiocParallel_1.24.1         chron_2.3-56               
##  [79] rlang_0.4.10                pkgconfig_2.0.3            
##  [81] matrixStats_0.58.0          bitops_1.0-7               
##  [83] evaluate_0.14               lattice_0.20-41            
##  [85] purrr_0.3.4                 GenomicAlignments_1.26.0   
##  [87] htmlwidgets_1.5.3           bit_4.0.4                  
##  [89] tidyselect_1.1.0            magrittr_2.0.1             
##  [91] bookdown_0.22               R6_2.5.0                   
##  [93] magick_2.7.1                generics_0.1.0             
##  [95] Hmisc_4.5-0                 DelayedArray_0.16.3        
##  [97] DBI_1.1.1                   gsubfn_0.7                 
##  [99] pillar_1.6.0                foreign_0.8-81             
## [101] survival_3.2-11             RCurl_1.98-1.3             
## [103] nnet_7.3-15                 tibble_3.1.1               
## [105] crayon_1.4.1                utf8_1.2.1                 
## [107] BiocFileCache_1.14.0        rmarkdown_2.7              
## [109] jpeg_0.1-8.1                progress_1.2.2             
## [111] data.table_1.14.0           blob_1.2.1                 
## [113] digest_0.6.27               openssl_1.4.3              
## [115] munsell_0.5.0               bslib_0.2.4                
## [117] askpass_1.1

13 Appendix

13.1 Visual way to derive the column name of a query view or the path name in a query constraint from the database webpage

The InterMine model could be accessed from the mine homepage by clicking the tab “QueryBuilder” and selecting the appropriate data type under “Select a Data Type to Begin a Query”:

Here we select Gene as the data type:

We could select Symbol and Chromosome->Primary Identifier by clicking Show on the right of them. Then click “Export XML” at the bottom right corner of the webpage:

The column names Gene.symbol and Gene.chromosome.primaryIdentifier are contained in the XML output:

InterMineR Vignette

28 April 2021

Contents

1 Introduction

2 Jumpstart: How to build queries using InterMineR

2.1 Select a database

2.2 Obtain a prebuilt query

2.2.1 Look at the data model

3 ```{r getModel, warning=FALSE, message=FALSE}

4 model <- getModel(im)

5 head(model)

6 ```{r pathway_type, warning=FALSE, message=FALSE}

7 model[which(model$type==“Pathway”),]

8 Modify a Query

8.1 Edit a constraint

8.2 Add a new constraint

8.3 Add a column

9 `{r add_column, warning=FALSE, message=FALSE} # model[which(model$type=="Gene"),] #`

10 `{r add_column2, warning=FALSE, message=FALSE} # model[which(model$type=="Disease"),] #`

10.1 Change constraint logic

11 Recipes

11.1 Obtain the gene ontology (GO) terms associated with gene ABO

11.2 Obtain the genes associated with gene ontology (GO) term ‘metal ion binding’

11.3 Find and plot the genes within 50000 base pairs of gene ABCA6

12 System info

13 Appendix

13.1 Visual way to derive the column name of a query view or the path name in a query constraint from the database webpage

InterMineR Vignette

28 April 2021

Contents

1 Introduction

2 Jumpstart: How to build queries using InterMineR

2.1 Select a database

2.2 Obtain a prebuilt query

2.2.1 Look at the data model

3 ```{r getModel, warning=FALSE, message=FALSE}

4 model <- getModel(im)

5 head(model)

6 ```{r pathway_type, warning=FALSE, message=FALSE}

7 model[which(model$type==“Pathway”),]

8 Modify a Query

8.1 Edit a constraint

8.2 Add a new constraint

8.3 Add a column

9 {r add_column, warning=FALSE, message=FALSE} # model[which(model$type=="Gene"),] #

10 {r add_column2, warning=FALSE, message=FALSE} # model[which(model$type=="Disease"),] #

10.1 Change constraint logic

11 Recipes

11.1 Obtain the gene ontology (GO) terms associated with gene ABO

11.2 Obtain the genes associated with gene ontology (GO) term ‘metal ion binding’

11.3 Find and plot the genes within 50000 base pairs of gene ABCA6

12 System info

13 Appendix

13.1 Visual way to derive the column name of a query view or the path name in a query constraint from the database webpage

9 `{r add_column, warning=FALSE, message=FALSE} # model[which(model$type=="Gene"),] #`

10 `{r add_column2, warning=FALSE, message=FALSE} # model[which(model$type=="Disease"),] #`