aggregate_duplicates {tidybulk}R Documentation

Aggregates multiple counts from the same samples (e.g., from isoforms), concatenates other character columns, and averages other numeric columns

Description

aggregate_duplicates() takes as imput a 'tbl' formatted as | <SAMPLE> | <TRANSCRIPT> | <COUNT> | <...> | and returns a 'tbl' with aggregated transcripts that were duplicated.

Usage

aggregate_duplicates(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  aggregation_function = sum,
  keep_integer = TRUE
)

## S4 method for signature 'spec_tbl_df'
aggregate_duplicates(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  aggregation_function = sum,
  keep_integer = TRUE
)

## S4 method for signature 'tbl_df'
aggregate_duplicates(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  aggregation_function = sum,
  keep_integer = TRUE
)

## S4 method for signature 'tidybulk'
aggregate_duplicates(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  aggregation_function = sum,
  keep_integer = TRUE
)

## S4 method for signature 'SummarizedExperiment'
aggregate_duplicates(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  aggregation_function = sum,
  keep_integer = TRUE
)

## S4 method for signature 'RangedSummarizedExperiment'
aggregate_duplicates(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  aggregation_function = sum,
  keep_integer = TRUE
)

Arguments

.data

A 'tbl' formatted as | <SAMPLE> | <TRANSCRIPT> | <COUNT> | <...> |

.sample

The name of the sample column

.transcript

The name of the transcript/gene column

.abundance

The name of the transcript/gene abundance column

aggregation_function

A function for counts aggregation (e.g., sum, median, or mean)

keep_integer

A boolean. Whether to force the aggregated counts to integer

Details

Maturing lifecycle

This function aggregates duplicated transcripts (e.g., isoforms, ensembl). For example, we often have to convert ensembl symbols to gene/transcript symbol, but in doing so we have to deal with duplicates. 'aggregate_duplicates' takes a tibble and column names (as symbols; for 'sample', 'transcript' and 'count') as arguments and returns a tibble with aggregate transcript with the same name. All the rest of the column are appended, and factors and boolean are appended as characters.

Value

A 'tbl' object with aggregated transcript abundance and annotation

A 'tbl' object with aggregated transcript abundance and annotation

A 'tbl' object with aggregated transcript abundance and annotation

A 'tbl' object with aggregated transcript abundance and annotation

A 'SummarizedExperiment' object

A 'SummarizedExperiment' object

Examples


    aggregate_duplicates(
    tidybulk::counts_mini,
    sample,
    transcript,
    `count`,
    aggregation_function = sum
    )



[Package tidybulk version 1.0.2 Index]