v2.9.3 Bugfix: - Introduced CHECK error by adding to .Rbuildignore...this is now fixed. v2.9.2 - version bump to let BioC nightly build grab commit. v2.9.1 - version bump for BioC devel release 3.1 v2.8.2 Bugfixes: - removed reference to sqliteCloseConnection() (not exported by RSQLite 1.0.0) in vignette. v2.8.1 Bugfixes: - Made minimal changes for compatibility with RSQLite 1.0.0 v2.7.3 Bugfixes: - Fixed sigMatrix legend argument to comply with ggplot2 deprecations. No longer throws an error. New Features: Notes: - Trying out a few more indices to speed up queries using sampleIdList. v2.7.1 Bugfixes: - Fixed 'fullnames' argument to cuffData::*Matrix() methods so that it does what it's supposed to do. - Added 'showPool' argument to fpkmSCVPlot. When TRUE, empirical mean and standard deviation are determined across all conditions as opposed to cross-replicate. This is set to TRUE anytime you have n<2 replicates per condition. - Added stat="identity" to expressionBarplot to comply with ggplot 0.9.3 enforcement. - 'labels' argument to csScatter is now working as it's supposed to. You can pass a vector of 'gene_short_name' identifiers to labels and these will be specifically called out in red text on scatterplot. - Added repFpkmMatrix() and replicates() methods to CuffFeature objects. - Removed unnecessary Joins to optimize retrieval speed for several key queries. - Fixed bug in csVolcano matrix that forced ylimits to be c(0,15) New Features: - Added csNMF() method for CuffData and CuffFeatureSet objects to perform non-negative matrix factorization. As of now, it's merely a wrapper around the default settings for NMFN::nnmf(), but hope to expand in the future. * Does not adjust sparsity of matrices after output, must be done by user as needed. - Added csPie() method for CuffGene objects. Allows for visualization of relative isoform, CDS, and promoter usage proportions as a pie chart by condition (or optionally as stacked bar charts by adding + coord_cartesian() ). - Added 'method' argument to csCluster and csHeatmap to allow custom distance functions for clustering. Default = "none" = JSdist(). You can now provide a function that returns a 'dist' object on rows of a matrix. - Added varModel.info tracking for compatibility with cuffdiff >=2.1. Will now find varModel.info file if exists, and incorporate into database. - dispersionPlot() method added for CuffSet object. This now appropriately draws from varModel.info and is the preferred visualization for dispersion of RNA-Seq data with cummeRbund. - Added diffTable() method to CuffData and CuffFeatureSet objects to allow a 'one-table' snapshot of results for all Features (CuffData) or a set of Features (CuffFeatureSet). This table outputs key values including gene name, gene short name, expression estimates and per-comparison fold-change, p-value, q-value, and significance values (yes/no). A convenient 'data-dump' function to merge across several tables. - Added coercion methods for CuffGene objects to create GRanges and GRangeslist objects (more BioC friendly!). Will work on making this possible on CuffFeatureSet and CuffFeature objects as well. - Added pass-through to select p.adjust method for getSig (method argument to getSig) - Added ability to revert to cuffdiff q-values for specific paired-wise interrogations with getSig as opposed to re-calculating new ones (useCuffMTC; default=FALSE) Notes: - Removed generic for 'featureNames'. Now appropriately uses featureNames generic from Biobase. As a consequence, Biobase is now a dependency. - Added passthrough to as.dist(...) in JSdist(...) - Added 'logMode' argument to csClusterPlot. - Added 'showPoints' argument to PCAplot to allow disabling of gene values in PCA plot. If false, only sample projections are plotted. - Added 'facet' argument to expressionPlot to disable faceting by feature_id. - shannon.entropy now uses log2 instead of log10 to constrain specificity scores between 0 and 1. v1.99.6 Notes: - 'annotation' and "annotation<-" generics were moved to BiocGenerics 0.3.2. Now using appropriate generic function, but requiring BiocGenerics >= 0.3.2 v1.99.5 Bugfixes: - Added replicates argument to csDistHeat to view distances between individual replicate samples. - Appropriately distinguish now between 'annotation' (external attributes) and features (gene-level sub-features). - csHeatmap now has 'method' argument to pass function for any dissimilarity metric you desire. You must pass a function that returns a 'dist' object applied to rows of a matrix. Default is still JS-distance. v1.99.3 New Features: - Added diffTable() method to return a table of differential results broken out by pairwise comparison. (more human-readable) - Added sigMatrix() method to CuffSet objects to draw heatmap showing number of significant genes by pairwise comparison at a given FDR. - A call to fpkm() now emits calculated (model-derived) standard deviation field as well. - Can now pass a GTF file as argument to readCufflinks() to integrate transcript model information into database backend * Added requirement for rtracklayer and GenomicFeatures packages. * You must also indicate which genome build the .gtf was created against by using the 'genome' argument to readCufflinks. - Integration with Gviz: * CuffGene objects now have a makeGeneRegionTrack() argument to create a GeneRegionTrack() from transcript model information * Can also make GRanges object * ONLY WORKS IF YOU READ .gtf FILE IN WITH readCufflinks() - Added csScatterMatrix() and csVolcanoMatrix() method to CuffData objects. - Added fpkmSCVPlot() as a CuffData method to visualize replicate-level coefficient of variation across fpkm range per condition. - Added PCAplot() and MDSplot() for dimensionality reduction visualizations (Principle components, and multi-dimensional scaling respectively) - Added csDistHeat() to create a heatmap of JS-distances between conditions. Bugfixes: - Fixed diffData 'features' argument so that it now does what it's supposed to do. - added DB() with signature(object="CuffSet") to NAMESPACE Notes: - Once again, there have been modifications to the underlying database schema so you will have to re-run readCufflinks(rebuild=T) to re-analyze existing datasets. - Importing 'defaults' from plyr instead of requiring entire package (keeps namespace cleaner). - Set pseudocount=0.0 as default for csDensity() and csScatter() methods (This prevents a visual bias for genes with FPKM <1 and ggplot2 handles removing true zero values). v1.99.2 Bugfixes: - Fixed bug in replicate table that did not apply make.db.names to match samples table. - Fixed bug for missing values in *.count_tracking files. - Now correctly applying make.db.names to *.read_group_tracking files. - Now correctly allows for empty *.count_tracking and *.read_group_tracking files v1.99.1 - This represents a major set of improvements and feature additions to cummeRbund. - cummeRbund now incorporates additional information emitted from cuffdiff 2.0 including: - run parameters and information. - sample-level information such as mass and scaling factors. - individual replicate fpkms and associated statistics for all features. - raw and normalized count tables and associated statistics all features. New Features: - Please see updated vignette for overview of new features. - New dispersionPlot() to visualize model fit (mean count vs dispersion) at all feature levels. - New runInfo() method returns cuffdiff run parameters. - New replicates() method returns a data.frame of replicate-level parameters and information. - getGene() and getGenes() can now take a list of any tracking_id or gene_short_name (not just gene_ids) to retrieve a gene or geneset. - Added getFeatures() method to retrieve a CuffFeatureSet independent of gene-level attributes. This is ideal for looking at sets of features outside of the context of all other gene-related information (i.e. facilitates feature-level analysis) - Replicate-level fpkm data now available. - Condition-level raw and normalized count data now available. - repFpkm(), repFpkmMatrix, count(), and countMatrix are new accessor methods to CuffData, CuffFeatureSet, and CuffFeature objects. - All relevant plots now have a logical 'replicates' argument (default = F) that when set to TRUE will expose replicate FPKM values in appropriate ways. - MAPlot() now has 'useCount' argument to draw MA plots using count data as opposed to fpkm estimates. Notes: - Changed default csHeatmap colorscheme to the much more pleasing 'lightyellow' to 'darkred' through 'orange'. - SQLite journaling is no longer disabled by default (The benefits outweigh the moderate reduction in load times). Bugfixes: - Numerous random bug fixes to improve consistency and improve performance for large datasets. v1.2.1 Bugfixes: -Fixed bug in CuffFeatureSet::expressionBarplot to make compatible with ggplot2 v0.9. New Features: - Added 'distThresh' argument to findSimilar. This allows you to retrieve all similar genes within a given JS distance as specified by distThresh. - Added 'returnGeneSet' argument to findSimilar. [default = T] If true, findSimilar returns a CuffGeneSet of genes matching criteria (default). If false, a rank-ordered data frame of JS distance values is returned. - findSimilar can now take a 'sampleIdList' argument. This should be a vector of sample names across which the distance between genes should be evaluated. This should be a subset of the output of samples(genes(cuff)). Notes: - Added requirement for 'fastcluster' package. There is very little footprint, and it makes a significant improvement in speed for the clustering analyses. v1.1.5 / 1.2.0 Bugfixes: - Fixed minor bug in database setup that caused instability with cuffdiff --no-diff argument. - Fixed bug in csDendro method for CuffData objects. v1.1.4 New Features: - Added MAplot() method to CuffData objects. Bugfixes: - Finished abrupt migration to reshape2. As a result fixed a bug in which 'cast' was still required for several functions and could not be found. Now appropriately using 'dcast' or 'acast'. - Fixed minor bug in CuffFeature::fpkmMatrix v1.1.3 New Features: - getSig() has been split into two functions: getSig() now returns a vector of ids (no longer a list of vectors), and getSigTable() returns a 'testTable' of binary values indicating whether or not a gene was significant in a particular comparison. - Added ability in getSig() to limit retrieval of significant genes to two provided conditions (arguments x & y). (reduces time for function call if you have a specific comparison in mind a priori) * When you specify x & y with getSig(), q-values are recalculated from just those selected tests to reduce impact of multiple testing correction. * If you do not specificy x & y getSig() will return a vector of tracking_ids for all comparisons (with appropriate MTC). - You can now specify an 'alpha' for getSig() and getSigTable() [ 0.05 by default to match cuffdiff default ] by which to filter the resulting significance calls. - Added csSpecificity() method: This method returns a feature-X-condition matrix (same shape as fpkmMatrix) that provides a 'condition-specificity' score * defined as 1-(JSdist(p,q)) where p is is the density of expression (probability vector of log(FPKM+1)) of a given gene across all conditions, and q is the unit vector for that condition (ie. perfect expression in that particular condition) * specificity = 1.0 if the feature is expressed exclusively in that condition - Created csDendro() method: This method returns a object of class 'dendrogram' (and plots using grid) of JS distances between conditions for all genes in a CuffData, CuffGeneSet, or CuffFeatureSet object. * Useful for identifying relationships between conditions for subsets of features - New visual cues in several plot types that indicates the quantification status ('quant_stat' field) of a particular gene:condition. This information is useful to indicate whether or not to trust the expression values for a given gene under a specific condition, and may provide insight into outlier expression values. * This feature can be disabled by setting showStatus=F. - csDensity() is now available for CuffFeatureSet and CuffGeneSet objects Bugfixes: - Fixed bug in getGenes that may have resulted in long query lag for retrieving promoter diffData. As a result all calls to getGenes should be significanly faster. - CuffData fpkm argument 'features' now returns appropriate data.frame (includes previously un-reported data fields). - Replaced all instances of 'ln_fold_change' with the actual 'log2_fold_change'. Values were previously log2 fold change but database headers were not updated to reflect this. - Fixed bug that could cause readCufflinks() to die with error when using reshape2::melt instead of reshape::melt. Notes: - ***The structure of the underlying database has changed in this version. As a consequence, you must rebuild you cuffData.db file to use new version. readCufflinks(rebuild=T)*** - Updated vignette - A 'fullnames' logical argument was added to fpkmMatrix. If True, rownames for fpkmMatrix will be a concatenation of gene_short_name and tracking_id. This has the added benefit of making row labels in csHeatmap easier to read, as well as preserving uniqueness. - Slight speed improvements to JSdist (noticeable when using csCluster on large feature sets). - 'testTable' argument to getSig() has been dropped in lieu of new getSigTable() method. v1.1.1 Bugfixes: - fixed issue in which there was no graceful error handling of missing CDS or TSS data in cuffdiff output. - Fixed issue in which distribution test data (promoters, splicing, relCDS) were not appropriately added to objects on creation. - Fixed bug that would sometimes cause csBoxplot() to throw an error when log-transforming fpkm data. Also added pseudocount argument. - Fixed bug that would cause diffData() to return a filtered subset of results by default. - Adjusted indexing of tables to improve performance on large datasets. - Fixed bug that caused diffData method to not be registered with CuffFeature and CuffGene objects. - Fixed bug that sometimes caused over-plotting of axis labels in csBarplots. New Features: - added getSig method to CuffSet class for rapid retrieval of significant features from all pairwise tests (as a list of IDs). By default the level is 'genes' but any feature level can be queried. - csCluster now uses Jensen-Shannon distance by default (as opposed to Euclidean) - Added 'xlimits' argument to csVolcano to constrain plot dimensions. - Enforced requirement in csVolcano for x and y arguments (as sample names). Notes: - Changed dependency 'reshape' to 'reshape2' - Changed the default orientation of expressionBarplot() for CuffFeatureSet objects. - Changed output of csCluster to a list format that includes clustering information. As a result, I created the function csClusterPlot to replace the previous default drawing behavior of csCluster. This allows for stable cluster analysis. - For consistency, the 'testId' slot for CuffDist objects was renamed to 'idField'. This brings the CuffDist class in line with the CuffData class. - CuffGene and CuffGeneSet now include slots for promoter, splicing, and relCDS distribution test results. v1.0.0 - Official public release. No changes from v0.99.5 v0.99.5 - Significant speed improvements to readCufflinks() for large cuffdiff datasets. - Tables written first then indexed. - Added slot accessor methods to avoid using slots directly. v0.99.4 - Second beta release and submission to Bioconductor v0.1.3 Release 2011-08-18: - First Beta release of cummeRbund and submission to Bioconductor for review and hosting.