Compare differential expression results against CMap perturbations.

rankSimilarPerturbations(
  input,
  perturbations,
  method = c("spearman", "pearson", "gsea"),
  geneSize = 150,
  cellLineMean = "auto",
  rankPerCellLine = FALSE,
  threads = 1,
  chunkGiB = 1,
  verbose = FALSE
)

Arguments

input

Named numeric vector of differentially expressed genes whose names are gene identifiers and respective values are a statistic that represents significance and magnitude of differentially expressed genes (e.g. t-statistics); or character of gene symbols composing a gene set that is tested for enrichment in reference data (only used if method includes gsea)

perturbations

perturbationChanges object: CMap perturbations (check prepareCMapPerturbations())

method

Character: comparison method (spearman, pearson or gsea; multiple methods may be selected at once)

geneSize

Numeric: number of top up-/down-regulated genes to use as gene sets to test for enrichment in reference data; if a 2-length numeric vector, the first index is the number of top up-regulated genes and the second index is the number of down-regulated genes used to create gene sets; only used if method includes gsea and if input is not a gene set

cellLineMean

Boolean: add rows with the mean of method across cell lines? If cellLineMean = "auto" (default), rows will be added when data for more than one cell line is available.

rankPerCellLine

Boolean: rank results based on both individual cell lines and mean scores across cell lines (TRUE) or based on mean scores alone (FALSE)? If cellLineMean = FALSE, individual cell line conditions are always ranked.

threads

Integer: number of parallel threads

chunkGiB

Numeric: if second argument is a path to an HDF5 file (.h5 extension), that file is loaded and processed in chunks of a given size in gibibytes (GiB); lower values decrease peak RAM usage (see details below)

verbose

Boolean: print additional details?

Value

Data table with correlation and/or GSEA score results

Process data by chunks

If a file path to a valid HDF5 (.h5) file is provided instead of a data matrix, that file can be loaded and processed in chunks of size chunkGiB, resulting in decreased peak memory usage.

The default value of 1 GiB (1 GiB = 1024^3 bytes) allows loading chunks of ~10000 columns and 14000 rows (10000 * 14000 * 8 bytes / 1024^3 = 1.04 GiB).

GSEA score

When method = "gsea", weighted connectivity scores (WTCS) are calculated (https://clue.io/connectopedia/cmap_algorithms).

Examples

# Example of a differential expression profile
data("diffExprStat")

if (FALSE) {
# Download and load CMap perturbations to compare with
cellLine <- c("HepG2", "HUH7")
cmapMetadataCompounds <- filterCMapMetadata(
    "cmapMetadata.txt", cellLine=cellLine, timepoint="24 h",
    dosage="5 \u00B5M", perturbationType="Compound")

cmapPerturbationsCompounds <- prepareCMapPerturbations(
    cmapMetadataCompounds, "cmapZscores.gctx", "cmapGeneInfo.txt",
    "cmapCompoundInfo_drugs.txt", loadZscores=TRUE)
}
perturbations <- cmapPerturbationsCompounds

# Rank similar CMap perturbations (by default, Spearman's and Pearson's
# correlation are used, as well as GSEA with the top and bottom 150 genes of
# the differential expression profile used as reference)
rankSimilarPerturbations(diffExprStat, perturbations)
#> Subsetting data based on 8790 intersecting genes (65% of the 13451 input genes)...
#> Comparing against 22 CMap perturbations (2 cell lines) using 'spearman, pearson, gsea' (gene size of 150)...
#> Comparison performed in 0.85 secs
#>                              compound_perturbation spearman_coef
#>                                             <char>         <num>
#>  1:          CVD001_24H:BRD-A14014306-001-01-1:4.1   0.224252829
#>  2:       CVD001_24H:BRD-K84595254-001-03-0:4.9444   0.093416734
#>  3:          CVD001_24H:BRD-K41172353-001-01-4:4.7  -0.007440278
#>  4:        CVD001_24H:BRD-K84389640-001-01-5:4.225  -0.013429614
#>  5:       CVD001_24H:BRD-K96188950-001-04-5:4.3967  -0.020549215
#>  6:        CVD001_24H:BRD-K77508012-001-01-9:6.025  -0.063221230
#>  7:       CVD001_24H:BRD-K62810658-001-05-6:4.6768  -0.067773347
#>  8:         CVD001_24H:BRD-K31030218-001-01-1:4.25  -0.077835425
#>  9:       CVD001_24H:BRD-K60476892-001-02-1:4.1072  -0.057954380
#> 10:          CVD001_24H:BRD-K94818765-001-01-0:4.8  -0.074453675
#> 11:         CVD001_24H:BRD-A65142661-034-01-8:5.35  -0.115903134
#> 12:    CVD001_HEPG2_24H:BRD-A14014306-001-01-1:4.1   0.225335666
#> 13:   CVD001_HEPG2_24H:BRD-A65142661-034-01-8:5.35  -0.164686339
#> 14:   CVD001_HEPG2_24H:BRD-K31030218-001-01-1:4.25  -0.125531451
#> 15:    CVD001_HEPG2_24H:BRD-K41172353-001-01-4:4.7   0.005161915
#> 16: CVD001_HEPG2_24H:BRD-K60476892-001-02-1:4.1072  -0.077832390
#> 17: CVD001_HEPG2_24H:BRD-K62810658-001-05-6:4.6768  -0.050201852
#> 18:  CVD001_HEPG2_24H:BRD-K77508012-001-01-9:6.025  -0.094208866
#> 19:  CVD001_HEPG2_24H:BRD-K84389640-001-01-5:4.225  -0.078331814
#> 20: CVD001_HEPG2_24H:BRD-K84595254-001-03-0:4.9444   0.085133810
#> 21:    CVD001_HEPG2_24H:BRD-K94818765-001-01-0:4.8  -0.113418048
#> 22: CVD001_HEPG2_24H:BRD-K96188950-001-04-5:4.3967  -0.037174082
#> 23:     CVD001_HUH7_24H:BRD-A14014306-001-01-1:4.1   0.223169991
#> 24:    CVD001_HUH7_24H:BRD-A65142661-034-01-8:5.35  -0.067119930
#> 25:    CVD001_HUH7_24H:BRD-K31030218-001-01-1:4.25  -0.030139400
#> 26:     CVD001_HUH7_24H:BRD-K41172353-001-01-4:4.7  -0.020042471
#> 27:  CVD001_HUH7_24H:BRD-K60476892-001-02-1:4.1072  -0.038076371
#> 28:  CVD001_HUH7_24H:BRD-K62810658-001-05-6:4.6768  -0.085344842
#> 29:   CVD001_HUH7_24H:BRD-K77508012-001-01-9:6.025  -0.032233593
#> 30:   CVD001_HUH7_24H:BRD-K84389640-001-01-5:4.225   0.051472586
#> 31:  CVD001_HUH7_24H:BRD-K84595254-001-03-0:4.9444   0.101699658
#> 32:     CVD001_HUH7_24H:BRD-K94818765-001-01-0:4.8  -0.035489302
#> 33:  CVD001_HUH7_24H:BRD-K96188950-001-04-5:4.3967  -0.003924349
#>                              compound_perturbation spearman_coef
#>     spearman_pvalue spearman_qvalue pearson_coef pearson_pvalue pearson_qvalue
#>               <num>           <num>        <num>          <num>          <num>
#>  1:              NA              NA  0.226788992             NA             NA
#>  2:              NA              NA  0.078584431             NA             NA
#>  3:              NA              NA -0.012120187             NA             NA
#>  4:              NA              NA -0.016435243             NA             NA
#>  5:              NA              NA -0.025128574             NA             NA
#>  6:              NA              NA -0.052288195             NA             NA
#>  7:              NA              NA -0.059565207             NA             NA
#>  8:              NA              NA -0.079194543             NA             NA
#>  9:              NA              NA -0.054194930             NA             NA
#> 10:              NA              NA -0.070396828             NA             NA
#> 11:              NA              NA -0.104553349             NA             NA
#> 12:   1.361508e-101   2.995318e-100  0.226406681  1.442316e-102  1.586548e-101
#> 13:    1.741438e-54    1.277054e-53 -0.147970604   3.242185e-44   2.377602e-43
#> 14:    3.293334e-32    1.811334e-31 -0.120540610   8.215231e-30   4.518377e-29
#> 15:    6.284641e-01    6.583910e-01 -0.012675137   2.347404e-01   2.582144e-01
#> 16:    2.727173e-13    5.454346e-13 -0.062540590   4.400100e-09   8.800200e-09
#> 17:    2.489663e-06    3.912327e-06 -0.025518430   1.673267e-02   2.045104e-02
#> 18:    8.674931e-19    2.726407e-18 -0.082830711   7.364690e-15   2.025290e-14
#> 19:    1.919930e-13    4.223846e-13 -0.082640252   8.485519e-15   2.074238e-14
#> 20:    1.294659e-15    3.164722e-15  0.068343991   1.415829e-10   3.114825e-10
#> 21:    1.461350e-26    6.429942e-26 -0.110539100   2.640853e-25   1.161975e-24
#> 22:    4.903648e-04    6.742517e-04 -0.045806246   1.737439e-05   2.548244e-05
#> 23:    1.229995e-99    1.352995e-98  0.227171304  2.883308e-103  6.343277e-102
#> 24:    2.994988e-10    5.490811e-10 -0.061136095   9.672983e-09   1.773380e-08
#> 25:    4.713922e-03    5.458226e-03 -0.037848476   3.863289e-04   5.312023e-04
#> 26:    6.024403e-02    6.626843e-02 -0.011565237   2.782846e-01   2.915363e-01
#> 27:    3.561138e-04    5.223003e-04 -0.045849271   1.706068e-05   2.548244e-05
#> 28:    1.101426e-15    3.028921e-15 -0.093611984   1.434812e-18   5.260979e-18
#> 29:    2.507656e-03    3.064913e-03 -0.021745678   4.147826e-02   4.802746e-02
#> 30:    1.376852e-06    2.330056e-06  0.049769766   3.035636e-06   5.137229e-06
#> 31:    1.197236e-21    4.389864e-21  0.088824872   7.237642e-17   2.274688e-16
#> 32:    8.751127e-04    1.132499e-03 -0.030254556   4.557359e-03   5.897759e-03
#> 33:    7.129654e-01    7.129654e-01 -0.004450902   6.765050e-01   6.765050e-01
#>     spearman_pvalue spearman_qvalue pearson_coef pearson_pvalue pearson_qvalue
#>            GSEA spearman_rank pearson_rank GSEA_rank rankProduct_rank
#>           <num>         <num>        <num>     <num>            <num>
#>  1:  0.49159830             1            1         1                1
#>  2:  0.10895716             2            2         2                2
#>  3:  0.00000000             3            3         4                3
#>  4:  0.00000000             4            4         4                4
#>  5: -0.14106687             5            5         9                5
#>  6: -0.12213138             7            6         7                6
#>  7: -0.03895537             8            8         6                7
#>  8:  0.00000000            10           10         4                8
#>  9: -0.14171766             6            7        10                9
#> 10: -0.13105820             9            9         8               10
#> 11: -0.16906703            11           11        11               11
#> 12:  0.49328365            NA           NA        NA               NA
#> 13: -0.33813405            NA           NA        NA               NA
#> 14:  0.00000000            NA           NA        NA               NA
#> 15:  0.00000000            NA           NA        NA               NA
#> 16:  0.00000000            NA           NA        NA               NA
#> 17:  0.23045793            NA           NA        NA               NA
#> 18: -0.24426276            NA           NA        NA               NA
#> 19:  0.00000000            NA           NA        NA               NA
#> 20:  0.21791433            NA           NA        NA               NA
#> 21: -0.26211640            NA           NA        NA               NA
#> 22: -0.28213373            NA           NA        NA               NA
#> 23:  0.48991295            NA           NA        NA               NA
#> 24:  0.00000000            NA           NA        NA               NA
#> 25:  0.00000000            NA           NA        NA               NA
#> 26:  0.00000000            NA           NA        NA               NA
#> 27: -0.28343532            NA           NA        NA               NA
#> 28: -0.30836866            NA           NA        NA               NA
#> 29:  0.00000000            NA           NA        NA               NA
#> 30:  0.00000000            NA           NA        NA               NA
#> 31:  0.00000000            NA           NA        NA               NA
#> 32:  0.00000000            NA           NA        NA               NA
#> 33:  0.00000000            NA           NA        NA               NA
#>            GSEA spearman_rank pearson_rank GSEA_rank rankProduct_rank

# Rank similar CMap perturbations using only Spearman's correlation
rankSimilarPerturbations(diffExprStat, perturbations, method="spearman")
#> Subsetting data based on 8790 intersecting genes (65% of the 13451 input genes)...
#> Comparing against 22 CMap perturbations (2 cell lines) using 'spearman'...
#> Comparison performed in 0.52 secs
#>                              compound_perturbation spearman_coef
#>                                             <char>         <num>
#>  1:          CVD001_24H:BRD-A14014306-001-01-1:4.1   0.224252829
#>  2:       CVD001_24H:BRD-K84595254-001-03-0:4.9444   0.093416734
#>  3:          CVD001_24H:BRD-K41172353-001-01-4:4.7  -0.007440278
#>  4:        CVD001_24H:BRD-K84389640-001-01-5:4.225  -0.013429614
#>  5:       CVD001_24H:BRD-K96188950-001-04-5:4.3967  -0.020549215
#>  6:       CVD001_24H:BRD-K60476892-001-02-1:4.1072  -0.057954380
#>  7:        CVD001_24H:BRD-K77508012-001-01-9:6.025  -0.063221230
#>  8:       CVD001_24H:BRD-K62810658-001-05-6:4.6768  -0.067773347
#>  9:          CVD001_24H:BRD-K94818765-001-01-0:4.8  -0.074453675
#> 10:         CVD001_24H:BRD-K31030218-001-01-1:4.25  -0.077835425
#> 11:         CVD001_24H:BRD-A65142661-034-01-8:5.35  -0.115903134
#> 12:    CVD001_HEPG2_24H:BRD-A14014306-001-01-1:4.1   0.225335666
#> 13:   CVD001_HEPG2_24H:BRD-A65142661-034-01-8:5.35  -0.164686339
#> 14:   CVD001_HEPG2_24H:BRD-K31030218-001-01-1:4.25  -0.125531451
#> 15:    CVD001_HEPG2_24H:BRD-K41172353-001-01-4:4.7   0.005161915
#> 16: CVD001_HEPG2_24H:BRD-K60476892-001-02-1:4.1072  -0.077832390
#> 17: CVD001_HEPG2_24H:BRD-K62810658-001-05-6:4.6768  -0.050201852
#> 18:  CVD001_HEPG2_24H:BRD-K77508012-001-01-9:6.025  -0.094208866
#> 19:  CVD001_HEPG2_24H:BRD-K84389640-001-01-5:4.225  -0.078331814
#> 20: CVD001_HEPG2_24H:BRD-K84595254-001-03-0:4.9444   0.085133810
#> 21:    CVD001_HEPG2_24H:BRD-K94818765-001-01-0:4.8  -0.113418048
#> 22: CVD001_HEPG2_24H:BRD-K96188950-001-04-5:4.3967  -0.037174082
#> 23:     CVD001_HUH7_24H:BRD-A14014306-001-01-1:4.1   0.223169991
#> 24:    CVD001_HUH7_24H:BRD-A65142661-034-01-8:5.35  -0.067119930
#> 25:    CVD001_HUH7_24H:BRD-K31030218-001-01-1:4.25  -0.030139400
#> 26:     CVD001_HUH7_24H:BRD-K41172353-001-01-4:4.7  -0.020042471
#> 27:  CVD001_HUH7_24H:BRD-K60476892-001-02-1:4.1072  -0.038076371
#> 28:  CVD001_HUH7_24H:BRD-K62810658-001-05-6:4.6768  -0.085344842
#> 29:   CVD001_HUH7_24H:BRD-K77508012-001-01-9:6.025  -0.032233593
#> 30:   CVD001_HUH7_24H:BRD-K84389640-001-01-5:4.225   0.051472586
#> 31:  CVD001_HUH7_24H:BRD-K84595254-001-03-0:4.9444   0.101699658
#> 32:     CVD001_HUH7_24H:BRD-K94818765-001-01-0:4.8  -0.035489302
#> 33:  CVD001_HUH7_24H:BRD-K96188950-001-04-5:4.3967  -0.003924349
#>                              compound_perturbation spearman_coef
#>     spearman_pvalue spearman_qvalue spearman_rank
#>               <num>           <num>         <num>
#>  1:              NA              NA             1
#>  2:              NA              NA             2
#>  3:              NA              NA             3
#>  4:              NA              NA             4
#>  5:              NA              NA             5
#>  6:              NA              NA             6
#>  7:              NA              NA             7
#>  8:              NA              NA             8
#>  9:              NA              NA             9
#> 10:              NA              NA            10
#> 11:              NA              NA            11
#> 12:   1.361508e-101   2.995318e-100            NA
#> 13:    1.741438e-54    1.277054e-53            NA
#> 14:    3.293334e-32    1.811334e-31            NA
#> 15:    6.284641e-01    6.583910e-01            NA
#> 16:    2.727173e-13    5.454346e-13            NA
#> 17:    2.489663e-06    3.912327e-06            NA
#> 18:    8.674931e-19    2.726407e-18            NA
#> 19:    1.919930e-13    4.223846e-13            NA
#> 20:    1.294659e-15    3.164722e-15            NA
#> 21:    1.461350e-26    6.429942e-26            NA
#> 22:    4.903648e-04    6.742517e-04            NA
#> 23:    1.229995e-99    1.352995e-98            NA
#> 24:    2.994988e-10    5.490811e-10            NA
#> 25:    4.713922e-03    5.458226e-03            NA
#> 26:    6.024403e-02    6.626843e-02            NA
#> 27:    3.561138e-04    5.223003e-04            NA
#> 28:    1.101426e-15    3.028921e-15            NA
#> 29:    2.507656e-03    3.064913e-03            NA
#> 30:    1.376852e-06    2.330056e-06            NA
#> 31:    1.197236e-21    4.389864e-21            NA
#> 32:    8.751127e-04    1.132499e-03            NA
#> 33:    7.129654e-01    7.129654e-01            NA
#>     spearman_pvalue spearman_qvalue spearman_rank