R/CMap.R
rankSimilarPerturbations.Rd
Compare differential expression results against CMap perturbations.
rankSimilarPerturbations(
input,
perturbations,
method = c("spearman", "pearson", "gsea"),
geneSize = 150,
cellLineMean = "auto",
rankPerCellLine = FALSE,
threads = 1,
chunkGiB = 1,
verbose = FALSE
)
Named numeric vector
of differentially expressed genes
whose names are gene identifiers and respective values are a statistic that
represents significance and magnitude of differentially expressed genes
(e.g. t-statistics); or character
of gene symbols composing a gene
set that is tested for enrichment in reference data (only used if
method
includes gsea
)
perturbationChanges
object: CMap perturbations
(check prepareCMapPerturbations()
)
Character: comparison method (spearman
, pearson
or gsea
; multiple methods may be selected at once)
Numeric: number of top up-/down-regulated genes to use as
gene sets to test for enrichment in reference data; if a 2-length numeric
vector, the first index is the number of top up-regulated genes and the
second index is the number of down-regulated genes used to create gene
sets; only used if method
includes gsea
and if input
is not a gene set
Boolean: add rows with the mean of method
across
cell lines? If cellLineMean = "auto"
(default), rows will be added
when data for more than one cell line is available.
Boolean: rank results based on both individual cell
lines and mean scores across cell lines (TRUE
) or based on mean
scores alone (FALSE
)? If cellLineMean = FALSE
, individual
cell line conditions are always ranked.
Integer: number of parallel threads
Numeric: if second argument is a path to an HDF5 file
(.h5
extension), that file is loaded and processed in chunks of a
given size in gibibytes (GiB); lower values decrease peak RAM usage (see
details below)
Boolean: print additional details?
Data table with correlation and/or GSEA score results
If a file path to a valid HDF5 (.h5
) file is provided instead of a
data matrix, that file can be loaded and processed in chunks of size
chunkGiB
, resulting in decreased peak memory usage.
The default value of 1 GiB (1 GiB = 1024^3 bytes) allows loading chunks of ~10000 columns and
14000 rows (10000 * 14000 * 8 bytes / 1024^3 = 1.04 GiB
).
When method = "gsea"
, weighted connectivity scores (WTCS) are
calculated (https://clue.io/connectopedia/cmap_algorithms).
Other functions related with the ranking of CMap perturbations:
as.table.referenceComparison()
,
filterCMapMetadata()
,
getCMapConditions()
,
getCMapPerturbationTypes()
,
loadCMapData()
,
loadCMapZscores()
,
parseCMapID()
,
plot.perturbationChanges()
,
plot.referenceComparison()
,
plotTargetingDrugsVSsimilarPerturbations()
,
prepareCMapPerturbations()
,
print.similarPerturbations()
# Example of a differential expression profile
data("diffExprStat")
if (FALSE) { # \dontrun{
# Download and load CMap perturbations to compare with
cellLine <- c("HepG2", "HUH7")
cmapMetadataCompounds <- filterCMapMetadata(
"cmapMetadata.txt", cellLine=cellLine, timepoint="24 h",
dosage="5 \u00B5M", perturbationType="Compound")
cmapPerturbationsCompounds <- prepareCMapPerturbations(
cmapMetadataCompounds, "cmapZscores.gctx", "cmapGeneInfo.txt",
"cmapCompoundInfo_drugs.txt", loadZscores=TRUE)
} # }
perturbations <- cmapPerturbationsCompounds
# Rank similar CMap perturbations (by default, Spearman's and Pearson's
# correlation are used, as well as GSEA with the top and bottom 150 genes of
# the differential expression profile used as reference)
rankSimilarPerturbations(diffExprStat, perturbations)
#> Subsetting data based on 8790 intersecting genes (65% of the 13451 input genes)...
#> Comparing against 22 CMap perturbations (2 cell lines) using 'spearman, pearson, gsea' (gene size of 150)...
#> Comparison performed in 0.78 secs
#> compound_perturbation spearman_coef
#> <char> <num>
#> 1: CVD001_24H:BRD-A14014306-001-01-1:4.1 0.224252829
#> 2: CVD001_24H:BRD-K84595254-001-03-0:4.9444 0.093416734
#> 3: CVD001_24H:BRD-K41172353-001-01-4:4.7 -0.007440278
#> 4: CVD001_24H:BRD-K84389640-001-01-5:4.225 -0.013429614
#> 5: CVD001_24H:BRD-K96188950-001-04-5:4.3967 -0.020549215
#> 6: CVD001_24H:BRD-K77508012-001-01-9:6.025 -0.063221230
#> 7: CVD001_24H:BRD-K62810658-001-05-6:4.6768 -0.067773347
#> 8: CVD001_24H:BRD-K31030218-001-01-1:4.25 -0.077835425
#> 9: CVD001_24H:BRD-K60476892-001-02-1:4.1072 -0.057954380
#> 10: CVD001_24H:BRD-K94818765-001-01-0:4.8 -0.074453675
#> 11: CVD001_24H:BRD-A65142661-034-01-8:5.35 -0.115903134
#> 12: CVD001_HEPG2_24H:BRD-A14014306-001-01-1:4.1 0.225335666
#> 13: CVD001_HEPG2_24H:BRD-A65142661-034-01-8:5.35 -0.164686339
#> 14: CVD001_HEPG2_24H:BRD-K31030218-001-01-1:4.25 -0.125531451
#> 15: CVD001_HEPG2_24H:BRD-K41172353-001-01-4:4.7 0.005161915
#> 16: CVD001_HEPG2_24H:BRD-K60476892-001-02-1:4.1072 -0.077832390
#> 17: CVD001_HEPG2_24H:BRD-K62810658-001-05-6:4.6768 -0.050201852
#> 18: CVD001_HEPG2_24H:BRD-K77508012-001-01-9:6.025 -0.094208866
#> 19: CVD001_HEPG2_24H:BRD-K84389640-001-01-5:4.225 -0.078331814
#> 20: CVD001_HEPG2_24H:BRD-K84595254-001-03-0:4.9444 0.085133810
#> 21: CVD001_HEPG2_24H:BRD-K94818765-001-01-0:4.8 -0.113418048
#> 22: CVD001_HEPG2_24H:BRD-K96188950-001-04-5:4.3967 -0.037174082
#> 23: CVD001_HUH7_24H:BRD-A14014306-001-01-1:4.1 0.223169991
#> 24: CVD001_HUH7_24H:BRD-A65142661-034-01-8:5.35 -0.067119930
#> 25: CVD001_HUH7_24H:BRD-K31030218-001-01-1:4.25 -0.030139400
#> 26: CVD001_HUH7_24H:BRD-K41172353-001-01-4:4.7 -0.020042471
#> 27: CVD001_HUH7_24H:BRD-K60476892-001-02-1:4.1072 -0.038076371
#> 28: CVD001_HUH7_24H:BRD-K62810658-001-05-6:4.6768 -0.085344842
#> 29: CVD001_HUH7_24H:BRD-K77508012-001-01-9:6.025 -0.032233593
#> 30: CVD001_HUH7_24H:BRD-K84389640-001-01-5:4.225 0.051472586
#> 31: CVD001_HUH7_24H:BRD-K84595254-001-03-0:4.9444 0.101699658
#> 32: CVD001_HUH7_24H:BRD-K94818765-001-01-0:4.8 -0.035489302
#> 33: CVD001_HUH7_24H:BRD-K96188950-001-04-5:4.3967 -0.003924349
#> compound_perturbation spearman_coef
#> spearman_pvalue spearman_qvalue pearson_coef pearson_pvalue pearson_qvalue
#> <num> <num> <num> <num> <num>
#> 1: NA NA 0.226788992 NA NA
#> 2: NA NA 0.078584431 NA NA
#> 3: NA NA -0.012120187 NA NA
#> 4: NA NA -0.016435243 NA NA
#> 5: NA NA -0.025128574 NA NA
#> 6: NA NA -0.052288195 NA NA
#> 7: NA NA -0.059565207 NA NA
#> 8: NA NA -0.079194543 NA NA
#> 9: NA NA -0.054194930 NA NA
#> 10: NA NA -0.070396828 NA NA
#> 11: NA NA -0.104553349 NA NA
#> 12: 1.361508e-101 2.995318e-100 0.226406681 1.442316e-102 1.586548e-101
#> 13: 1.741438e-54 1.277054e-53 -0.147970604 3.242185e-44 2.377602e-43
#> 14: 3.293334e-32 1.811334e-31 -0.120540610 8.215231e-30 4.518377e-29
#> 15: 6.284641e-01 6.583910e-01 -0.012675137 2.347404e-01 2.582144e-01
#> 16: 2.727173e-13 5.454346e-13 -0.062540590 4.400100e-09 8.800200e-09
#> 17: 2.489663e-06 3.912327e-06 -0.025518430 1.673267e-02 2.045104e-02
#> 18: 8.674931e-19 2.726407e-18 -0.082830711 7.364690e-15 2.025290e-14
#> 19: 1.919930e-13 4.223846e-13 -0.082640252 8.485519e-15 2.074238e-14
#> 20: 1.294659e-15 3.164722e-15 0.068343991 1.415829e-10 3.114825e-10
#> 21: 1.461350e-26 6.429942e-26 -0.110539100 2.640853e-25 1.161975e-24
#> 22: 4.903648e-04 6.742517e-04 -0.045806246 1.737439e-05 2.548244e-05
#> 23: 1.229995e-99 1.352995e-98 0.227171304 2.883308e-103 6.343277e-102
#> 24: 2.994988e-10 5.490811e-10 -0.061136095 9.672983e-09 1.773380e-08
#> 25: 4.713922e-03 5.458226e-03 -0.037848476 3.863289e-04 5.312023e-04
#> 26: 6.024403e-02 6.626843e-02 -0.011565237 2.782846e-01 2.915363e-01
#> 27: 3.561138e-04 5.223003e-04 -0.045849271 1.706068e-05 2.548244e-05
#> 28: 1.101426e-15 3.028921e-15 -0.093611984 1.434812e-18 5.260979e-18
#> 29: 2.507656e-03 3.064913e-03 -0.021745678 4.147826e-02 4.802746e-02
#> 30: 1.376852e-06 2.330056e-06 0.049769766 3.035636e-06 5.137229e-06
#> 31: 1.197236e-21 4.389864e-21 0.088824872 7.237642e-17 2.274688e-16
#> 32: 8.751127e-04 1.132499e-03 -0.030254556 4.557359e-03 5.897759e-03
#> 33: 7.129654e-01 7.129654e-01 -0.004450902 6.765050e-01 6.765050e-01
#> spearman_pvalue spearman_qvalue pearson_coef pearson_pvalue pearson_qvalue
#> GSEA spearman_rank pearson_rank GSEA_rank rankProduct_rank
#> <num> <num> <num> <num> <num>
#> 1: 0.49159830 1 1 1 1
#> 2: 0.10895716 2 2 2 2
#> 3: 0.00000000 3 3 4 3
#> 4: 0.00000000 4 4 4 4
#> 5: -0.14106687 5 5 9 5
#> 6: -0.12213138 7 6 7 6
#> 7: -0.03895537 8 8 6 7
#> 8: 0.00000000 10 10 4 8
#> 9: -0.14171766 6 7 10 9
#> 10: -0.13105820 9 9 8 10
#> 11: -0.16906703 11 11 11 11
#> 12: 0.49328365 NA NA NA NA
#> 13: -0.33813405 NA NA NA NA
#> 14: 0.00000000 NA NA NA NA
#> 15: 0.00000000 NA NA NA NA
#> 16: 0.00000000 NA NA NA NA
#> 17: 0.23045793 NA NA NA NA
#> 18: -0.24426276 NA NA NA NA
#> 19: 0.00000000 NA NA NA NA
#> 20: 0.21791433 NA NA NA NA
#> 21: -0.26211640 NA NA NA NA
#> 22: -0.28213373 NA NA NA NA
#> 23: 0.48991295 NA NA NA NA
#> 24: 0.00000000 NA NA NA NA
#> 25: 0.00000000 NA NA NA NA
#> 26: 0.00000000 NA NA NA NA
#> 27: -0.28343532 NA NA NA NA
#> 28: -0.30836866 NA NA NA NA
#> 29: 0.00000000 NA NA NA NA
#> 30: 0.00000000 NA NA NA NA
#> 31: 0.00000000 NA NA NA NA
#> 32: 0.00000000 NA NA NA NA
#> 33: 0.00000000 NA NA NA NA
#> GSEA spearman_rank pearson_rank GSEA_rank rankProduct_rank
# Rank similar CMap perturbations using only Spearman's correlation
rankSimilarPerturbations(diffExprStat, perturbations, method="spearman")
#> Subsetting data based on 8790 intersecting genes (65% of the 13451 input genes)...
#> Comparing against 22 CMap perturbations (2 cell lines) using 'spearman'...
#> Comparison performed in 0.53 secs
#> compound_perturbation spearman_coef
#> <char> <num>
#> 1: CVD001_24H:BRD-A14014306-001-01-1:4.1 0.224252829
#> 2: CVD001_24H:BRD-K84595254-001-03-0:4.9444 0.093416734
#> 3: CVD001_24H:BRD-K41172353-001-01-4:4.7 -0.007440278
#> 4: CVD001_24H:BRD-K84389640-001-01-5:4.225 -0.013429614
#> 5: CVD001_24H:BRD-K96188950-001-04-5:4.3967 -0.020549215
#> 6: CVD001_24H:BRD-K60476892-001-02-1:4.1072 -0.057954380
#> 7: CVD001_24H:BRD-K77508012-001-01-9:6.025 -0.063221230
#> 8: CVD001_24H:BRD-K62810658-001-05-6:4.6768 -0.067773347
#> 9: CVD001_24H:BRD-K94818765-001-01-0:4.8 -0.074453675
#> 10: CVD001_24H:BRD-K31030218-001-01-1:4.25 -0.077835425
#> 11: CVD001_24H:BRD-A65142661-034-01-8:5.35 -0.115903134
#> 12: CVD001_HEPG2_24H:BRD-A14014306-001-01-1:4.1 0.225335666
#> 13: CVD001_HEPG2_24H:BRD-A65142661-034-01-8:5.35 -0.164686339
#> 14: CVD001_HEPG2_24H:BRD-K31030218-001-01-1:4.25 -0.125531451
#> 15: CVD001_HEPG2_24H:BRD-K41172353-001-01-4:4.7 0.005161915
#> 16: CVD001_HEPG2_24H:BRD-K60476892-001-02-1:4.1072 -0.077832390
#> 17: CVD001_HEPG2_24H:BRD-K62810658-001-05-6:4.6768 -0.050201852
#> 18: CVD001_HEPG2_24H:BRD-K77508012-001-01-9:6.025 -0.094208866
#> 19: CVD001_HEPG2_24H:BRD-K84389640-001-01-5:4.225 -0.078331814
#> 20: CVD001_HEPG2_24H:BRD-K84595254-001-03-0:4.9444 0.085133810
#> 21: CVD001_HEPG2_24H:BRD-K94818765-001-01-0:4.8 -0.113418048
#> 22: CVD001_HEPG2_24H:BRD-K96188950-001-04-5:4.3967 -0.037174082
#> 23: CVD001_HUH7_24H:BRD-A14014306-001-01-1:4.1 0.223169991
#> 24: CVD001_HUH7_24H:BRD-A65142661-034-01-8:5.35 -0.067119930
#> 25: CVD001_HUH7_24H:BRD-K31030218-001-01-1:4.25 -0.030139400
#> 26: CVD001_HUH7_24H:BRD-K41172353-001-01-4:4.7 -0.020042471
#> 27: CVD001_HUH7_24H:BRD-K60476892-001-02-1:4.1072 -0.038076371
#> 28: CVD001_HUH7_24H:BRD-K62810658-001-05-6:4.6768 -0.085344842
#> 29: CVD001_HUH7_24H:BRD-K77508012-001-01-9:6.025 -0.032233593
#> 30: CVD001_HUH7_24H:BRD-K84389640-001-01-5:4.225 0.051472586
#> 31: CVD001_HUH7_24H:BRD-K84595254-001-03-0:4.9444 0.101699658
#> 32: CVD001_HUH7_24H:BRD-K94818765-001-01-0:4.8 -0.035489302
#> 33: CVD001_HUH7_24H:BRD-K96188950-001-04-5:4.3967 -0.003924349
#> compound_perturbation spearman_coef
#> spearman_pvalue spearman_qvalue spearman_rank
#> <num> <num> <num>
#> 1: NA NA 1
#> 2: NA NA 2
#> 3: NA NA 3
#> 4: NA NA 4
#> 5: NA NA 5
#> 6: NA NA 6
#> 7: NA NA 7
#> 8: NA NA 8
#> 9: NA NA 9
#> 10: NA NA 10
#> 11: NA NA 11
#> 12: 1.361508e-101 2.995318e-100 NA
#> 13: 1.741438e-54 1.277054e-53 NA
#> 14: 3.293334e-32 1.811334e-31 NA
#> 15: 6.284641e-01 6.583910e-01 NA
#> 16: 2.727173e-13 5.454346e-13 NA
#> 17: 2.489663e-06 3.912327e-06 NA
#> 18: 8.674931e-19 2.726407e-18 NA
#> 19: 1.919930e-13 4.223846e-13 NA
#> 20: 1.294659e-15 3.164722e-15 NA
#> 21: 1.461350e-26 6.429942e-26 NA
#> 22: 4.903648e-04 6.742517e-04 NA
#> 23: 1.229995e-99 1.352995e-98 NA
#> 24: 2.994988e-10 5.490811e-10 NA
#> 25: 4.713922e-03 5.458226e-03 NA
#> 26: 6.024403e-02 6.626843e-02 NA
#> 27: 3.561138e-04 5.223003e-04 NA
#> 28: 1.101426e-15 3.028921e-15 NA
#> 29: 2.507656e-03 3.064913e-03 NA
#> 30: 1.376852e-06 2.330056e-06 NA
#> 31: 1.197236e-21 4.389864e-21 NA
#> 32: 8.751127e-04 1.132499e-03 NA
#> 33: 7.129654e-01 7.129654e-01 NA
#> spearman_pvalue spearman_qvalue spearman_rank