[datatable-help] Problem(s) finding p-values for numerous spearman correlations
Izzy_M
izabellamurphy at aol.co.uk
Mon Nov 27 18:22:51 CET 2017
Hi everyone,
I am very, very new to R, and I'm trying to work out the p-values for
thousands of spearman correlation scores.
Essentially, I have imported a large dataset from a CSV file (366 obs. of
73775 variables) into R Studio. Along the x-axis, I have a series of words,
the y-axis contains dates, and the data is the relative frequencies of each
of the words on that particular date. Essentially, I am trying to see if the
frequency of any/all of the given words increases significantly over the
course of a year.
After some trial and error (and a lot of Googling!), I have a code which
successfully stores the Spearman Correlation values in a matrix:
x <- my_data[1:73775]
y <- my_data[1]
corrs3 <- round(cor(x, y, method = "spearman", use="complete.obs"), 3)
This code stores the words in one column of the matrix and their Spearman
value in the second column However, what I need to do now is to calculate
the corresponding p-values for each of the variables. I have been able to
this for individual variables by running the following code (although I do
get a warning saying "Cannot compute exact p-value with ties", but I've been
told that this isn't a major problem?):
cor.test(1:73775, my_data$romcom, method = "spearman")
However, what I would ideally like to do is store the p-value next to the
Spearman value in the matrix (if that is possible).
The consensus seems to be that Hmisc is the ideal tool for this kind of
thing, so I installed that library, and I've been attempting to run it as
follows
flattenCorrMatrix <- function(cormat, pmat) {
ut <- upper.tri(cormat)
data.frame(
row = rownames(cormat)[row(cormat)[ut]],
column = rownames(cormat)[col(cormat)[ut]],
cor =(cormat)[ut],
p = pmat[ut]
)
}
x <- my_data[1:73775]
y <- my_data[1]
library(Hmisc)
res2<-rcorr(as.matrix(my_data[x,y]))
flattenCorrMatrix(res2$r, res2$P)
However, I get an error message, stating:
"Unsupported index type: tbl_df".
And I'm unsure how to fix this.
I've also tried bypassing Hmisc and using the following:
x <- my_data[1:73775]
y <- my_data[1]
corrs3 <- round(cor.test(x, y, method = "spearman", use="complete.obs"), 3)
But this returns the error message:
Error in cor.test.default(x, y, method = "spearman", use = "complete.obs") :
'x' and 'y' must have the same length
More Googling suggested that the "corr.test" function from the psych library
would be better. However, when I use the following code:
x <- my_data[1:73775]
y <- my_data[1]
library("psych")
corr.test(x, y = NULL, use = "pairwise", method="spearman", ci=TRUE)
I get the following error message:
Error: cannot allocate vector of size 40.6 Gb
I'm really out of options now, and I would really appreciate any
suggestions!
Thanks!
--
Sent from: http://r.789695.n4.nabble.com/datatable-help-f2315188.html
More information about the datatable-help
mailing list