[datatable-help] split data table column aka tidyr separate function

Carl Sutton suttoncarl at ymail.com
Mon Dec 19 04:16:21 CET 2016


Hi
I have searched the last couple of days for a way to do this but have not found a solution.   With real data, I have used tidyr to do the task but:1.   It has used all available memory (12gb on older desktop)2.   Future tables will be even larger so would need to be split
3.  It is is s l ow, perhaps due to lack of free memory.
The data is provided in a format such that a variable "name" (and there are several like this) actually contains the variable name and  indices, i.e. var_09 is the ninth level of that variable.   The data analysis needs that level as a separate variable.  Code and toy data set are below.
#  column split test
library(data.table)
library(tidyr)
#  data table for melt and columns split
dt1 <- data.table(a_1 = 1:10, b_2 = 20:29,folks = c("art","brian","ed",
                "rich","dennis","frank", "derrick","paul","fred","numnuts"),
                  a_2 = 2:11, b_1 = 21:30)
melt(dt1, id = "folks")  #  so far so good
dt1[,c("a") := tstrsplit(c(a_1),"_",fixed = TRUE)][,c("a") := tstrsplit(c(a_2),
                          "_",fixed = TRUE)][]
#  That is not producing what I want

#  tidyr gives what I want
df <- data.frame(a_1 = 1:10, b_2 = 20:29,folks = c("art","brian","ed",
                "rich","dennis","frank", "derrick","paul","fred","numnuts"),
                 a_2 = 2:11, b_1 = 21:30)
df %>% gather(value, nums, -folks) %>%
        separate(value, c("varTYpe","varIndex")) Carl Sutton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20161219/b3208449/attachment.html>


More information about the datatable-help mailing list