[datatable-help] Shuffle row-wise, column independently
Nicolas Paris
niparisco at gmail.com
Fri Jan 6 01:09:01 CET 2017
Hey,
Thanks for suggestion but this didn't work.
Method 1 : use of data.table / sample
> set.seed(1); size <- 100000000; dt <-
data.table::data.table("a"=c(1:size),"b"=rep(letters[1:10],size/10));head(dt);system.time(
dt[,c("a","b"):=list(sample(a),sample(b))]
);head(dt)
a b
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
6: 6 f
utilisateur système écoulé
10.190 0.252 10.456
a b
1: 26550867 a
2: 37212390 b
3: 57285336 c
4: 90820777 e
5: 20168193 a
6: 89838965 h
Method 2 : use of factor / data.table / sample
> set.seed(1); size <- 100000000; dt <-
data.table::data.table("a"=c(1:size),"b"=as.factor(rep(letters[1:10],size/10)));head(dt);system.time(
dt[,c("a","b"):=list(sample(a),sample(b))]
);head(dt)
a b
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
6: 6 f
utilisateur système écoulé
9.271 0.276 9.559
a b
1: 26550867 a
2: 37212390 b
3: 57285336 c
4: 90820777 e
5: 20168193 a
6: 89838965 h
Method 3: Use of internal / data.table / factor
> set.seed(1); size <- 100000000; dt <-
data.table::data.table("a"=c(1:size),"b"=as.factor(rep(letters[1:10],size/10)));head(dt);system.time(
dt[,c("a","b"):=list(a[.Internal(sample(size, size, FALSE,
NULL))],b[.Internal(sample(size, size, FALSE, NULL))])]
);head(dt)
a b
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
6: 6 f
utilisateur système écoulé
8.786 0.137 8.935
a b
1: 26550867 a
2: 37212390 b
3: 57285336 c
4: 90820777 e
5: 20168193 a
6: 89838965 h
Method 4 (thanks for pointing it banded): set / factor / sample
> set.seed(1); size <- 100000000; dt <-
data.table::data.table("a"=c(1:size),"b"=as.factor(rep(letters[1:10],size/10)));head(dt);system.time({
set(dt,j="a",value=sample(dt$a));
set(dt,j="b",value=sample(dt$b))}
);head(dt);
a b
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
6: 6 f
utilisateur système écoulé
8.790 0.204 9.006
a b
1: 26550867 a
2: 37212390 b
3: 57285336 c
4: 90820777 e
5: 20168193 a
6: 89838965 h
Method 5 use of a data.frame
> set.seed(1); size <- 100000000; dt <-
data.frame("a"=c(1:size),"b"=as.factor(rep(letters[1:10],size/10)));head(dt);system.time({
dt$a <- sample(dt$a);dt$b <- sample(dt$b)
});head(dt);
a b
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
utilisateur système écoulé
8.755 0.152 8.921
a b
1 26550867 a
2 37212390 b
3 57285336 c
4 90820777 e
5 20168193 a
6 89838965 h
sadly, data.table does not improve. sample is the bottleneck
2017-01-05 14:20 GMT+01:00 banded08 <david.awam.jansen at gmail.com>:
> Maybe not the fastest of most efficient, but this should work
>
> for(ii in 1:dim(dt1)[1]) set(dt1, ii, 1:dim(dt1)[2] ,sample(dt1[ii]))
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/
> Shuffle-row-wise-column-independently-tp4727865p4727871.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20170106/ae35071d/attachment.html>
More information about the datatable-help
mailing list