[datatable-help] Can you explain what is going on???

Gerald Jean gerald.jean at dgag.ca
Thu May 14 20:45:09 CEST 2015


Hello,

thanks Frank, you were right.  I am converting roughly 2000 lines of code using data.frames to the data.table way, this one skipped me!!!  By the way, on this data set, 4750880 observations, the processing time went from 1hr.45 to 12.5 minutes.  If we could parallelize this it would run under a minute, I have 24 processors on the server where that runs.

Thanks again,

Gérald

[cid:image001.gif at 01D08E54.6EF8F1B0]

Gerald Jean, M. Sc. en statistiques
Conseiller senior en statistiques

Actuariat corporatif,
Modélisation et Recherche
Assurance de dommages
Mouvement Desjardins


Lévis (siège social)

418 835-4900,
poste 5527639
1 877 835-4900,
poste 5527639
Télécopieur : 418 835-6657







Faites bonne impression et imprimez seulement au besoin!

Ce courriel est confidentiel, peut être protégé par le secret professionnel et est adressé exclusivement au destinataire. Il est strictement interdit à toute autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. Merci.



De : by.hook.or at gmail.com [mailto:by.hook.or at gmail.com] De la part de Frank Erickson
Envoyé : 14 mai 2015 13:19
À : Gerald Jean
Cc : datatable-help at lists.r-forge.r-project.org
Objet : Re: [datatable-help] Can you explain what is going on???

Hi Gérald,

Your question is not really data.table specific, I think. Your
ttt[ttt == "0"] <- "O"
does not affect the result because you overwrite with
ttt <- ifelse(...
immediately afterwards. Maybe you meant to have ttt on the right-hand side of the latter command, instead of membre.

--Frank


On Thu, May 14, 2015 at 10:04 AM, Gerald Jean <gerald.jean at dgag.ca<mailto:gerald.jean at dgag.ca>> wrote:
Hello,

the following code is extracted from a function where roughly 150 variables of a large data set are transformed using data.table.

The variable “membre” was coming out with one missing value, in trying to understand why, I extracted the code from the function, added a few “cat” statements and ran it directly in the terminal.

ttt.test.sima[, ":="  (membre = {##
+       cat(" Processing: membre", sep = "\n")
+       ttt <- membre
+       cat(paste(" Class ttt = ", class(ttt), sep = ""), sep = "\n")
+       cat(paste(" Length ttt = ", length(ttt), sep = ""), sep = "\n")
+       cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep = "\n")
+       ttt[ttt == "0"] <- "O"  ## A few capital “O” are coded as zero “0”.
+       cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep = "\n")
+       ttt <- ifelse(PROV != " QC", " OAO",
+                     ifelse(membre == "", " Ma  ", membre))
+       cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep = "\n")
+       merge.levels(factor(ttt, levels = c("O", "N", " Ma  ", " OAO"),
+                           labels = c(" Oui", " Non", " Ma ", " OAO")),
+                    k = list(" Oui" = c(" Oui", " OAO")))})]
Processing: membre
Class ttt = character
Length ttt = 4750880
sum(ttt == 0) = 2
sum(ttt == 0) = 0
sum(ttt == 0) = 1

I don’t understand why after the « ifelse» statement the temporary variable « ttt» is back with a single « 0 (zero)» in it, resulting of course in the missing value of the factor created from it.

Thanks for your support,

Gérald

[cid:image001.gif at 01D08E54.6EF8F1B0]

Gerald Jean, M. Sc. en statistiques
Conseiller senior en statistiques

Actuariat corporatif,
Modélisation et Recherche
Assurance de dommages
Mouvement Desjardins


Lévis (siège social)

418 835-4900<tel:418%20835-4900>,
poste 5527639
1 877 835-4900<tel:1%20877%20835-4900>,
poste 5527639
Télécopieur : 418 835-6657<tel:418%20835-6657>






Faites bonne impression et imprimez seulement au besoin!

Ce courriel est confidentiel, peut être protégé par le secret professionnel et est adressé exclusivement au destinataire. Il est strictement interdit à toute autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. Merci.




_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org<mailto:datatable-help at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20150514/142d04ee/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 6632 bytes
Desc: image001.gif
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20150514/142d04ee/attachment-0001.gif>


More information about the datatable-help mailing list