[datatable-help] Can you explain what is going on???

Gene Leynes gleynes+r at gmail.com
Fri May 22 22:51:49 CEST 2015


Sorry for the off topic question, but what is "merge.levels"?  That looks
potentially useful.


On Thu, May 14, 2015 at 1:45 PM, Gerald Jean <gerald.jean at dgag.ca> wrote:

>  Hello,
>
>
>
> thanks Frank, you were right.  I am converting roughly 2000 lines of code
> using data.frames to the data.table way, this one skipped me!!!  By the
> way, on this data set, 4750880 observations, the processing time went from
> 1hr.45 to 12.5 minutes.  If we could parallelize this it would run under a
> minute, I have 24 processors on the server where that runs.
>
>
>
> Thanks again,
>
>
>
> Gérald
>
>
>
>     *Gerald Jean, M. Sc. en statistiques*
> Conseiller senior en statistiques
>
> Actuariat corporatif,
> Modélisation et Recherche
> Assurance de dommages
> Mouvement Desjardins
>
>
> Lévis (siège social)
>
> 418 835-4900,
>
> poste 5527639
> 1 877 835-4900,
>
> poste 5527639
> Télécopieur : 418 835-6657
>
>
>
>
>
>
>
> Faites bonne impression et imprimez seulement au besoin!
>
> Ce courriel est confidentiel, peut être protégé par le secret
> professionnel et est adressé exclusivement au destinataire. Il est
> strictement interdit à toute autre personne de diffuser, distribuer ou
> reproduire ce message. Si vous l'avez reçu par erreur, veuillez
> immédiatement le détruire et aviser l'expéditeur. Merci.
>
>
>
>
>
> *De :* by.hook.or at gmail.com [mailto:by.hook.or at gmail.com] *De la part de*
> Frank Erickson
> *Envoyé :* 14 mai 2015 13:19
> *À :* Gerald Jean
> *Cc :* datatable-help at lists.r-forge.r-project.org
> *Objet :* Re: [datatable-help] Can you explain what is going on???
>
>
>
> Hi Gérald,
>
>
>
> Your question is not really data.table specific, I think. Your
>
> ttt[ttt == "0"] <- "O"
>
> does not affect the result because you overwrite with
>
> ttt <- ifelse(...
>
> immediately afterwards. Maybe you meant to have ttt on the right-hand side
> of the latter command, instead of membre.
>
>
>
> --Frank
>
>
>
>
>
> On Thu, May 14, 2015 at 10:04 AM, Gerald Jean <gerald.jean at dgag.ca> wrote:
>
> Hello,
>
>
>
> the following code is extracted from a function where roughly 150
> variables of a large data set are transformed using data.table.
>
>
>
> The variable “membre” was coming out with one missing value, in trying to
> understand why, I extracted the code from the function, added a few “cat”
> statements and ran it directly in the terminal.
>
>
>
> ttt.test.sima[, ":="  (membre = {##
>
> +       cat(" Processing: membre", sep = "\n")
>
> +       ttt <- membre
>
> +       cat(paste(" Class ttt = ", class(ttt), sep = ""), sep = "\n")
>
> +       cat(paste(" Length ttt = ", length(ttt), sep = ""), sep = "\n")
>
> +       cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep =
> "\n")
>
> +       ttt[ttt == "0"] <- "O"  ## A few capital “O” are coded as zero “0”.
>
> +       cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep =
> "\n")
>
> +       ttt <- ifelse(PROV != " QC", " OAO",
>
> +                     ifelse(membre == "", " Ma  ", membre))
>
> +       cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep =
> "\n")
>
> +       merge.levels(factor(ttt, levels = c("O", "N", " Ma  ", " OAO"),
>
> +                           labels = c(" Oui", " Non", " Ma ", " OAO")),
>
> +                    k = list(" Oui" = c(" Oui", " OAO")))})]
>
> Processing: membre
>
> Class ttt = character
>
> Length ttt = 4750880
>
> sum(ttt == 0) = 2
>
> sum(ttt == 0) = 0
>
> sum(ttt == 0) = 1
>
>
>
> I don’t understand why after the « ifelse» statement the temporary
> variable « ttt» is back with a single « 0 (zero)» in it, resulting of
> course in the missing value of the factor created from it.
>
>
>
> Thanks for your support,
>
>
>
> Gérald
>
>
>
>     *Gerald Jean, M. Sc. en statistiques*
> Conseiller senior en statistiques
>
> Actuariat corporatif,
> Modélisation et Recherche
> Assurance de dommages
> Mouvement Desjardins
>
>
> Lévis (siège social)
>
> 418 835-4900,
>
> poste 5527639
> 1 877 835-4900,
>
> poste 5527639
> Télécopieur : 418 835-6657
>
>
>
>
>
>
> Faites bonne impression et imprimez seulement au besoin!
>
> Ce courriel est confidentiel, peut être protégé par le secret
> professionnel et est adressé exclusivement au destinataire. Il est
> strictement interdit à toute autre personne de diffuser, distribuer ou
> reproduire ce message. Si vous l'avez reçu par erreur, veuillez
> immédiatement le détruire et aviser l'expéditeur. Merci.
>
>
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20150522/b7d3e634/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 6632 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20150522/b7d3e634/attachment-0001.gif>


More information about the datatable-help mailing list