<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Texte de bulles Car";
margin:0cm;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.TextedebullesCar
{mso-style-name:"Texte de bulles Car";
mso-style-priority:99;
mso-style-link:"Texte de bulles";
font-family:"Tahoma","sans-serif";
mso-fareast-language:FR-CA;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="FR-CA" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks again Nicolas,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">in my case, for the time being, the load process has no options!!! The data is supplied to us by an outside firm, there is one directory per device
and one file per day of usage, the files are zipped, there is 10-12 millions of them. I read them using “fread” this way:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">fread(input = sprintf("zcat %s", x), …)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">works fine, much faster than using read.table.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">All I have left to do before running in parallel on the whole data directories is to find a way to efficiently outputting the resulting aggregated
data sets.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I thought about SQLite, but was told that it puts a lock on the DB when a process is writing to it and, apparently Postgresql handles that more
efficiently. But SQLite has the advantage of being embedded in RSQLite, hence not requiring admin intervention, can’t have everything it seems!!!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks again and cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Gérald<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<table class="MsoNormalTable" border="0" cellpadding="0" width="640" style="width:480.0pt">
<tbody>
<tr>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><img width="136" height="54" id="_x0000_i1027" src="cid:image001.gif@01D0660F.CA65B4B0"></span><span style="font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt"></td>
<td style="padding:.75pt .75pt .75pt .75pt"></td>
</tr>
<tr>
<td width="300" valign="top" style="width:225.0pt;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"><b><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black">Gerald Jean, M. Sc. en statistiques</span></b><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black"><br>
Conseiller senior en statistiques<br>
<br>
Actuariat corporatif,<br>
Modélisation et Recherche<br>
Assurance de dommages<br>
Mouvement Desjardins</span><span style="font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
</td>
<td width="170" valign="top" style="width:127.5pt;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black"><br>
Lévis (siège social)<br>
<br>
418 835-4900,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black">poste 5527639<br>
1 877 835-4900, <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black">poste 5527639<br>
Télécopieur : 418 835-6657</span><span style="font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
</td>
<td width="170" valign="top" style="width:127.5pt;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black"><br>
<br>
<br>
<br>
</span><span style="font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D;display:none"><o:p> </o:p></span></p>
<table class="MsoNormalTable" border="0" cellpadding="0" width="640" style="width:480.0pt">
<tbody>
<tr>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"><span style="font-size:7.0pt;font-family:"Verdana","sans-serif";color:black">Faites bonne impression et imprimez seulement au besoin!<br>
<br>
</span><span style="font-size:7.0pt;font-family:"Verdana","sans-serif";color:dimgray">Ce courriel est confidentiel, peut être protégé par le secret professionnel et est adressé exclusivement au destinataire. Il est strictement interdit à toute autre personne
de diffuser, distribuer ou reproduire ce message. Si vous l'avez reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. Merci.</span><span style="font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="FR" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">De :</span></b><span lang="FR" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Nicolas Paris [mailto:niparisco@gmail.com]
<br>
<b>Envoyé :</b> 24 mars 2015 08:35<br>
<b>À :</b> Gerald Jean<br>
<b>Cc :</b> datatable-help@lists.r-forge.r-project.org<br>
<b>Objet :</b> Re: [datatable-help] Best way to export Huge data sets.<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<blockquote style="margin-left:30.0pt;margin-right:0cm">
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Cambria Math","serif";color:#003333"></span><span style="font-size:11.0pt;font-family:"Tahoma","sans-serif";color:#003333"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">thanks for your suggestions. The data.table can’t be transformed in a matrix as it is of mixed types : POSIX.ct columns, character, logical, factor and numeric
columns.</span><span style="font-family:"Tahoma","sans-serif";color:#003333"><o:p></o:p></span></p>
</div>
</blockquote>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Cambria Math","serif";color:#003333"></span><span style="font-size:11.0pt;font-family:"Tahoma","sans-serif";color:#003333">What about casting all as character ? CSV does not make difference between
types as quote is disabled in the config I proposed.<o:p></o:p></span></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Tahoma","sans-serif";color:#003333"><o:p> </o:p></span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Tahoma","sans-serif";color:#003333">About postgresql, I use it, and the faster way to load data is to use the COPY statement. I load 7GB of data in 5 min, but...<o:p></o:p></span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Tahoma","sans-serif";color:#003333">COPY uses a csv as source. A "binary" file can be used too, but I have never tried.<o:p></o:p></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Cambria Math","serif";color:#003333"></span><span style="font-family:"Tahoma","sans-serif";color:#003333">The package RSqlite could help too, Some use this instead of CSV writing. Never tried too.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Cambria Math","serif";color:#003333"></span><span style="font-family:"Tahoma","sans-serif";color:#003333"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Tahoma","sans-serif";color:#003333"><o:p> </o:p></span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Tahoma","sans-serif";color:#003333"> </span><span style="font-size:11.0pt;font-family:"Cambria Math","serif";color:#003333"></span><span style="font-size:11.0pt;font-family:"Tahoma","sans-serif";color:#003333"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><span style="font-family:"Cambria Math","serif";color:#003333"></span><span style="font-family:"Tahoma","sans-serif";color:#003333"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">2015-03-24 13:03 GMT+01:00 Gerald Jean <<a href="mailto:gerald.jean@dgag.ca" target="_blank">gerald.jean@dgag.ca</a>>:<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hello Nicolas,</span><o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-family:"Cambria Math","serif";color:#003333"></span><span style="font-family:"Tahoma","sans-serif";color:#003333"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-family:"Cambria Math","serif";color:#003333"></span><span style="font-family:"Tahoma","sans-serif";color:#003333"><o:p></o:p></span></p>
</div>
<p class="MsoNormal">thanks for your suggestions. The data.table can’t be transformed in a matrix as it is of mixed types : POSIX.ct columns, character, logical, factor and numeric columns.<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Admin is currently installing PostgreSQL on the server, I’ll try to go that route.
Too bad data.table doesn’t have, yet, a writing routine as fast as “fread” is for reading!!!</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Gérald</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<table class="MsoNormalTable" border="0" cellpadding="0" width="640" style="width:480.0pt">
<tbody>
<tr>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><img border="0" width="136" height="54" id="_x0000_i1025" src="cid:image001.gif@01D0660F.CA65B4B0"></span><o:p></o:p></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt"></td>
<td style="padding:.75pt .75pt .75pt .75pt"></td>
</tr>
<tr>
<td width="300" valign="top" style="width:225.0pt;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black">Gerald Jean, M. Sc. en statistiques</span></b><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black"><br>
Conseiller senior en statistiques<br>
<br>
Actuariat corporatif,<br>
Modélisation et Recherche<br>
Assurance de dommages<br>
Mouvement Desjardins</span><o:p></o:p></p>
</td>
<td width="170" valign="top" style="width:127.5pt;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black"><br>
Lévis (siège social)<br>
<br>
<a href="tel:418%20835-4900" target="_blank">418 835-4900</a>,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black">poste 5527639<br>
<a href="tel:1%20877%20835-4900" target="_blank">1 877 835-4900</a>, </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black">poste 5527639<br>
Télécopieur : <a href="tel:418%20835-6657" target="_blank">418 835-6657</a></span><o:p></o:p></p>
</td>
<td width="170" valign="top" style="width:127.5pt;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black"><br>
<br>
<br>
</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<table class="MsoNormalTable" border="0" cellpadding="0" width="640" style="width:480.0pt">
<tbody>
<tr>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:7.0pt;font-family:"Verdana","sans-serif";color:black">Faites bonne impression et imprimez seulement au besoin!<br>
<br>
</span><span style="font-size:7.0pt;font-family:"Verdana","sans-serif";color:dimgray">Ce courriel est confidentiel, peut être protégé par le secret professionnel et est adressé exclusivement au destinataire. Il est strictement interdit à toute autre personne
de diffuser, distribuer ou reproduire ce message. Si vous l'avez reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. Merci.</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span lang="FR" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">De :</span></b><span lang="FR" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Nicolas
Paris [mailto:<a href="mailto:niparisco@gmail.com" target="_blank">niparisco@gmail.com</a>]
<br>
<b>Envoyé :</b> 23 mars 2015 17:51<br>
<b>À :</b> Gerald Jean<br>
<b>Cc :</b> <a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">
datatable-help@lists.r-forge.r-project.org</a><br>
<b>Objet :</b> Re: [datatable-help] Best way to export Huge data sets.</span><o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333">Some test you can do without many code change :</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333">1) transform your data.table as matrix before write</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333">2) use write table + this config to save place& time</span><o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333">->sep = Pipe (1byte& rarely used)</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333">->disable quote (saves "" more than bilion times)</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333">->latin1 instead of utf-8</span><o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333">3) use chunks (say cut in slice output) and append=T (this may work in parallel)</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333">If still too long, try installing some database (sqlite) on your 24 core system, and try load it</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-family:"Tahoma","sans-serif";color:#003333">hope this helps</span><o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">2015-03-23 14:49 GMT+01:00 Gerald Jean <<a href="mailto:gerald.jean@dgag.ca" target="_blank">gerald.jean@dgag.ca</a>>:<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hello,<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">I am currently on a project where I have to read, process, aggregate 10 to 12 millions of files for roughly 10 billions lines of data.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">The files are arranged in roughly 64000 directories, each directory is one client’s data.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">I have written code importing and “massaging” the data per directory. The code is data.table driven. I am running this on a 24 cores machine with 145 Gb of
RAM on a Linux box under RedHat.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">For testing purpose I have parallelized the code, using the doMC package, runs fine and it seems to be fast. But I haven’t tried to output the resulting files,
three per client. A small one, a moderate size one and a large one, over 500Gb estimated.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">My question:</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">what is the best way to output those files without creating bottlenecks??</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">I thought of breaking the list of input directories into 24 threads, supplying a list of lists to “foreach” where one of the components of each sub-list would
be the name of the output files but I am worried that “write.table” would take for ever to write this data to disk, one solution would be to use “save” and keep the output data in Rdata format, but that complicates further analysis by other software.</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Any suggestions???</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">By the way “data.table” sure helped so far in processing that data, thanks to the developpers for such an efficient package,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Gérald</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<table class="MsoNormalTable" border="0" cellpadding="0" width="640" style="width:480.0pt">
<tbody>
<tr>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><img border="0" width="136" height="54" id="_x0000_i1026" src="cid:image001.gif@01D0660F.CA65B4B0"><o:p></o:p></p>
</td>
<td style="padding:.75pt .75pt .75pt .75pt"></td>
<td style="padding:.75pt .75pt .75pt .75pt"></td>
</tr>
<tr>
<td width="300" valign="top" style="width:225.0pt;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black">Gerald Jean, M. Sc. en statistiques</span></b><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black"><br>
Conseiller senior en statistiques<br>
<br>
Actuariat corporatif,<br>
Modélisation et Recherche<br>
Assurance de dommages<br>
Mouvement Desjardins</span><o:p></o:p></p>
</td>
<td width="170" valign="top" style="width:127.5pt;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black"><br>
Lévis (siège social)<br>
<br>
<a href="tel:418%20835-4900" target="_blank">418 835-4900</a>,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black">poste 5527639<br>
<a href="tel:1%20877%20835-4900" target="_blank">1 877 835-4900</a>, </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black">poste 5527639<br>
Télécopieur : <a href="tel:418%20835-6657" target="_blank">418 835-6657</a></span><o:p></o:p></p>
</td>
<td width="170" valign="top" style="width:127.5pt;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt"><span style="font-size:8.0pt;font-family:"Verdana","sans-serif";color:black"><br>
<br>
</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<table class="MsoNormalTable" border="0" cellpadding="0" width="640" style="width:480.0pt">
<tbody>
<tr>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="font-size:7.0pt;font-family:"Verdana","sans-serif";color:black">Faites bonne impression et imprimez seulement au besoin!<br>
<br>
</span><span style="font-size:7.0pt;font-family:"Verdana","sans-serif";color:dimgray">Ce courriel est confidentiel, peut être protégé par le secret professionnel et est adressé exclusivement au destinataire. Il est strictement interdit à toute autre personne
de diffuser, distribuer ou reproduire ce message. Si vous l'avez reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. Merci.</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><br>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</div>
</div>
</body>
</html>