<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix"><br>
      Hi Matt,<br>
      <br>
      Great.  If you can prepare some dummy data with the appropriate
      properties and a parameter or two to scale up the size (or just
      provide an online large example to download) and a query that gets
      to the right answer but is slow or ugly,   then we've got
      something to chew on ...<br>
      <br>
      Matt<br>
      <br>
      On 28/06/14 10:55, Matthew DeAngelis wrote:<br>
    </div>
    <blockquote
cite="mid:CAMjp+0cc3d9r3mSnM5DRfd9ygTGQgDe2wdbWXOHc2rH-JR617Q@mail.gmail.com"
      type="cite">
      <div dir="ltr">Hi Matt,
        <div><br>
        </div>
        <div>You have the right of it. The problem is somewhat
          complicated, however, since I would want to substitute
          "DT[word=="good"..." with "DT[J("good")..." after setting the
          key to word and reordering the rows. Hence the two-step
          process I have now where I key by document and position first,
          create the lag_word column, key by the word and lag_word
          columns and query by row.</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>Matt</div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Fri, Jun 27, 2014 at 3:17 PM, Matt
          Dowle <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:mdowle@mdowle.plus.com" target="_blank">mdowle@mdowle.plus.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000">
              <div><br>
                Hi,<br>
                <br>
                Not sure exactly what you need but looks interesting.<br>
                <br>
                Something a bit like this ?<br>
                <br>
                DT[ word == "good", .SD[ lag(word, N) != "not" ], 
                by=document]<br>
                <br>
                Your idea being you don't want to have to repeat all the
                pre and post words alongside each word but rather
                express it in the query. Makes sense.   Leads to
                classifying "not good" and "not very good" as both
                negative phrases I guess.<br>
                <br>
                Matt
                <div>
                  <div class="h5"><br>
                    <br>
                    <br>
                    On 26/06/14 21:56, Matthew DeAngelis wrote:<br>
                  </div>
                </div>
              </div>
              <blockquote type="cite">
                <div>
                  <div class="h5">
                    <div dir="ltr">Hello data.table gurus,
                      <div><br>
                      </div>
                      <div>I have been using data.table to efficiently
                        work with textual data and I love it for that
                        purpose. I have transformed my data so that it
                        looks something like this:</div>
                      <div><br>
                      </div>
                      <div>
                        <table dir="ltr"
                          style="table-layout:fixed;font-size:13px;font-family:arial,sans,sans-serif;border-collapse:collapse;border:1px
                          solid rgb(204,204,204)" cellpadding="0"
                          cellspacing="0" border="1">
                          <colgroup><col width="100"><col width="100"><col
                              width="100"></colgroup><tbody>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">word</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">document</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">position</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">I</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                1</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">have</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                2</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">transformed</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                3</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">my</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                4</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">data</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                5</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">so</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                1</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">that</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                2</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">it</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                3</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">looks</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                4</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">something</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                5</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">like</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                6</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">this</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                7</td>
                            </tr>
                          </tbody>
                        </table>
                        <br>
                      </div>
                      <div>(I actually use a unique number for each
                        word, so that I am able to use data.table's
                        excellent features to do lightning-fast word
                        counts. This has revolutionized my workflow over
                        looping through text files with Perl.)</div>
                      <div><br>
                      </div>
                      <div>My problem is that I sometimes need to search
                        for phrases or to select words based on their
                        context (for instance, I may want to exclude a
                        word if it is preceded by "not" or followed by a
                        word that changes its meaning). Currently, I am
                        using the solution <a moz-do-not-send="true"
href="http://stackoverflow.com/questions/11397771/r-data-table-grouping-for-lagged-regression"
                          target="_blank">here</a> to create a new
                        column for a word in another position, like
                        this:</div>
                      <div><br>
                      </div>
                      <div>
                        <table dir="ltr"
                          style="table-layout:fixed;font-size:13px;font-family:arial,sans,sans-serif;border-collapse:collapse;border:1px
                          solid rgb(204,204,204)" cellpadding="0"
                          cellspacing="0" border="1">
                          <colgroup><col width="100"><col width="100"><col
                              width="100"><col width="100"></colgroup><tbody>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">word</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">document</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">position</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">lead_word</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">I</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">have</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">have</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">transformed</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">transformed</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">3</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom"> my</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">my</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                4</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">data</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">data</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">5</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">NA</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">so</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">1</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">that</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">that</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom"> it</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">it</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                3</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">looks</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">looks</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">4</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">something</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom"> something</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">5</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">like</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">like</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                6</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">this</td>
                            </tr>
                            <tr style="height:21px">
                              <td style="padding:2px
                                3px;vertical-align:bottom">this</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">
                                2</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom;text-align:center">7</td>
                              <td style="padding:2px
                                3px;vertical-align:bottom">NA</td>
                            </tr>
                          </tbody>
                        </table>
                        <br>
                        using a command like:
                        DT[,lead_word:=DT[list(document,position+1),word].<br>
                        <br>
                      </div>
                      <div>This approach has two problems, however.
                        First, it consumes more resources as the dataset
                        grows. I am currently working with a file
                        containing over 150 million rows, so adding a
                        column is costly. Second, I may want to check
                        both one and two words ahead, so that I have to
                        add two columns, and this can quickly get out of
                        hand.</div>
                      <div><br>
                      </div>
                      <div>Is there a better way to use data.table to
                        check the value in a row N distance from the row
                        of interest within a group and select a row
                        based on that value? Perhaps the .I variable
                        could be useful here?</div>
                      <div><br>
                      </div>
                      <div>I appreciate any suggestions.</div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                      <div>Regards,</div>
                      <div>Matt</div>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                  </div>
                </div>
                <pre>_______________________________________________
datatable-help mailing list
<a moz-do-not-send="true" href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a>
<a moz-do-not-send="true" href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></pre>
              </blockquote>
              <br>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>