[Traminer-users] large data sets (Gerhard Wuehrer)
Matthias Studer
Matthias.Studer at unige.ch
Fri Nov 21 20:01:56 CET 2014
Hello,
The WeightedCluster library provide a way to work with unique sequences and weights them accordingly. You can have a look at the manual here: http://mephisto.unige.ch/weightedcluster/
You can find some additional comments here:
http://stackoverflow.com/questions/15929936/problem-with-big-data-during-computation-of-sequence-distances-using-tramine
Hope this helps.
Matthias
-----Message d'origine-----
De : traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] De la part de Pollien Alexandre
Envoyé : vendredi 21 novembre 2014 15:32
À : traminer-users at lists.r-forge.r-project.org
Objet : Re: [Traminer-users] large data sets (Gerhard Wuehrer)
Hello,
Unless you haven't a very specific configuration (?), such a huge dataset won't be fully supported with TraMineR in R: you'll have a "can't allocate vetor size" error" and other calculations will be very long (with much smaller files, I have waited overnight to get pseudo-ANOVA). One point that is essential is to reduce the file to "unique" sequences: create an ID that designates each different sequence and keep only one of each. Once results of sequence analysis obtained (mds, cluster), you can reassign sequence variables in all identical one. Depending on the nature of the data, this can divided the size by 10-20.
I don't think a large number of modalities (states) causes an overmemory error. But I'm not sure that the analysis will be successful. I have only experienced about 20 states, that I quickly recoded in less than 10 states because I don't manage to achieve anything good with too much modalities. You have maybe to try. The sequence analysis works very differently according the nature of data.
Alexandre
-----Message d'origine-----
De : traminer-users-bounces at lists.r-forge.r-project.org [mailto:traminer-users-bounces at lists.r-forge.r-project.org] De la part de traminer-users-request at lists.r-forge.r-project.org
Envoyé : vendredi 21 novembre 2014 12:00 À : traminer-users at lists.r-forge.r-project.org
Objet : Traminer-users Digest, Vol 41, Issue 2
Send Traminer-users mailing list submissions to
traminer-users at lists.r-forge.r-project.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
or, via email, send a message with subject or body 'help' to
traminer-users-request at lists.r-forge.r-project.org
You can reach the person managing the list at
traminer-users-owner at lists.r-forge.r-project.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Traminer-users digest..."
Today's Topics:
1. large data sets (Gerhard Wuehrer)
----------------------------------------------------------------------
Message: 1
Date: Fri, 21 Nov 2014 09:01:38 +0100
From: "Gerhard Wuehrer" <Gerhard.Wuehrer at jku.at>
To: <traminer-users at lists.r-forge.r-project.org>
Subject: [Traminer-users] large data sets
Message-ID: <546EFF7202000083000AE6A1 at gwia.im.jku.at>
Content-Type: text/plain; charset="utf-8"
** Proprietary **
** Reply Requested When Convenient **
Hi TraMineRs,
what is your experience with large datasets? I mean n = 45.000 and the states possible as sequences are m = 184, the sequence intervall varies between 1 to 66. States can repeat several times.
Thank you for your opinion and advice.
Best regards - Gerhard Wuehrer
o. Univ.-Prof. Dkfm. Dr. Gerhard A. W?hrer Institut f?r Handel, Absatz und Marketing Johannes Kepler Universit?t Linz Altenberger Str. 69
4040 Linz/Austria
tel.: 004373224689401
fax.:004373224689404
mail: gerhard.wuehrer at jku.at
URL: www.marketing.jku.at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20141121/93e9f0e7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Wuehrer, Gerhard.vcf
Type: application/octet-stream
Size: 334 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20141121/93e9f0e7/attachment-0001.obj>
------------------------------
_______________________________________________
Traminer-users mailing list
Traminer-users at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
End of Traminer-users Digest, Vol 41, Issue 2
*********************************************
_______________________________________________
Traminer-users mailing list
Traminer-users at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users
More information about the Traminer-users
mailing list