[datatable-help] Analysis of categorical variables with different levels

R_Exp64 deepaksharma64 at gmail.com
Sun Apr 8 07:15:46 CEST 2018


I have been given a dataframe with 50+ fields. My job is to find the fields
which are dependent and independent of each other. Most of the fields are
categorical variable and have different levels. For example - below table
shows different levels for each field Sex:2, Color:4, Geography:3, ID:8.
Now, how can I do regression/correlation on them ? Clearly, we see that all
Males have Red color and their Geography is North. What analysis I need to
do to get these kind of insights? I tried few things but could not figure
out the right way. 

Idea is to develop a system where few of the fields are automatically filled
based on the information entered in other fields, like if we enter Sex as
Male,Geography is automatically North. Thanks!

**Sex**	   **Color**	**Geography**	 **ID**
 Male	      Red	              North	           100
 Male	      Red	              North	           200
 Male	      Red	              North	           300
 Male              Red                   North             400
 Male	      Blue	              North	           500
 Female	      Green	              South	           600
 Female	      Green		     
 Female	    	                      East	           800
 Female	      Yellow	              North	           900




--
Sent from: http://r.789695.n4.nabble.com/datatable-help-f2315188.html


More information about the datatable-help mailing list