Thursday 10 January 2019

PIG more functions



Step 1:  create file “my-friend” with some rows like
1,abc,1892
2,xyz,1894
3,apple,1896
4,orange,1892
Step 2: upload to HDFS file browser
Step3:  Grunt> log = LOAD ‘/user/cloudera/my-friend’ AS (sno,name,dob);
Step 4: grunt> A = FOREACH  log generate sno,name,dob;
Step 5: describe A;
Step 6: B = GROUP A BY dob;
Step 7: DUMP B;
Result:
 (1892,{(4,orange,1892),(1,abc,1892)})
(1894,{(2,xyz,1894)})
(1896,{(3,apple,1896)})
Step 8: cntd = FOREACH B GENERATE group,COUNT(A) as cnt;
Step 9:  STORE B  INTO ‘ouput’;

No comments:

Post a Comment