Step 1:
create file “my-friend” with some rows like
1,abc,1892
2,xyz,1894
3,apple,1896
4,orange,1892
Step 2: upload to HDFS file browser
Step3: Grunt>
log = LOAD ‘/user/cloudera/my-friend’ AS (sno,name,dob);
Step 4: grunt> A = FOREACH log generate sno,name,dob;
Step 5: describe A;
Step 6: B = GROUP A BY dob;
Step 7: DUMP B;
Result:
(1892,{(4,orange,1892),(1,abc,1892)})
(1894,{(2,xyz,1894)})
(1896,{(3,apple,1896)})
Step 8: cntd = FOREACH B GENERATE group,COUNT(A) as
cnt;
Step 9: STORE
B INTO ‘ouput’;
No comments:
Post a Comment