你的位置:首页 > 数据库

[数据库]基于物品的协同过滤(二)


MapReduce实现基于物品的协同过滤:

 

实现过程中需要执行多个mapreduce任务。

初始数据:

u1,i101,5.0u1,i102,3.0u1,i103,2.5u2,i101,2.0u2,i102,2.5u2,i103,5.0u2,i104,2.0u3,i101,2.0u3,i104,4.0u3,i105,4.5u3,i107,5.0u4,i101,5.0u4,i103,3.0u4,i104,4.5u4,i106,4.0u5,i101,4.0u5,i102,3.0u5,i103,2.0u5,i104,4.0u5,i105,3.5u5,i106,4.0

 

job1: 生成用户对物品喜爱度矩阵

数据:初始数据

map:

key=userid

value=item:grade

reduce:

key=userid

value=item:grade,item:grade

结果:

u1 i101:5.0,i102:3.0,i103:2.5u2 i101:2.0,i102:2.5,i103:5.0,i104:2.0u3 i107:5.0,i105:4.5,i104:4.0,i101:2.0u4 i106:4.0,i103:3.0,i101:5.0,i104:4.5u5 i104:4.0,i105:3.5,i106:4.0,i101:4.0,i102:3.0,i103:2.0

 

job2: 生成物品与物品的同现矩阵

数据:job1的结果数据

map:

例,将i101 ,i102 ,i103 循环组合

key=item:item

value=1

reduce:

key=item:item

value=n

结果:

i101:i104 4i101:i105 2i101:i106 2i101:i107 1i102:i101 3i102:i102 3i102:i103 3i102:i104 2i102:i105 1i102:i106 1i103:i101 4i103:i102 3i103:i103 4i103:i104 3i103:i105 1i103:i106 2i104:i101 4i104:i102 2i104:i103 3i104:i104 4i104:i105 2i104:i106 2i104:i107 1i105:i101 2i105:i102 1i105:i103 1i105:i104 2i105:i105 2i105:i106 1i105:i107 1i106:i101 2i106:i102 1i106:i103 2i106:i104 2i106:i105 1i106:i106 2i107:i101 1i107:i104 1i107:i105 1i107:i107 1

 

job3:将同现矩阵和用户喜爱度矩阵进行相乘

数据:job1和job2的输出数据

map:

区分不同的数据进行处理,根据文件目录进行区分

FileSplit split = (FileSplit)context.getInputSplit();
dirName = split.getPath().getParent().getName();

job1的数据经过map处理:

i101 B:u1,5.0
i102 B:u1,3.0
i103 B:u1,2.5

job2的数据经过map处理:

i101 A:i101,5

key=item

value=B:u1,5.0或A:i101,5

reduce:

针对同一个item的数据,A的数据,分别和B的数据进行相乘

key=user

value=item,score

u2      i105,4.0

结果:

u2 i105,4.0u1 i105,14.0u4 i105,24.0u3 i105,28.0u5 i105,36.0u2 i104,44.0u1 i104,64.0u4 i104,84.0u3 i104,92.0u5 i104,108.0u2 i107,110.0u1 i107,115.0u4 i107,120.0u3 i107,122.0u5 i107,126.0u2 i106,130.0u1 i106,140.0u4 i106,150.0u3 i106,154.0u5 i106,162.0u2 i101,172.0u1 i101,197.0u4 i101,222.0u3 i101,232.0u5 i101,252.0u2 i103,260.0u1 i103,280.0u4 i103,300.0u3 i103,308.0u5 i103,324.0u2 i102,330.0u1 i102,345.0u4 i102,360.0u3 i102,366.0u5 i102,378.0u2 i105,2.5u1 i105,5.5u5 i105,8.5u2 i104,13.5u1 i104,19.5u5 i104,25.5u2 i106,28.0u1 i106,31.0u5 i106,34.0u2 i101,41.5u1 i101,50.5u5 i101,59.5u2 i103,67.0u1 i103,76.0u5 i103,85.0u2 i102,92.5u1 i102,101.5u5 i102,110.5u2 i105,5.0u1 i105,7.5u4 i105,10.5u5 i105,12.5u2 i104,27.5u1 i104,35.0u4 i104,44.0u5 i104,50.0u2 i106,60.0u1 i106,65.0u4 i106,71.0u5 i106,75.0u2 i101,95.0u1 i101,105.0u4 i101,117.0u5 i101,125.0u2 i103,145.0u1 i103,155.0u4 i103,167.0u5 i103,175.0u2 i102,190.0u1 i102,197.5u4 i102,206.5u5 i102,212.5u2 i105,4.0u4 i105,13.0u3 i105,21.0u5 i105,29.0u2 i104,37.0u4 i104,55.0u3 i104,71.0u5 i104,87.0u2 i107,89.0u4 i107,93.5u3 i107,97.5u5 i107,101.5u2 i106,105.5u4 i106,114.5u3 i106,122.5u5 i106,130.5u2 i101,138.5u4 i101,156.5u3 i101,172.5u5 i101,188.5u2 i103,194.5u4 i103,208.0u3 i103,220.0u5 i103,232.0u2 i102,236.0u4 i102,245.0u3 i102,253.0u5 i102,261.0u3 i105,9.0u5 i105,16.0u3 i104,25.0u5 i104,32.0u3 i107,36.5u5 i107,40.0u3 i106,44.5u5 i106,48.0u3 i101,57.0u5 i101,64.0u3 i103,68.5u5 i103,72.0u3 i102,76.5u5 i102,80.0u4 i105,4.0u5 i105,8.0u4 i104,16.0u5 i104,24.0u4 i106,32.0u5 i106,40.0u4 i101,48.0u5 i101,56.0u4 i103,64.0u5 i103,72.0u4 i102,76.0u5 i102,80.0u3 i105,5.0u3 i104,10.0u3 i107,15.0u3 i101,20.0

 

Job4: 矩阵乘法求和

map:

不做特殊处理

key:user

value:item,score

reduce:

将相同的user及item的score的值进行相加。

key:user  

value:item,score

结果:

u1 i105:15.5u1 i104:33.5u1 i107:5.0u1 i106:18.0u1 i101:44.0u1 i103:39.0u1 i102:31.5u2 i105:15.5u2 i104:36.0u2 i107:4.0u2 i106:20.5u2 i101:45.5u2 i103:41.5u2 i102:32.5u3 i105:26.0u3 i104:38.0u3 i107:15.5u3 i106:16.5u3 i101:40.0u3 i103:24.5u3 i102:18.5u4 i105:26.0u4 i104:55.0u4 i107:9.5u4 i106:33.0u4 i101:63.0u4 i103:53.5u4 i102:37.0u5 i105:32.0u5 i104:59.0u5 i107:11.5u5 i106:34.5u5 i101:68.0u5 i103:56.5u5 i102:42.5

此结果为用户对各个物品的喜爱度。