文章相关度匹配的一些思路---"压缩"预料库,即提取用特征词或词频,量化后以“列向量”形式保存到数据库;按前N组词拼为向量组供查询使用,即组合为1到N字的组合,量化后以“行向量”形 ...
文章相关度匹配的一些思路---"压缩"预料库,即提取用特征词或词频,量化后以“列向量”形式保存到数据库;按前N组词拼为向量组供查询使用,即组合为1到N字的组合,量化后以“行向量”形式保存到数据库(目前是用MYSQL),计算和查询相似度的时候先提取特征,然后量化,再查询各Long型数值字段,速度应该会较一般查询要快一些。
应用举例:[这些都是推测,实际希望会有比较好的结果]
假设查询以下特征
Dictionary<get='_blank'>string, int> words = new Dictionary<string, int>(); words.Add("五笔", 1); words.Add("拼音", 1); words.Add("笔画", 1); words.Add("其它", 1); words.Add("英盘", 1); words.Add("美盘", 1); words.Add("法盘", 1); //List<Dictionary<int, long>> WordList = new List<Dictionary<int, long>>(); //for (int i = 0; i < 15; i++) //{ // WordList.Add(GetWordSecurity(words, i + 1)); //} //直观看数据 Dictionary<int, long> R1 = GetWordSecurity(words, 1); Dictionary<int, long> R2 = GetWordSecurity(words, 2); Dictionary<int, long> R3 = GetWordSecurity(words, 3); Dictionary<int, long> R4 = GetWordSecurity(words, 4); Dictionary<int, long> R5 = GetWordSecurity(words, 5); Dictionary<int, long> R6 = GetWordSecurity(words, 6); Dictionary<int, long> R7 = GetWordSecurity(words, 7); Dictionary<int, long> R8 = GetWordSecurity(words, 8); Dictionary<int, long> R9 = GetWordSecurity(words, 9); Dictionary<int, long> R10 = GetWordSecurity(words, 10); Dictionary<int, long> R11 = GetWordSecurity(words, 11); Dictionary<int, long> R12 = GetWordSecurity(words, 12); Dictionary<int, long> R13 = GetWordSecurity(words, 13); Dictionary<int, long> R14 = GetWordSecurity(words, 14);
原标题:相似度到大数据查找之Mysql 文章匹配的一些思路与提高查询速度
关键词:MYSQL
*特别声明:以上内容来自于网络收集,著作权属原作者所有,如有侵权,请联系我们:
admin#shaoqun.com
(#换成@)。