你的位置:首页 > ASP.net教程

[ASP.net教程]相似度到大数据查找之Mysql 文章匹配的一些思路与提高查询速度


 文章相关度匹配的一些思路---"压缩"预料库,即提取用特征词或词频,量化后以“列向量”形式保存到数据库;按前N组词拼为向量组供查询使用,即组合为1到N字的组合,量化后以“行向量”形式保存到数据库(目前是用MYSQL),计算和查询相似度的时候先提取特征,然后量化,再查询各Long型数值字段,速度应该会较一般查询要快一些。

应用举例:[这些都是推测,实际希望会有比较好的结果]

        假设查询以下特征

      Dictionary<string, int> words = new Dictionary<string, int>();      words.Add("五笔", 1);      words.Add("拼音", 1);      words.Add("笔画", 1);      words.Add("其它", 1);      words.Add("英盘", 1);      words.Add("美盘", 1);      words.Add("法盘", 1);      //List<Dictionary<int, long>> WordList = new List<Dictionary<int, long>>();      //for (int i = 0; i < 15; i++)      //{      //  WordList.Add(GetWordSecurity(words, i + 1));        //}      //直观看数据      Dictionary<int, long> R1 = GetWordSecurity(words, 1);      Dictionary<int, long> R2 = GetWordSecurity(words, 2);      Dictionary<int, long> R3 = GetWordSecurity(words, 3);      Dictionary<int, long> R4 = GetWordSecurity(words, 4);      Dictionary<int, long> R5 = GetWordSecurity(words, 5);      Dictionary<int, long> R6 = GetWordSecurity(words, 6);      Dictionary<int, long> R7 = GetWordSecurity(words, 7);      Dictionary<int, long> R8 = GetWordSecurity(words, 8);      Dictionary<int, long> R9 = GetWordSecurity(words, 9);      Dictionary<int, long> R10 = GetWordSecurity(words, 10);      Dictionary<int, long> R11 = GetWordSecurity(words, 11);      Dictionary<int, long> R12 = GetWordSecurity(words, 12);      Dictionary<int, long> R13 = GetWordSecurity(words, 13);      Dictionary<int, long> R14 = GetWordSecurity(words, 14);

 

 量化数据,可以任选一种方式处理

五笔 -8683246507546018072拼音 5720075168044685354笔画 6444854990336207024其它 -4797408270696495584英盘 -1741849883950345011美盘 4116094244106799890法盘 5071717547464226258

 


      查询以下数值:

  二字词  Dictionary<int, long> R1 = GetWordSecurity(words, 1);+ [0] {[1, -2963171339501332718]} System.Collections.Generic.KeyValuePair<int,long>+ [1] {[2, -2238391517209811048]} System.Collections.Generic.KeyValuePair<int,long>+ [2] {[3, 4966089295467037960]} System.Collections.Generic.KeyValuePair<int,long>+ [3] {[4, -6281813915328659238]} System.Collections.Generic.KeyValuePair<int,long>+ [4] {[5, 922666897348189770]} System.Collections.Generic.KeyValuePair<int,long>+ [5] {[6, 3978225284094340343]} System.Collections.Generic.KeyValuePair<int,long>+ [6] {[7, -8610574661558066372]} System.Collections.Generic.KeyValuePair<int,long>Dictionary<int, long> R2 = GetWordSecurity(words, 2);