你的位置:首页 > ASP.net教程

[ASP.net教程]第二篇:速卖通产品采集系列 之 产品采集实战


    上一篇,对速卖通产品采集做了分析,包含要采集产品信息,以及如何采集这些产品信息,这一篇接着来采集实战,相关技术前篇也说过了,不废话直接开项目做。

一, 创建解决方案,编写采集代码

1. 创建解决方案“CollectorSolution”,在其中新建“Collector” 空 ASP.NET MVC 项目,解决方案结构图如下:

2.在“Collector” 项目中,分别新增“CollectingController” 控制器,以及和控制器相关的视图,并将原来默认路由 Home -》 Index 改成 Collecting -》 Index,截图如下:

RouteConfig 修改成如下:

 1 using System.Web.Mvc; 2 using System.Web.Routing; 3  4 namespace Collector 5 { 6   public class RouteConfig 7   { 8     public static void RegisterRoutes(RouteCollection routes) 9     {10       routes.IgnoreRoute("{resource}.axd/{*pathInfo}");11 12       routes.MapRoute(13         name: "Default",14         url: "{controller}/{action}/{id}",15         defaults: new { controller = "Collecting", action = "Index", id = UrlParameter.Optional }16       );17     }18   }19 }

3. 分别新增“CollectionViewModel” ,"CollectedProductViewModel","CollectedProductImageViewModel" 视图模型,和一个存放正则表达式的结构体:“ParseProductPatterns”,代码分别如下

1.> CollectionViewModel

 1 using System.Collections.Generic; 2  3 namespace Collector.Models 4 { 5   public class CollectionViewModel 6   { 7     public CollectionViewModel() 8     { 9       ProductViews = new List<CollectedProductViewModel>();10     }11     public string CollectionUrl { get; set; }12     public IEnumerable<CollectedProductViewModel> ProductViews { get; set; }13   }14 }

2.> CollectedProductViewModel

 1 using System.Collections.Generic; 2  3 namespace Collector.Models 4 { 5   public class CollectedProductViewModel 6   { 7     public CollectedProductViewModel() 8     { 9       ProductImages = new List<CollectedProductImageViewModel>();10     }11     public string ProductName { get; set; }12     public decimal ProductPrice { get; set; }13     public decimal ProductDiscountPrice { get; set; }14     public string ProductCurrency { get; set; }15     public string ProductColor { get; set; }16     public string ProductSize { get; set; }17     public IEnumerable<CollectedProductImageViewModel> ProductImages { get; set; }18   }19 }

3.>CollectedProductImageViewModel

1 namespace Collector.Models2 {3   public class CollectedProductImageViewModel4   {5     public string ImageUrl { get; set; }6     public int Sort { get; set; }7   }8 }

4.>ParseProductPatterns

namespace Collector.Models{  public struct ParseProductPatterns  {    public static string ProductNamePattern = "(?<=<h1 class=\"product-name\" itemprop=\"name\">).*?(?=</h1>)";    public static string ProductJsnPattern = @"(?<=var skuProducts=).*?(?=;\s*var skuAttrIds=)";    public static string ProductImageJsonPattern = "(?<=window.runParams.imageBigViewURL=).*?(?=;)";    public static string ProductCurrencyPattern = "(?<=window.runParams.currencyCode=\").*?(?=\";)";    public static string ProductColorPattern =      "(?<=<a data-role=\"sku\" data-sku-id=\"{0}\" id=\"sku-1-{0}\" title=\").*?(?=\")";    public static string ProductSizePattern =      "(?<=<a data-role=\"sku\" data-sku-id=\"{0}\" id=\"sku-2-{0}\" href=\"javascript:void\\(0\\)\"\\s+><span>).*?(?=</)";  }}

基本上容易理解,我这里就不再一一讲解了。

4. 视图布局设计很简单,如下图 

采集地址 就是速卖通产品地址,这里不支持店铺和类型采集地址。表格就是采集产品信息展示。

5. 控制器和视图代码如下

1.> CollectingController

 1 using System; 2 using System.Collections.Generic; 3 using System.Linq; 4 using System.Text.RegularExpressions; 5 using System.Web.Mvc; 6 using Collector.Models; 7 using Newtonsoft.Json.Linq; 8 using RestSharp; 9  10 namespace Collector.Controllers 11 { 12   public class CollectingController : Controller 13   { 14     // GET: Collecting 15     public ActionResult Index() 16     { 17       return View(); 18     } 19  20     [HttpPost] 21     public ActionResult Index(CollectionViewModel collectionView) 22     { 23       collectionView = ColllectWithParse(collectionView); 24       return View(collectionView); 25     } 26  27     public CollectionViewModel ColllectWithParse(CollectionViewModel collectionView) 28     { 29       if (collectionView == null || string.IsNullOrEmpty(collectionView.CollectionUrl)) 30       { 31         return collectionView; 32       } 33       var client = new RestClient(collectionView.CollectionUrl); 34       var request = new RestRequest(Method.GET); 35       var response = client.Execute(request); 36       var htmlContent = response.Content; 37       collectionView.ProductViews = ParseProducts(htmlContent); 38       return collectionView; 39     } 40  41     public IEnumerable<CollectedProductViewModel> ParseProducts(string productHtmlContent) 42     { 43       var productName = RegexMatchValue(ParseProductPatterns.ProductNamePattern, productHtmlContent); 44       var productCuurency = RegexMatchValue(ParseProductPatterns.ProductCurrencyPattern, productHtmlContent); 45  46       var productJson = RegexMatchValue(ParseProductPatterns.ProductJsnPattern, productHtmlContent); 47  48       var prodctJsonArray = JArray.Parse(productJson); 49       var products = 50         prodctJsonArray.Select(pja => 51         { 52           var colorWithSizeCode = pja["skuPropIds"].ToString().Split(','); 53           var priceJson = pja["skuVal"]; 54           var skuPrice = priceJson["skuPrice"]; 55           var price = skuPrice == null ? "0" : skuPrice.ToString(); 56           var actSkuPrice = priceJson["actSkuPrice"]; 57           var discountPrice = actSkuPrice == null ? "0" : actSkuPrice.ToString(); 58           return new 59           { 60             ColorCode = colorWithSizeCode.First(), 61             SizeCode = colorWithSizeCode.Last(), 62             Price = Convert.ToDecimal(price), 63             DiscountPrice = Convert.ToDecimal(discountPrice), 64           }; 65         }).ToList(); 66  67       var collectedImages = ParseProducImages(productHtmlContent); 68  69       var collectedProducts = products.Select(p => new CollectedProductViewModel 70       { 71         ProductName = productName, 72         ProductPrice = p.Price, 73         ProductDiscountPrice = p.DiscountPrice, 74         ProductCurrency = productCuurency, 75         ProductColor = SetProductColorWithSize(ParseProductPatterns.ProductColorPattern,p.ColorCode,productHtmlContent), 76         ProductSize = SetProductColorWithSize(ParseProductPatterns.ProductSizePattern, p.SizeCode, productHtmlContent), 77         ProductImages = collectedImages 78       }).ToList(); 79       return collectedProducts; 80     } 81  82     private IEnumerable<CollectedProductImageViewModel> ParseProducImages(string productHtmlContent) 83     { 84       var imagesJson = RegexMatchValue(ParseProductPatterns.ProductImageJsonPattern, productHtmlContent); 85       var imageJsonArray = JArray.Parse(imagesJson); 86  87       var images = imageJsonArray.ToObject<List<string>>(); 88       return images.Select((t, i) => new CollectedProductImageViewModel 89       { 90         ImageUrl = t, 91         Sort = i 92       }); 93     } 94  95     private string SetProductColorWithSize(string pattern, string colorWithSizeCode,string input) 96     { 97       var newPattern = string.Format(pattern, colorWithSizeCode); 98       return RegexMatchValue(newPattern, input); 99     }100 101     private string RegexMatchValue(string pattern, string input, RegexOptions regexOptions = RegexOptions.IgnoreCase|RegexOptions.Singleline)102     {103       var regex = new Regex(pattern, regexOptions);104       var match = regex.Match(input);105       return match.Value;106     }107   }108 }

View Code

2.> Collecting->Index 

 1 @model Collector.Models.CollectionViewModel 2 <!DOCTYPE html> 3  4 <html> 5 <head> 6   <meta name="viewport" content="width=device-width" /> 7   <title></title> 8   <!-- CSS goes in the document HEAD or added to your external stylesheet --> 9   <style type="text/css">10     table.gridtable {11       font-family: verdana,arial,sans-serif;12       font-size: 11px;13       color: #333333;14       border-width: 1px;15       border-color: #666666;16       border-collapse: collapse;17     }18 19       table.gridtable th {20         border-width: 1px;21         padding: 8px;22         border-style: solid;23         border-color: #666666;24         background-color: #dedede;25       }26 27       table.gridtable td {28         border-width: 1px;29         padding: 8px;30         border-style: solid;31         border-color: #666666;32         background-color: #ffffff;33       }34   </style>35 </head>36 <body>37   <div>38     @using (Html.BeginForm("Index", "Collecting", FormMethod.Post))39     {40       <table>41         <tr>42           <td>采集地址:</td>43           <td>44             @Html.TextAreaFor(m => m.CollectionUrl, 4, 0, new { style = "width:1500px;" })45           </td>46           47         </tr>48         <tr><td colspan="2" style="text-align: right;"><input type="submit" value="开始采集" /></td></tr>49       </table>50     }51   </div>52   <div>53     <table class="gridtable">54       <thead>55         <tr>56           <th width="5%">编号</th>57           <th width="5%">图片</th>58           <th width="30%">产品名称</th>59 60           <th width="10%">产品单价</th>61           <th width="10%">产品参考单价</th>62           <th width="10%">产品币别</th>63           <th width="10%">产品颜色</th>64           <th width="10%">产品大小</th>65         </tr>66       </thead>67       <tbody>68         @{69           var i = 0;70           if (Model == null || Model.ProductViews == null)71           {72             return;73           }74         }75         @foreach (var collectedProduct in Model.ProductViews)76         {77           <tr>78             <td align="center">@{i++;}@i</td>79             <td><img src="@collectedProduct.ProductImages.FirstOrDefault().ImageUrl" width="60" height="60" /></td>80             <td>@collectedProduct.ProductName</td>81             <td>@collectedProduct.ProductDiscountPrice</td>82             <td>@collectedProduct.ProductPrice</td>83             <td>@collectedProduct.ProductCurrency</td>84             <td>@collectedProduct.ProductColor</td>85             <td>@collectedProduct.ProductSize</td>86           </tr>87         }88 89       </tbody>90 91     </table>92   </div>93 </body>94 </html>

View Code

这里要说明的是,本篇只是采集的冰山一角的例子,所有没有搞得很复杂,没有严格封装,不管是前端,还是后端,希望大家了解,还有本人不喜好在代码中加注释,在我看来代码就是注释。

二, 测试结果,将MVC项目,部署到IIS,端口号1005,走起看效果。

1. 测试上一篇速卖通产品地址:

http://www.aliexpress.com/store/product/Yoga-Tops-Women-Women-Yoga-Shirts-Womens-Sportswear-Gym-Woman-Running-Shirt-Camisetas-Deporte-Mujer-Gym/1025110_32620359354.html?spm=a2g01.8032156.template-section-container.27.wcM8ES&sdom=3514.555719.493653.0_32620359354

效果截图如下:

刚刚采集发现上一篇写的这个产品地址,速卖通不打折,因此没有了折扣价格。

2.再采集一个地址:

http://www.aliexpress.com/store/product/LEVEL-4-shock-Professional-running-intensive-training-without-rims-snow-sports-bra-open-front-zipper-style/1025110_32357688343.html?spm=2114.12010108.1000013.1.uvJqBj

截图如下

这个产品的产品变体有很多,所有一网页还显示不了。

源码码:https://github.com/haibozhou1011/Collector

总结:

好了,速卖通产品采集系列,就全部结束了,总的来说,采集这个活技术都是大家经常用的,主要是前期分析,抓产品信息规则,每个网站多有规律,大家留心观察就会找到一些蛛丝马迹,就会有所突破。希望大家如果有更好的采集方法,一定要和大家分享。