你的位置:首页 > 操作系统

[操作系统]【推荐】oc解析HTML数据的类库(爬取网页数据)


  TFhpple是一个用于解析html数据的第三方库,本人感觉功能还算可以,只不过在使用前必须配置项目。

  

  配置

1.导入lib

 

2.设置编译路径

   使用

这里使用一个例子来说明

http://so.gushiwen.org/guwen/book_2.aspx

 

 1.创建TFHpple对象,data为网站返回的数据

TFHpple *htmlParser = [[TFHpple alloc] initWithHTMLData:data];

 

 2.使用searchWithXPathQuery方法得到有用数据,XPATH知识具体百度

NSArray *temp1 = [htmlParser searchWithXPathQuery:@"//div[@class='shileft']/div[@class='bookcont']"]

这样我们获取了论语的数据

 

3。获取并分析元素

TFHppleElement *element = [elements objectAtIndex:i];

 TFHppleElement对象包含许多属性,下面简单介绍一下各属性

1。

@property (nonatomic, copy, readonly) NSString *raw

 raw是包含html标记的网页数据

<div class="bookcont">&#13;    <ul>&#13;     &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_19.aspx">学而篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_20.aspx">为政篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_21.aspx">八佾篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_22.aspx">里仁篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_23.aspx">公冶长篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_24.aspx">雍也篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_25.aspx">述而篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_26.aspx">泰伯篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_27.aspx">子罕篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_28.aspx">乡党篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_29.aspx">先进篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_30.aspx">颜渊篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_31.aspx">子路篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_32.aspx">宪问篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_33.aspx">卫灵公篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_34.aspx">季氏篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_35.aspx">阳货篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_36.aspx">微子篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_37.aspx">子张篇</a></span>&#13;        &#13;       <span><a href="http://www.cnblogs.com//guwen/bookv_38.aspx">尧曰篇</a></span>&#13;       &#13;    </ul>&#13;    </div>

raw数据

 

 

2.content是网页的具体数据,不包含html标记

学而篇               为政篇               八佾篇               里仁篇               公冶长篇               雍也篇               述而篇               泰伯篇               子罕篇               乡党篇               先进篇               颜渊篇               子路篇               宪问篇               卫灵公篇               季氏篇               阳货篇               微子篇               子张篇               尧曰篇

content数据

 

 

3.tagName是html标签

输出只有div

 

4.attributes,属性。。。。。。。

class = bookcont;

 

 

5.children子节点

(  "{\n  nodeContent = \"\\n    \";\n  nodeName = text;\n}",  "{\n  nodeChildArray =   (\n        {\n      nodeContent = \"\\n     \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_19.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_20.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_21.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_22.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_23.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_24.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_25.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_26.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_27.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_28.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_29.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_30.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_31.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_32.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_33.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_34.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_35.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_36.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_37.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n        \\n       \";\n      nodeName = text;\n    },\n        {\n      nodeChildArray =       (\n                {\n          nodeAttributeArray =           (\n                        {\n              attributeName = href;\n              nodeContent = \"/guwen/bookv_38.aspx\";\n            }\n          );\n          nodeChildArray =           (\n                        {\n              nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n              nodeName = text;\n            }\n          );\n          nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n          nodeName = a;\n          raw = \"<a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a>\";\n        }\n      );\n      nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n      nodeName = span;\n      raw = \"<span><a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a></span>\";\n    },\n        {\n      nodeContent = \"\\n       \\n    \";\n      nodeName = text;\n    }\n  );\n  nodeContent = \"\\n     \\n       \\U5b66\\U800c\\U7bc7\\n        \\n       \\U4e3a\\U653f\\U7bc7\\n        \\n       \\U516b\\U4f7e\\U7bc7\\n        \\n       \\U91cc\\U4ec1\\U7bc7\\n        \\n       \\U516c\\U51b6\\U957f\\U7bc7\\n        \\n       \\U96cd\\U4e5f\\U7bc7\\n        \\n       \\U8ff0\\U800c\\U7bc7\\n        \\n       \\U6cf0\\U4f2f\\U7bc7\\n        \\n       \\U5b50\\U7f55\\U7bc7\\n        \\n       \\U4e61\\U515a\\U7bc7\\n        \\n       \\U5148\\U8fdb\\U7bc7\\n        \\n       \\U989c\\U6e0a\\U7bc7\\n        \\n       \\U5b50\\U8def\\U7bc7\\n        \\n       \\U5baa\\U95ee\\U7bc7\\n        \\n       \\U536b\\U7075\\U516c\\U7bc7\\n        \\n       \\U5b63\\U6c0f\\U7bc7\\n        \\n       \\U9633\\U8d27\\U7bc7\\n        \\n       \\U5fae\\U5b50\\U7bc7\\n        \\n       \\U5b50\\U5f20\\U7bc7\\n        \\n       \\U5c27\\U66f0\\U7bc7\\n       \\n    \";\n  nodeName = ul;\n  raw = \"<ul>&#13;\\n     &#13;\\n       <span><a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a></span>&#13;\\n        &#13;\\n       <span><a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a></span>&#13;\\n       &#13;\\n    </ul>\";\n}",  "{\n  nodeContent = \"\\n    \";\n  nodeName = text;\n}")

children

 

 

6.firstChild

{  nodeContent = "\n    ";  nodeName = text;}

 

上面属性都是涉及HTML语言的标记,我们一般使用的时content属性,然后处理得到的NSString对象

 

这样我们就得到并处理为我们想要的数据。TFHppleElement是一个很重要的类,具体使用在这里就不介绍了。