你的位置:首页 > 软件开发 > Java > Java中解析HTML数据 (利用第三方库Jsoup)

Java中解析HTML数据 (利用第三方库Jsoup)

发布时间:2016-05-25 18:00:27
需求分析:在为网页服务提取API时需要解析页面中的信息项目地址: https://github.com/hwding/LibXDUQuery 准备工作:下载第三方库Jsoup(一款非常优秀的HTML Parser): https://jsoup.org/download阅读J ...

Java中解析HTML数据 (利用第三方库Jsoup)

需求分析:

在为网页服务提取API时需要解析页面中的信息

项目地址: https://github.com/hwding/LibXDUQuery

 

准备工作:

  • 下载第三方库Jsoup(一款非常优秀的HTML Parser): https://jsoup.org/download
  • 阅读Jsoup API Reference: https://jsoup.org/apidocs/
  • 查阅相关代码
  • 了解JAVA解析

 

网页源代码:

 1 <HTML> 2   <HEAD> 3     <title>物理实验网络选课系统</title> 4     <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=gb2312"> 5     <meta content="Microsoft Visual Studio .NET 7.1" name="GENERATOR"> 6     <meta content="C#" name="CODE_LANGUAGE"> 7     <meta content="JavaScript" name="vs_defaultClientScript"> 8     <meta content="http://schemas.microsoft.com/intellisense/ie5" name="vs_targetSchema"> 9     <link href="../style.css" type="text/css" rel="stylesheet"> 10      11   </HEAD> 12   <body> 13      14 <TABLE cellSpacing="0" cellPadding="0" width="100%" border="0"> 15   <TR> 16     <TD width="560" nowrap><img src='/images/loading.gif' data-original="http://www.cnblogs.com//PhyEws/img/banleft.gif"></TD> 17     <td width="258" align="left" nowrap> 18       <table cellSpacing="0" cellPadding="0" width="100%" border="0"> 19         <tr> 20           <TD nowrap background="/PhyEws/img/banback.gif" valign="bottom" align=left height=90 width="258" > 21             <br> 22             <br> 23             <div style="FONT-WEIGHT:bold;FONT-SIZE:18pt;COLOR:#666666;FONT-FAMILY:宋体"> 24               某某大学物理实验教学示范中心 25                26             </div> 27             <br> 28           </TD> 29         </tr> 30       </table> 31     </td> 32     <td align="center" valign="bottom">&nbsp; 33       </td> 34   </TR> 35 </TABLE> 36 <table width="100%" border="0" cellspacing="0" cellpadding="0" align="center" class="localtoolbar"> 37   <tr style="PADDING-TOP: 2px" height="30" align="center" class="lt0"> 38      <td class="lt0" nowrap> 39       <font color="#000099"> 40         2016年5月25日 第13周  星期三 41       </font>&nbsp;&nbsp;&nbsp;&nbsp;  42       <a href="student.aspx" class="menu">首页</a> |  43       <a href="select.aspx" class="menu">  查询已选实验</a> |  44       <a href="addexpe.aspx" class="menu">开始选课</a> |  45       <a href="del.aspx" class="menu">取消选课</a> |  46       <a href="statsel.aspx" class="menu">查询可选单元</a> |  47       <a href="msg.aspx" class="menu">给教师留言</a> |  48       <a href="course.aspx" class="menu">课程安排</a> |  49       <a href="expeinfo.aspx" class="menu">实验项目查询</a> |  50       <a href="chgstupwd.aspx" class="menu">修改密码</a> |  51       <a href="modperinfo.aspx" class="menu">修改个人信息</a> |  52       <a href="../logoff.aspx" class="menu">退出</a> 53     </td> 54   </tr> 55 </table> 56  57     <TABLE id="Table1" cellSpacing="0" cellPadding="0" width="100%" border="0" height="72%"> 58       <TR> 59         <TD valign="top" nowrap> 60           <form name="Form1" method="post" action="select.aspx" id="Form1"> 61 <div> 62 <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUJODg3NTE5NTY0ZGTrVCA59FM5ZwvQdjZCh3bbd3Y15Q==" /> 63 </div> 64  65 <div> 66  67   <input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="CF4774E0" /> 68 </div> 69             <br> 70         <span style='MARGIN-LEFT:50px'><a href='detgrade.aspx' >查询各实验的具体成绩及出勤情况</a></span><br/> 71              72              73             <br> 74             <div align="center"> 75             某某某 76             同学, 77             <span id="Introscore" style="color:Red;">您的绪论课成绩为:尚未录入</span> 78             <BR> 79             您已选的实验如下:<br> 80             (注:“归一成绩”栏为期末计算实验综合成绩时,对各实验成绩进行归一化处理后所得成绩) 81             <br> 82             <br> 83             <span id="Orders"><table id="Orders_ctl00" cellspacing="0" rules="all" border="1" style="width:900px;border-collapse:collapse;"> 84   <tr> 85     <th class="tableHeaderText" align="left" style="height:25px;width:15px;">序号</th><th class="tableHeaderText" align="left" style="height:25px;width:280px;">实验项目</th><th class="tableHeaderText" align="left" style="height:25px;width:30px;">实验周次</th><th class="tableHeaderText" align="left" style="height:25px;width:100px;">实验时间</th><th class="tableHeaderText" align="left" style="height:25px;width:80px;">实验日期</th><th class="tableHeaderText" align="left" style="height:25px;width:35px;">上课教室</th><th class="tableHeaderText" align="center" style="height:25px;width:80px;">讲义出处</th><th class="tableHeaderText" align="center" style="height:25px;width:70px;">实验成绩</th><th class="tableHeaderText" align="center" style="height:25px;width:35px;">归一成绩</th><th class="tableHeaderText" align="center" style="height:25px;">备注</th> 86   </tr><tr> 87     <td class="forumRow" style="height:25px;"><span>1</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">绪论F322(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>2</span></td><td class="forumRow" style="height:25px;"><span>星期五晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>3/11/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F322</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new"></a></td><td class="forumRow" style="height:25px;"><span>95</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span></span></td> 88   </tr><tr> 89     <td class="forumRow" style="height:25px;"><span>2</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">利用牛顿环测量平凸透镜曲率半径(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>4</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>3/24/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F321</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">综合设计性物理实验</a></td><td class="forumRow" style="height:25px;"><span>95</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第89页</span></td> 90   </tr><tr> 91     <td class="forumRow" style="height:25px;"><span>3</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">复摆测量重力加速度实验(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>5</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>3/31/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F323</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">基础物理实验</a></td><td class="forumRow" style="height:25px;"><span>85</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第49页</span></td> 92   </tr><tr> 93     <td class="forumRow" style="height:25px;"><span>4</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">组装式直流双臂电桥测量低电阻(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>7</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>4/14/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F220</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">基础物理实验</a></td><td class="forumRow" style="height:25px;"><span>80</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第136页</span></td> 94   </tr><tr> 95     <td class="forumRow" style="height:25px;"><span>5</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">线性、非线性电阻及二极管伏安特性的测定(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>8</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>4/21/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F220</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">基础物理实验</a></td><td class="forumRow" style="height:25px;"><span>70</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第141页</span></td> 96   </tr><tr> 97     <td class="forumRow" style="height:25px;"><span>6</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">感应法测磁场——交变磁场的分布与测量(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>9</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>4/28/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F221</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">综合设计性物理实验</a></td><td class="forumRow" style="height:25px;"><span>80</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第215页</span></td> 98   </tr><tr> 99     <td class="forumRow" style="height:25px;"><span>7</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">冲击法测量高阻和电容实验(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>12</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>5/19/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F222</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">下载</a></td><td class="forumRow" style="height:25px;"><span>未录入</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span></span></td>100   </tr><tr>101     <td class="forumRow" style="height:25px;"><span>8</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">霍耳元件测量磁场实验(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>13</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>5/26/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F221</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">综合设计性物理实验</a></td><td class="forumRow" style="height:25px;"><span>未录入</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第147页</span></td>102   </tr><tr>103     <td class="forumRow" style="height:25px;"><span>9</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">电流场模拟静电场(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>14</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>6/2/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F222</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">基础物理实验</a></td><td class="forumRow" style="height:25px;"><span>未录入</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第153页</span></td>104   </tr>105 </table></span>106             </div><br><br>107             <span style='MARGIN-LEFT:50px'><a href='detgrade.aspx' >查询各实验的具体成绩及出勤情况</a></span><br/></form>108         </TD>109       </TR>110     </TABLE>111     <TABLE height=45 id=footer cellSpacing=0 cellPadding=0 width="100%">112  <TBODY>113  <TR Valign=center >114   <TD nowrap id=msviFooter2115   style="FILTER: progid:DXImageTransform.Microsoft.Gradient(startColorStr='#FFFFFF', endColorStr='#0066CC', gradientType='1')" 116   width="100%" align=center>&copy;2005 某某大学物理实验教学示范中心&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;117  <td>&nbsp;</td>118   </TD>119   </TR>120   </TBODY>121   </TABLE>122 123 124   </body>125 </HTML>

原标题:Java中解析HTML数据 (利用第三方库Jsoup)

关键词:JS

JS
*特别声明:以上内容来自于网络收集,著作权属原作者所有,如有侵权,请联系我们: admin#shaoqun.com (#换成@)。