Jump to content
UBot Underground

Need Help Getting First Result From Scrape


Recommended Posts

Hey guys,

 

I scraped a page and need to return only the first result of a dynamic result that changes:

<div class="div_flow_charts">

		<div id="trust_flow_chart" class="flow_chart_container" style="width: 170px; height: 170px;"></div>		

		<span class="trustCategories ">

				<div class="category_wrap">

					<ul class="category_list">

							<li class="cat_line society">

								<span class="the_score"><b>17</b></span>

								<span class="the_percentage" style="display:none;">23.21</span>

								<span class="the_category">Society / Philosophy</span>

							</li>

							<li class="cat_line business">

								<span class="the_score"><b>17</b></span>

								<span class="the_percentage" style="display:none;">20.83</span>

								<span class="the_category">Business</span>

							</li>

							<li class="cat_line computers">

								<span class="the_score"><b>16</b></span>

								<span class="the_percentage" style="display:none;">12.49</span>

								<span class="the_category">Computers / Internet / Web Design and Development</span>

							</li>

							<li class="cat_line regional">

								<span class="the_score"><b>15</b></span>

								<span class="the_percentage" style="display:none;">8.81</span>

								<span class="the_category">Regional / Asia</span>

							</li>

							

							 </ul><ul class="category_list"> 

							
							<li class="cat_line games">

								<span class="the_score"><b>15</b></span>

								<span class="the_percentage" style="display:none;">7.97</span>

								<span class="the_category">Games / Video Games / Puzzle</span>

							</li>

							
							<li class="cat_line arts">

								<span class="the_score"><b>13</b></span>

								<span class="the_percentage" style="display:none;">4.06</span>

								<span class="the_category">Arts / Design</span>

							</li>

							<li class="cat_line society">

								<span class="the_score"><b>13</b></span>

								<span class="the_percentage" style="display:none;">3.79</span>

								<span class="the_category">Society / Politics</span>

							</li>

							<li class="cat_line arts">

								<span class="the_score"><b>13</b></span>

								<span class="the_percentage" style="display:none;">3.29</span>

								<span class="the_category">Arts / Music</span>

							</li>

							 </ul><ul class="category_list"> 

							<li class="cat_line computers">

								<span class="the_score"><b>13</b></span>

								<span class="the_percentage" style="display:none;">2.52</span>

								<span class="the_category">Computers / Internet / News and Media</span>

							</li>

							<li class="cat_line business">

								<span class="the_score"><b>12</b></span>

								<span class="the_percentage" style="display:none;">2.19</span>

								<span class="the_category">Business / Business Services</span>

							</li>

							<li class="cat_line society">

								<span class="the_score"><b>12</b></span>

								<span class="the_percentage" style="display:none;">1.71</span>

								<span class="the_category">Society / People</span>

							</li>


							<li class="cat_line recreation">

								<span class="the_score"><b>12</b></span>

								<span class="the_percentage" style="display:none;">1.55</span>

								<span class="the_category">Recreation / Aviation</span>

							</li>

							

							 </ul><ul class="category_list"> 

							<li class="cat_line business">

								<span class="the_score"><b>11</b></span>

								<span class="the_percentage" style="display:none;">0.86</span>

								<span class="the_category">Business / Materials</span>

							</li>

							<li class="cat_line business">

								<span class="the_score"><b>11</b></span>

								<span class="the_percentage" style="display:none;">0.7</span>

								<span class="the_category">Business / Arts and Entertainment</span>

							</li>

							<li class="cat_line recreation">

								<span class="the_score"><b>10</b></span>

								<span class="the_percentage" style="display:none;">0.61</span>

								<span class="the_category">Recreation / Travel</span>

							</li>

							<li class="cat_line home">

								<span class="the_score"><b>10</b></span>

								<span class="the_percentage" style="display:none;">0.42</span>

								<span class="the_category">Home / Cooking</span>

							</li>

							

							 </ul><ul class="category_list"> 

							
							<li class="cat_line arts">

								<span class="the_score"><b>10</b></span>

								<span class="the_percentage" style="display:none;">0.42</span>

								<span class="the_category">Arts / Entertainment</span>

							</li>

							<li class="cat_line computers">

								<span class="the_score"><b>10</b></span>

								<span class="the_percentage" style="display:none;">0.4</span>

								<span class="the_category">Computers / Internet / Abuse</span>

							</li>

							
							<li class="cat_line society">

								<span class="the_score"><b>10</b></span>

								<span class="the_percentage" style="display:none;">0.37</span>

								<span class="the_category">Society</span>

							</li>

							
							<li class="cat_line society">

								<span class="the_score"><b>10</b></span>

								<span class="the_percentage" style="display:none;">0.36</span>

								<span class="the_category">Society / Religion and Spirituality</span>

							</li>

							

							 </ul><ul class="category_list"> 

							
							<li class="cat_line computers">

								<span class="the_score"><b>10</b></span>

								<span class="the_percentage" style="display:none;">0.35</span>

								<span class="the_category">Computers / Internet / On the Web</span>

							</li>

							
							<li class="cat_line sports">

								<span class="the_score"><b>10</b></span>

								<span class="the_percentage" style="display:none;">0.33</span>

								<span class="the_category">Sports / Hockey</span>

							</li>

							
							<li class="cat_line recreation">

								<span class="the_score"><b>10</b></span>

								<span class="the_percentage" style="display:none;">0.28</span>

								<span class="the_category">Recreation / Food</span>

							</li>

							<li class="cat_line arts">

								<span class="the_score"><b>9</b></span>

								<span class="the_percentage" style="display:none;">0.22</span>

								<span class="the_category">Arts / Visual Arts</span>

							</li>

							
							 </ul><ul class="category_list"> 

							<li class="cat_line computers">

								<span class="the_score"><b>9</b></span>

								<span class="the_percentage" style="display:none;">0.17</span>

								<span class="the_category">Computers</span>

							</li>

							<li class="cat_line computers">

								<span class="the_score"><b>9</b></span>

								<span class="the_percentage" style="display:none;">0.17</span>

								<span class="the_category">Computers / Internet / Searching</span>

							</li>


							<li class="cat_line business">

								<span class="the_score"><b>9</b></span>

								<span class="the_percentage" style="display:none;">0.16</span>

								<span class="the_category">Business / Marketing and Advertising</span>

							</li>


							<li class="cat_line games">

								<span class="the_score"><b>9</b></span>

								<span class="the_percentage" style="display:none;">0.12</span>

								<span class="the_category">Games / Gambling</span>

							</li>

							 </ul><ul class="category_list"> 

							<li class="cat_line arts">

								<span class="the_score"><b>8</b></span>

								<span class="the_percentage" style="display:none;">0.08</span>

								<span class="the_category">Arts / Radio</span>

							</li>

							<li class="cat_line sports">

								<span class="the_score"><b>8</b></span>

								<span class="the_percentage" style="display:none;">0.07</span>

								<span class="the_category">Sports / Events</span>

							</li>

					</ul>

				</div>

I'm scraping Majestic Topical Trust flow and the problem is that depending on the site you're researching the result categories change.

 

In this instance I'm after the first result which happens to be this:

<li class="cat_line society">

								<span class="the_score"><b>17</b></span>

								<span class="the_percentage" style="display:none;">23.21</span>

								<span class="the_category">Society / Philosophy</span>

							</li>

The data I need is to scrape the score, the percentage and the category at the bottom.

 

As you can see from the first code box above there are several different classes and so it's

hard for me to figure out how to only return the first result and how can I separate the 3 values

I am after?

 

Any ideas?

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...