Jump to content
UBot Underground

Can I use $page scrape with wildcards?


Recommended Posts

Hello, Im dealing with a scrape which defines each row data with this:

<a href="#" class="detailLink action-detail-setup" data-recordid="700001189669" data-record-index="0">Bryant</a>

 

and on the second row of the table (which contains 25 results per page)

<a href="#" class="detailLink action-detail-setup" data-recordid="700001275054" data-record-index="1">Thomas</a>

Third row:

<a href="#" class="detailLink action-detail-setup" data-recordid="700001275070" data-record-index="2">Adrian</a>

 

How can I scrape all the rows and align as the table but in a csv?

Been trying with many scrape attribute and all, but its not saving :@

 

Any has an idea to solve this?

Link to post
Share on other sites

By the way..

The source code has the info like this:

 

<tr>
<td class="selection"><input type="checkbox" value="300001275866" class="action-record-selected" data-recordid="300001275866"></td>
<td class="no-padding-on-left no-padding-on-right" title="Quick View"><div class="action-quick-view-list" data-quickview="/UsConsumer/Detail/QuickView?recordId=300001275866" data-detail="/UsConsumer/Detail/All/2a4baaca0ec24c87919043cc10bfd16e/21"> </div></td><td class="no-padding-on-left no-padding-on-right"><div title="Exported" class=""> </div></td>
<td class="FirstName"><a href="#" class="detailLink action-detail-setup" data-recordid="300001275866" data-record-index="21">Michelle</a></td>
<td class="LastName"><a href="#" class="detailLink action-detail-setup" data-recordid="300001275866" data-record-index="21">Bacco</a></td>
<td class="Address">2916 Willow St</td>
<td class="CityState">Anchorage, AK</td><td class="Phone">(907) 276-8467</td>
<td class="AgeRange">40 - 44</td>
</tr>
<tr>
<td class="selection"><input type="checkbox" value="300001275905" class="action-record-selected" data-recordid="300001275905"></td>
<td class="no-padding-on-left no-padding-on-right" title="Quick View"><div class="action-quick-view-list" data-quickview="/UsConsumer/Detail/QuickView?recordId=300001275905" data-detail="/UsConsumer/Detail/All/2a4baaca0ec24c87919043cc10bfd16e/22"> </div></td><td class="no-padding-on-left no-padding-on-right"><div title="Exported" class=""> </div></td>
<td class="FirstName"><a href="#" class="detailLink action-detail-setup" data-recordid="300001275905" data-record-index="22">Robert</a></td>
<td class="LastName"><a href="#" class="detailLink action-detail-setup" data-recordid="300001275905" data-record-index="22">Baer</a></td>
<td class="Address">3444 Wentworth St</td>
<td class="CityState">Anchorage, AK</td><td class="Phone">(907) 278-2457</td>
<td class="AgeRange">65+</td>
</tr>
<tr>
<td class="selection"><input type="checkbox" value="700001275931" class="action-record-selected" data-recordid="700001275931"></td>
<td class="no-padding-on-left no-padding-on-right" title="Quick View"><div class="action-quick-view-list" data-quickview="/UsConsumer/Detail/QuickView?recordId=700001275931" data-detail="/UsConsumer/Detail/All/2a4baaca0ec24c87919043cc10bfd16e/23"> </div></td><td class="no-padding-on-left no-padding-on-right"><div title="Exported" class=""> </div></td>
<td class="FirstName"><a href="#" class="detailLink action-detail-setup" data-recordid="700001275931" data-record-index="23">Glen</a></td>
<td class="LastName"><a href="#" class="detailLink action-detail-setup" data-recordid="700001275931" data-record-index="23">Bailey</a></td>
<td class="Address">1821 Westchester Cir</td>
<td class="CityState">Anchorage, AK</td><td class="Phone">(907) 277-7640</td>
<td class="AgeRange">45 - 49</td>

Link to post
Share on other sites

I guess the clue is the variable offset..

Because each time I go down from row to row.. It increments..

How can I use the offset from 0 to 25 to grab all the results on the list?

Link to post
Share on other sites

Hi,

 

This ubot code sample is based on the table data sample provided.

 

Sample code:

set(#trlist, "<tr>
<td class=\"selection\"><input type=\"checkbox\" value=\"300001275866\" class=\"action-record-selected\" data-recordid=\"300001275866\"></td>
<td class=\"no-padding-on-left no-padding-on-right\" title=\"Quick View\"><div class=\"action-quick-view-list\" data-quickview=\"/UsConsumer/Detail/QuickView?recordId=300001275866\" data-detail=\"/UsConsumer/Detail/All/2a4baaca0ec24c87919043cc10bfd16e/21\"> </div></td><td class=\"no-padding-on-left no-padding-on-right\"><div title=\"Exported\" class=\"\"> </div></td>
<td class=\"FirstName\"><a href=\"#\" class=\"detailLink action-detail-setup\" data-recordid=\"300001275866\" data-record-index=\"21\">Michelle</a></td>
<td class=\"LastName\"><a href=\"#\" class=\"detailLink action-detail-setup\" data-recordid=\"300001275866\" data-record-index=\"21\">Bacco</a></td>
<td class=\"Address\">2916 Willow St</td>
<td class=\"CityState\">Anchorage, AK</td><td class=\"Phone\">(907) 276-8467</td>
<td class=\"AgeRange\">40 - 44</td>
</tr>
<tr>
<td class=\"selection\"><input type=\"checkbox\" value=\"300001275905\" class=\"action-record-selected\" data-recordid=\"300001275905\"></td>
<td class=\"no-padding-on-left no-padding-on-right\" title=\"Quick View\"><div class=\"action-quick-view-list\" data-quickview=\"/UsConsumer/Detail/QuickView?recordId=300001275905\" data-detail=\"/UsConsumer/Detail/All/2a4baaca0ec24c87919043cc10bfd16e/22\"> </div></td><td class=\"no-padding-on-left no-padding-on-right\"><div title=\"Exported\" class=\"\"> </div></td>
<td class=\"FirstName\"><a href=\"#\" class=\"detailLink action-detail-setup\" data-recordid=\"300001275905\" data-record-index=\"22\">Robert</a></td>
<td class=\"LastName\"><a href=\"#\" class=\"detailLink action-detail-setup\" data-recordid=\"300001275905\" data-record-index=\"22\">Baer</a></td>
<td class=\"Address\">3444 Wentworth St</td>
<td class=\"CityState\">Anchorage, AK</td><td class=\"Phone\">(907) 278-2457</td>
<td class=\"AgeRange\">65+</td>
</tr>
<tr>
<td class=\"selection\"><input type=\"checkbox\" value=\"700001275931\" class=\"action-record-selected\" data-recordid=\"700001275931\"></td>
<td class=\"no-padding-on-left no-padding-on-right\" title=\"Quick View\"><div class=\"action-quick-view-list\" data-quickview=\"/UsConsumer/Detail/QuickView?recordId=700001275931\" data-detail=\"/UsConsumer/Detail/All/2a4baaca0ec24c87919043cc10bfd16e/23\"> </div></td><td class=\"no-padding-on-left no-padding-on-right\"><div title=\"Exported\" class=\"\"> </div></td>
<td class=\"FirstName\"><a href=\"#\" class=\"detailLink action-detail-setup\" data-recordid=\"700001275931\" data-record-index=\"23\">Glen</a></td>
<td class=\"LastName\"><a href=\"#\" class=\"detailLink action-detail-setup\" data-recordid=\"700001275931\" data-record-index=\"23\">Bailey</a></td>
<td class=\"Address\">1821 Westchester Cir</td>
<td class=\"CityState\">Anchorage, AK</td><td class=\"Phone\">(907) 277-7640</td>
<td class=\"AgeRange\">45 - 49</td>
</tr>", "Global")
clear list(%csvsaveitems)
clear list(%webpagetabitems)
add list to list(%webpagetabitems, $list from text(#trlist, "</tr>"), "Delete", "Global")
loop($list total(%webpagetabitems)) {
   if($comparison($list position(%webpagetabitems), "<", $list total(%webpagetabitems))) {
       then {
           set(#tritem, $trim($replace($next list item(%webpagetabitems), $new line, $nothing)), "Global")
           set(#quickview, $replace regular expression($replace regular expression(#tritem, ".*data-quickview=", $nothing), " data-detail=.*", $nothing), "Global")
           set(#recordid, $replace(#quickview, "/UsConsumer/Detail/QuickView?recordId=", $nothing), "Global")
           set(#detaildata, $replace regular expression($replace regular expression(#tritem, ".*data-detail=", $nothing), "> <\\/div><\\/td><td class=\"no-padding-on-left no-padding-on-right\">.*", $nothing), "Global")
           set(#firstname, $replace regular expression($replace regular expression(#tritem, ".*<td class=\"FirstName\"><a href=\"#\" class=\"detailLink action-detail-setup\" data-recordid=\"\\d\{1,12\}\" data-record-index=\"\\d\{1,2\}\">", $nothing), "<\\/a><\\/td><td class=\"LastName\">.*", $nothing), "Global")
           set(#lastname, $replace regular expression($replace regular expression(#tritem, ".*<td class=\"LastName\"><a href=\"#\" class=\"detailLink action-detail-setup\" data-recordid=\"\\d\{1,12\}\" data-record-index=\"\\d\{1,2\}\">", $nothing), "<\\/a><\\/td><td class=\"Address\">.*", $nothing), "Global")
           set(#address, $replace regular expression($replace regular expression(#tritem, ".*<td class=\"Address\">", $nothing), "<\\/td><td class=\"CityState\">.*", $nothing), "Global")
           set(#citystate, $replace regular expression($replace regular expression(#tritem, ".*<td class=\"CityState\">", $nothing), "<\\/td><td class=\"Phone\">.*", $nothing), "Global")
           set(#phone, $replace($replace regular expression($replace regular expression(#tritem, ".*<td class=\"Phone\">", $nothing), "<\\/td><td class=\"AgeRange\">.*", $nothing), " ", " "), "Global")
           set(#agerange, $replace($replace regular expression(#tritem, ".*<td class=\"AgeRange\">", $nothing), "</td>", $nothing), "Global")
           add item to list(%csvsaveitems, "{#recordid},\"{#firstname}\",\"{#lastname}\",\"{#address}\",\"{#citystate}\",\"{#phone}\",\"{#agerange}\",{#quickview},{#detaildata}", "Delete", "Global")
       }
       else {
       }
   }
}
save to file("c:\\downloads\\pagedetails.csv", %csvsaveitems)

sample-table-to-list-to-csv-002.ubot

 

Kevin

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...