Jump to content
UBot Underground

How To Scrape From Wikipedia?


Recommended Posts

set(#child,"","Global")
set(#position,0,"Global")
loop while($comparison($find regular expression($scrape attribute($element offset(<(tagname="p" OR id="toc")>,#position),"innertext"),".+"),"> Greater than",$nothing)) {
    set(#child,"{#child}{$new line}{$scrape attribute($element offset(<(tagname="p" OR id="toc")>,#position),"innertext")}","Global")
    increment(#position)
}

 

How this works is wiki has all paragraphs on the page within a p tag,the contents box has an ID of toc,the scrape attribute we are looking for is the innertext of p,but before an id of TOC,I do not undertand the element child/siblings function to well,there is probably a better way of doing it,to make sure we only get the p tags before the content box,Ive added in to look for innertext of p but before an id of toc and return the innertext(id of toc will not return any innertext) so once the loop gets a result of nothing it stops looping

Link to post
Share on other sites
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...