How To Scrape From Wikipedia?

mugglu · August 17, 2015

i have added an image marking which part i want to scrape.

is it possible to copy everything between two txt like A ''this words i want to scrape'' B

deliter · August 17, 2015

set(#child,"","Global")
set(#position,0,"Global")
loop while($comparison($find regular expression($scrape attribute($element offset(<(tagname="p" OR id="toc")>,#position),"innertext"),".+"),"> Greater than",$nothing)) {
set(#child,"{#child}{$new line}{$scrape attribute($element offset(<(tagname="p" OR id="toc")>,#position),"innertext")}","Global")
increment(#position)
}

How this works is wiki has all paragraphs on the page within a p tag,the contents box has an ID of toc,the scrape attribute we are looking for is the innertext of p,but before an id of TOC,I do not undertand the element child/siblings function to well,there is probably a better way of doing it,to make sure we only get the p tags before the content box,Ive added in to look for innertext of p but before an id of toc and return the innertext(id of toc will not return any innertext) so once the loop gets a result of nothing it stops looping

addy196 · September 6, 2015

:rolleyes:

deliter · September 6, 2015

rather than "Roll your Eyes" why not post a better solution?

Sign In

How To Scrape From Wikipedia?

Recommended Posts

mugglu 1

Link to post

Share on other sites

deliter 203

Link to post

Share on other sites

addy196 9

Link to post

Share on other sites

deliter 203

Link to post

Share on other sites

Join the conversation

Browse

Activity