Jump to content
UBot Underground

how to scrape page after page


Recommended Posts

Put it in a loop, something along the lines of : 

loop($list total(%pages)) {
    navigate($next list item(%pages), "Wait")
    add item to list(%emails, $page scrape("var displayEmail = \"", "\";"), "Delete", "Global")
}
Link to post
Share on other sites

Ok i am using the standard edition and do not know how to utilize the html function. I want to be able to click on a link in craigslist and scrape the email and then back to main list of cars for sale and then move to next link and scrape the email and then back out and so forth but i cannot figure out to get the loop to function properly to get the list of 100 people of so listing vehicles so i can mass email them

Link to post
Share on other sites
define scrape for raw pages {
    navigate("http://auburn.craigslist.org/cto/", "Wait")
    comment("replace \'auburn\' and \'cto\' for the area/section you want")
    add list to list(%pages, $scrape attribute(<outerhtml=w"<a href=\"http://*.craigslist.org/cto/*.html\">*</a>">, "fullhref"), "Delete", "Global")
    comment("the above uses the \'*\' wild card option to scrape for the urls")
}
define scrape emails {
    set list position(%pages, 0)
    loop($list total(%pages)) {
        navigate($next list item(%pages), "Wait")
        add item to list(%emails, $page scrape("var displayEmail = \"", "\";"), "Delete", "Global")
    }
}
divider
scrape for raw pages()
scrape emails()

 

I think this is basically what you're looking for, swap out which sections/area you want to scrape from as commented out. 

cl_ex.ubot

  • Like 1
Link to post
Share on other sites
  • 2 weeks later...

I may be retarded but i swapped to the area that i want but somehow i keep navigating back to auburn :(

 

navigate("http://auburn.craigslist.org/cto/", "Wait")

 

That's where you want to swap auburn, and also cto with whichever area of CL you want to use. Alternatively you can load in a list with locations (subdomains) and use something like $next list item. 

  • Like 1
Link to post
Share on other sites

 

navigate("http://auburn.craigslist.org/cto/", "Wait")

 

That's where you want to swap auburn, and also cto with whichever area of CL you want to use. Alternatively you can load in a list with locations (subdomains) and use something like $next list item. 

I did that but when it starts going through the pages it ends up switch back to auburn.... do i need to create a new list perhaps>?

Link to post
Share on other sites

I am so frustrated i can remove auburn and put in miami and it either stops functioning completly or starts in miami and switches over to auburn. Wth am i doing wrong

Link to post
Share on other sites
  • 2 weeks later...

am creating my own Craigslist bot for real estate.  am a noob.  was curious why you would want to scrape CL for autos.  Is this something you are scraping info, then emailing the poster and soliciting them for a service or something?

 

@merkaba, I'm not in the auto industry but I downloaded your script for CL just to see what it did and that thing was scraping like crazy...interesting to see how others are doing this stuff....

 

can't wait to dig further into this stuff.

 

Best,

D

Link to post
Share on other sites

am creating my own Craigslist bot for real estate.  am a noob.  was curious why you would want to scrape CL for autos.  Is this something you are scraping info, then emailing the poster and soliciting them for a service or something?

 

@merkaba, I'm not in the auto industry but I downloaded your script for CL just to see what it did and that thing was scraping like crazy...interesting to see how others are doing this stuff....

 

can't wait to dig further into this stuff.

 

Best,

 

D

 

If you're doing a lot of scraping, do yourself a favor and get Aymen's HTTP plugins. It's a lot faster, the only real bottle neck being internet speed. 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...