Jump to content
UBot Underground

[Solved] Scraping Xhr Response


Recommended Posts

Hi,

 
I just got a developer copy of ubot and I'm trying to learn by scrapping Pinterest, may be not the easiest one, but I'm facing a big issue.
I've scrapped the search people page with success, but scrapping the followers of those profiles is another story. In fact the data is updated with Ajax after every scroll. I want to extract the username in the body response, I don't care if all the body is extracted as I can use a regex for that.
 

 

I'm a beginner in Javascript/jQuery and I don't know where to start. Can you guys tell me what to look for? I can't find the information, I've already succeeded at extracting the header, but unfortunately not the data.

 

 
Your help is really appreciated. Thank you.
Link to post
Share on other sites

what do you want to do actually ?

 

make the page scroll or scrape the data ?

if you want to scroll the page then use 'run javascript' command.

window.scrollTo(0,document.body.scrollHeight)

and if you want to scrape the results/data then you can use regex or json path

Link to post
Share on other sites

what do you want to do actually ?

 

make the page scroll or scrape the data ?

if you want to scroll the page then use 'run javascript' command.

window.scrollTo(0,document.body.scrollHeight)

and if you want to scrape the results/data then you can use regex or json path

I can scroll the page but I can't scrape more than the  first 10 results, All the data that is loaded is passed through Ajax. Even if I can see it on screen, nothing can be seen on source code, it's all loaded on the header. 

Link to post
Share on other sites

As I said in my first post "I'm a beginner in Javascript/jQuery and I don't know where to start.". So I don't have any code because I don't know what to do :(

Thank you for taking the time to look into my issue.

I expected a lot from this product and it's forum, but unfortunately I'm thinking about quitting and asking for a refund. I went directly for the dev version but I didn't think that the forum is here for people who make money selling plugins and not to help each others. 

Link to post
Share on other sites

This is far from perfect but it shows that you can scroll the page and still scrape the followers:

clear list(%followers)
add list to list(%followers,$scrape attribute(<(tagname="a" AND href=r"^/.*?/$")>,"href"),"Delete","Global")
run javascript("window.scrollTo(0,document.body.scrollHeight)")
  • Like 1
Link to post
Share on other sites

Thank you for the answer, but what you are trying to do with your code won't work on Pinterest because expect the 10 first profils, the data is passed through the header and you can't find it in the html source code. 

Link to post
Share on other sites

Thank you for the answer, but what you are trying to do with your code won't work on Pinterest because expect the 10 first profils, the data is passed through the header and you can't find it in the html source code. 

Have you try it ?

 

Those code work on pinterest, and the data is not passed through the header. The data is JSON format and in the html source code.

You can scrape using browser (html) or you can scrape using http get methods (json).

Link to post
Share on other sites

It works when you search for profils "search -> People", but once you've clicked on a profile and clicked on "followers" to see the number of followers of that profile the code will not work. 

Link to post
Share on other sites

It works when you search for profils "search -> People", but once you've clicked on a profile and clicked on "followers" to see the number of followers of that profile the code will not work. 

The code work. Thats why i need you to post your code here,

So we can discuss where is the problem

Link to post
Share on other sites

Here is my code, but you should test it on this page or similar https://www.pinterest.ca/moderncatmag/followers/ 

clear list(%searchlistfollowers)
set list position(%searchlist,1)
loop($list total(%searchlist)) {
    navigate("https://www.pinterest.ca{$list item(%searchlist,$list position(%searchlist))}","Wait")
    click(<data-reactid=149>,"Left Click","No")
    loop(20) {
        run javascript("window.scrollTo(0,document.body.scrollHeight); 
")
        wait for browser event("Page Loaded","")
        wait(1)
    }
    add list to list(%searchlistfollowers,$scrape attribute(<href=w"/*/">,"href"),"Delete","Global")
    set list position(%searchlist,$add($list position(%searchlist),1))
    wait(3)
}

Link to post
Share on other sites

you must put add list to list command inside page scroll loop

clear list(%searchlistfollowers)
add list to list(%searchlist,$list from text("/infographicnowcom
/romper.com","
"),"Delete","Global")
loop($list total(%searchlist)) {
    navigate("https://www.pinterest.ca{$next list item(%searchlist)}","Wait")
    wait for browser event("Everything Loaded","")
    click(<innertext="Followers">,"Left Click","No")
    wait for browser event("Everything Loaded","")
    loop(20) {
        run javascript("window.scrollTo(0,document.body.scrollHeight); 
")
        wait for browser event("Page Loaded","")
        wait(1)
        add list to list(%searchlistfollowers,$scrape attribute(<(tagname="a" AND href=r"^/.*?/$")>,"href"),"Delete","Global")
    }
    wait(3)
}

Link to post
Share on other sites

What a dumb error, my bad. 5 days scratching my head and making up theories about what should be the problem when it's an error in my code. Thank you a lot Varo.

 

At least I've dived my nose in what I'll need in the future in order to speedup the process by using sockets. Even though I've solved my issue I still wonder how to do a HTTP POST request to Pinterest, as the tutorials I followed all have something like ?page=x as URL that you have to call, but with this baby it contain a lot of data that it's almost impossible for me to guess how to do it :/

 

Thanks again Varo and the others (more than 5000 profiles on my list as I'm typing this YEAAAH :) )

Link to post
Share on other sites

You are welcome Yagami.

 

Http post is simple, just sent http get request to pinterest like this :

https://www.pinterest.ca/resource/UserFollowersResource/get/?source_url=/infographicnowcom/followers/&data={"options":{"hide_find_friends_rep":true,"username":"infographicnowcom"},"context":{},"module":{}}&module_path=App(module=[object Object])&_=1510802889379
https://www.pinterest.ca/resource/UserFollowersResource/get/?source_url=/infographicnowcom/followers/&data={"options":{"bookmarks":["Pz9Nakl5TXpvM09ERXdPVFV4TWpFNU5UWTBNREkxTXpvNU1qSXpNemN5TURNMk9EVTBOemMxT0RBM1gwVT18NDk5NGEyMDg4MzIyNjJlZGYwMjU0ZTUwZTI4OGYwZjk0MTg0M2YwMmY1YjdlZDg2MGU0MWIzNWZkM2QwYTAzOQ=="],"hide_find_friends_rep":true,"username":"infographicnowcom"},"context":{},"module":{}}&module_path=App(module=[object Object])&_=1510802889380

and read the response with JSON parser

Link to post
Share on other sites

Thanks. Where is the pagination in your code? I guess that I will have to modify something in there to simulate the scroll?

When you sent first GET request, then server will response with JSON data which contain parameter value, that you need to input as parameter url in the next request

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...