Jump to content
UBot Underground

url string is messed up after scraping


Recommended Posts

hi,

 

i tried searching for this problem but don't really know what to search for and so couldn't find a solution. please help if it's not too much trouble for you :)

 

so i navigate to this page where i want to scrape a url. the url is in this format in the ubot browser visually:

 

http://domain.com/v1.php?pid=1&hid=12345&t=0(0)&zz=123&jump=<sub_tracking%20id>&par=a1234-q.w.e.r%09a.sdfghj%097

 

but what i scrape is:

 

http://domain.com/v1.php?pid=1&hid=12345&t=0(0)&zz=123&jump=<sub_tracking[space]id>&par=a1234-q.w.e.r[tab]a.sdfghj[tab]7

 

[space] and [tab] are added because ubot forum automatically deletes the space and tab within the quote, but it's there

 

then because i want to navigate to my scraped url, i simply do navigate, scraped url variable. however, since some subsitutions occur like:

 

& becomes &

< becomes <

> becomes >

%20 becomes [space]

%09 becomes [tab]

 

i think these changes are breaking my url and i so can't navigate to it anymore... what can be done to make the url working again?

Link to post
Share on other sites

thanks for that. i did replace the %09 by nothing

 

then i realize the urls will vary so greatly that i don't know how many characters i need to replace... i wonder if there is a more universial method to replace all ASCII characters

Link to post
Share on other sites

unfortunately that is not my call to make... i don't think my biz partner would be comfortable with sharing it.

 

anyway to replace these characters easily?

 

thanks for the help!

Link to post
Share on other sites

I asked because there maybe another way to scrape the links without ubot automatically convert certain characters.

 

 

If [space] converted to these

%20
%09
%05 

 

You can use this regex code

%[0-9]{2}

to replace it into [space]

Link to post
Share on other sites

Are you scraping these urls from the page itself, or is this the url in the address bar? If they are clickable links click on them and then grab the $url.

Link to post
Share on other sites

thanks for the response :)

 

these links are from $page scrape

 

i tried choosing it by attribute and getting the link, but the result is the same as what i get from $page scrape

 

 

I asked because there maybe another way to scrape the links without ubot automatically convert certain characters.

 

so it is ubot that converts them into characters?

 

weird as i have not come across this scraping other pages/ links.

Link to post
Share on other sites

 

so it is ubot that converts them into characters?

 

weird as i have not come across this scraping other pages/ links.

 

I think so, when you use $page scrape.

 

Try to "choose attribute" then $scrape chosen attribute > href. It should scrape the correct url.

 

Edit: I just realized that you already tried "choose attribute" then $scrape chosen attribute . Then I think the last resort there will be replacing every wrong characters.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...