Scraping All.atom Code

mootoo · September 16, 2018

Hello,

Apologies for the basic question, but i'm trying to scrape a url from source .atom code.

So just in case i'm explaining this incorrectly, if i add .atom to the end of a url on a site that i'm scraping, it displays sort of like the source of the page.

Within the code is are url's that I want to scrape based on particular attributes.

Is the only way to sort through this code and extract the url that I want through regex?

Thanks

HelloInsomnia · September 16, 2018

Probably using regex or an xpath parser there is. It would really help to see an example but if you just need a generic URL regex you can try something like this:

(http|https)\:\/\/[a-zA-Z0-9\-\?\#\.\=\/_\~]+

mootoo · September 16, 2018

Ok, thank you.

I may come back with some regex questions but just wanted to make sure there wasnt a feature in ubot that I wasnt aware of.

Thanks again.

Sign In

Scraping All.atom Code

Recommended Posts

mootoo 0

Link to post

Share on other sites

HelloInsomnia 1103

Link to post

Share on other sites

mootoo 0

Link to post

Share on other sites

Join the conversation

Browse

Activity