mootoo 0 Posted September 16, 2018 Report Share Posted September 16, 2018 Hello, Apologies for the basic question, but i'm trying to scrape a url from source .atom code. So just in case i'm explaining this incorrectly, if i add .atom to the end of a url on a site that i'm scraping, it displays sort of like the source of the page. Within the code is are url's that I want to scrape based on particular attributes. Is the only way to sort through this code and extract the url that I want through regex? Thanks Quote Link to post Share on other sites
HelloInsomnia 1103 Posted September 16, 2018 Report Share Posted September 16, 2018 Probably using regex or an xpath parser there is. It would really help to see an example but if you just need a generic URL regex you can try something like this: (http|https)\:\/\/[a-zA-Z0-9\-\?\#\.\=\/_\~]+ Quote Link to post Share on other sites
mootoo 0 Posted September 16, 2018 Author Report Share Posted September 16, 2018 Ok, thank you. I may come back with some regex questions but just wanted to make sure there wasnt a feature in ubot that I wasnt aware of. Thanks again. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.