oricoun 4 Posted December 20, 2017 Report Share Posted December 20, 2017 Hi gang, I have rather large datafeed XML file located on URL. Forexample: https://www.bibloo.sk/_upload/heureka.php?cj Example XML text looks like this: <ORIGINAL_PRICE>81.49</ORIGINAL_PRICE> <MANUFACTURER>Franklin & Marshall</MANUFACTURER> <CATEGORYTEXT>Pánske | Oblečenie | Tričká a tielka | Polo tričká</CATEGORYTEXT> <EAN>8055526901364</EAN> I would like to extract from CATEGORYTEXT , all items that have "Polo tričká" in it. Is that even possible? I tried to xpath parser------------------//CATEGORYTEXT[contains(text(),"Polo tričká")]------------------but no luck, also bot freezes because XML contains 30 000 products. I would appreciate any help. THX. Quote Link to post Share on other sites
bestmacros 60 Posted December 20, 2017 Report Share Posted December 20, 2017 try regular expression (?<=\<CATEGORYTEXT>).*(?=Polo tričká<\/CATEGORYTEXT>) Quote Link to post Share on other sites
oricoun 4 Posted December 20, 2017 Author Report Share Posted December 20, 2017 Yeah I was thinking about that, but problem is that location of "polo trička" inside of the CATEGORYTEXT is all the time different. I was thinking that xpath is more powerfull compare to regex with scraping large data, Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.