grantwood 5 Posted September 27, 2013 Report Share Posted September 27, 2013 Hello, I am using the Page Scrape command to extract the text below: <td colspan="3" class="heading small"><strong>Product charges</strong></td> </tr> <tr> <td class="title small">Tuneband for iPhone 4 & iPhone 4S, Black, Grantwood Technology's Armband, Silicone Skin, and Front/Back Screen Protector</td> <td class="space"> </td> <td class="quantity small">Qty: 1</td> <td class="small"></td> <td class="amount small">$21.99</td> </tr> <tr> <td class="title small">Tuneband for iPhone 5 (NOT FOR IPHONE 5C OR IPHONE 5S), Black, Grantwood Technology's Armband, Silicone Skin, and Front Screen Protector</td> <td class="space"> </td> <td class="quantity small">Qty: 1</td> <td class="small"></td> <td class="amount small">$22.99</td> </tr> <tr> <td colspan="5" height="25px"><hr></td> </tr> I want to use the Find Regular Expression function to extract all occurrences of the tag <td class="title small">, which should be (2) occurrences in this example. There is always a "title" class name, and sometimes there is more than one, like "title small". However, when I use the following regex, only (1) occurrence is returned. Any ideas? add list to list(%temp_list, $find regular expression(#temp, "<td class=\"title.*\">"), "Delete", "Global") Quote Link to post Share on other sites
HelloInsomnia 1103 Posted September 27, 2013 Report Share Posted September 27, 2013 Here you go: add list to list(%temp_list, $find regular expression(#temp, "(?<=title\\ssmall\\\"\\>).*?(?=\\<)"), "Delete", "Global") Quote Link to post Share on other sites
grantwood 5 Posted September 27, 2013 Author Report Share Posted September 27, 2013 Wow! That works perfectly. If I want to extract the quantities and amounts, would I use: add list to list(%temp_list, $find regular expression(#temp, "(?<=quantity\\ssmall\\\"\\>).*?(?=\\<)"), "Delete", "Global") add list to list(%temp_list, $find regular expression(#temp, "(?<=amount\\ssmall\\\"\\>).*?(?=\\<)"), "Delete", "Global") Quote Link to post Share on other sites
HelloInsomnia 1103 Posted September 27, 2013 Report Share Posted September 27, 2013 Yes that looks right. Quote Link to post Share on other sites
grantwood 5 Posted September 27, 2013 Author Report Share Posted September 27, 2013 The regex for the quantity only returns (1) occurrence. Is that because the (2) occurrences have the same value? Quote Link to post Share on other sites
grantwood 5 Posted September 27, 2013 Author Report Share Posted September 27, 2013 Never mind. The list was configured to delete duplicates. Duh! Quote Link to post Share on other sites
grantwood 5 Posted June 22, 2015 Author Report Share Posted June 22, 2015 If there are returns embedded in the text, then the current regex does not produce any matches. For example: <td class="amount small"> $21.99 </td> How would you modify the regex to extract the text (including any returns, tabs, spaces, etc.)? Also, how would you strip all of these characters (Ubot's $trim command only strips spaces), leaving just the amount? add list to list(%temp_list, $find regular expression(#temp, "(?<=amount\\ssmall\\\"\\>).*?(?=\\<)"), "Don\'t Delete", "Global") Quote Link to post Share on other sites
HelloInsomnia 1103 Posted June 24, 2015 Report Share Posted June 24, 2015 You can start with this: set(#temp, "<td class=\"amount small\"> $21.99 </td>", "Global") add list to list(%temp_list, $find regular expression($trim($replace(#temp, $new line, $nothing)), "(?<=amount\\ssmall\\\"\\>).*?(?=\\<)"), "Don\'t Delete", "Global") And then when you use each list item you can call $trim to get rid of any extra spaces. That should be able to do it all for you. Quote Link to post Share on other sites
grantwood 5 Posted June 25, 2015 Author Report Share Posted June 25, 2015 That will work. Thank you! I wish uBot Studio would add a multiline option. 1 Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.