Press Enter

PROJECT TITLE

    Press Enter

    NOTIFICATIONS

    PROJECT TITLE
    vertexmedia

    Webiste Scraper

    • 1 Comment
    • 3 bids
    • $733.33 Avg Bid (USD)
    • Open 5 years ago

    Project description:

    I have a great ubotter, but we seem to be running into some issues scraping a site.  Font error, scraping legal symbols, and even apostrophes within the texts.I need multi-bot modular bots to help with speed and efficiency. the First BOT:

    1: Scrape articles HG.org: https://www.hg.org/articles-for-260-areas-of-law.asp

    2: User input: Articles By Area of Practice: Estate Planning etc…

    3: All scraped data goes into an XAMMP db.

    4: When scraping the components of the article must be stripped and separated. (the website has <script> mixed into for ads and whatnot, they need to be removed)

    Components for DB:
    A: scraped Url
    B: <Title>
    C: description
    D: H2
    E: h2-Article body
    F: H2 — Bot needs to make this an H3
    G: h3-Article Body
    H: H2 — Bot needs to make this an H4
    I: h4-Article Body
    J: H2 —- — Bot needs to make this an H5
    K: h5-Article Body
    L: H2 – —- — Bot needs to make this an H6
    M: h6-article body

    ** any articles that go past this just keep them at H6.
    ***** If possible, some article do not have any H2 tags etc…

    Make the bot smart enough, if the Text is 8 words or less, with a double <br><br>  Make that an H tag in the corresponding order.

    Other known issues: This website, blocks IP address after 100 page views. So we need to swap proxies after that.  The Ip will be dead for 30 days after 100 page views.  I have plenty of proxies to use.

    ** GETTING THE DATA PROPERLY INTO
    THE DB IS THE HIGHEST PRIORITY **

    Have questions before you bid? Click here to open a private conversation with the Employer, or leave a public comment below.

    1 Comment

    1. Whiztech

      I can make it please come on the skype: Sharp Botmaker

    Comment

    FREELANCER BIDDING (3)
    REPUTATION
    BID
    lowridertj
    Botguru.net Web Automation
    In Process
    Charles Fairbairn
    Developer and CEO
    In Process
    Peter Kim
    Programmer
    In Process
    ABOUT EMPLOYER

    vertexmedia

    Member Since December 5, 2016
    • Location: Orange County
    • Earning: $0.00
    • Project complete: 0