Browse free open source Python Web Scrapers and projects below. Use the toggles on the left to filter open source Python Web Scrapers by OS, license, language, programming language, and project status.

  • QA Wolf | We Write, Run and Maintain Tests Icon
    QA Wolf | We Write, Run and Maintain Tests

    For developer teams searching for a testing software

    QA Wolf is an AI-native service that delivers 80% automated E2E test coverage for web & mobile apps in weeks not years.
    Learn More
  • Houzz Pro is the #1 business management software for home construction and design professionals. Icon
    Houzz Pro is the #1 business management software for home construction and design professionals.

    Get the all-in-one tool for marketing, project and client management built specifically for remodeling and design professionals.

    Get an all-in-one solution that spans the full customer lifecycle, including marketing, CRM, estimation & proposal building, project management, a 3D Floor Plan builder, an online invoicing and payment portal, as well as a client portal and collaboration tools. Start a free trial today to see why thousands of Pros run their business on Houzz Pro. Plans available for all business sizes.
    Learn More
  • 1
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate with each other via gRPC (a RPC framework). Tasks are scheduled by the task scheduler module in the master node, and received by the task handler module in worker nodes, which executes these tasks in task runners. Task runners are actually processes running spider or crawler programs, and can also send data through gRPC (integrated in SDK) to other data sources, e.g. MongoDB.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. Newer version and smaller file size can be found at: https://sourceforge.net/projects/dstk3/ It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features. Of course you may specify JASP for advanced data editing and RapidMiner for advanced prediction modeling. DSTK is written in C#, Java and Python to interface with R, NLTK, and Weka. It can be expanded with plugins using R Scripts. We have also created plugins for more statistical functions, and Big Data Analytics with Microsoft Azure HDInsights (Spark Server) with Livy. License: R, RStudio, NLTK, SciPy, SKLearn, MatPlotLib, Weka, ... each has their own licenses.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next