Web Scraping - Open Source Projects

Scrapling: Ultimate Python Web Scraping Framework

March 01, 2026

Tags:

Python Web Scraping Web Crawler scrapy cloudflare bypass

Discover Scrapling, the adaptive web scraping framework that handles everything from single requests to full-scale crawls. Bypass Cloudflare Turnstile, use smart element tracking that survives website changes, and scale with concurrent spiders featuring pause/resume. With stealth modes, proxy rotation, AI integration via MCP server, and blazing-fast performance outperforming Scrapy/Parsel, it's built for serious web scrapers. Install with pip and start scraping in minutes!

Read more Original

Practical Open Source Projects

Agent Reach: One CLI to Power AI Agents Across the Web

February 27, 2026

Tags:

Open Source AI Agent CLI tool Web Scraping free APIs

Agent Reach is a zero‑fuss command‑line tool that instantly gives your AI agent the ability to read Twitter, Reddit, YouTube, GitHub, and more—without costly API keys. The project bundles free‑to‑use open‑source scrapers, manages cookie credentials securely, and provides a plug‑and‑play CLI that works with any coding agent capable of shell commands. In this article you’ll learn why the web‑scraping barrier matters for AI, how Agent Reach auto‑installs dependencies, how to configure each channel, and how to keep your credentials safe. Whether you’re a prompt engineer, a developer, or just curious about building smarter agents, Agent Reach is the first step to full‑internet AI access.

Read more Original

Practical Open Source Projects

Web Scout MCP: DuckDuckGo Web Search & Extraction

January 23, 2026

Tags:

Open Source AI Assistant Web Scraping MCP duckduckgo

Looking for a plug‑in that lets your AI assistant browse the web securely? Web Scout MCP brings privacy‑focused DuckDuckGo search and streamlined content extraction right into your MCP environment. With an intuitive CLI, easy Docker support, and parallel URL handling, developers can get ready‑to‑use web search on demand. Read on to see how to install, integrate with Claude Desktop or Cursor, and leverage the DuckDuckGo and extraction tools to fetch clean text from any site.

Read more Original

Practical Open Source Projects

LLM Scraper: Turn Webpages Into Structured Data

July 20, 2025

Tags:

Open Source LLM Web Scraping Playwright TypeScript

Discover LLM Scraper, a powerful TypeScript library that leverages Large Language Models to transform any webpage into structured data. This open-source project, built on Playwright, supports various LLM providers like GPT, Gemini, and Llama, and allows schema definition with Zod or JSON Schema for type-safe extraction. Learn how to get started, integrate with popular LLMs, and even generate reusable scraping code. Explore its features like multi-modal input support for screenshots and streaming capabilities. LLM Scraper is ideal for developers seeking efficient web scraping solutions powered by AI.

Read more Original

Practical Open Source Projects

Google Play Scraper: Extract App Data with Node.js

July 10, 2025

Tags:

Open Source Node.js Data Extraction Web Scraping Google Play API

Discover 'google-play-scraper,' a powerful Node.js module designed to effortlessly extract vast amounts of data from the Google Play Store. This open-source tool allows developers and researchers to programmatically fetch app details, reviews, developer information, and more. With easy installation via npm and clear usage examples, it's an invaluable resource for anyone needing to analyze Google Play data. Learn how to use its various methods to list apps, search by terms, retrieve permissions, and even handle data safety information, making it a versatile addition to your development toolkit. Practical demonstrations and tips for managing requests are included.

Read more Original

Practical Open Source Projects

Crawlee: Powering Reliable Web Scraping with Node.js

July 09, 2025

Tags:

Open Source Automation Node.js Data Extraction Web Scraping

Discover Crawlee, the powerful Node.js library for web scraping and browser automation. Learn how this open-source tool helps developers build robust and reliable crawlers with features like proxy rotation, bot protection evasion, and support for Puppeteer and Playwright. Whether you're extracting data for AI, LLMs, or general data collection, Crawlee streamlines the process. Explore its capabilities and find out how to get started with installation and basic usage. Ideal for JavaScript and TypeScript developers looking to enhance their data extraction workflows and ensure their crawlers operate efficiently and undetected.

Read more Original

Practical Open Source Projects

MediaCrawler: Open-Source Social Media Data Scraper

July 05, 2025

Tags:

Open Source Python Web Scraping Playwright Social Media Data

Discover MediaCrawler, a powerful open-source Python tool for scraping publicly available data from major Chinese social media platforms like Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, Baidu Tieba, and Zhihu. Leveraging Playwright for browser automation, it simplifies data collection for research or analysis without complex reverse engineering. This project is ideal for developers and researchers seeking a robust, easy-to-use solution for media platform data acquisition. Learn about its features, installation, and how it can aid your data-driven projects.

Read more Original

Practical Open Source Projects

Crawlee-Python: The Ultimate Web Scraping Library

June 29, 2025

Tags:

Open Source Python Automation Web Scraping Crawlee

Discover Crawlee-Python, a robust and reliable web scraping and browser automation library. Ideal for data extraction for AI, LLMs, RAG, and GPTs, Crawlee handles everything from downloading various file types to working with BeautifulSoup, Playwright, and raw HTTP. It supports both headful and headless modes, offering proxy rotation and advanced features for building resilient crawlers. This library simplifies complex scraping tasks, ensuring your projects are efficient and effective. Learn how Crawlee revolutionizes web data collection and automation for developers.

Read more Original

Practical Open Source Projects

Firecrawl: Turn Websites into LLM-Ready Data

June 28, 2025

Tags:

Open Source AI Development Data Extraction LLM Data Web Scraping

Discover Firecrawl, the powerful open-source web scraping and crawling solution designed specifically for AI applications. It transforms raw website data into clean, LLM-ready formats, seamlessly integrating with popular AI tools like LlamaIndex and Langchain. Learn how Firecrawl handles dynamic content, provides reliable data extraction, and supports various use cases from AI chats to deep research, making it an essential tool for developers building AI-powered solutions. Start for free and scale as your needs grow.

Read more Original

Categories

Posts tagged with: Web Scraping

Scrapling: Ultimate Python Web Scraping Framework

Agent Reach: One CLI to Power AI Agents Across the Web

Web Scout MCP: DuckDuckGo Web Search & Extraction

LLM Scraper: Turn Webpages Into Structured Data

Google Play Scraper: Extract App Data with Node.js

Crawlee: Powering Reliable Web Scraping with Node.js

MediaCrawler: Open-Source Social Media Data Scraper

Crawlee-Python: The Ultimate Web Scraping Library

Firecrawl: Turn Websites into LLM-Ready Data