Knows.nl

Technical website intelligence, screenshots, and crawl history.

About This Project

This project is built for fun and learning. The main goal is to learn how to crawl, store, and analyze large datasets in a practical setup with MySQL and S3-compatible storage. Parts of it are "vibe-coded". Years ago there was a handcrafted PHP version; it worked, but it was much less structured and harder/slower to maintain than this rebuild. I know this contributes to the noise on the internet. That’s fine, and I’m not sorry but you can always remove your site.

Why It Exists

  • Learn scalable crawling patterns.
  • Practice data modeling and deduplication.
  • Measure storage growth over time.
  • Keep architecture stateless for container environments.
  • Testing ZFS compression and dedup.
  • All on slow homelab hardware, so it has to be very efficient.

Current Dataset

URLs1
Fetches0
Screenshots0
Site details0

Storage Footprint

MySQL total size640.0 KiB (655360 bytes)
S3 total size0 B (0 bytes)
S3 total files0
S3 snapshot updatedLive/fallback mode

S3 Breakdown

PrefixFilesBytesReadable
No objects found in tracked prefixes.

Top MySQL Tables By Size

TableRows (est.)Size
fetch_observations 0 80.0 KiB
links 0 64.0 KiB
removal_requests 0 64.0 KiB
site_enrichments 0 64.0 KiB
change_events 0 48.0 KiB
domain_scores 0 48.0 KiB
jobs 0 48.0 KiB
render_snapshots 0 48.0 KiB
screenshots 0 48.0 KiB
urls 0 48.0 KiB

`Rows (est.)` comes from MySQL metadata and is approximate.