About This Project
This project is built for fun and learning. The main goal is to learn how to crawl, store,
and analyze large datasets in a practical setup with MySQL and S3-compatible storage.
Parts of it are "vibe-coded". Years ago there was a handcrafted PHP version; it worked,
but it was much less structured and harder/slower to maintain than this rebuild. I know this contributes to the noise on the internet. That’s fine, and I’m not sorry but you can always remove your site.
Why It Exists
- Learn scalable crawling patterns.
- Practice data modeling and deduplication.
- Measure storage growth over time.
- Keep architecture stateless for container environments.
- Testing ZFS compression and dedup.
- All on slow homelab hardware, so it has to be very efficient.
Current Dataset
| URLs | 1 |
| Fetches | 0 |
| Screenshots | 0 |
| Site details | 0 |
Storage Footprint
| MySQL total size | 640.0 KiB (655360 bytes) |
| S3 total size | 0 B (0 bytes) |
| S3 total files | 0 |
| S3 snapshot updated | Live/fallback mode |
S3 Breakdown
| Prefix | Files | Bytes | Readable |
| No objects found in tracked prefixes. |
Top MySQL Tables By Size
| Table | Rows (est.) | Size |
| fetch_observations |
0 |
80.0 KiB |
| links |
0 |
64.0 KiB |
| removal_requests |
0 |
64.0 KiB |
| site_enrichments |
0 |
64.0 KiB |
| change_events |
0 |
48.0 KiB |
| domain_scores |
0 |
48.0 KiB |
| jobs |
0 |
48.0 KiB |
| render_snapshots |
0 |
48.0 KiB |
| screenshots |
0 |
48.0 KiB |
| urls |
0 |
48.0 KiB |
`Rows (est.)` comes from MySQL metadata and is approximate.