while (true) {

Anyone had real-world pleasant experiences with some sort of task orchestration thing? I feel with data engineering, they are in vogue and I might find something useful to me.

I'm writing a Python (Django) project, that I currently deploy to Kubernetes. The process "scrapes" data (mostly public APIs and Git repos, so I don't have to deal with adversarial problems. But I kinda would like to parallelize a bit, get nice logs and analytics, etc. of runs, so I can address problems.

Frankly, k8s jobs are mostly fine, and I'm happy not having to run any queue system. If I could find something that triggers k8s jobs and records results to a database, that might be enough.

(I have some ClickHouse/OpenTelemetry capabilities, perhaps I should dig more into that anyway. It seems to be that this is the future and I should get more familiar with that.)
I've used Airflow for orchestrating K8s tasks (first using KubePodOperator and then with something newer I can't remember the name of), but that is probably overkill if your tasks are standalone or have very simple dependencies.
 

koala

Ars Tribunus Angusticlavius
7,796
Noted Temporal, looks interesting; Airflow I already knew, and yeah, I suspect it's too heavyweight for what I want.

edit: I found https://github.com/PrefectHQ/prefect/ while looking at the astral.sh website; also looks good...

On a completely different topic, has anyone tried https://www.cozodb.org/ ?

When I discovered, it looked really nice. I like the "pure relational" approach and simplicity. However, some time ago they started going "LLM/AI/Vector" and that has soured me.

I was prototyping a small CLI tool, and for persistence it would require something like SQLite, but I was thinking about trying CozoDB instead. But I have never seen any use in the wild, so I dunno.