I want to extract and process the metadata from PNG images and the first line of .safetensors files for LLM’s and LoRA’s. I could spend ages farting around with sed or awk but formats of files are constantly changing. I’d like a faster way to see a summary of training and a few other details when they are available.
Python is very good for working with JSON. Definitely will get you there faster than awk for anything not completely trivial.
jless to know what the hell I’m looking at and then maybe jq
Have a look at miller
jq, or if I need to do something wacky a one-off python script.
jq
Previously, I coded something in Rust real quick to spit out and manipulate some JSON, but it looks like the jq/yq below would work fine.
I’d probably go Python but I’m an idiot
And there is htmlq too, if you ever need to scrape some stuff from a website :)
Naw, everybody knows that you have to use regex for that
I have a very handy command in my .vimrc for this -
command! JSON setlocal filetype=json | %!jq .
Anytime I’m in a json file that isn’t formatted it’s as simple as typing
:JSON
to have it all sorted.Specifically this version of yq - there are other versions bundled with distros that look and act very differently and lack the potency of this version.
Seriously, can’t get those 15 minutes back.
For me, a C# developer by trade, this is easily solved with a one command C# call. It’s possible you already have dotnet 6 or 8 on your distro as there are many C# Linux apps now.
https://www.nuget.org/packages/System.Text.Json/9.0.0-preview.4.24266.19
Probably not popular opinion, but pwsh (powershell). It’s got a lot of tooling built in and means I don’t have to learn a different tool just because I’m in a different system.
Big fan of running
cat file.json | ConvertFrom-Json
and just being able to do things quickly!
Nushell is pretty nice.
Yeah, I’ve been learning some nushell. If you’re dealing with data, it’s just a great tool. So many sharp edges in the POSIX shell come from it being stringly typed, so having a strongly typed shell is extremely helpful.
A week ago I would have said jq, but just the other day I discovered nushell and have been loving it, if you deal with structured data often it’s way easier, just bear in mind it’s not POSIX compatible
deleted by creator
There are probably pre-written awk scripts out there that already do what you want, not that I know where they’d be.
That said, you might be better off using one of the bigger but still fairly commonly installed languages. There’s bound to be things on PyPI (for Python) or CPAN (for Perl) that could be bolted together for example.
If you’re really lucky there might even be something that covers your whole use-case, but I haven’t checked.
Python has built-in json parsing, as does (and i know this isnt gonna be popular) PowerShell.