CSV and TSV¶
csvkit¶
csvkit is an excellent set of tools (and python libraries) for working with CSV data. I encounter (and generate) a variety of csv data, and I use many of these frequently. Note that the suite also works with tab-separated values (TSV).
I tend to use:
csvlook - pretty printing of csv files
csvcut - extract columns from CSV
csvformat - convert CSV <=> TSV
csvsql - run SQL queries against CSV data
csvgrep - search by column in CSV data
csvstats - summary statistics for CSV file
Here’s an introduction:
https://source.opennews.org/articles/eleven-awesome-things-you-can-do-csvkit/
json¶
jq - CLI query language for json. Doesn’t handle very large ints.
jsonpp, json_pp - pretty printing,
python -m json.tool
- pretty printing. Doesn’t handle json per lin
jq¶
Here’s some sample json (created using the jo json authoring tool )
{ jo user[name]=jud user[id]=17 ; jo user[name]=joe user[id]=22;} | tee json1
{"user":{"name":"jud","id":17}}
{"user":{"name":"joe","id":22}}
jq has a fully-featured query language, but since I don’t use the more advanced features very often, I just remember how to index down into object, and print out the raw values:
# pretty print with colorization
jq . myfile.json
# extract user.name from every object
jq .user.name myfile.json
"jud"
"joe"
# get the raw (unquoted) values.
jq -r .user.name json1
jud
joe
I nearly always use jq for my json needs. However, I’ve recently been dealing with json that contains very large numeric ids. Unfortunately jq rounds large integers.
echo '{"a":11111222223333344444}' | jq .
{
"a": 11111222223333345000
}
echo '{"a":11111222223333344444}' | python -m json.tool
{
"a": 11111222223333344444
}
So I have been using other tools to pretty print json, like jsonpp and/or json_pp.
The python json.tool pretty printer handles big ints, but doesn’t handle newline delimited json. I just saw the newlinejson package, which could help. Here’s a simple python solution:
#!/usr/bin/env python
from __future__ import print_function
import fileinput
import json
for line in fileinput.input():
print(json.dumps(json.loads(line.rstrip('\r\n')), indent=2))