The Text Processing CookbookΒΆ
Tools and techniques for processing text and data on the command line
by Jud Dagnall
- An Introduction
- Filter and Select
- Extraction
- Extracting one or more columns with awk
- Field extraction via perl -anE
- Extract simple fields via cut
- Extract by character position with cut
- Extract fixed-width fields with awk
- Extract fixed-width fields with in2csv
- Convert whitespace-delimited columns to csv
- cutting columns, other tools
- f - trivial field extractor
- scut - swiss army knife of column cutters
- to extract columns from CSV data, use csvcut
- Transformation
- General transformation with perl -pE and -nE
- Create several simple filters rather than one complicated one
- collapse or replace spaces and newlines
- convert spaces to newline with tr or perl
- remove newlines with perl
- reshape text with rs
- merge sort multiple files of sorted data
- paste: add files side by side
- join: intersect two files
- Concatenate files, skipping header line
- Remove the first n lines of a file with tail
- Sort a file with a header
- put data into a specific number of columns with pr
- making data tables with column
- Use column to create a flexible number of columns to fill the width.
- joining all lines with xargs or paste
- joining/transforming all except the last line with perl
- Transform one column at a time
- Grouping Data
- Frequency Counts and Distributions
- Specialized Tools for Aggregation, Summary, Analysis and Reporting
- CSV and TSV
- json
- Sorting
- Generating Data
- Batch and parallel execution with xargs and parallel
- Visualization
- Misc
- Solutions: Putting it all together