Generating Data

Generating columns of data by column

seq 20 | pr -t -3 | column -t
1  8   15
2  9   16
3  10  17
4  11  18
5  12  19
6  13  20
7  14

Generating columns of data by row

seq 20 | pr -t -3 -a | column -t
1   2   3
4   5   6
7   8   9
10  11  12
13  14  15
16  17  18
19  20

Generating a sequence of letters:

perl -E'say for "a".."d"'
 a
 b
 c
 d

Generating random numbers

10 random numbers between 0 and 19

perl -E'say int(rand(20)) for 1..10'

jot

Generate various sequences and random numbers

10 ints between 0 and 100

jot -r 10 0 100

5 floats between 0.000 and 1.000

jot -r 5 0.000 1.000

random letters

jot -r -c 10 97 122

Generating permutations with shuf

shuf, part of coreutils, is useful for generating random permutations.

shuf is best known for generating a “shuffled” version of a file, or selecting random lines. However, it can also be used to generate some sample data quickly given a few input values. The -r flags allows repeats. -n 100 selects 100 samples. -e treats additional command line parameters like input lines.

In this example, I want to take 100 random selections ‘foo’, ‘bar’ or ‘baz’

shuf -r -n 100 -e foo bar baz | head
 baz
 baz
 baz
 baz
 bar
 foo
 foo
 bar
 bar
 baz

Here’s a more complicated example of generating some test scores for some random student ids in random classes (note that here I’m using gshuf. On Mac OSX, when installing coreutils via brew, it uses the ‘g’ prefix for the gnu tools so they don’t conflict with the osx standard (BSD) utilities of the same name). Here I’m also using (…) to combine the output of multiple commands, and converting spaces to tabs during output so I have an actual TSV file:

seq 100 | gshuf -n 100 -r > student_ids.txt
gshuf -n 100 -r -e math science art > classes.txt
seq 50 100 | gshuf -n 100 -r > scores.txt
(
  echo "id class score";
  paste -d " " student_ids.txt classes.txt scores.txt;
) | tr ' ' '\t' > report.tsv

Now, with these scores, let’s get some aggregate data

cat report.tsv | datamash -sH --wh --group 2 mean 3 | column -t
 GroupBy(class)  mean(score)
 art             73.214285714286
 math            73.461538461538
 science         72.21875

Generating date sequences

By combining seq and gnu date, you can generate ranges of dates. On mac, you may need to install coreutils to get gdate.

for i in `seq 10`; do
    date -I --date "2023-08-15 +${i} day" 
done
 2023-08-16
 2023-08-17
 2023-08-18
 2023-08-19
 2023-08-20
 2023-08-21
 2023-08-22
 2023-08-23
 2023-08-24
 2023-08-25

See also dseq and another dseq in the dateutils tool collection

One common use for date sequences is to identify missing dates. See the section above join: intersect two files