List files that contain `n` or fewer lines

  • A+


In a folder, I would like to print the name of every .txt files that contain n=27 lines or fewer lines. I could do

wc -l *.txt | awk '{if ($1 <= 27){print}}' 

The problem is that many files in the folder are millions of lines (and the lines are pretty long) and hence the command wc -l *.txt is very slow. In principle a process could count the number of lines until finding at least n lines and then proceed to the next file.

What is a faster alternative?

FYI, I am on MAC OSX 10.11.6


Here is an attempt with awk

#!/bin/awk -f  function printPreviousFileIfNeeded(previousNbLines, previousFILENAME) {   if (previousNbLines <= n)    {     print previousNbLines": "previousFILENAME   } }  BEGIN{   previousNbLines=n+1   previousFILENAME=NA }    {   if (FNR==1)   {     printPreviousFileIfNeeded(previousNbLines, previousFILENAME)     previousFILENAME=FILENAME   }   previousNbLines=FNR   if (FNR > n)   {     nextfile   } }  END{   printPreviousFileIfNeeded(previousNbLines, previousFILENAME) } 

which can be called as

awk -v n=27 -f myAwk.awk *.txt 

However, the code fails at printing out perfectly empty files. I am not sure how to fix that and I am not sure my awk script is the way to go.


How's this?

awk 'BEGIN { for(i=1;i<=ARGC; ++i) arg[ARGV[i]] }   FNR==28 { delete arg[FILENAME]; nextfile }   END { for (file in arg) print file }' *.txt 

We copy the list of file name arguments to an associative array, then remove all files which have a 28th line from it. Empty files obviously won't match this condition, so at the end, we are left with all files which have fewer lines, including the empty ones.

nextfile was a common extension in many Awk variants and then was codified by POSIX in 2012. If you need this to work on really old dinosaur OSes (or, good heavens, probably Windows), good luck, and/or try GNU Awk.


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: