PERL: Jumping to lines in a huge text file

  • A+
Category:Languages

I have a very large text file (~4 GB). It has the following structure:

S=1 3 lines of metadata of block where S=1 a number of lines of data of this block S=2 3 lines of metadata of block where S=2 a number of lines of data of this block S=4 3 lines of metadata of block where S=4 a number of lines of data of this block etc. 

I am writing a PERL program that read in another file, foreach line of that file (where it must contain a number), search the huge file for a S-value of that number minus 1, and then analyze the lines of data of the block belongs to that S-value.

The problem is, the text file is HUGE, so processing each line with a

foreach $line {...} loop 

is very slow. As the S=value is strictly increasing, are there any methods to jump to a particular line of the required S-value?

 


are there any methods to jump to a particular line of the required S-value?

Yes, if the file does not change then create an index. This requires reading the file in its entirety once and noting the positions of all the S=# lines using tell. Store it in a DBM file with the key being the number and the value being the byte position in the file. Then you can use seek to jump to that point in the file and read from there.

But if you're going to do that, you're better off exporting the data into a proper database such as SQLite. Write a program to insert the data into the database and add normal SQL indexes. This will probably be simpler than writing the index. Then you can query the data efficiently using normal SQL, and make complex queries. If the file change you can either redo the export, or use the normal insert and update SQL to update the database. And it will be easy for anyone who knows SQL to work with, as opposed to a bunch of custom indexing and search code.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: