Without copying files how to remove a single row from a csv in python

  • A+
Category:Languages

There are multiple SO questions addressing some form of this topic, but they all seem terribly inefficient for removing only a single row from a csv file (usually they involve copying the entire file). If I have a csv formatted like so:


fname,lname,age,sex
John,Doe,28,m
Sarah,Smith,27,f
Xavier,Moore,19,m

What is the most efficient way to remove Sarah's row? If possible, I would like to avoid copying the entire file.


You have a fundamental problem here. No current filesystem (that I am aware of) provides a facility to remove a bunch of bytes from the middle of a file. You can overwrite existing bytes, or write a new file. So, your options are:

  • Create a copy of the file without the offending line, delete the old one, and rename the new file in place. (This is the option you want to avoid).
  • Overwrite the bytes of the line with something that will be ignored. Depending on exactly what is going to read the file, a comment character might work, or spaces might work (or possibly even /0). If you want to be completely generic though, this is not an option with CSV files, because there is no defined comment character.
  • As a last desperate measure, you could:
    • read up to the line you want to remove
    • read the rest of the file into memory
    • and overwrite the line and all subsequent lines with the data you want to keep.
    • truncate the file as the final position (filesystems usually allow this).

The last option obviously doesn't help much if you are trying to remove the first line (but it is handy if you want to remove a line near the end). It is also horribly vulnerable to crashing in the middle of the process.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: