Remove duplicate lines while comparing two files

I’ve been quite busy this whole day with a partially complete database dump and wanted to prepare for tomorrow with some ninja bash voodoo shizzle. I’m doing a braindump here because I know I’ll have forgotten this when I wake up tomorrow 🙂
The command stated below was the first working example I’ve gotten together, please let me know if you know a neater / better solution!

The situation:

I’ve got two files. The first file contains lines which need to be deleted from the second line (if they exist there)

The setup:

Contents of the first file:

This line exists in two files
This line exists in the first file
This line exists in the very first file
The next line is an empty line

Contents of the second file:

This line exists in two files
This line exists in the second file
This line exists in the very second file
The next line is an empty line in the second file

I want to remove the duplicate lines from the second file, which is “This line exists in two files”
 

The solution:

Bash voodoo ninja shizzle:

while read line; do sed -i "/^${line}$/d" second_file; done <first_file

What this does is read the lines of the first file, then removing each line found in the second file.
This results in an edited second file:

This line exists in the second file
This line exists in the very second file
The next line is an empty line in the second file

Important considerations:

  • Do notice the “^” and “$” characters, these represent the start and end of the line. If you would omit these, you would also remove the line “The next line is an empty line in the second file” because it matches “The next line is an empty line” from the first file
  • This is a direct edit of the second file. Be sure to have a backup, since you’re using sed “in place” (option -i)
  • Always be very careful when someone tells you their ninja shizzle, since it’s hardly tested 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.