Skip to content
Advertisement

How to completely erase the duplicated lines by linux tools?

This question is not equal to How to print only the unique lines in BASH? because that ones suggests to remove all copies of the duplicated lines, while this one is about eliminating their duplicates only, i..e, change 1, 2, 3, 3 into 1, 2, 3 instead of just 1, 2.

This question is really hard to write because I cannot see anything to give meaning to it. But the example is clearly straight. If I have a file like this:

JavaScript

After to parse the file erasing the duplicated lines, becoming it like this:

JavaScript

I know python or some of it, this is a python script I wrote to perform it. Create a file called clean_duplicates.py and run it as:

JavaScript

Although, while searching for duplicates lines remove seems to be easier to use tools as grep, sort, sed, uniq:

  1. How to remove duplicate lines inside a text file?
  2. removing line from list using sort, grep LINUX
  3. Find duplicate lines in a file and count how many time each line was duplicated?
  4. Remove duplicate entries in a Bash script
  5. How to delete duplicate lines in a file without sorting it in Unix?
  6. How to delete duplicate lines in a file…AWK, SED, UNIQ not working on my file

Advertisement

Answer

You may use uniq with -u/--unique option. As per the uniq man page:

-u / --unique

Don’t output lines that are repeated in the input.
Print only lines that are unique in the INPUT.

For example:

JavaScript

OR, as mentioned in UUOC: Useless use of cat, better way will be to do it like:

JavaScript

Both of these commands will return me value:

JavaScript

where /tmp/uniques.txt holds the number as mentioned in the question, i.e.

JavaScript

Note: uniq requires the content of file to be sorted. As mentioned in doc:

By default, uniq prints the unique lines in a sorted file, it discards all but one of identical successive input lines. so that the OUTPUT contains unique lines.

In case file is not sorted, you need to sort the content first and then use uniq over the sorted content:

JavaScript
Advertisement