remove duplicate rows in calc?

Dave_Stevens · June 29, 2020, 5:16pm

There's a bug in the storage layout of some data I'm getting from an
archive that results in duplicate rows in Calc 6.4, adjacent in all the
cases I've seen. Is there a simple way to remove duplicates in this
case? Not all rows are duplicates but as high as 40%.

Dave

Brian_Barker · June 29, 2020, 5:50pm

Try this:
o Select all the material.
o Go to Data | Filter > | Standard Filter... .
o Change "Field name" to "- none -".
o Click Options.
o Tick or untick "Range contains column labels" as necessary.
o Tick "No duplications".
o OK.
o If desired, copy filtered material and paste back or elsewhere as desired.

I trust this helps.

Brian Barker

Johnny_Rosenberg · June 29, 2020, 6:40pm

Here's one suggestion that I found:
https://ask.libreoffice.org/en/question/53569/delete-duplicates-in-calc/

Otherwise I guess you have to write a macro.

Kind regards

Johnny Rosenberg

Steve_Edmonds · June 29, 2020, 7:19pm

That suggestion looks great, it seems to only manage 8 columns
(conditions), I will have to remember for smaller sheets.
The process at present I use on my data is to sort it in order so
duplicates always appear together. If you need the data back in original
order and don't have a column with a progression add an index column.
Then I tag rows in another column, say with IF(the row below is not the
same)
Then I copy/paste the data to a new sheet (without formulae) and sort it
and delete non-tagged rows.
Then I re-sort back to original order.

Seems cumbersome but doesn't take long.
steve

Steve_Edmonds · June 29, 2020, 7:31pm

Is the data a CSV or text file.
Could you pre-process it, i.e.
with awk '!seen[$0]++' file.txt (from
https://stackoverflow.com/questions/1444406/how-to-delete-duplicate-lines-in-a-file-without-sorting-it-in-unix)
or awk '!_[$0]++' file (from
https://www.unix.com/shell-programming-and-scripting/146404-command-remove-duplicate-lines-perl-sed-awk.html)

steve