What are the essential functions to find duplicate elements within a list?

Scott Nimrod

What are the essential functions to find duplicate elements within a list?

Translated, how can I simplify the following function:

let numbers = [ 3;5;5;8;9;9;9 ]

let getDuplicates = numbers |> List.groupBy id
                            |> List.map snd
                            |> List.filter (fun set -> set.Length > 1)
                            |> List.map (fun set -> set.[0])

I'm sure this is a duplicate. However, I am unable to locate the question on this site.

UPDATE

let getDuplicates numbers =

    numbers |> List.groupBy id
            |> List.choose (fun (k,v) -> match v.Length with
                                         | x when x > 1 -> Some k
                                         | _            -> None)
chryosolo

Simplifying your function:

Whenever you have a filter followed by a map, you can probably replace the pair with a choose. The purpose of choose is to run a function for each value in the list, and return only the items which return Some value (None values are removed, which is the filter portion). Whatever value you put inside Some is the map portion:

let getDuplicates = numbers |> List.groupBy id
                            |> List.map snd
                            |> List.choose( fun( set ) ->
                               if set.Length > 1
                                  then Some( set.[0] )
                                  else None )

We can take it one additional step by removing the map. In this case, keeping the tuple which contains the key is helpful, because it eliminates the need to get the first item of the list:

let getDuplicates = numbers |> List.groupBy id
                            |> List.choose( fun( key, set ) ->
                               if set.Length > 1
                                  then Some key
                                  else None )

Is this simpler than the original? Perhaps. Because choose combines two purposes, it is by necessity more complex than those purposes kept separate (the filter and the map), and this makes it harder to understand at a glance, perhaps undoing the more "simplified" code. More on this later.

Decomposing the concept

Simplifying the code wasn't the direct question, though. You asked about functions useful in finding duplicates. At a high level, how do you find a duplicate? It depends on your algorithm and specific needs:

  • Your given algorithm uses the "put items in buckets based on their value", and "look for buckets with more than one item". This is a direct match to List.groupBy and List.choose (or filter/map)
  • A different algorithm could be to "iterate through all items", "modify an accumulator as we see each", then "report all items which have been seen multiple times". This is kind of like the first algorithm, where something like List.fold is replacing List.groupBy, but if you need to drag some other kind of state along, it may be helpful.
  • Perhaps you need to know how many times there are duplicates. A different algorithm satisfying these requirements may be "sort the items so they are always ascending", and "flag if the next item is the same as the current item". In this case, you have a List.sort followed by a List.toSeq then Seq.windowed:

    let getDuplicates = numbers |> List.sort
                                |> List.toSeq
                                |> Seq.windowed 2
                                |> Seq.choose( function
                                  | [|x; y|] when x = y -> Some x
                                  | _ -> None )
    

    Note that this returns a sequence with [5; 9; 9], informing you that 9 is duplicated twice.

  • These were algorithms mostly based on List functions. There are already two answers, one mutable, the other not, which are based on sets and existence.

My point is, a complete list of functions helpful to finding duplicates would read like a who's who list of existing collection functions -- it all depends on what you're trying to do and your specific requirements. I think your choice of List.groupBy and List.choose is probably about as simple as it gets.

Simplifying for maintainability

The last thought on simplification is to remember that simplifying code will improve the readability of your code to a certain extent. "Simplifying" beyond that point will most likely involve tricks, or obscure intent. If I were to look back at a sample of code I wrote even several weeks and a couple of projects ago, the shortest and perhaps simplest code would probably not be the easiest to understand. Thus the last point -- simplifying future code maintainability may be your goal. If this is the case, your original algorithm modified only keeping the groupBy tuple and adding comments as to what each step of the pipeline is doing may be your best bet:

// combine numbers into common buckets specified by the number itself
let getDuplicates = numbers |> List.groupBy id
                            // only look at buckets with more than one item
                            |> List.filter( fun (_,set) -> set.Length > 1)
                            // change each bucket to only its key
                            |> List.map( fun (key,_) -> key )

The original question comments already show that your code was unclear to people unfamiliar with it. Is this a question of experience? Definitely. But, regardless of whether we work on a team, or are lone wolves, optimizing code (where possible) for quick understanding should probably be close to everyone's top priority. (climbing down off sandbox...) :)

Regardless, best of luck.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Find duplicate elements in a list

From Dev

Find duplicate elements in a list

From Dev

What is the essential difference between the two functions to find the first negative entry?

From Dev

What is the essential difference between the two functions to find the first negative entry?

From Dev

Get a list of python's essential functions

From Dev

How can I find elements with attributes within a list?

From Dev

List of duplicate elements generator

From Dev

scala duplicate elements in list

From Dev

Java duplicate elements in a list

From Dev

Find elements within a set of elements

From Dev

Find Duplicate Array within Array

From Dev

Comparing elements within a list

From Dev

find duplicates within list

From Dev

How to create a list within a list of duplicate files?

From Dev

Replacing duplicate elements in a list in Python?

From Dev

Separate list according to duplicate elements

From Dev

Find duplicate lists where element order is insignificant but repeat list elements are significant

From Dev

Find tables that are copied / duplicate structure within database

From Dev

Find duplicate values within the same row (Excel)

From Dev

Accessing list elements within mutate

From Dev

Appending to elements within an Rcpp List

From Dev

Searching of elements within a list in Python

From Dev

Accessing list elements within mutate

From Dev

Accessing elements of a matrix within a list

From Dev

Find duplicate item in unordered list

From Dev

python find duplicate objects in a list

From Dev

Dart - find duplicate values on a List

From Dev

Find duplicate item in unordered list

From Dev

find indices of duplicate floats in a list

Related Related

HotTag

Archive