Extract duplicate objects from a List in Java 8

  • A+
Category:Languages

This code removes duplicates from the original list, but I want to extract the duplicates from the original list -> not removing them (this package name is just part of another project):

Given:

a Person pojo:

package at.mavila.learn.kafka.kafkaexercises;  import org.apache.commons.lang3.builder.ToStringBuilder;  public class Person {  private final Long id; private final String firstName; private final String secondName;   private Person(final Builder builder) {     this.id = builder.id;     this.firstName = builder.firstName;     this.secondName = builder.secondName; }   public Long getId() {     return id; }  public String getFirstName() {     return firstName; }  public String getSecondName() {     return secondName; }  public static class Builder {      private Long id;     private String firstName;     private String secondName;      public Builder id(final Long builder) {         this.id = builder;         return this;     }      public Builder firstName(final String first) {         this.firstName = first;         return this;     }      public Builder secondName(final String second) {         this.secondName = second;         return this;     }      public Person build() {         return new Person(this);     }   }  @Override public String toString() {     return new ToStringBuilder(this)             .append("id", id)             .append("firstName", firstName)             .append("secondName", secondName)             .toString(); } } 

Duplication extraction code.

Notice here we filter the id and the first name to retrieve a new list, I saw this code someplace else, not mine:

package at.mavila.learn.kafka.kafkaexercises;  import java.util.List; import java.util.Map; import java.util.Objects; import java.util.concurrent.ConcurrentHashMap; import java.util.function.Function; import java.util.function.Predicate; import java.util.stream.Collectors;  import static java.util.Objects.isNull;  public final class DuplicatePersonFilter {   private DuplicatePersonFilter() {     //No instances of this class }  public static List<Person> getDuplicates(final List<Person> personList) {     return personList            .stream()            .filter(duplicateByKey(Person::getId))            .filter(duplicateByKey(Person::getFirstName))            .collect(Collectors.toList());  }  private static <T> Predicate<T> duplicateByKey(final Function<? super T, Object> keyExtractor) {     Map<Object,Boolean> seen = new ConcurrentHashMap<>();     return t -> isNull(seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE));  }  } 

The test code. If you run this test case you will get [alex, lolita, elpidio, romualdo].

I would expect to get instead [romualdo, otroRomualdo] as the extracted duplicates given the id and the firstName:

package at.mavila.learn.kafka.kafkaexercises;   import org.junit.Test; import org.slf4j.Logger; import org.slf4j.LoggerFactory;  import java.util.ArrayList; import java.util.List;  import static org.junit.Assert.*;  public class DuplicatePersonFilterTest {  private static final Logger LOGGER = LoggerFactory.getLogger(DuplicatePersonFilterTest.class);    @Test public void testList(){      Person alex = new Person.Builder().id(1L).firstName("alex").secondName("salgado").build();     Person lolita = new Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();     Person elpidio = new Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();     Person romualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();     Person otroRomualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();       List<Person> personList = new ArrayList<>();      personList.add(alex);     personList.add(lolita);     personList.add(elpidio);     personList.add(romualdo);     personList.add(otroRomualdo);      final List<Person> duplicates = DuplicatePersonFilter.getDuplicates(personList);      LOGGER.info("Duplicates: {}",duplicates);  }  } 

In my job I was able to get the desired result it by using Comparator using TreeMap and ArrayList, but this was creating a list then filtering it, passing the filter again to a newly created list, this looks bloated code, (and probably inefficient)

Does someone has a better idea how to extract duplicates?, not remove them.

Thanks in advance.

Update :

Thanks everyone for your answers

To remove the duplicate using same approach with the uniqueAttributes:

 public static List<Person> removeDuplicates(final List<Person> personList) {      return personList.stream().collect(Collectors             .collectingAndThen(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(                     PersonListFilters::uniqueAttributes))),                     ArrayList::new));  }   private static String uniqueAttributes(Person person){      if(Objects.isNull(person)){         return StringUtils.EMPTY;     }        return (person.getId()) + (person.getFirstName()) ; } 

 


In this scenario you need to write your custom logic to extract the duplicates from the list, you will get all the duplicates in the Person list

   public static List<Person> extractDuplicates(final List<Person> personList) {      return personList.stream().flatMap(i -> {         final AtomicInteger count = new AtomicInteger();         final List<Person> duplicatedPersons = new ArrayList<>();          personList.forEach(p -> {              if (p.getId().equals(i.getId()) && p.getFirstName().equals(i.getFirstName())) {                 count.getAndIncrement();             }              if (count.get() == 2) {                 duplicatedPersons.add(i);             }          });          return duplicatedPersons.stream();     }).collect(Collectors.toList()); } 

Applied to:

 List<Person> l = new ArrayList<>();            Person alex = new   Person.Builder().id(1L).firstName("alex").secondName("salgado").build();             Person lolita = new   Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();             Person elpidio = new   Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();             Person romualdo = new   Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();             Person otroRomualdo = new   Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();       l.add(alex);       l.add(lolita);       l.add(elpidio);       l.add(romualdo);       l.add(otroRomualdo); 

Output:

[Person [id=4, firstName=romualdo, secondName=gomez], Person [id=4, firstName=romualdo, secondName=perez]] 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: