why does unordered set mix the values

  • A+
Category:Languages

I am trying to remove duplicates from an vector by using an unordered_set. but my design creates an unordered_set that doesn't maintain the ordering correctly. in this example the "z" is not at the end. What am I doing wrong? thank you in advance.

EDIT: sorry if I wasn't clear with what I was looking for. I want the output to be "e,d,a,b,c,z" I want to keep original ordering but remove duplicates. I currently have it working using about 3 different for loops and an extra copy of the init vector. I was just looking for a STL function that was cleaner if possible.

output produced: e d a b c a a a a b b b b c z printing unordered set e d a z b c

#include <iostream>  #include <iterator>      #include <algorithm>     #include <string> #include <unordered_set> using namespace std;  int main() {     vector<string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };     for (vector<string>::iterator it = terminals.begin(); it != terminals.end(); it++) // print given vector         cout << *it << " ";     cout << endl;     unordered_set<string> newSet;     copy(terminals.begin(), terminals.end(), inserter(newSet, newSet.end()));     cout << "printing unordered set" << endl;     for (unordered_set<string>::iterator it = newSet.begin(); it != newSet.end(); it++)         cout << *it << " ";     cout << endl;     //system("pause");     return 0; } 

 


std::unordered_set:

Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its value. This allows fast access to individual elements, since once a hash is computed, it refers to the exact bucket the element is placed into.

If you need the unique terminals to be ordered, use a std::set:

#include <iostream> #include <vector> #include <string> #include <set>  int main() {     std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };      for(const std::string& terminal : terminals) // print given vector         std::cout << terminal << " ";     std::cout << "/n";;      // populate the set directly from the vectors iterators:     std::set<std::string> newSet(terminals.begin(), terminals.end());;      std::cout << "printing the (ordered) set:" << "/n";;     for(const std::string& terminal : newSet)         std::cout << terminal << " ";     std::cout << "/n";; } 

If you want to maintain the original order, you can't use either set as your final storage, but you can use a std::unordered_set as a cache/blacklist for values you've already inserted in your final storage.

#include <iostream> #include <vector> #include <string> #include <algorithm> #include <unordered_set>  int main() {     std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };      for(const std::string& terminal : terminals) // print given vector         std::cout << terminal << " ";     std::cout << "/n";;      std::vector<std::string> newSet; // not really a set anymore     std::unordered_set<std::string> cache; // blacklist      // try to insert all terminals and only when an insert is successful,     // put the terminal in newSet      std::for_each(terminals.begin(), terminals.end(),         [&](const std::string& terminal) {             auto [it, inserted] = cache.insert(terminal);             if(inserted)                 newSet.push_back(terminal);         }     );      std::cout << "printing the vector of unique terminals:" << "/n";;     for(const std::string& terminal : newSet)         std::cout << terminal << " ";     std::cout << "/n";; } 

If you want the original order and don't mind making changes directly to the original terminals vector, you can use std::remove_if combined with an unordered_set which is nice since it doesn't require a new vector. This is an annotated variant of @Marek R's answer:

Read this first: Erase–remove idiom

int main() {     std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };      for(const std::string& terminal : terminals) // print given vector         std::cout << terminal << " ";     std::cout << "/n";;      std::unordered_set<std::string> cache; // blacklist      // remove_if() moves all entries in your container, for which the     // UnaryPredicate(*) returns true, to the end of the container. It returns     // an iterator pointing to the first element in the vector that was     // moved - which is a suitable starting point for a subsequent erase().     //     // (*) UnaryPredicate: A callable that returns true or false given a single     //                     value.      // auto past_new_end = std::vector<std::string>::iterator past_new_end     auto past_new_end = std::remove_if(terminals.begin(), terminals.end(),         // this lambda is the UnaryPredicate         [&](const std::string& terminal) {             // insert returns a std::pair<Iterator, bool>             // where the bool (.second in the pair) is false             // if the value was not inserted (=it was already present)             return cache.insert(terminal).second == false;         }     );      std::cout << "display all the entries (now with unspecified values) "                  "that will be erased:/n";     std::copy(past_new_end, terminals.end(),                             std::ostream_iterator<std::string>(std::cout, "<"));     std::cout << "/n";      // erase all the moved entries     terminals.erase(past_new_end, terminals.end());      std::cout << "printing the unique terminals:" << "/n";;     for(const std::string& terminal : terminals)         std::cout << terminal << " ";     std::cout << "/n";; } 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: