Rationale of restrictive rules for extract and re-insert with map

  • A+
Category:Languages

Since C++17, associative containers support the extraction of a node and its re-insertion (possibly into another container of the same type). The object returned by extract(key) is a node_handle, which is move-only and for map containers has member functions

key_type &key() const; mapped_type &mapped() const; 

which allow to alter not only the mapped type, but also the key. This can be used to change the key without re-allocation (example taken from the documentation for map::extract()):

std::map<int, std::string> map{{1,"mango"}, {2,"papaya"}, {3,"guava"}}; auto handle = map.extract(2); handle.key() = 4; map.insert(move(handle)); 

As far as I understand, map is implemented as binary-search tree, while map::extract() un-links a node from the tree and returns a pointer to it via the node-handle, which takes over the ownership. Upon map::insert(), the node is re-linked into the tree and ownership is taken over by the map again.

Thus, the node (and the stored key and mapped_type) are not re-allocated, moved, or copied in the process. The standard says (my high lighting):

The extract members invalidate only iterators to the removed element; pointers and references to the removed element remain valid. However, accessing the element through such pointers and references while the element is owned by a node_­type is undefined behavior. References and pointers to an element obtained while it is owned by a node_­type are invalidated if the element is successfully inserted.

My question: what is the rationale behind (1) making it UB to access an extracted element by its address and (2) invalidating upon insertion the address taken in the extracted state?

IMHO, this extract-and-insert idiom can be implemented in a way that keeps the address of the element valid at all times (until destruction, which may occur before map-destruction if the element is never re-inserted). The following code

#include <map> #include <string> #include <iostream>  struct immovable_string : std::string {     immovable_string(const char*s) : std::string(s) {}     immovable_string() = default;     immovable_string(immovable_string const&) = delete;     immovable_string(immovable_string &&) = delete;     immovable_string&operator=(immovable_string const&) = delete;     immovable_string&operator=(immovable_string &&) = delete; };  int main() {     std::map<int,immovable_string> map;      map.emplace(1,"mango");     map.emplace(2,"papaya");     map.emplace(3,"guava");      std::cout << "initially:     "               << " address=" << std::addressof(map[2])               << " value=" << map[2] <<'/n';      auto handle = map.extract(2);      std::cout << "after extract: "               << " address=" << std::addressof(handle.mapped())               << " value=" << handle.mapped() <<'/n';      handle.key() = 4;     map.insert(move(handle));      std::cout << "after insert:  "               << " address=" << std::addressof(map[4])               << " value=" << map[4] <<'/n'; } 

compiles (with gcc 8.2.0 using -std=c++17) and gives output

initially:      address=0x7f9e06c02738 value=papaya after extract:  address=0x7f9e06c02738 value=papaya after insert:   address=0x7f9e06c02738 value=papaya 

as expected (the same results are obtained for std::string in place of immovable_string and/or unordered_map instead of map).


Edit

Note that I'm not asking about issues related to modifying the key (map stores pair<const Key,T>).

My question is solely about restrictions for access to the mapped element via pointers or references. The whole idea of the extract-and-insert idiom makes only sense if the element is not moved/copied, i.e. if its address remains valid at all times (which in fact is specified by the standard). Rendering access to the element whilst in the extracted state UB seems strange and renders the extract-and-insert mechanism less useful: think about multi-threaded code with one thread accessing an element whilst another extracts and re-inserts it. This can be implemented w/o any issue yet may invoke UB -- WHY?

Here is a UB scenario (which IMHO is perfectly fine, no UB required):

void somefunc(object*ptr) { ptr->do_something(); } void re_key(map<int,object> &M, int oldKey, int newKey) {     if(M.find(0)!=M.end() && M.find(newKey)==M.end()) {         auto handle = M.extract(0);         handle.key() = newKey;         M.insert(std::move(handle));     } }  map<int,object> M = fillMap(); auto ptr = addressof(M[0]);     // takes initial address thread t1(somefunc,ptr);        // uses said address to access object thread t2(re_key,M,7);          // extracts and inserts an object  

Of course, if the insert() fails, the handle is destroyed and the address invalidated. That's obvious, but the user can to something about that.

 


I think the predominant subtlety in the "extraction" system is that the value_type of a map<K, T> is pair<const K, T> — note the const!

Modifying a const object causes undefined behaviour, so you need to be very careful not to modify something that's known to be const. While the node is part of any map, the key is const. The "magic" of the extraction machinery (and the reason it took so long to specify) is that while the node is extracted, the key is not const.

This basically requires you to look at the problem really hard and convince yourself that a pair<const K, T> can sometimes be interpreted as a pair<K, T> (and bear in mind that pair is a template that users are permitted to specialize!). So to avoid any potential for const objects being modified, there must be a clear sequencing of inserted and extracted states for any node.

There is standard wording to help with the specialization issue in [container.node.overview]p4:

If a user-defined specialization of pair exists for pair<const Key, T> or pair<Key, T>, where Key is the container’s key_type and T is the container’s mapped_type, the behavior of operations involving node handles is undefined.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: