Two Stage Push & Pop

I probably mentioned this before, but SPSC (single producer, single consumer) queue is one of my favorite structures. If implemented correctly, it’s actually 100% lock-free and is also surprisingly versatile (not all problems require MPMC!). I typically use a slightly modified version of this or an unbounded version, based on code by Dmitry Vyukov.

Both implementations are very simple and possibly lack some of the modern C++ bells and whistles, like emplace, but these should not be hard to add. However, you might occassionally run into a situation where the type you need to store is both fairly expensive to copy and impossible to move (one example might be a static array). I found that in such case, it might be beneficial to use a pattern I call “two stage push/pop”. Simply put, we split each operation in two stages, we initiate a push and get an address we can use to construct our objects (result not yet visible to consumer) and when we’re done we finalize it – at this point data becomes visible. Similarly, for pop, we first completely consume/process data and then signalize we’re done with it and slot can be reused/recycled.

Using Dmitry’s SPSC code as a starting point, here’s how it could be implemented:

T& init_enqueue() 
{ 
	assert(pending_enqueue_ == nullptr);
	node* n = alloc_node(); 
	n->next_ = 0; 
	// pending_enqueue_ is a member variable, but alternatively, you could
	// return node ptr from init_enqueue and pass it to finish_enqueue
	pending_enqueue_ = n;
	return n->value_;
} 
// Publish
void finish_enqueue()
{
	assert(pending_enqueue_ != nullptr);
	node* n = pending_enqueue_;
	pending_enqueue_ = nullptr;
	store_release(&head_->next_, n); 
	head_ = n; 
}

Consume:

T* init_dequeue()
{ 
	assert(pending_dequeue_ == nullptr);
	if(load_consume(&tail_->next_)) 
	{
		pending_dequeue_ = tail_->next;
		return &tail_->next->value_;
	}
	else 
	{
		return nullptr;
	} 
} 
void finish_dequeue()
{
	assert(pending_dequeue_ != nullptr);
	node* n = pending_dequeue_;
	pending_dequeue_ = nullptr;
	store_release(&tail_, n); 
}

Use case:

// Producer
MyExpensiveStruct& data = queue.init_enqueue();
// Read//construct data
Prepare(data);
queue.finish_enqueue();

// Consumer
MyExpensiveStruct* data = queue.init_dequeue();
if(data)
{
	// Consume/process
	Consume(*data);
	queue.finish_dequeue();
}

In that example, data does not have to be copied around at all it’s always produced/consumed 100% in-place. A potential problem with this solution is that both preparing and handling the data can be time-consuming and we’re hogging our slot during that time. It is a SPSC structure, so no danger of another producer/consumer stalling but the other “end” might have to wait (either for the data to be prepared or for node to be recycled). Please note, however, it does not affect the running time (if anything, it’s shorter), merely the time we take to hold our virtual “lock” (for a lack of a better world). Even then, in the worst case it means slightly bigger memory requirements (we’re not recycling nodes as fast as possible). As mentioned, this example is for the SPSC queue, but the idiom itself could be applied to virtuall all kinds of containers, the basic idea is simple, we split our operations in 2 stages: - preparing the data - publishing our results (bumping an index, queueing a new node and so on). (it needs to be said, it gets obviously more complicted for multiple consumers as they now have to be careful not to write to the same slot).