Limiting Concurrency in Go
Go's goroutines make it easy to make embarrassingly parallel programs, but in many "real world" cases resources can be limited and attempting to do everything at once can exhaust your access to them.
In these cases, you need to limit the concurrency of your program to fall in line with the acceptable or optimum use of those resources. In many languages which use threads (or greenlets), pools are used to limit concurrency. They can be as easy to use as this example in python:
from gevent.pool import Pool
from requests import get
concurrency = 5
urls = ["url1", "url2", "..."]
Pool(concurrency).map(get, urls)
Since Go already has built in concurrency primitives in the way of goroutines and channels, lets look at an example which is an idiom borrowed from the net
package:
concurrency := 5
sem := make(chan bool, concurrency)
urls := []string{"url1", "url2"}
for _, url := range urls {
sem <- true
go func(url) {
defer func() { <-sem }()
// get the url
}(url)
}
for i := 0; i < cap(sem); i++ {
sem <- true
}
First, a channel is created called sem (as it will act as a semaphore) with the level of concurrency desired. As we loop over the urls, we attempt to put a bool onto the channel. If it isn't full, we fire off the goroutine on the URL, which defers a read from the semaphore which frees its slot.
After the last goroutine is fired, there are still concurrency
amount of goroutines running. In order to make sure we wait for all of them to finish, we attempt to fill the semaphore back up to its capacity. Once that succeeds, we know that the last goroutine has read from the semaphore, as we've done len(urls) + cap(sem)
writes and len(urls)
reads off the channel.
This is of course more verbose than the pool example above, but it's conceptually very simple, and there is opportunity for semaphore write/reads to be delayed or surround multiple throughput-controlled sections of the code in a flexible yet readable manner.