I know of no book on low latency C++ and some LL shops code in C under the (sometimes mistaken) view that it is inherently faster.Some HFT teams modify the kernel and/or get into device driver coding, so reading
http://lwn.net/Kernel/LDD3/ might help.Victor doesnn't tell us his experience, so reading ought to include algorithms for sorting and searching, tree and the Tortoise/HareHe also ought to get wise in the ways of C style memory management, malloc et al.The malloc family are believed by many developers to be faster and since there are no constructors to be called, that it is deterministic in how long it takes.This is of course wrong, but you need to understand this to get the job at some places, and sometimes the vagaries of malloc implementations are acceptable.But...Classical malloc suffers from fragmentation and this can both slow the system down and occasionally causes what looks like insufficient memory.The C++ technique of a placement new() allows you to control how memory is managed.In C terms, it's easier although more work.A common issue is where you have a type of item that you create and destroy frequently. malloc will do this well enough for most purposes, but instead you can write a small collection of functions that work on a linked list of objects of the right size.Your new allocator simply gives back a pointer to an object when malloc'ing and freeing puts it on the front of the list to be given to the next request. This does consume more memory, since typically once space is allocated to this purpose it stays there until the process ends.But it's unlikely an HFT app will be constrained in RAM use, so it's worth doing.Also you need to get a decent understanding of basic networking and hardware architecture.I find a depressing % of "developers" don't even know what PCI stands for. Also, be very clear about the difference between high performance, responsive, low latency and real time.High performance is doing a lot of stuff as quickly as possible. That's a power plant, megawatts come out, but it can take a long time to start.Distributing a task over a grid/cloud may give you billions of operations per second, but may take seconds or even minutes to start.Responsive is giving back some response early, even if that slows the system down. When I first turned up at IBM labs they made me use shit that each keystroke waited for the system to respond, if you typed when it wasn't ready it either threw it away or didn't put it on the screen. When Microsoft introduced "splash screens" into it's apps in the 1990s, users reported this as the best feature of their apps, so much so that their development tools included a wizard to allow corporate apps developers to do their own. low latency means you get it there quickly, but it is not the same as real time.Real time is where you guarantee that it happens within a defined time. That's different, and increasingly important to the smarter end of HFT shops.HFT is partly driven by the fact that they are forecasting over very short timescales, and outside that period it may not only be less good but some mean reversion strategies are pretty much guaranteed to lose you money if you act too late.Thus although the mean is important, the distribution is critical as well, I've seen nice graphs with tight cluster in the small number of milliseconds with a few outliers literally hundreds of times longer.Those vary from useless to rather expensive.