I don’t have much time to structure this text more.
As intro, I finish in Sept my PhD in the field of programming language theory and somewhat theoretical computer science. As side project I do research in scaling problems in server systems and research on measuring performance.
A few pointers. Amazon AWS was mentioned a few times. Have you speed checked the cost of virtualization? Because this idea might sound strange from one point. Taking my server examples, the virtualization penalty might be at least 80% less performance on virtualized hardware. So running on AWS would need at least 32 to 64 to compete against a fast 4 core in the web server example. I don’t have the overhead for mathematica.
The kind of problem is that in the 70ies to now the performance (amount of transistors, speed) followed mostly moore's law while memory speed lags a factor 1000 behind. So the question is less of cores than of memory speed. And if memory speed speed is the limiting factor, it’s main memory, caches and memory bus regarding the inter CPU communication.
Just an example that is clear but sounds paradoxical at the same time. If you have an application that runs on 2 or 4 core machine, run the same application of 32 core and on a 64 core machine. You would expect that the 2 or 4 core is the fastest, then the 32 core and the slowest would be the 64 core machine. This is because the clock speed is significantly slower on the 32 core machine so the 2 or 4 core machine wins if the application doesn’t scale well in the number of cores. The inter core overhead, interrupts, etc., is so much an overhead on the 64 core machine that the 32 machine is, of course, much faster than the 64 core machine.
So, if you run your mathematica code and turn of 1 to 8 threads, do you get a linear line for the first 4 threads? What is the performance gain of turning hyperthreading on or off? If the latter is a 5% to 8% increase, then a 4 core machine with hyper threading will beat a 8 core machine probably. And in this case just order a 7700k intel chip and run it with 4.7 Ghz. With fast RAM that machine costs you 750 USD (or 700 in EUR) and you order it on amazon (or newegg).
How big is your data set? I assume quite small such as 2 to 10GB? In performance terms 10GB is read in 1 second, if you data set is that large.
And also the thread would be clearer if you said what are the speed benefits to you. I have to guess faster development and debugging times, as if you make more money from speed you would not use old hardware by now. (Also what is current memory speed and exact CPU, then it would be clearer how much speed benefit you get from a 7700k chip.)
How much faster is mathematica on windows than on linux? If it is about the same speed, linux benefits in great length regarding the freedom what you can do with the kernel and OS configuration (just think of booting windows to a RAM disk vs. linux).
It might be beneficial to solve your problem by creating your own custom language / domain specific language (DSL) to solve your problem. Then you compile that code to mathematica. And if you need speed later, you compile our DSL to a cuda / GPU DSL (or c code). You just write difference compiles to all the options (or better get someone to write those). This might have potential for a number of reasons (you can always test your software on mathematica and compare results in case your super optimization code has a bug).