SERVING THE QUANTITATIVE FINANCE COMMUNITY

 
kmiklas
Topic Author
Posts: 14
Joined: July 8th, 2016, 2:34 pm

Usain: A Low-Latency Market Data Generation Program

August 6th, 2016, 1:18 am

Dear All,

Given the high fees for "non-display use" of tick data, I have written a program to generate ticks, and published it in the open source community. "Usain" allows the user to generate ticks at their chosen speed.. and it is FAST!

This will allow one to build algorithms without giving an arm and a leg away to the exchanges.

Please take a look, and tell me what you think. Be gentle.. it's the first release. If any of you would like to contribute, please message me. This is going to be good!

Thank you
Keith

Github link:
https://github.com/kmiklas/usain

Image

Image

Image
 
User avatar
Alan
Posts: 9868
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: Usain: A Low-Latency Market Data Generation Program

August 8th, 2016, 2:15 pm

Since no one has responded, perhaps you can elaborate on what an "algorithm builder" would expect to learn against a simulation of a pure random walk. (I looked at your link briefly and that was my impression of what you are simulating).   

I suppose it depends on the purpose of the algorithm. If the purpose is to "beat the market" in some sense, I don't see the point unless the simulated data captures some useful "non-random-walk" features of the real data. If the purpose is to simply spread out some large order passively during the day, well maybe you can learn some things about the distribution of outcomes. So, I think you need to pair what you are doing with a real application that demonstrates ... something. 

My two cents.
 
User avatar
Traden4Alpha
Posts: 23951
Joined: September 20th, 2002, 8:30 pm

Re: Usain: A Low-Latency Market Data Generation Program

August 8th, 2016, 3:31 pm

Actually,a pure random walk data source for "beat the market" applications can help to avoid logical or statistical flaws in at least three ways. First, if the "algorithm" consistently beats the market on random walk data, something is wrong (e.g., data about the future is leaking into decisions). Second, one can use random data to to set statistic thresholds that avoid overfitting. Third, or one can use statistical outcomes from random data (e.g., the distribution of drawdowns) to characterize some of the long-term risks of a method.
 
User avatar
Alan
Posts: 9868
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: Usain: A Low-Latency Market Data Generation Program

August 8th, 2016, 3:58 pm

Fair enough -- I would still like to see an actual application of any of the above ...
 
kmiklas
Topic Author
Posts: 14
Joined: July 8th, 2016, 2:34 pm

Re: Usain: A Low-Latency Market Data Generation Program

August 9th, 2016, 2:11 am

Thanks for responding. Often I feel like I'm not being heard on these forums.
Here are three reasons why Usain is extremely useful.

1. Performance and S P E E D. Blazing speed! This program can generate about 2ns (two nanosecond) ticks locally on my $1000 rig. That's 1/1000000000th of a second. It enables a programmer to work, build, develop, and gain experience in an ultra-low-latency environment--at sanctum sanctorum speeds. Just to give you a point of reference:

a. Interactive brokers tick time (U.S. equities):  250ms (four ticks per second).
b. oanda.com tick time (FOREX): 50ms (twenty-five ticks per second)
c. Interactive brokers tick time (FOREX):  5ms (250 ticks per second).
d. iqfeed.com tick time: 1ms (1000 ticks per second)
e. Big brokerage tick times (Morgan Stanley, BNY Mellon, JPM, etc): 1mcs (100,000 ticks per second)
f. HFT Hedge funds and other monsters with and exchange-collocated equipment:  tick times: 500ns (2,000,000 ticks per second)
g. Usain: 2ns tick time (50,000,000 ticks per second)... perhaps even picosecond (1,000,000,000 ticks per second) on heavy hardware. 

2. Non-Display Fees and Regulations. Usage of market data dramatically affects subscriber fees. For simple "Display" usage--such as piping data into Ninjatrader or the like--NYSE charges $16 per non-pro subscriber, $70 for a pro. If the data are used with an API, and "not displayed" on the screen (hence the term, "non-display fees") they jump to $20,000. Furthermore, each exchange charges separately, so it can really add up. NYSE reference follows. Usain allows a programmer to develop low-latency algorithms, while staying in compliance with exchange regulations, and avoiding these astronomical fees.
http://www.nyxdata.com/doc/241907

NOTE: I believe that these fees are a knee-jerk response to the crash of 2008, where HFT was the scapegoat. The intent is to create a paywall to keep out all but the wealthiest. IMO, this is not the truth; the crash of 2008 had other root causes.

3. Incredible Simulations. Allow the user to specify volatility as an input parameter, such as Geometric Brownian Motion (GBM), with a jump diffusion to simulate spikes. Then, test algorithms against the specified patterns, to see if they can identify them in the data. I'm planning to build this out after I create the Usain client for data processing, and a couple of other features.

4. More to come! Exciting!

Sincerely,
Keith Miklas
Last edited by kmiklas on August 9th, 2016, 2:57 am, edited 1 time in total.
 
User avatar
Traden4Alpha
Posts: 23951
Joined: September 20th, 2002, 8:30 pm

Re: Usain: A Low-Latency Market Data Generation Program

August 9th, 2016, 2:40 am

Speed is nice but isn't this an apples to simulated oranges comparison? $20,000 buys real data (which does necessarily not follow GBM+jump diffusion) and which is what a programmer will eventually need.

P.S. How do you simulate the timestamp dynamics and the price oscillations between bid and ask?
 
kmiklas
Topic Author
Posts: 14
Joined: July 8th, 2016, 2:34 pm

Re: Usain: A Low-Latency Market Data Generation Program

August 9th, 2016, 2:00 pm

Speed is nice but isn't this an apples to simulated oranges comparison?  $20,000 buys real data (which does necessarily not follow GBM+jump diffusion) and which is what a programmer will eventually need.
True... and herein lies one of the disruptions that I'm driving for. Can market data be decentralized? These fees really chafe; we need to start innovating. How can this data monopoly be undermined?

P.S. How do you simulate the timestamp dynamics and the price oscillations between bid and ask?
Good question. I'll have to think about this..........................
.......ok. GREAT question. My first thought is to change (again) how prices are simulated. Instead of generating random (last) ticks, perhaps generate random supply and demand, and allow the bid/ask ticks to flow along lines of least resistance?
Dang.... I just committed v2 of the tick generation scheme, using C++'s high_resolution_clock, and now it looks like I'm going to have to re-write it again.
Excellent feedback. Thank you. I'll be sure to put in a good word about you with the higher-ups.   ;)
 
User avatar
Traden4Alpha
Posts: 23951
Joined: September 20th, 2002, 8:30 pm

Re: Usain: A Low-Latency Market Data Generation Program

August 9th, 2016, 5:02 pm

Speed is nice but isn't this an apples to simulated oranges comparison?  $20,000 buys real data (which does necessarily not follow GBM+jump diffusion) and which is what a programmer will eventually need.
True... and herein lies one of the disruptions that I'm driving for. Can market data be decentralized? These fees really chafe; we need to start innovating. How can this data monopoly be undermined?
Real time market data cannot be separated from the trading itself. Whatever entity handles the matching of buyer to sellers will always have privileged and lower-latency access to data about that matching process (i.e., the raw price event data).

The deeper issue is whether one could decentralize the matching process.. From a technology standpoint, the answer is clearly yes -- peer-to-peer buying and selling is the original form of commerce. Obviously, one could create an open source buyer-seller matching engine and let a thousand servers host local matches and publish their data at cost.

Yet such a plan seems doomed to fail due to the natural preferences of the buyers and sellers themselves. A buyer would generally prefer a venue with larger numbers of sellers -- there being a greater chance of finding a motivated seller with a low ask price. And a seller would generally prefer a venue with larger numbers of buyers -- there being a greater chance of finding a motivated buyer with a high bid price. Even if one set up a thousand servers, the buyers and sellers themselves would naturally gravitate to the server with the highest volume of other buyers and sellers, recreating centralization.

Moreover, the better prices obtained by buyers and sellers in a centralized high-volume market probably more than make up for the higher data costs enabled by that centralized monopoly market. That's especially true if most buyers and sellers don't care about tick data, especially API-accessible tick data, so the price of that service is irrelevant to them. In fact, many buyers and sellers might prefer a venue that levies onerous tick data prices because said buyers and sellers want to minimize the amount of high-frequency trading that is getting all the best bid and ask events.
 
kmiklas
Topic Author
Posts: 14
Joined: July 8th, 2016, 2:34 pm

Re: Usain: A Low-Latency Market Data Generation Program

August 9th, 2016, 6:24 pm

Traden4Alpha wrote:
The deeper issue is whether one could decentralize the matching process.. From a technology standpoint, the answer is clearly yes -- peer-to-peer buying and selling is the original form of commerce.  
Agreed, up to this point...
Obviously, one could create an open source buyer-seller matching engine and let a thousand servers host local matches and publish their data at cost.  
Here's where you lost it. You went back to "centralized" thinking. The whole idea of setting up "a thousand servers." You don't need 1000 servers. There's billions of devices out there that, together, are more than capable of hosting local matches and publishing data. They will supply your processing power. That's the decentralized model--the "exchange" becomes all the laptops, tablets, phones, smartwatches--whatever is out there.

Example: You have a peer-to-peer trading app named, say, "Flipper," installed on your iPhone. You wish to buy 100 shares of Ford. You open Flipper and bid $12.20. This offer jumps to your neighbor, who has Flipper agent running on his Mac. No match? Your bid propagates through to a mechanic in his shop that has Flipper running on his Windows laptop, and is asking $12.20 for a round lot of Ford. The app connects you two directly. You pay him: over PayPal, in cash, by check, online account, in gold, in time, in trade... whatever. Upon receipt of payment, the shares are transferred to your account.  Key point? Your order never touched a centralized server. The "market" becomes the network of people bidding and asking across the Flipper network, and prices move in response to bids and asks. Very organic.

Really, at the heart of the matter is the fundamental need for an exchange: to match buyers and sellers. In today's day and age, we really don't need a big fancy exchange; the same matches can easily--and more cheaply--be accomplished with a distributed agent as described above. Just as how open outcry exchanges--a seemingly permanent fixture less than 50 years ago--are now irrelevant, methinks that centralized exchanges will also go the way of the horse and buggy. They just becoming too top-heavy; too overbearing with the heavy fees... like a big bully.

Do you want to be my first beta tester?
 
User avatar
Traden4Alpha
Posts: 23951
Joined: September 20th, 2002, 8:30 pm

Re: Usain: A Low-Latency Market Data Generation Program

August 9th, 2016, 8:38 pm

Traden4Alpha wrote:
The deeper issue is whether one could decentralize the matching process.. From a technology standpoint, the answer is clearly yes -- peer-to-peer buying and selling is the original form of commerce.  
Agreed, up to this point...
Obviously, one could create an open source buyer-seller matching engine and let a thousand servers host local matches and publish their data at cost.  
Here's where you lost it. You went back to "centralized" thinking. The whole idea of setting up "a thousand servers." You don't need 1000 servers. There's billions of devices out there that, together, are more than capable of hosting local matches and publishing data. They will supply your processing power. That's the decentralized model--the "exchange" becomes all the laptops, tablets, phones, smartwatches--whatever is out there.

Example: You have a peer-to-peer trading app named, say, "Flipper," installed on your iPhone. You wish to buy 100 shares of Ford. You open Flipper and bid $12.20. This offer jumps to your neighbor, who has Flipper agent running on his Mac. No match? Your bid propagates through to a mechanic in his shop that has Flipper running on his Windows laptop, and is asking $12.20 for a round lot of Ford. The app connects you two directly. You pay him: over PayPal, in cash, by check, online account, in gold, in time, in trade... whatever. Upon receipt of payment, the shares are transferred to your account.  Key point? Your order never touched a centralized server. The "market" becomes the network of people bidding and asking across the Flipper network, and prices move in response to bids and asks. Very organic.

Really, at the heart of the matter is the fundamental need for an exchange: to match buyers and sellers. In today's day and age, we really don't need a big fancy exchange; the same matches can easily--and more cheaply--be accomplished with a distributed agent as described above. Just as how open outcry exchanges--a seemingly permanent fixture less than 50 years ago--are now irrelevant, methinks that centralized exchanges will also go the way of the horse and buggy. They just becoming too top-heavy; too overbearing with the heavy fees... like a big bully.

Do you want to be my first beta tester?
LOL! As I said the technology is certainly possible and could work as you outlined it. And it might certainly reduce the data fees, but how does it guarantee the best price in the least amount of time?

In any given second, there's only 15 round lots of Ford being traded in the world. How does my $12.20 bid order find one of those 15 trades somewhere in the entire world? How many hops does it take for my $12.20 bid order to find that "a mechanic in his shop"? How many intermediate nodes have to take my order, compute whether there's a match, and either compute the transaction or compute the routing of next hop for my order? What is the latency of each hop? And how do I know that the "mechanic in his shop" is the best price I could get?

What you propose would consume far more bandwidth and CPU resources, have far worse latency, and provide worse prices than a centralized system. If you don't believe me, start simulating what happens if you try to scale this kind of network. (Oh, and don't forget all the wasted effort when two or more orders get routed to the same matching end-point, but only one order gets the transaction and any other orders must go back to bouncing around the network. It's not like P2P file sharing in which multiple requests for the same file can be satisfied by one node.)

Aggregation provides cost efficiency, latency minimization, and deeper liquidity -- every buyer and every seller can send their order directly to one location which can match and sort by price far more efficiently than pairwise P2P could.

(P.S. The more interesting issue is can the operators of intermediate "Flipper" nodes extract profits from the order flow at the expense of the end-point nodes? If a node receives a $12.20 bid for Ford and happens to know of a $12.15 ask for Ford, can than take the 0.05 difference as profit (buying the 12.15 ask and selling at the 12.20 bid) rather than letting the buyer and seller directly connect to share the benefits of the midpoint price? What stops a well-endowed node from frontrunning other nodes' orders?)
 
kmiklas
Topic Author
Posts: 14
Joined: July 8th, 2016, 2:34 pm

Re: Usain: A Low-Latency Market Data Generation Program

August 11th, 2016, 1:54 am

In any given second, there's only 15 round lots of Ford being traded in the world. How does my $12.20 bid order find one of those 15 trades somewhere in the entire world?
Excellent points; so good, in fact, that I'm going to have to change my architecture.
Perhaps a peer-to-peer pub-sub type model might work better? The first time that you search for "Ford," you basically enter a Flipper VPN made up of all those interested in Ford. There, all Ford data is aggregated, the latency would be minimized, and you'd get that deep liquidity. Basically (get this) an open outcry virtual chat room!

How many hops does it take for my $12.20 bid order to find that "a mechanic in his shop"?
In the subscription model, the mechanic would be in the Ford VPN. At first, when you enter it, it would require a bit of setup time, but then there is no cost; you're in the Flipper VPN for Ford.

How many intermediate nodes have to take my order, compute whether there's a match, and either compute the transaction or compute the routing of next hop for my order? What is the latency of each hop? And how do I know that the "mechanic in his shop" is the best price I could get?
In the subscription VPN model, you'd be on a VPN with all parties interested in Ford. Sorta like a chat room, where you'd have immediate access to all parties. Thus: 0 intermediate nodes, no hops, latency at the network speeds, and the best posted price in the VPN would be the best price in the entire peer-to-peer exchange.

(P.S. The more interesting issue is can the operators of intermediate "Flipper" nodes extract profits from the order flow at the expense of the end-point nodes? If a node receives a $12.20 bid for Ford and happens to know of a $12.15 ask for Ford, can than take the 0.05 difference as profit (buying the 12.15 ask and selling at the 12.20 bid) rather than letting the buyer and seller directly connect to share the benefits of the midpoint price? What stops a well-endowed node from frontrunning other nodes' orders?)
The peer-to-peer VPN model would prevent this, as the most current price would be posted in the virtual group, and immediately broadcast to all parties.
In this model it really just becomes a routing problem: connecting a group of network addresses with shared information. At each and every IP address is basically a little decentralized exchange.
Again, thank you for your interest. Sorry to be switching up the architecture, but your questions did flush out some very real issues. Somewhere I read, "As iron sharpens iron, so one man sharpens another."
Sincerely, Keith
 
User avatar
Paul
Posts: 9774
Joined: July 20th, 2001, 3:28 pm

Re: Usain: A Low-Latency Market Data Generation Program

August 11th, 2016, 2:03 am

No one has ever regretted taking Traden4Alpha's advice!

P
 
User avatar
dd3
Posts: 246
Joined: June 8th, 2010, 9:02 am

Re: Usain: A Low-Latency Market Data Generation Program

August 22nd, 2016, 9:07 pm

Is rand() good enough? Your code is also a bit of a mess. Check out boost::program_options for better handling of commandline args, and <random> for better random number generators.

I tried something similar but used a brownian bridge (there is VB code somewhere on this site) to create 'scenarios', unfortunately i've lost the code to a dead ssd.
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


Twitter LinkedIn Instagram

JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On