Serving the Quantitative Finance Community

 
lbasse
Topic Author
Posts: 1
Joined: April 8th, 2021, 4:12 pm

Python: socket realtime prices and Pandas dataframes

April 8th, 2021, 4:48 pm

Hi everyone,
I have a python script that uses API from one broker platform I use. It connects (in localhost) to the platform software via 'socket' and using a socket.recv() call , I receive all the requested price-feed information (time, symbol, bid, ask, last, last quantity, volume...etc) . 
Now, I want to analyze these records, so I append them into one big Pandas Dataframe called 'priceAll'
....
.... 
while True:
        msg = mysock.recv(16384)
        msg_stringa=str(msg,'utf-8')
        
        read_df = pd.read_csv(StringIO(msg_stringa) , sep=";", error_bad_lines=False, 
                        index_col=None, header=None, 
                        engine='c', names=range(33),
                        decimal = '.')
        priceAll = priceAll.append(priceDF, ignore_index=True).copy()
I then read and analyze 'priceAll' , making local copies inside of other functions , eg:
- summing all quantities per symbol and price so I have all the volumes per specific prices
- summing all quantities per second so to have second-based volumes 
- etc ...

This whole workflow works quite fine (looping in 100-200ms approx , I almost never encounter missed values or reading problems)  and DF 'priceAll' gets as big as approx 4000-5000 lines, per 8 columns. Rows older than 10mins are automatically dropped with this:
priceAll = priceAll[(now - priceAll['time']).astype('timedelta64[s]') < 600].copy()

MY QUESTION: Is there any other workflow more suitable for this purpose using Pandas Dataframe ? I am aware that dataframes are not exactly the best choice for realtime and "append" tasks, but I cannot find another better solution that is as fast and as simple with handling tabular data (making summations, averages , grouby's ,etc..) . 

Thanks in advance!
 
User avatar
Alan
Posts: 10564
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: Python: socket realtime prices and Pandas dataframes

April 9th, 2021, 2:43 pm

Interesting question. I am a python novice, so just have a novice suggestion. It sounds like you have seen this advice: better to use a list because a list append is O(1), while a df append is O(len(df)). I would add that you can do sums and averages by simple updating rules, so you don't need a df for those.  
 
User avatar
Cuchulainn
Posts: 64298
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python: socket realtime prices and Pandas dataframes

April 9th, 2021, 3:50 pm

Are you using connection-oriented (TCP) or connectionless (UDP) sockets? Feels a bit klingon.
'realtime' is a relative concept.

Guess: what about asynchronous programming (futures)?

https://docs.python.org/3/library/asyncio.html
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl
 
User avatar
Cuchulainn
Posts: 64298
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python: socket realtime prices and Pandas dataframes

April 9th, 2021, 4:23 pm

.... and maybe coroutines ..

https://docs.python.org/3/library/async ... #coroutine

That's the kind of approach I would take.

It means that you can be updating a DataFrame while (non-blocked) waiting on the next incoming data packet.
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl
 
User avatar
katastrofa
Posts: 10006
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Python: socket realtime prices and Pandas dataframes

Yesterday, 10:54 pm

Concatenation should be 10x faster: https://pandas.pydata.org/pandas-docs/s ... ppend.html
RTFM ;-D
 
User avatar
Cuchulainn
Posts: 64298
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python: socket realtime prices and Pandas dataframes

Today, 8:40 am

Concatenation should be 10x faster: https://pandas.pydata.org/pandas-docs/s ... ppend.html
RTFM ;-D
Does this solve everything? Seems too good to be true.

In short, if append is the answer, what is the question? I missed the latter.
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl