First of all, thank you very much for all your replies and interest in my question, very much appreciated.
I better add some information about overall workflow so to better get to my original question:
1) I 'connect' to a localhost server, so I believe it's not a matter of TCP/UDP , is it ? Then I use socket.send(cmd) to request specific outputs of a list of stock symbols:
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('localhost', 12001))
cmd = (cmd_type+" "+symbs_list+"\n").encode() #byte-type
mysock.send(cmd)
2) After this, using a 'while True' loop, I keep on reading incoming data, good thing is that Server send byte-array which terminates with '\n\n' , which makes my client-side socket instruction understand when 'that' message is finished.
I collect 'that' message into a DF converting it to string, and parsing string to columns and rows , columns are coma-separated, while rows are '\n' separated; the
StringIO library helps for that.
After that, I append this DF into a global DF :
....
....
while True:
if (flag_start == False):
break
flag_running = True
msg = mysock.recv(16384)
msg_stringa=str(msg,'utf-8')
read_df = pd.read_csv(StringIO(msg_stringa) , sep=";", error_bad_lines=False,
index_col=None, header=None,
engine='c', names=range(33),
decimal = '.')
....
....
priceAll = priceAll.append(priceDF, ignore_index=True).copy()
...
#function_child1(priceAll)
3)
Now the important part: using 'priceAll' DF , my tasks are:
- analyze traded volumes by symbol and timestamp (my timestamp is second-based, not milliseconds)
- find big trades (e.g. >10'000 shares / second)
- find specific traded volume (eg. 199999 shares , or 111111 or 22222 or ...etc ) per symbol, per second
I successfully complete them using '.groupby' , '.isin' methods or simply column value comparisons, eg:
DF1 = priceAll.groupby(['time','symb'], as_index=False).agg({'qty':'sum', 'price':'last'}).copy()
...
DF2 = priceAll.loc[priceAll['qty'].isin(array_of_odd_volume_qtys)].copy()
..
4) Print out these result-DFs into a Excel file 'realtime' (I know it's a relative concept, right point @Cuchulainn!) , meaning that during my trading session I check this file as an ongoing stream of live data, not 'ex-post' file to analyze.
For this purpose I use '
xlwing' python library which is quite good, using a COM channel it can update excel cells while file is open:
....
xw.Book(file_path).sheets['output_sheet'].range('A1').value = output_D1
...
Since DF is sorted by time, last on top, my excel table is updated constantly , showing most recent records on top of the list, scrolling down the previous results, as if it would a sort of 'Time & Sales' scroller.
Getting back to your replies:
@Cuchulainn: I do not really need to have an asynchronous or concurrent function that analyze data (do I ?) , because:
-it gets data of time0
-- it analyzes it
-I get data of time1
-- it analyzes it
... and so on...
I had tried it but it would not really make it batter as, if I do not receive data between
Time0 ....and ...
Time1 , it does not make sense to analyze data, as there is no really new data to be analyzed. 'Analyze' process takes about 100-200ms (I accept it), and then code gets back to 'socket.recv()' and it receives the new data. I am quite sure I would not miss any data while script is analyzing the previous data - in case the 'analyze' phase would take longer, I will only get a lag for the next-to-be-received data, but not miss any , Am I right ?
The output I get is used ad a screener for manual trading, not algo-trading, so <500ms lag is quite acceptable.
@Alan: thank you for your advice and link, which I had already seen and read. I am not totally sure (tried a few options but maybe not all the possible ones) I can really use that workflow for my case. As per point 2) I wrote above, I directly receive data as a matrix of 33 colums by 20-to-100ish lines per loop , which I put into a DF , would a list be a feasible object for such tabular data ? - sorry for this basic question, I have more experience with DFs than lists or dict's , hence I might be wrong.
@katastrofa: Thank you too for your advice, it's helpful in case I would need to append more than one DF per time into a another big-DF , while, in my case, I append one only DF into a big-DF , for each loop of the 'while True' chuck, where I have the 'socket.recv()' method reading from server. I am not appending one single line per time, but smallDF into bigDF . I haven't tried 'concat' , maybe it's faster than append in my case too. I will have a check.
@Cuchulainn : Thanks also pandas_streaming advice, I haven't tried it yet - I am not sure what it really does. I will have a better look and try it out, I will keep you updated about this too.