SERVING THE QUANTITATIVE FINANCE COMMUNITY

 
User avatar
katastrofa
Topic Author
Posts: 8367
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Open binary file format

November 13th, 2018, 8:56 pm

Can you recommend an open binary format to store large simulation output consisting of columns of numbers and strings? (I can get rid of strings if necessary.)
 
User avatar
FaridMoussaoui
Posts: 445
Joined: June 20th, 2008, 10:05 am
Location: Genève, Genf, Ginevra, Geneva

Re: Open binary file format

November 13th, 2018, 9:05 pm

 
User avatar
katastrofa
Topic Author
Posts: 8367
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Open binary file format

November 14th, 2018, 10:52 am

Thanks!
Could I have one more question (I'm completely ignorant about this stuff)? I understand this format supports indexing. I need to store a single indexed (with e.g. time) dataset, i.e. no need for paths. It will be essential for me to read at random from different places. The files will be quite long - I cannot tell exactly at the moment, but up to 0.5 GB (just a few columns: string, int and float). Do you think it will be very slow?
 
User avatar
FaridMoussaoui
Posts: 445
Joined: June 20th, 2008, 10:05 am
Location: Genève, Genf, Ginevra, Geneva

Re: Open binary file format

November 14th, 2018, 1:15 pm

I can't answer about the performance. But hdf5 is not a database.

You can use a database as you want "random" access. One of the DBs used by HFT traders is QuasarDB: https://www.quasardb.net/product
It is not open source but there is a free "community edition".

QuasarDB is a high performance, distributed, transactional, time series database. It can ingest data at very high speed, while giving you immediate access through a powerful, SQL-like, query language. QuasarDB was designed to withstand the most extreme use case that can be found in financial markets, aeronautics, and heavy industry.
 
User avatar
katastrofa
Topic Author
Posts: 8367
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Open binary file format

November 14th, 2018, 11:42 pm

Cool! Thank you!
 
User avatar
ISayMoo
Posts: 2137
Joined: September 30th, 2015, 8:30 pm

Re: Open binary file format

December 27th, 2018, 12:41 pm

Can you recommend an open binary format to store large simulation output consisting of columns of numbers and strings? (I can get rid of strings if necessary.)
Maybe this?

 
User avatar
katastrofa
Topic Author
Posts: 8367
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Open binary file format

December 27th, 2018, 7:41 pm

Se ve adecuado. Gracias por la sugerencia :-)
 
User avatar
ISayMoo
Posts: 2137
Joined: September 30th, 2015, 8:30 pm

Re: Open binary file format

December 27th, 2018, 7:52 pm

Un placer conocerte :)
 
User avatar
katastrofa
Topic Author
Posts: 8367
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Open binary file format

December 27th, 2018, 8:09 pm

Image
Nice to meet you too! :-D
 
User avatar
katastrofa
Topic Author
Posts: 8367
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Open binary file format

April 5th, 2019, 4:19 pm

HDF is terribly slow :-(
 
User avatar
Cuchulainn
Posts: 60241
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Open binary file format

April 6th, 2019, 8:41 am

Are object databases used these days? In 90's they were kind of hot. Achilles' heel ==> did not support schema evolution (in contrast to Oracle). You want to be able to read exploration data 20 years after. If you change the class OO hierarchy in the meantime..

https://en.wikipedia.org/wiki/Object_database
 
User avatar
katastrofa
Topic Author
Posts: 8367
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Open binary file format

April 6th, 2019, 11:19 am

Farid may know the answer to your question.

HDF turned out to be slower than CSV. I think the problem might be that it doesn't use the OS file system, which is very efficient in modern OSs. It's also pretty hard to configure.
 
User avatar
FaridMoussaoui
Posts: 445
Joined: June 20th, 2008, 10:05 am
Location: Genève, Genf, Ginevra, Geneva

Re: Open binary file format

April 8th, 2019, 11:01 am

Could you share the part of your code performing the task? Any language but the C# shit.
 
User avatar
katastrofa
Topic Author
Posts: 8367
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Open binary file format

April 8th, 2019, 7:19 pm

    void Sinkhole::dump_full_hdf5(const std::string& filename) const {
        HighFive::File out(filename, HighFive::File::ReadWrite | HighFive::File::Create | HighFive::File::Truncate);
        std::vector<seconds_t> time(data_.size());
        static const size_t n_int_cols = 5;
        static const unsigned int deflate_level = 9;
        boost::numeric::ublas::matrix<int64_t, boost::numeric::ublas::row_major> int_data(data_.size(), n_int_cols);
        int row_idx = 0;
        auto time_it = time.begin();
        for (auto it = data_.begin(); it != data_.end(); ++it, ++row_idx, ++time_it) {
            *time_it = it->time;
            int_data(row_idx, 0) = static_cast<int64_t>(it->bot_state);
            int_data(row_idx, 1) = it->ip;
            int_data(row_idx, 2) = static_cast<int64_t>(it->host_id);
            int_data(row_idx, 3) = static_cast<int64_t>(it->local_network_type);
            int_data(row_idx, 4) = static_cast<int64_t>(it->is_fixed);
        }
-        static const size_t chunk_size = 100;
-        HighFive::DataSetCreateProps time_props;        
-        time_props.add(HighFive::Chunking({ chunk_size }));
-        time_props.add(HighFive::Deflate(deflate_level));
-        HighFive::DataSet dataset = out.createDataSet<double>("/time", HighFive::DataSpace::From(time), time_props);
-        dataset.write(time);
-        HighFive::DataSetCreateProps int_data_props;
-        int_data_props.add(HighFive::Chunking({ chunk_size, n_int_cols }));
-        int_data_props.add(HighFive::Deflate(deflate_level));
-        dataset = out.createDataSet<int64_t>("/int_data", HighFive::DataSpace::From(int_data), int_data_props);
-        dataset.write(int_data);
-        std::vector<std::string> int_data_cols({ "bot_state", "ip", "host_id", "local_network_type", "is_fixed" });
-        dataset = out.createDataSet<std::string>("/int_data_cols", HighFive::DataSpace::From(int_data_cols));
-        dataset.write(int_data_cols);
-    }
It uses a library called <highfive/H5File.hpp>. I've already written my own binary format, which is faster than my above attempt at HDF5.
 
User avatar
FaridMoussaoui
Posts: 445
Joined: June 20th, 2008, 10:05 am
Location: Genève, Genf, Ginevra, Geneva

Re: Open binary file format

April 9th, 2019, 8:19 am

Thanks. I will be back to you if I find something meaningful.
Is "seconds_t" defined as std::chrono::seconds or another structure?
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


Twitter LinkedIn Instagram

JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On