Serving the Quantitative Finance Community

 
User avatar
tags
Topic Author
Posts: 3162
Joined: February 21st, 2010, 12:58 pm

C++ tricks

August 1st, 2021, 4:08 pm

Hello. I need to cherry pick a few data in this tiny online csv that contains some bits of US EIA's weekly Petroleum status.
The structure of the file never changes.
As you can see there is now row ID, data/row identifier or anything like that. The 2 first columns ("STUB_1", "STUB_2") relate to measure and geography data dimensions respectively but there several lines with the same combination of "STUB_1" and "STUB_2".
From column 3 to the last row, values correspond to this week, prev week, prev year, this week  (4wk ave)this week (4wk ave).
So I know in advance that I will a few data at the intersection of a selected number of rows and columns 3 (this week) & column 4 (prev week).
What is the best way to do this in C++, please?  (untold questions include  would you read the file row by row ? btw the file is first saved to disk? would you loop over row and columns? are there numpy tools like in C++ nowadays? .. ?)
I'm willing to do this quite consistenly with the way real and modern C++ programmers would do it.
any help much appreciated.
(NO I don't want to use Python, instead) 
 
User avatar
Alan
Posts: 2958
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: C++ tricks

August 2nd, 2021, 3:24 pm

maybe the top solution here?
 
User avatar
tags
Topic Author
Posts: 3162
Joined: February 21st, 2010, 12:58 pm

Re: C++ tricks

August 2nd, 2021, 6:19 pm

thank you for your input Alan.
so reading line by line is the solution to go? and while i'm acquiring each new line, i spot the 3rd and 4 elements and assign them to some variable / add them to some container?
just in case my question is not crystal-clear, i'm mostly interested on "how to do it" and "why that how is the right approach" in C++  (e.g. one is better off reading the file all at once and then get through each line as it is more memory efficient,  oen shall not use conditions like if row M and and column N then catch the value because it is ugly and there is a better way to do it, use boost::tokenizer because it is handy and fast, etc...)..
thank you all!
 
User avatar
Alan
Posts: 2958
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: C++ tricks

August 4th, 2021, 1:42 pm

Sorry, but my contribution here is in helping you google. I know a little C but even less C++ . 
I will guess there are many "right" approaches.
 
User avatar
tags
Topic Author
Posts: 3162
Joined: February 21st, 2010, 12:58 pm

Re: C++ tricks

August 10th, 2021, 5:57 pm

It is a question on Visual Studio 2019. How does one know what version of nmake he/she has to use to build whatever library he/she wants to build?
VS19 comes with 4 different flavors of nmake.exe in separate Tools subfolders.

\Hostx86\x86
\Hostx86\x64
\Hostx64\x86
\Hostx64\x64


It probably relates to 32/64 bits options, Can you please explain Hostx86 vs Hostx64 and these two vs x86 and x64 and/or point to online documentation?

 
 
mateuszb
Posts: 1
Joined: October 16th, 2022, 6:49 am
Location: California

Re: C++ tricks

December 10th, 2022, 12:58 am

I know this is late but perhaps someone finds this useful:
#include <iostream>
#include <string>
#include <vector>
#include <cassert>

using namespace std;

namespace {
enum class State {
    QUOTED,
    UNQUOTED
};

}

int main()
{
    string line;
    State state = State::UNQUOTED;

    using Field = string;
    using Record = vector<Field>;
    vector<Record> records;

    string headerLine;
    getline(cin, headerLine);

    uint32_t p, q, r, lineno = 0;

    while (getline(cin, line)) {
        p = q = r = 0;
        Record record;

        for (auto c : line) {
            switch (state) {
                case State::UNQUOTED:
                    switch (c) {
                        case '\"':
                            state = State::QUOTED;
                            q = p = r + 1;
                            break;
                            
                        default:
                            //assert(!"error");
                            break;
                    }
                    break;

                case State::QUOTED:
                    switch (c) {
                        case '\"':
                            state = State::UNQUOTED;
                            q = r;
                            assert((q - p) > 0);
                            record.emplace_back(line.substr(p, (q - p)));
                            break;

                        default:
                            break;
                    }
                    break;
            }
            ++r;
        }
        records.emplace_back(std::move(record));
    }

    auto recno = 0;
    for (auto& r : records) {
        cout << "Record " << ++recno << " with " << r.size() << " fields: ";
        copy(r.cbegin(), r.cend(), ostream_iterator<string>(cout, "|"));
        cout << endl;
    }

    return 0;
}
mb@mbp untitled % clang++ main.cpp -o main
mb@mbp untitled % curl -s "https://ir.eia.gov/wpsr/table9.csv" | ./main | head
Record 1 with 8 fields: Crude Oil Production |Domestic Production|12,200|12,100|11,700|11,100|12,125|11,550|
Record 2 with 8 fields: Crude Oil Production |Alaska|450|444|454|512|448|450|
Record 3 with 8 fields: Crude Oil Production |Lower 48|11,700|11,700|11,200|10,600|11,700|11,100|
Record 4 with 8 fields: Refiner Inputs and Utilization |Crude Oil Inputs|16,585|16,638|15,785|14,436|16,446|15,613|
Record 5 with 8 fields: Refiner Inputs and Utilization |East Coast (PADD 1)|760|809|742|612|803|723|
Record 6 with 8 fields: Refiner Inputs and Utilization |Midwest (PADD 2)|3,982|3,859|3,960|3,472|3,841|3,821|
Record 7 with 8 fields: Refiner Inputs and Utilization |Gulf Coast (PADD 3)|8,954|9,135|8,350|7,922|8,971|8,402|
Record 8 with 8 fields: Refiner Inputs and Utilization |Rocky Mountain (PADD 4)|573|586|574|550|579|546|
Record 9 with 8 fields: Refiner Inputs and Utilization |West Coast (PADD 5)|2,315|2,249|2,159|1,880|2,252|2,123|
Record 10 with 8 fields: Refiner Inputs and Utilization |Gross Inputs|17,162|17,106|16,287|14,692|16,955|16,095|


Hope this helps ;)