SERVING THE QUANTITATIVE FINANCE COMMUNITY

 
User avatar
Traden4Alpha
Posts: 23951
Joined: September 20th, 2002, 8:30 pm

Re: Habemus codex et ordenadum

April 3rd, 2018, 5:44 pm

I'm glad it worked! (P.S. Did you doublecheck that the returned header really is the release time and not just the current time? I could see some URL serving DBs just slapping the current time on whatever they send regardless of the timestamp of the internal information.)
 
User avatar
tagoma
Topic Author
Posts: 18354
Joined: February 21st, 2010, 12:58 pm

Re: Habemus codex et ordenadum

April 3rd, 2018, 5:57 pm

I'm glad it worked!  (P.S. Did you doublecheck that the returned header really is the release time and not just the current time?  I could see some URL serving DBs just slapping the current time on whatever they send regardless of the timestamp of the internal information.)
Yes! thanks for the warning. Actually, I checked with several links to zipped csv available on the USDA/FAS webpage. The date/time returned was the one expected.
In contrast to r.headers['Last-modified'] in my snippet code above, the command r.headers['date'] returns the date/time the USDA/FAS server is queried.
 
User avatar
Traden4Alpha
Posts: 23951
Joined: September 20th, 2002, 8:30 pm

Re: Habemus codex et ordenadum

April 3rd, 2018, 8:27 pm

I'm glad it worked!  (P.S. Did you doublecheck that the returned header really is the release time and not just the current time?  I could see some URL serving DBs just slapping the current time on whatever they send regardless of the timestamp of the internal information.)
Yes! thanks for the warning. Actually, I checked with several links to zipped csv available on the USDA/FAS webpage. The date/time returned was the one expected.
In contrast to r.headers['Last-modified'] in my snippet code above, the command r.headers['date'] returns the date/time the USDA/FAS server is queried.
That seems like good evidence that the date is what you wanted.
 
User avatar
tagoma
Topic Author
Posts: 18354
Joined: February 21st, 2010, 12:58 pm

Re: Habemus codex et ordenadum

May 12th, 2018, 7:52 pm

OK. So I'm dealing with commodity fundamentals and related data (S/D balances, COT report, global economic data, weather, few price series). It is small-scale project (professional environment, though).

Data sources are mostly internet websites via API, CSV files/spreadsheets I download from the internet. I also have a Bloomberg terminal (you know, the cost you don't wish to see in your P&L sheet) I can get data from.

I'm on my own on this one, IT being there only to restrict what I can do (see below). And everything (else) must be license-free.

I first thought. OK I will go with Python for the ETLs and the forecasting models. PostgreSQL to gather and consolidate all the data, create light MDM and serve them to PowerBI.

Unfortunately IT is reluctant to let me run a PostgreSQL or the like on my machine.

So I'm looking to broad suggestions on an alternative setup for my data. Maybe you know some python frameworks able to "simulate" a database/server? Or maybe I shall pull all the data in the BBG terminal? (I think users can do that)

Merci!
 
User avatar
outrun
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

Re: Habemus codex et ordenadum

May 12th, 2018, 8:11 pm

If you're single user (for now?) then maybe hdf5? http://www.h5py.org

Having a database server like you want is is however much more convenient, you can have scripts running at various machines to fill it up without worrying about locking, access rights etc. You can make a data model with relations, unique keys, ..

I would buy a small server myself and trust that I would get the money back. A synology nas can run postgres. You need a development network isolated from the corporate network and have local admin so that you can be productive and use good tools. Maybe even have it at home and work there?
 
User avatar
tagoma
Topic Author
Posts: 18354
Joined: February 21st, 2010, 12:58 pm

Re: Habemus codex et ordenadum

May 12th, 2018, 8:35 pm

Thank you for the suggestions, outrun.
For the time being, I cannot work from home. Unfortunately, I'm not either authorized to set up my own machinery at work. (But yeah! As IT started along "we have had issues in the past with people using their own database [within the company IT environment..." I first thought I don't care as I would bring my own stuff at work!).
I agree a data model would make it easier to update, maintain, etc...
I'm having a deeper look at hdf5 right now that I have used only once for very basic data manipulations during a Kaggle competition!

Note: as per querying BBG terminal via python, I'm seeing their API only supports Python 2.6/2.7. Ah!..
 
User avatar
outrun
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

Re: Habemus codex et ordenadum

May 12th, 2018, 8:49 pm

It's the same everywhere, I'm sure you'll be able to negotiate some more flexibility with IT.

For versioning with Pyhton virtualenv gives you good control, great for ensuring you have the right versions running.
 
User avatar
tagoma
Topic Author
Posts: 18354
Joined: February 21st, 2010, 12:58 pm

Re: Habemus codex et ordenadum

May 12th, 2018, 11:23 pm

Now looking into sqlite3. Realizing it comes with Python.
 
User avatar
outrun
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

Re: Habemus codex et ordenadum

May 13th, 2018, 12:12 pm

Now looking into sqlite3. Realizing it comes with Python.
What functionality do you wish for for your storage?

Sqlite3 is a good choice is your database structure is simple and you want to stick to SQL going forward. Using the sqlalchemy library instead of "SQL statements hardcodes in strings" will make you code more portable and less dependent on the specific SQL database backend (sqllite, postgresql,..).

https://www.sqlalchemy.org
 
User avatar
tagoma
Topic Author
Posts: 18354
Joined: February 21st, 2010, 12:58 pm

Re: Habemus codex et ordenadum

May 13th, 2018, 12:54 pm

Now looking into sqlite3. Realizing it comes with Python.
What functionality do you wish for for your storage?

Sqlite3 is a good choice is your database structure is simple and you want to stick to SQL going forward. Using the sqlalchemy library instead of "SQL statements hardcodes in strings" will make you code more portable and less dependent on the specific SQL database backend (sqllite, postgresql,..).

https://www.sqlalchemy.org
Thanks for suggesting sqlalchemy. I first thought it requires an installer and then discarded it, but I'm now seeing it is a standard python package.
I am not sure I understand your question "What functionality do you wish for for your storage?"
Still, my data will probably have 4 dimensions, namely, product, geography, flow, marketing campaign with each dimension probably having nested sub-categories (e.g. along the geography dimension US, US State, US county). Thus, a dimension may be the total for inner-categories.
I also need to specifiy the source of each data point, "as of" date for each data point, and keep track of changes in data points (e.g. May forecast for a specific item can be different from April forecast) -- I call it "versioning" but I'm not sure it is the correct concept/word.
 
User avatar
outrun
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

Re: Habemus codex et ordenadum

May 13th, 2018, 5:34 pm

Yes sqlalchemy is a popular package, used a lot also..

With functionality I mean: what's wrong with a bunch of CSV files? In general any choice of SQL database (except sqllite) probably offers all you need: access control, indices to speed things up, views, triggers, interfaces to lots of tools and programming languages, live backup options. The only thing I can come up with is perhaps a nosqldatase like mongodb for performance reasons when things get huge. I have used postgres and MySQL databases with close to a billion records without any problems.

I often have similar project like you (timeseries)!

I typically have a symbol table where each timeseries gets a single row with a unique id and discriptive info like data source, expiration date, currency of the quote, county, exchange, frequency of the timeseries. many many fields..

And then a second data row that has {timeseries_id, timestamp, value, last_updated}

The last_updated field is for versioning.
 
User avatar
tagoma
Topic Author
Posts: 18354
Joined: February 21st, 2010, 12:58 pm

Re: Habemus codex et ordenadum

May 13th, 2018, 8:28 pm

Yes sqlalchemy is a popular package, used a lot also..

With functionality I mean: what's wrong with a bunch of CSV files? In general any choice of SQL database (except sqllite) probably offers all you need: access control, indices to speed things up, views, triggers, interfaces to lots of tools and programming languages, live backup options. The only thing I can come up with is perhaps a nosqldatase like mongodb for performance reasons when things get huge. I have used postgres and MySQL databases with close to a billion records without any problems.

I often have similar project like you (timeseries)!

I typically have a symbol table where each timeseries gets a single row with a unique id and  discriptive info like data source, expiration date, currency of the quote, county, exchange, frequency of the timeseries. many many fields..

And then a second data row that has {timeseries_id, timestamp, value, last_updated}

The last_updated field is for versioning.
Thanks for this additional food for thought!
I need a bit of time to gauge the different options for a relatively light but robust data ecosystem (= headache free).
In the meantime, I shall start with something simplistic in technical terms and restricted in terms of numbers of series. I really need to deliver (market research, forecasts) with no delay!
 
User avatar
tagoma
Topic Author
Posts: 18354
Joined: February 21st, 2010, 12:58 pm

Re: Habemus codex et ordenadum

May 14th, 2018, 10:38 am

BTW, BBG helpdesk told me their API also works with Python3
 
User avatar
outrun
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

Re: Habemus codex et ordenadum

May 15th, 2018, 8:37 am

BTW, BBG helpdesk told me their API also works with Python3
That makes sense!
BTW, .. a friend of mine optimized BB data collection  (don't download the same things repeatedly) and ended up saving the company a couple of 100k per year!
 
User avatar
tagoma
Topic Author
Posts: 18354
Joined: February 21st, 2010, 12:58 pm

Re: Habemus codex et ordenadum

May 17th, 2018, 4:12 pm

I now have a Windows (7) command prompt popping each time I start up my computer. It all started after I started to use the Ice Chat thing - a nightmare in any way - . It is Java (can you believe it (Cuch?)?) .
I then tried to get rid of the beast, deleting (savagely, I reckon) all the files relating to that app e.g. in the user/data and /user/AppData.
Unfortunately, I continue to have that prompt popping at computer start.
Where else on my disk the ICE thing is likely to have written?
(I spent some 30 min with ICE support (a human) but it was useless)
Any suggestion welcome.  Merci!
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


Twitter LinkedIn Instagram

JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On