Serving the Quantitative Finance Community

 
User avatar
stratamad
Topic Author
Posts: 0
Joined: April 2nd, 2009, 9:25 pm

How to Build & Automate a Clean and Sanitary FX Database

April 30th, 2010, 12:52 pm

Greetings Forum Members, Our internal quant group is in the process of building an emerging market currency database. We have done a whole slew of financial database projects in the past, but we have never done one exactly like this. We wanted to cross check some of our ideas with the forum. Generally, we would be interested to know your thoughts on the following questions/answers.1. What are some of the best ?official? sources for currency data? a. Our plan is to build the backbone using data from Datastream. We will also use other sources ( BBG, Reuters, central banks ) for cross checks and to fill in gaps. We are also developing a method to automatically recommend a best source when primary sources fail. 2. What are some common problems with raw currency data from Datastream/BBG/Central Banks/etc a. Generally, we see gaps, repeating daily closing spots, outliers, inliers and more. Quality and error types vary from source to source, period to period, currency to currency. Right now, we are using spot and 1-M forward data but an expansion will end up including 2-M/3-M/etc , intraday, bid/ask and much more. b. We have talent internally who has a feel for this type of data and knowledge of the currency ( and derivatives) markets from past projects. We are staffing the following skills sets: macro-econ, FX trading exp, stats, programmers /systems engineers ( SAS, R, python, perl, mySQL, Oracle, ). 3. What are some techniques for doing outlier analysis/error detection/quality assurance on FX time series data? a). We are going to use some standard stats techniques like interquartile range, modified z-score tests and more. Any insights on what has/hasn?t worked well in the past? b). Of course, visual inspection will be also be used. Any pit falls to watch out for? c). Generally, we are looking for obvious manipulations, inconsistency, and gross discrepancies. Large jumps and anomalies will be checked for supporting events. 4. How would you go about automating the QA/updating/etc of the process? a). On other database projects we have automated QA/updating/etc tasks in perl, VBA, SQL, etc. Once we have a set methodology we will start coding it up and testing. Anyone have any ideas or prebuilt systems they could share? b). Generally, looking to leverage SAS, R, perl, python for the programming side. 5. From an infrastructure design perspective what would be some best practices so the database is scalable and future proof?Thanks in advance for your imput!Regards,Textexactly
 
User avatar
bearrito
Posts: 1
Joined: March 14th, 2010, 2:24 am

How to Build & Automate a Clean and Sanitary FX Database

May 4th, 2010, 12:15 am

Bumping this.I have no insight into these questions.
 
User avatar
farmer
Posts: 63
Joined: December 16th, 2002, 7:09 am

How to Build & Automate a Clean and Sanitary FX Database

May 4th, 2010, 10:29 am

QuoteOriginally posted by: stratamadc). Generally, we are looking for obvious manipulationsHuh?Anyway, I think you are confusing the problems. Your database should just be a historical archive. All events timestamped and stored with origin and details. So you can travel back in time. Then you have the freedom to decide later what data you might want.
Antonin Scalia Library http://antoninscalia.com
 
User avatar
bearrito
Posts: 1
Joined: March 14th, 2010, 2:24 am

How to Build & Automate a Clean and Sanitary FX Database

May 4th, 2010, 12:27 pm

QuoteOriginally posted by: farmerQuoteOriginally posted by: stratamadc). Generally, we are looking for obvious manipulationsHuh?Anyway, I think you are confusing the problems. Your database should just be a historical archive. All events timestamped and stored with origin and details. So you can travel back in time. Then you have the freedom to decide later what data you might want.To this point you could always roll out two dbs.One raw one clean. I do this regularly. However, this only make sense if your cleaning operations are a bottleneck.You basically trade off space for time.
 
User avatar
stratamad
Topic Author
Posts: 0
Joined: April 2nd, 2009, 9:25 pm

How to Build & Automate a Clean and Sanitary FX Database

May 4th, 2010, 3:41 pm

Farmer:Regarding Manipulations. Data from BBG often has been changed. For example, you will sometimes come across 4 or 5 days in row where the currency closing spot price is exactly the same for each day down to several decimal places. This is highly unlikely. BBG has applied some set of rules to the data. Those would need to be accounted for. Also, they often have not adjusted prices correctly; especially for equites regarding corporate actions. We did a big project looking at 5K+ series of BBG data and found lots and lots of errors. We checked these errors by referencing at least 3 or 4 sources. We also used one of our clients internal databases as cross check. This data is comprised of prices coming from their trading desk. Sometimes BBG was significantly off. We have already been paid good money several times to remedy these types of errors for banks and hedge funds, etc who require acurate information. Thanks again! Bearrito:Like the point about two DBs. We will discuss. FYI Guys - I am the salesperson. So please excuse me if my understanding is not as thorough as yours.
 
User avatar
ntxman
Posts: 0
Joined: May 8th, 2010, 2:33 am

How to Build & Automate a Clean and Sanitary FX Database

May 9th, 2010, 5:48 am

Textexactly,I can tell you from years (nay, decades) of installing/configuring/programing databases of all flavors (Oracle, Sysbase, Microsoft, Progress etc) that you will be terribly dissatisfied if you run this database system on anything other than a Unix platform (yes that leaves out SqlServer). You won't be able to automate all the tasks needed to maintain multiple instances, plus the security you get with a correctly installed Unix system cannot be matched. Combine this with the fact that such a system will be a of a mutant in that it's both a report database (designed mostly to fetch and format data calls into readable reports for end users) and a transaction based database (reading live or nearly live data flow (trades) from your providers) and processing same (mostly write operations and data manipulation). In fact setting the database (engine) parameters in these types of systems is always a trade off in performance, and lots of whining end users. You mentioned "manipulations, inconsistency, and gross discrepancies", in industry parlance this is called "data scrubbing". Database companies are loath to do this, in fact the big db vendors will tell you flat out they won't do it. Not only does add horrible expense to the client it becomes an issue of 'ownership' and scope creep. This is evidenced by the fact that you can see the same types of errors across multiple data vendors,, none of them seem to want to tackle the job of supplying 'pristine' data.Your choices for managing your 'golden' database (one in which you're satisfied is as data error free as you can make it) is to 'dump and load' from the golden to 'production' database. This can be time consuming, it may take several hours to complete and may fail in the process (constraint issues, disk space issues, etc.) leaving you a hell of mess to walk into when you arrive at the office the next morning. I've had good luck 'mirroring' the database instances, doing data writes to the dedicated db server that does the transaction heavy work and having the report heavy server mirror this data as it flows in, this also allows me to keep end users on a 'report' server tuned for that purpose. This solves my performance issues and helps keep data refreshes more manageable. At any rate, hope you find this useful.ntxman
 
User avatar
stratamad
Topic Author
Posts: 0
Joined: April 2nd, 2009, 9:25 pm

How to Build & Automate a Clean and Sanitary FX Database

May 9th, 2010, 3:47 pm

Hi ntxman,Thanks for the detailed response and yes this should be helpful. I will share it with the team. Yes, it looks really solid for us getting more and more business scrubbing data and building databases. We have some comparitive advantageous in handling this type of work. We also ussually follow up by helping the client do backtesting, portfolio optimization, econometric modelling, etc.Follow this thread as we will get some technical responses from the team and post again soon. Thanks!!!!
Last edited by stratamad on May 8th, 2010, 10:00 pm, edited 1 time in total.