April 30th, 2010, 12:52 pm
Greetings Forum Members, Our internal quant group is in the process of building an emerging market currency database. We have done a whole slew of financial database projects in the past, but we have never done one exactly like this. We wanted to cross check some of our ideas with the forum. Generally, we would be interested to know your thoughts on the following questions/answers.1. What are some of the best ?official? sources for currency data? a. Our plan is to build the backbone using data from Datastream. We will also use other sources ( BBG, Reuters, central banks ) for cross checks and to fill in gaps. We are also developing a method to automatically recommend a best source when primary sources fail. 2. What are some common problems with raw currency data from Datastream/BBG/Central Banks/etc a. Generally, we see gaps, repeating daily closing spots, outliers, inliers and more. Quality and error types vary from source to source, period to period, currency to currency. Right now, we are using spot and 1-M forward data but an expansion will end up including 2-M/3-M/etc , intraday, bid/ask and much more. b. We have talent internally who has a feel for this type of data and knowledge of the currency ( and derivatives) markets from past projects. We are staffing the following skills sets: macro-econ, FX trading exp, stats, programmers /systems engineers ( SAS, R, python, perl, mySQL, Oracle, ). 3. What are some techniques for doing outlier analysis/error detection/quality assurance on FX time series data? a). We are going to use some standard stats techniques like interquartile range, modified z-score tests and more. Any insights on what has/hasn?t worked well in the past? b). Of course, visual inspection will be also be used. Any pit falls to watch out for? c). Generally, we are looking for obvious manipulations, inconsistency, and gross discrepancies. Large jumps and anomalies will be checked for supporting events. 4. How would you go about automating the QA/updating/etc of the process? a). On other database projects we have automated QA/updating/etc tasks in perl, VBA, SQL, etc. Once we have a set methodology we will start coding it up and testing. Anyone have any ideas or prebuilt systems they could share? b). Generally, looking to leverage SAS, R, perl, python for the programming side. 5. From an infrastructure design perspective what would be some best practices so the database is scalable and future proof?Thanks in advance for your imput!Regards,Textexactly