Serving the Quantitative Finance Community

 
User avatar
powerforward
Topic Author
Posts: 3
Joined: May 27th, 2015, 2:27 am

Advice on Mortgage Project

April 3rd, 2016, 10:27 pm

Hi, I'm a 4th year undergrad currently trying to formulate a project on mortgage backed securities. I'm hoping to use multiple areas of math/data science to formulate an amateur prepayment model.Based on my little knowledge,1) Interest Rate Model; Probably going to be a simple 1-factor CIR. I don't think I'm ready for more complicated models. Planning to read out of Brigo's book,2) Prepayment Model; This is where the problem is. I can't find a large amount amount of relevant data to construct a data-driven prepayment model. Bloomberg only can break it down into geographical regions, which is a very small amount of data; thus one can't run regressions, kernel analysis, and other data science tools. Can anyone suggest other ways to source relevant data?Any advice would be highly appreciated and feel free to correct me. Thank you.
 
User avatar
Alan
Posts: 3050
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Advice on Mortgage Project

April 4th, 2016, 1:41 pm

I would first try the agencies. For example, a little googling turns up http://www.freddiemac.com/news/finance/ ... taset.html
 
User avatar
bearish
Posts: 5906
Joined: February 3rd, 2011, 2:19 pm

Advice on Mortgage Project

April 9th, 2016, 2:52 pm

The Freddie Mac data set actually looks quite interesting - thanks for pointing it out. To OP I would remark that the ambition level for your "amateur prepayment model" appears to be higher than that of some organizations that I am familiar with, who trade billions of dollars worth of MBS relying on prepayment models done at a much higher aggregation level (like just coupon, vintage, maturity and agency).
 
User avatar
powerforward
Topic Author
Posts: 3
Joined: May 27th, 2015, 2:27 am

Advice on Mortgage Project

April 27th, 2016, 5:55 am

Alan:I tried Fannie and Freddie. The agencies' data is only part of the overall data. Furthermore, how can I even tell if they are in a specific MBS component or even big enough to impact the pool? Does Bloomberg do this?Since I'm a student, I don't have access to Corelogic, Blackknight, or similar services.Bearish:I was afraid for this scenario. Most IB FI Research use, at most, pool-level data. Might have to change to a mathematical, no-arbitrage driven project if I don't find data soon.Overall:1)It seems to me that most people find fundamental factors for prepayment modeling and then use quant models to predict future behavior; the data is used for historical, mean reverting calculations.2) When I search the internet, I see not many papers on loan-level modeling, mostly pool level. Why?3) Is it better to calibrate a math model using data or employ a purely data driven model?
 
User avatar
Alan
Posts: 3050
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Advice on Mortgage Project

April 27th, 2016, 1:44 pm

If the agencies don't have what you need, but some commercial service does, you should contact them.Sometimes a commercial service will support an academic project if it serves the purpose of showing 'here's an example of what you can do with our data'.Also, you can pester your dept. head who can then pester your school's library about acquiring access to the data you need. Or maybe a local prof. has a grant, and is interested in your project. Bottom line: take the initiative and get aggressive about what you need, assuming it exists at all.
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

Advice on Mortgage Project

April 27th, 2016, 2:22 pm

Based on the kinds of junk mail I get, its clear that data on individual mortgages is available in the public records on real estate transactions and registered liens against such property. Easily gathering enough of it for longitudinal models is another matter. Perhaps one or a few jurisdictions have particularly easy access to public records with enough data for a proof of concept for this kind of project.Whether it's better to calibrate a math model using data or employ a purely data driven model depends on the types of errors you are willing to make. The math model has a better chance of having explicable parameters but the model might be wrong or missing a lot of factors. The data driven model might seem to explain a greater fraction of the variance at the risk of being overfitted or inscrutable.Good luck!