Help with code design in C++

qck · February 22nd, 2007, 7:25 pm

My main difficulty is that I can't find a good way to "structure" the software.The software consists in a library for the simulation of stochastic differential equations according to some numerical schemes.The implementation has to be efficient (we are going do do benchmark comparisons on them) so the use of templates to avoid function pointers.Don't be too harsh on me, as it's my first "real life" problem that I'm trying to solve using C++. We want to be able to write at the console:./nsde.exe -m ou -mp 2 -s euler -sp 0 10 0.1 -f maximum_bb -n 1000here:-m model (ou: dXt = -cXtdt + dBt)-mp model parameters (c = 2)-s scheme: here euler-sp scheme parameters (here starting point x0=0, ending time T, delta of discretization 0.1)-f functional used (here I calculate the maximum of the path using Brownian Bridges interpolation)-n number of simulations (here 1000)The program should consider the ou model, then for 1000 times simulate the path with the euler scheme, compute the value of the functional of the path and store the result on a fileMy idea is to define a class FOR EACH (parametric) diffusion model, like:class ou { double c;public: double drift(double x) {return -c*x}; double diffu(double x) {return 1}; //diffusion coefficient .....};(not writing everything, like constructors for simplicity.....)and a class for the path of the process:class path {public: velarray<double> values(maxsize); velarray<double> times(maxsize); static long maxsize; long size; //actual lenght of the path double x0, delta, T; //starting point , delta of discretization, ending time} ;and call the numerical scheme withtypedef double (D::*pmfD)(double)template<class D, pmfD drift, pmfD diffu>void euler(path& thepath, const D& thed, double x0, double delta, double T>I have to decide if it is a good idea to use this class path, which restrict the reusability of the schemes...The point is that some functional of the path requires a path with constant spacing (come schemes returns values in this constant spacing augmented with other values betweenthis constant spacing). So I could define another class pathconst and define conversion via copy assignment.a functional is for example:double maximum_bb(path thepath) ;I will also include a class named custom with pointer to functions, and a template specialization for it, to allow for run-time defined processes to be taken in consideration.There are some "problems" with this approach:A) as you add more schemes you should add more function and precomputed coefficients in each diffusion process class (like ou). A scheme for example uses the function drift' + drift^2 (and for efficency reasons I will compute this function and plug in directly the simplified form, with also precomputed coefficients). I'm just searching for a more elegant approach (I fear the class to become a burden....).Should I add friends classes objects (with the precomputed coefficients as private data) to better organize the class?(for precomputed coefficients I mean for example something of the form c1 = exp(c^2 + c/3); )For example I could define a classclass ou_milstein { double func1(double x) {return ...} ; .....}and make it friend of class ou.A pro is that for some numerical scheme we need to perform some intialization / optimization procedure, that could be performed at the moment of creation of the obj of type ou_milstein. The cons are that:1. I have to put all the needed function for a given scheme in its own class (friend of ou), like ou_milestein for milstein. That would result in a lot of functions (like the drift) being duplicated....2. OR I could just create "some" classes for groups of numerical schemes needing the same functions. But then I would need to pass more than one class to each numerical scheme, so the calling syntax of the scheme risks becoming quite complicated.....OR I just avoid all this and put everything together in the class ou.....B) There are schemes that apply only to a particular diffusion process (like ou).In this situation my idea is to just define this schemes as friends of the class (for the other schemes this is not needed, they just have to access the functions (defined for them, see point A) in the public part of the ou class) to avoid putting the required functions in the public part of the class ou (again, to avoid burden).C) How to easily write the interface part of the test-program for the library that read:-m ou -mp 2 -s euler -sp 0 10 0.1 -f maximum_bb -n 1000and performs the simulation avoiding hundred of switch-cases ?At the end of MarkJoshi I see there is "the factory pattern" for similar situations, but I have not studied it (yet). Do you think it is appropriate?Do you have better ideas on how could I implement this library?Do you already see problems I can't see?I would like to have something that is efficient, but reusable and extendable if needed (not easy problem to solve, I know....).Someone I saw mentioned "the visitor pattern", could it help me out?Again, thank you VERY much in advance for your help!Best RegardsStephQ

mj · February 22nd, 2007, 10:58 pm

the factory pattern is designed for polymorphic objects and virtual functions, it is very hard to do this stuff at run time for generic programming (this is why i prefer polymorphism to templates)why use the command line? use an excel plug in eg xlw.sourceforge.netWhen i have tried to do SDEs is a generic fashion in the past, i have come to the conclusion that the downsides outweigh the benefits with too many fiddly special cases to make it worthwhile.

Cuchulainn · February 23rd, 2007, 6:28 am

QuoteMy main difficulty is that I can't find a good way to "structure" the software.I know the problemQuoteThe software consists in a library for the simulation of stochastic differential equations according to some numerical schemes.The implementation has to be efficient (we are going do do benchmark comparisons on them) so the use of templates to avoid function pointers.I have built a SDE framework using templates, function classes AND OO polymorphism in conjunction with the main GOF patterns. I use FDM schemes using Visitor.If C++ is new I would advise to start on a 1-factor, linear SDE and progress from there. It takes time to learn C++ so take it step by step. QuoteSomeone I saw mentioned "the visitor pattern", could it help me out?Visitor is a really useful pattern. In this context I use it to model various FDM schemes for various SDE classes. Quoteavoid function pointers.Why?

Cuchulainn · February 23rd, 2007, 6:38 am

QuoteOriginally posted by: mjthe factory pattern is designed for polymorphic objects and virtual functions, it is very hard to do this stuff at run time for generic programming (this is why i prefer polymorphism to templates)When i have tried to do SDEs is a generic fashion in the past, i have come to the conclusion that the downsides outweigh the benefits with too many fiddly special cases to make it worthwhile.I have a hierarchy of templated SDEs that are implemented using classes for vector-valued and matrix-valued functions. Thus, I combine the OO and generic paradigms. It's very flexible. For example, you need run-time structures for the paths (OO approach) but compile-times ones for SDEs (generic).The design is very close to the mathematical description of an SDE.And all the Gamma patterns fall into place as well.Without the combination templates/OO the solution would not even be wrong. But now it just feels right.Quotei prefer polymorphism to templatesThey are two sides of the one coin; they serve different purposes. I think comparing them in this way is one possibility but there are other scenarios.Of course, (subtype) polymorphism has a possible run-time performance penalty while templates are most efficient in this regards.So, I do not choose, I use both paradigms, it's not a problem.

qck · February 23rd, 2007, 10:21 am

QuoteIf C++ is new I would advise to start on a 1-factor, linear SDE and progress from there. It takes time to learn C++ so take it step by step.At the moment I'm just working with 1-dimensional sde, with arbitrary drift and diffusion functions.I will not work with the multidimensional case.QuoteVisitor is a really useful pattern. In this context I use it to model various FDM schemes for various SDE classes.Could you talk about this a little more? Is your code (or part of it) available ? Quoteavoid function pointers.Why?My understanding is that the template approach helps the compiler to inline functions to avoid function calling overhead, while function pointers are hard to inline. At least this is the explanation in the book I cited above and in the book of MarkJoshi.This is (it think) surely the case if I compile some code (this library) and specify the function externally.Test with the qsort algorithm with arbitrary comparision function (using C-style approach vs template approach) show that there is indeed a difference.

qck · February 23rd, 2007, 10:27 am

Quotewhy use the command line? use an excel plug in eg xlw.sourceforge.netWell, I hope to use this library in future as a building block for more interesting things. But at the moment this software will be used to perform benchmarking of schemes for sde. The command line is needed because I will run batches of sequential simulations to evaluate different scenarios.Because of this, and because I'm quite a newbie in C++, I prefer to avoid for the moment the use of more advanced techniques that could result in performance penalties if you fail to master them correctly. The implementation problem remain the same anyway. How to avoid a nightmare of switch-cases in the selection part of the code (where the correct combination of model / scheme / functional gets selected).Here there is no performance problem, as this part will not be benchmarked at all.

Cuchulainn · February 23rd, 2007, 11:05 am

QuoteMy understanding is that the template approach helps the compiler to inline functions to avoid function calling overhead, while function pointers are hard to inlineThis statement is partially correct, partially incorrect.Function pointers _are_ efficient, in fact this is the way assembler code works (relative jumps).not a problem, performance bottlenecks are caused by other things, e.g. bad design or incorrect data structures. QuoteI prefer to avoid for the moment the use of more advanced techniques that could result in performance penalties if you fail to master them correctly. I would not worry, yet! get it wirking, then get it right, then get it optimised (in that order)

Cuchulainn · February 23rd, 2007, 11:16 am

QuoteQuote--------------------------------------------------------------------------------Visitor is a really useful pattern. In this context I use it to model various FDM schemes for various SDE classes.--------------------------------------------------------------------------------Could you talk about this a little more? Is your code (or part of it) available ? It will be published officially this year in the MC/C++ book (Wiley). I can give some tips in the meantime.

qck · February 25th, 2007, 11:00 am

QuoteI would not worry, yet! get it wirking, then get it right, then get it optimised (in that order)Well, I already have working code for (almost) everything in C. The problem is that it got quite messy. Now the problem is to "get in right" in the sense of C++ "code design".I will post more details soon about the way I'm implementing the library, so that you could help me with some useful tips. I would appreciate that QuoteThis statement is partially correct, partially incorrect. Function pointers _are_ efficient, in fact this is the way assembler code works (relative jumps).Do you have something about it in your 2006 book?Best RegardsStephQ

twofish · February 25th, 2007, 3:35 pm

The main thing that you might want to consider is to define an abstract base class diffusion_model that defines the basic interfaces, and then have each specific model be a subclass of that main class. If you use protected member functions this lets you define new models without having to use friend functions. (Using friend functions is almost always a warning sign that something is wrong).Also don't worry too much about performance hits. If you code for readability and maintainability, performance usually follows. You might take a look at QuantLib or any other large C++ application. The way you really learn to write code is learning how to read code.

Cuchulainn · March 3rd, 2007, 10:22 am

Steph,QuoteWell, I already have working code for (almost) everything in C. The problem is that it got quite messy. It is good to have working code. Then you know the requiremens. The (only?) solution now is to do a complete redesign using the nice algorithms from the C solution. I would no try to stick the pattern onto the C code. QuoteQuote--------------------------------------------------------------------------------This statement is partially correct, partially incorrect. Function pointers _are_ efficient, in fact this is the way assembler code works (relative jumps).--------------------------------------------------------------------------------Do you have something about it in your 2006 book?Yes, pls have a look at the index (p. 415 ) and look up "function pointers", there's about 6 subtopics on this QuoteAlso don't worry too much about performance hits. If you code for readability and maintainability, performance usually follows. I agree; have never seen a well-designed app that did no perform well and if it did not I could replace one strategy by another one.And watch out for Observer pattern QuoteThe way you really learn to write code is learning how to read code. Caveat: if the code uses DP and the deisgn is _not_ documented in UML 2.0 then it will take a long time to understand the design intent behind the code. What developers do is a reverse engineer of the code; this wastes time...