What an interesting problem, frenchX!I would guess that your data both under-estimates and over-estimates the variance of outcomes due to scope modulation and cancelation effects. That is, many projects show lower over-run levels than actual because the company cut the scope of the system mid-project. But some projects show horrible overruns because of scope creep. The data also under-estimates the cost-overrun risks due to survivor effects -- you data lacks all the events with 5000% over-runs that were cancelled (which suggests that a coherent risk measure must include cancellation risks).The other issue is that most of the variance of outcomes probably have two causes. The first is the intrinsic structure of technical feasibility risks in the project -- how many "tough engineering/scientific problems" are embedded in the project and how are the coupled to each other. The more feasibility risks, the worse the overrun -- the overrun being the max overrun across risks. And the greater the coupling between the risks, the worse the overrun because a change in one part of the system forces a redesign of another part of the system. If a project calls for several different new technologies, all tightly coupled to each other, then the likelihood of significant overruns would be quite high.The second cause is the management culture of the company and whether it promotes transparency and honesty with regard to go/no-go or scope decisions. To the extent that project teams can and do hide problems with the project, can maintain over-optimistic forecasts of the marginal completion date or cost (i.e., a perpetual 80% completion state) or tend to say "yes" to every scope increase, then the chance of overruns is much higher. This issue would cause heteroskedasticity in your data if the data come from a number of firms that is significantly less than the sample size (e.g., data on 10 to 30 projects from 3 firms really has a sample size of 3).But to answer your original question, I'd think about doing a meta-analysis of the literature starting with this fount of wisdom. A quick glance shows that many of these papers have decent sample sizes. That might give you a good idea of the family of distributions to use and then you estimate the parameters of the distribution for you data, including the meta-risks that the distribution of overruns may be worse than the maximum likelihood value due to your small sample size.