The Matchstick Problem

mensa0 · September 25th, 2004, 7:39 am

I see this thread has been pretty dormant since I last visited. Too bad 'cause I think the answer has to do with the simulations, not the mathematics in earlier threads. I assume that where the matchstick is broken is a uniform distribution. Only in "real life" would we expect more breaks to be near the middle - in this thought experiment, the breaks would be evenly distributed along the matchstick's length.Now - run a simulation of say, N=100,000 breaks as many of you have done, but assume you only had an 8-bit processor. Th maximum ratio you could have would be .9999999/.0000001=9,999,999 which would occur with probability 1/N. With a "nine-bit" processor, your maximum ratio would be ten times as large but with the same probability of occurance in the simulation. How about what you're using? 32-bit? 64-bit?I believe the simulations that produce results in the mid-twenties are a result of the maximum number of digits in the calculation of the ratio. A 64 bit processor will give you a larger average of the ratios than a 32 bit processor, etc. since larger ratios (observations) are possible. The mid-twenties value is the average value for your (limited) discrete approach to simulating a continuous variable.I stand by my original post: the mean is unbounded (infinite).Any thoughts?

Athletico · September 26th, 2004, 5:44 pm

This is a brilliant problem -- I suspect zerdna's analysis is probably closest to what the original author had in mind when he invented it. If the problem simply asked for a population mean, the result would be trivial:2\int_0^{0.5}\frac{1-x}{x} f(x) dxwhere f(x) dx is the measure of integration, as others have noted. The simplest prior, f(x) = const, makes the integral diverge. But by specifiying N and asking for the "average" ratio, the author is practically asking how fast the integral diverges. This is is solved by zerdna's approximation of the mean of the sample averages, based on order statistics.It's analogous to the problem where a price is driven by some process with infinite quadratic variation. If we were to compute the volatility for this process in the standard way, sampling N close-to-close log-price changes, we'd get a finite, normal-looking number. Naturally we'd like to know as much about the model-dependent sampling distribution as possible.By the way, the physics of the situation suggests that a uniform distribution assumption is very wrong, as SciLabGuru pointed out. AlexandreC's quadratic seems like a very reasonable ansatz but of course that makes the integral converge and the problem ceases to be interesting mathematically.

smg · September 27th, 2004, 12:07 am

Somewhat surprising the confusion -- leading even to insult -- caused by this simple problem. The statement of the problem allows for assumptions about the distribution from which the random variable (the "breaking point" of the match, numerically a value between 0 and 1) is drawn, and consequently about the distribution function for the smallest (or longest) piece. A quick reading through this thread suggests the latter is the main cause for the "controversy". There should be none, really.Here are three quick-'n-dirty estimates for the answer to the problem:I. Naive discrete approachAssumption: all breaking points are equally spaced.II. Naive continuous approachAssumption: all breaking points are equally likely.III. Monte Carlo simulationA few seconds of coding in Mathematica give you a toy model for the experiment (see attached zipped .nb file).I don't know what generator is behind Random[], but found it interesting to see the final answer ranging from 19.793 to 38.451 over just 10 runs, with 8 of them giving values above 22. As an auxiliary check on Random[], I did a quick Monte Carlo simulation to estimate Pi, and got a fractional error always less than 0.1%, using the same 10^5 points, over 10 runs. It seems that Random[] generates quite a few points closer to 1 and 0 than the 10^(-5) threshold considered above. Some remarks on real-world attempts...- The assumption of a uniform PDF is the most natural from a purely mathematical viewpoint: there is absolutely no reason for certain breaking points being more likely than others. If one is to take the problem as pertaining to a real life situation where one breaks the matches with one's hands, then a constant PDF is a very poor choice for several obvious reasons: matches tend to break close to their midpoint and arbitrarily small break points are impossible to achieve. Note, however, that most people are either right-handed or left-handed (most are RH, of course, but this is irrelevant), which means that when one is trying to break a match "in half", one of the hands will be applying more force than the other, and the most likely point for the match to break is not its midpoint; it is close to it, but it's not it. Neglecting possible differences between RH and LH people (and the tiny fraction of people that are ambidextrous and would, arguably, be more likely to approach the half cut on average), I think the PDF for the break point would be some leptokurtic curve with some skewness (left or right, depending on how one marks the break point in the (0,1) interval).- In a real life situation, it should be clear that the threshold for "closeness" of breaking point to either end of the match is likely of the same order of magnitude as the match length (i.e., larger than 0.1, for a match of unit size), for a regular-sized match (about 1-2 inches). Obviously, the longer the match, the easier it is to break it at a fractionally smaller distance from one of its ends. - One other issue one must worry about in a real life situation is the structural consistency of matches. Even assuming a single brand, it's obvious that matches are not perfect objects (nothing is, by the way), and if 10^5 same-brand matches were subjected to a perfectly symmetrical (say, by an infinitely accurate ambidextrous robot) breaking procedure, they would not all break at midpoint. The observed PDF for this process would be different than that due to single-handedness bias. I suspect a leptokurtic curve with a "very small" skewness. In the real world, of course, the observed PDF for a large number of matches (being broken up by people) would be a functional of the PDF for imperfect matches in perfect hands and the PDF for perfect matches in imperfect hands. I still think the observed PDF would have high kurtosis and a "noticeable" skewness. This is getting too long... I had my 40 minutes of forum "fun" (20 of those 40 were spent getting the LaTeX bits to be accepted...) and shall leave the rest to others.