2014-04-16

Co.Labs

As Black Box Pings Go Silent, Here's How Data Can Narrow The Search For Malaysia Air Flight MH370

Our evolving data model shows the most likely locations for the plane's remains, according to recent evidence.



We're now entering the 6th week of search of MH370, and the 5th week of FastCoLab’s MH370 model. There are two major developments since the last installment of this series.

First, a variety of alleged black box pings were detected close to where the first version of the model [1] showed MH370 most likely to be, further supporting the assumptions used in the model.

MH370 Monte Carlo data model, Version 2 start
MH370 Monte Carlo data model, Version 1 start

Then, recent reports of the co-pilot’s cell phone being on in the cockpit indicate the plane might have been at low enough altitude to be in reception. This could be consistent with a subtle signal of foul play; pilots are well aware of cellular interference (creating static) with other communications equipment in the cockpit, and they have a strong job performance incentive to turn the phone off independent of regulations they are required to follow. [2]

But now that any black box pings have been silent for a week, it’s time to reevaluate where the plane could be and establish some actual coordinates to search on the seafloor. If you're catching up, here are the first three posts about this project.

Version 2, 2.5% ping arc error
Version 2, 5% ping arc error

We left off with two model versions which effectively used two speeds: a slower scenario at 5/6 cruising speed, [3] and a more standard scenario at cruising speed. We explored rules by which the plane on its own tended to turn, as well as what effects the ping arc error as well as the ping arc itself had, on where the plane headed each hour as well.

This initially uncovered a lot of proverbial low-hanging fruit about the different plane possibilities, but to get a more precise estimate of where the plane is, we first need to increase the number of simulations. Many Monte Carlo models use hundreds of thousands, or even millions of trials. We increase the number by an order of magnitude and in the new figures, we see the vast range of ocean that 10,000 trials span. It’s not particularly useful for the ultimate likelihood, as a few outliers will not dramatically affect where the average plane location is—its most likely location—but it does show the space problem recovery operations face.

Version 1, 2.5% ping arc error
Version 1, 5% ping arc error

There’s a lot of ocean to search and especially at this point, any reduction of likely search area is all the more valuable. Since we’re making more specific estimates, it’s helpful to describe the results in more detail than before.

In Version 2 of the model, with 5% ping arc error, we do see some aberrations which have the plane go any which way (see the middle of the figure), but they're in the small minority. For the most part, MH370 winds up between 35 S and 50 S latitude and 75 E and 95 E longitude. Initially heading west and slightly south, MH370 ends up picking a good number of the possible circle headings after the first hour of disappearance that are not in the same direction (see the yellow dots in all of the above figures; MH370’s location about one hour after disappearance), yet the plane doesn't completely reverse course, exactly as we imagine a Boeing 777 is likely to do. And, note from the Great Circle path, that there are essentially no possible routes, again.

With 2.5% error the end MH370 location range is much tighter—75 E and 90 E longitude, and 35 S and 47.5 S latitude. This is much more manageable, and quite frankly more realistic too, in my opinion, since with a standard deviation of 2.5% radius of the last ping, or every ping, we're already having the pings overlap each other.

Version 1 shows similar results, but further north: with 5% error, updating to only the last ping each time, the bulk of the final locations are from 25 S to 40 S latitude and 85 E to 105 E longitude. With just 2.5% error, 25 S to 40 S and 87.5 E to 102.5 E longitude.

In both model versions, the fact that we see a relatively consistent area in which MH370 ends up, means that we should be able to set bounds over a small area of ocean, relative to what’s been searched to date.

Calculating The Averages

At 10,000 simulations, we're pushing the limits of what we can effectively visualize on the map. Any more and it would look a bit cluttered, or alternatively we wouldn't really get to see how frequently the final locations, and final ping locations occurred on top of each other. So rather than do a probability map or heat color scale, let's get a bit more quantitative (and brave) by returning actual latitude and longitude values.

We’ll use the final ping arc locations, rather than the last-hour estimates of ¼, ½, ¾ and 59/60 of an hour. Here’s why: If we were to average the not-quite last hour then we'd get ~30 minutes' worth of distance. Yet there's no reason to believe this is a better value than 10 minutes or 50 minutes, and it will just result in us being systematically off in most cases. So we'll ignore the end dots and just do the final locations, and then say "search within an hour's radius of this spot." Which, should over a lot of other "hour's distance" of likely locations as well.

The most obvious realization from our results to date is that MH370’s final locations are bimodal (see histograms in the IPython notebook) in both longitude and latitude, which correspond to the northern or southern route that could MH370 take. We saw from previous model versions that MH370 is more likely to take the south, but this helps visualize just how much this is the case.

It turns out that in Version 2 and in Version 1 MH370 goes to the south over 80% of the time. This is important for us to know, because visually the dots pile on top of each other so we can't really tell how much more MH370 goes south in these situations, although we recognized the overall conclusion before.

Next, we separate our southern from northern tracks to get an average of the most likely location, of the more likely southern track; otherwise we'll get some useless mishmash of the two which will not reflect any of the results. We use histograms to inform us of where exactly to make the split. Since there are two clear distributions—latitudes corresponding to the north and south, and equivalent longitude arrangements—we truncate to get the larger one of each, and only return the latitudes/longitude pairs that are in both of the bigger distributions for the latitude and the longitude. Which are both in the southern scenario.

Now that we've filtered out the unlikelier-in-aggregate northern tracks, here are the averages with an idea of how much they vary, displayed to the nearest hundredth place of each calculation [4]:

Where to Look for MH370

Version 2, 2.5% ping arc error

mean latitude: -38.56

mean longitude : 86.51

median latitude: -38.61

median longitude: 86.44

latitude variance: 0.63

longitude variance: 1.65

Version 2 of the model – where to search
Version 1 of the model – where to search

Version 2, 5% ping arc error

mean latitude: -37.17

mean longitude: 88.79

median latitude: -37.87

median longitude: 88.57

latitude variance: 42.44

longitude variance: 9.00

Version 1, 2.5% ping arc error

mean latitude: -28.19

mean longitude: 98.71

median latitude: -28.62

median longitude: 98.66

latitude variance: 22.95

longitude variance: 1.93

Version 1, 5% ping arc error

mean latitude: -25.75

mean longitude: 98.78

median latitude: -28.80

median longitude: 98.77

latitude variance: 168.58

longitude variance: 7.87

Because we're searching an area, we should not be too concerned about taking the averages of both the latitude and longitude separately, instead of comparing pairwise (as a latitude and longitude result together). We do latitude and longitude separately as well to see if there are systematic differences in the end distributions of latitudes and longitudes.

With our fairly fancy ellipse function that we already use—one that is still highly accurate at low latitudes—we create an hour’s worth flying radius no problem, without the distortion that creating a typical circle would be across our not-spherical Earth.

The above figures show 1 hour radius circles for the means and medians of Version 1 and Version 2 with 2.5% and 5% ping arc error. Depending on which scenario you trust the most, the dashed up-to-an-hour circles with the mean/median dots show the most likely regions where the plane is. Note that the variance increases for the more-rough estimates: Version 1 is more rough than Version 2, and 5% error is more rough than 2.5% error. Fortunately, within the circles, we have a partial Venn diagram going on; there's a fair amount of overlap between all four circles.

Search the Circles, Starting at the Center

We first did an exceptional job of averaging to figure out where to start looking. Displaying those averages, we have narrowed down more where MH370 might be, but as you can see from the plots, due to the limitation of up to 1 hour at the end, we unfortunately can't determine better likelihoods without more information. Thus, a good search strategy would be to pick the most likely scenario and start at the center, then head to sections where other circles overlap to first cover the areas of ocean more likely to have MH370, given the results of our model.

Additional Information, Black Box Pings and Search Areas

Australian Government Search Locations as of first week of April: Link 1, Link 2, Link 3

Daily Mail Infographic of Black Box Ping Locations: Link 1

[1] Completed March 23, and updated March 30, so 1-2 weeks ahead depending on how you count it.

[2] Although this is not really an issue in the main cabin, which is why the FAA in the United States has now largely relaxed the so-called "Kindle Rule."

[3] Cruising speed for 5 hours, which becomes 5/6 cruising speed for 6 hours, to get the plane’s location at the last ping arc.

[4] If we were hardcore, we’d keep track and report significant figures; but I don’t think this is necessary here. Rounding is important though because we really do not know the exact locations nearly as precisely as Python calculates it to be. Inmarsat et al. didn’t even release error margins, and they are a legitimate engineering and technology company.

[Image: Flickr user CamponeZ]




Add New Comment

2 Comments

  • It would be quite a service to your readers if you would take the time to fact check your article. Especially such an important development in the story such as the pingers going silent.

    'But now that any black box pings have been silent for a week.'

    I am constantly amazed the media in general keeps informing the world incorrectly in every way possible with this story. Anyways, basically the maker guarantee one month of 160dB Pinger performance on the required frequency.

    However despite this, they will generally perform at 157dB for another month and then dropping further to 150db and with possible 1kHZ drift in the signal for another month.

    So while the peak performance of the device is done, the device still transmits the ping just at much lower power level. I am sure you can dig up this information.

  • Thanks for the comment -- I was not aware of this black box technical point. But it's a factual statement we haven't heard anything that could be the black box for over a week now. So, whether or not the black box pingers have the potential to continue being detected, was not in the scope of the mention. Only in that, for the time being, without any recent potential pings to go on, this model gives a good inference on where to focus the search.