Monday, April 27, 2015

Mapping a World Route

Introduction

In a previous post, I found short and long routes through all fifty USA states using simulation in R.  However, I did not use R's visual and mapping capabilities to display the results.  In this post, I use these capabilities and also display a "short" route through each of the world's country capitals.

Visualizing the USA State Route

After trying out various mapping packages in R, I settled on using RgoogleMaps.  It is easy to use and also has great maps provided by Google.  Here is an automatically generated map of the shortest route I found that starts in Washington and stops in each of the 50 state capitals only once.


The map is cleaner, more visually appealing, and also easier to create since I do not have to manually draw lines.  This becomes especially important when increasing the number of plots and lines, as in the next scenario.

Visualizing a World Route

I created a list of world country capitals along with their associated latitudes and longitudes, largely using the list found here.  I also added a few disputed territories, Greenland, Taiwan, and some other smaller territories to get a list of 197 stopping points in the world.  My intention was not to make a political statement in including some countries and territories and excluding others.  I merely want to have a set of countries which reasonably represents "all countries" of the world for the purposes of finding and displaying a route to "all countries" of the world.

Using an algorithm to find a route from each country's capital to the remaining closest country's capital, I got the following route:

  • Minimum Distance: 104,018 miles
  • Minimum Route: United States of America, Canada, Bahamas, Cuba, Jamaica, Haiti, Dominican Republic, Saint Kitts and Nevis, Antigua and Barbuda, Dominica, Saint Lucia, Saint Vincent and the Grenadines, Grenada, Trinidad and Tobago, Barbados, Guyana, Suriname, Venezuela, Colombia, Ecuador, Panama, Costa Rica, Nicaragua, Honduras, El Salvador, Guatemala, Belize, Mexico, Peru, Bolivia, Paraguay, Uruguay, Argentina, Chile, Brazil, Cape Verde, Senegal, Gambia, Guinea-Bissau, Guinea, Sierra Leone, Liberia, Cote d'Ivoire, Ghana, Togo, Benin, Nigeria, Equatorial Guinea, Cameroon, Gabon, Sao Tome and Principe, Republic of the Congo, Democratic Republic of the Congo, Angola, Namibia, Botswana, South Africa, Swaziland, Mozambique, Lesotho, Zimbabwe, Zambia, Malawi, Tanzania, Kenya, Uganda, Rwanda, Burundi, South Sudan, Ethiopia, Djibouti, Yemen, Eritrea, Sudan, Egypt, Israel, Palestine, Jordan, Syria, Lebanon, Cyprus, Turkey, Romania, Bulgaria, Macedonia, Albania, Montenegro, Bosnia and Herzegovina, Serbia, Hungary, Slovakia, Austria, Czech Republic, Germany, Denmark, Norway, Sweden, Estonia, Finland, Latvia, Lithuania, Belarus, Ukraine, Moldova, Poland, Croatia, Slovenia, San Marino, Vatican City, Italy, Monaco, Switzerland, Liechtenstein, Luxembourg, Belgium, Netherlands, United Kingdom, France, Andorra, Spain, Portugal, Morocco, Algeria, Tunisia, Malta, Libya, Greece, Armenia, Georgia, Azerbaijan, Iran, Turkmenistan, Tajikistan, Uzbekistan, Kyrgyzstan, Kazakhstan, Afghanistan, Pakistan, India, Nepal, Bhutan, Bangladesh, Myanmar, Thailand, Laos, Viet Nam, Cambodia, Malaysia, Singapore, Indonesia, Brunei  Darussalam, Philippines, Taiwan, South Korea, North Korea, China, Mongolia, Japan, Palau, Timor-Leste, Papua New Guinea, Solomon Islands, Nauru, Kiribati, Marshall Islands, Micronesia, Tuvalu, Fiji, Tonga, Samoa, Vanuatu, New Zealand, Australia, Sri Lanka, Maldives, Seychelles, Somalia, Comoros, Madagascar, Mauritius, Oman, United Arab Emirates, Qatar, Bahrain, Saudi Arabia, Kuwait, Iraq, Russia, Ireland, Iceland, Greenland, Mauritania, Mali, Burkina Faso, Niger, Chad, Central African Republic

I had originally tried to manually visualize this route by plotting and drawing lines.  This was way too difficult and time consuming, so I figured there must be a better way using R.  Using RgoogleMaps, I came up with the following visualization:


This worked really well except when it came to the international dateline.  The plotting function interpreted the countries of Tonga and Samoa as being on the other side of the world from the other islands they are close to, because they have negative longitudes while the other islands have positive longitudes.  However, they are quite close geographically, which is why they were selected by the algorithm as being next in the route.  To fix this, I gave them a positive longitude by adding 360 to their actual longitude.  This corrected the problem visually.





For fun, I wrote a program to show the progression of the route from stop to stop.  Here is the result:




Happy travels!



Wednesday, April 1, 2015

Traveling State to State Using Simulation

Introduction

I know, I know... People have already blogged about the shortest/longest routes in traveling from state to state.  So nothing here is likely original.  But I thought I would do my own analysis with my own set of rules as a fun exercise in simulation.  Below is my method, my findings, and conclusions.

Method

I wrote an R program to simulate traveling from state to state.  I used the latitude/longitude for each state capital as the destination point in each state.  Always starting with the state of Washington (my home state), the program would then proceed to "move" from state to state until each state had been stopped in once and only once.  The program calculated the distance traveled "as the crow flies" between the states and then added it to the total.  The final output of the program listed the travel route (e.g., "Washington, Oregon, California,....") and the total mileage (e.g., 17650.37).

First Attempt: The Random Walk

My first program chose each successive state randomly.  I ran this simulation 100,000 times.  Here are the results:
  • Minimum distance: 41,077 miles
  • Minimum route: Washington, Oregon, Alaska, Hawaii, Arizona, California, Montana, New York, Connecticut, Idaho, Kansas, Mississippi, West Virginia, Wisconsin, Maryland, Wyoming, Utah, Ohio, Massachusetts, Rhode Island, North Carolina, New Jersey, Delaware, Florida, Louisiana, Oklahoma, Iowa, Minnesota, Indiana, Alabama, Tennessee, Nevada, Pennsylvania, Maine, Vermont, Kentucky, Colorado, Nebraska, New Mexico, New Hampshire, Virginia, South Dakota, North Dakota, South Carolina, Arkansas, Illinois, Georgia, Michigan, Missouri, Texas

  • Maximum distance: 73,221 miles
  • Maximum route: Washington, Indiana, South Dakota, Louisiana, West Virginia, Wisconsin, Montana, Florida, Connecticut, Alaska, Virginia, Oklahoma, Colorado, Pennsylvania, Wyoming, Delaware, Texas, Michigan, Massachusetts, Idaho, Vermont, South Carolina, Utah, New Hampshire, Nevada, Ohio, Arkansas, Kentucky, Mississippi, Nebraska, Alabama, New Mexico, Maryland, North Dakota, Illinois, Maine, New Jersey, California, Missouri, Minnesota, North Carolina, Hawaii, Rhode Island, Kansas, New York, Arizona, Iowa, Georgia, Oregon, Tennessee

  • Average distance: 59,025 miles




Observations:

Given that we are choosing 50 states from 50 states (well, 49 from 49 since we start with Washington each time), there are 3 * 10^64 permutations of possible state to state routes available.  Even my simulation of 100,000 routes (supposing they are unique) covers only 1/(3 *10^60) possibilities.  That is practically 0%.  So the chances of stumbling across the "shortest" or "longest" routes randomly are pretty much non-existent. 

The "shortest" route randomly generated starts off well, staying largely on the west coast.  But then it jumps to New York and Connecticut, then back to Idaho.  It continues to jump to far off capitals when there are closer capitals available.  Similarly, the "longest" route does not always go to the farthest away capital.  Clearly, to find the "shortest" and "longest" routes we need give the computer guidance in selecting the next capital to visit.

Second Attempt: The Closest/Farthest State

I revised the program to always choose the closest state capital for the next stop.  As there is only one closest state capital to each state capital, there was only one possible route to take.  Similarly, I had the program select the farthest state capital for the next stop, resulting in one unique route.

  • Minimum distance: 17,648 miles
  • Minimum route: Washington, Oregon, Idaho, Montana, Utah, Wyoming, Colorado, New Mexico, Arizona, Nevada, California, South Dakota, North Dakota, Minnesota, Wisconsin, Illinois, Missouri, Kansas, Nebraska, Iowa, Indiana, Kentucky, Ohio, West Virginia, Virginia, Maryland, Delaware, New Jersey, Pennsylvania, New York, Connecticut, Rhode Island, Massachusetts, New Hampshire, Vermont, Maine, Michigan, Tennessee, Georgia, Alabama, Florida, South Carolina, North Carolina, Mississippi, Louisiana, Arkansas, Oklahoma, Texas, Alaska, Hawaii
To visualize this, here is a map:


  • Maximum distance: 76,124 miles
  • Maximum route: Washington, Hawaii, Maine, Alaska, Florida, Oregon, Massachusetts, California, Rhode Island, Nevada, New Hampshire, Arizona, Vermont, Idaho, Connecticut, Utah, New York, Montana, New Jersey, New Mexico, Delaware, Colorado, Maryland, Wyoming, Virginia, North Dakota, South Carolina, South Dakota, North Carolina, Texas, Pennsylvania, Oklahoma, West Virginia, Nebraska, Georgia, Minnesota, Louisiana, Michigan, Mississippi, Wisconsin, Alabama, Iowa, Ohio, Kansas, Kentucky, Arkansas, Indiana, Missouri, Tennessee, Illinois

Observations:
While the maximum route only increased by about 3,000 miles, the minimum route decreased by over 23,000 miles!  However, one can tell by looking at the map that this is clearly not the most efficient path to follow.  By selecting the closest capital, the program often got itself stuck in a corner and then had to jump a long distance to get to then next state capital.  Or it had to cross states it had already visited.  The jumps from California to South Dakota, Maine to Michigan, North Carolina to Mississippi, and Texas to Alaska make this very clear.

What this shows is that going to the closest state capital may not in fact lead to the shortest route overall.  In other words, it might make more sense to go to a capital a little farther away in order to shorten the overall route.  However, it is also obvious that at some point, going to a capital farther away will make the route longer.  For example, going to the capital farthest away most likely will not decrease the overall route distance.  So one should go to a capital that is close, but not the closest, in some cases to shorten the overall route...

Third Attempt: The Top n Closest States

To give the program more flexibility in selecting the next stop, I had it find the top n closest states, and then it chose randomly from that set for it's next stop.  For example, if it was finding the top 2 closest states for Washington, it would find Oregon and Idaho and randomly choose one or the other.  Some routes started with "Washington, Oregon,...." while others started with "Washington, Idaho,...".  In this way, the program could go to a state that was close, but not always the closest, in hopes of shortening the overall route. Using the top 2 closest states produced the best results.   I ran this simulation 100,000 times.

For n=2:
  • Minimum distance: 16,245 miles
  • Average distance: 22,360 miles
  • Minimum route: Washington, Idaho, Utah, Wyoming, Colorado, South Dakota, North Dakota, Minnesota, Iowa, Nebraska, Kansas, Missouri, Illinois, Wisconsin, Michigan, Indiana, Kentucky, Ohio, West Virginia, North Carolina, Virginia, Maryland, Delaware, Pennsylvania, New Jersey, Connecticut, New York, New Hampshire, Massachusetts, Rhode Island, Vermont, Maine, Tennessee, Alabama, Florida, Georgia, South Carolina, Louisiana, Mississippi, Arkansas, Oklahoma, Texas, New Mexico, Arizona, Nevada, California, Montana, Oregon, Alaska, Hawaii
To visualize...



Observations:
While on average, these routes were longer than always going to the closest state, on occasion they were shorter.  The shortest route by this method was 1,400 miles shorter than the previous method.  The shortest routes all circled around the United States, either along the top or the bottom going east, and then came back west on the opposite side as going east.  Finally, they all went to Alaska and then ended with Hawaii.  This makes sense as it means that very little ground is covered twice, and the states farthest away are left for last and are reached by the closest remaining states.

Still, one can see from the map that there are some inefficiencies, lines crossed, states covered twice, etc.  So it is not perfect...

Fourth Attempt: Bubble Sort Style Algorithm

I combined the previous attempts into a bubble sort style algorithm.  I had the computer generate a route randomly and then calculate the total mileage. I then had the program swap states and recalculate the mileage.  If the mileage decreased, the new route was kept, and the process was repeated.  The program did this until every possible state swap had been attempted and no more swaps occurred.  That is, no more swapping could reduce the total mileage.  Then the final route was output.  I ran this simulation 1,500 times each for the shortest and longest routes.
  • Minimum distance: 15,173 miles
  • Average distance: 19,760 miles
  • Minimum route: Washington, Oregon, Idaho, Nevada, California, Arizona, Texas, Oklahoma, Kansas, Nebraska, Iowa, Missouri, Arkansas, Mississippi, Louisiana, Alabama, Florida, Georgia, Tennessee, Kentucky, West Virginia, South Carolina, North Carolina, Virginia, Maryland, Delaware, Pennsylvania, New Jersey, Connecticut, Rhode Island, Massachusetts, New Hampshire, Maine, Vermont, New York, Michigan, Ohio, Indiana, Illinois, Wisconsin, Minnesota, North Dakota, South Dakota, Wyoming, Colorado, New Mexico, Utah, Montana, Alaska, Hawaii



  • Maximum distance: 81,927 miles
  • Average distance: 81,240 miles
  • Maximum route: Washington, North Carolina, Idaho, Virginia, Nebraska, Rhode Island, Arizona, Massachusetts, Illinois, Arkansas, Vermont, Oklahoma, New Hampshire, Missouri, New York, New Mexico, Connecticut, Colorado, Pennsylvania, Hawaii, Delaware, Nevada, Maryland, Wyoming, West Virginia, Oregon, Kentucky, Montana, South Carolina, South Dakota, Tennessee, Alaska, Georgia, Iowa, Florida, North Dakota, Alabama, Minnesota, Mississippi, Wisconsin, Louisiana, Michigan, Texas, Maine, Kansas, New Jersey, California, Ohio, Utah, Indiana

Observations:
This method was a great improvement over the previous methods.  The total mileage decreased on average and the shortest route was 1,100 miles shorter than the previous shortest route.  This route also had no crisscrossing of paths, the first time I had seen this occur (and which would be expected in the shortest route possible).  The longest route of 81,927miles  was 5,803 miles longer than the previous record of 76,124 miles.  All in all, this method was much better than prior attempts.

 Fifth Attempt: Insertion Sort Style Algorithm

The previous attempt, despite being very successful, was incredibly slow.  It took over 12 hours to get 1,500 results.  If I wanted to do a larger simulation, I would need to have a faster algorithm.  Also, being so close to breaking 15,000 miles, I wanted to see if there was a route that was in the 14,000s.  Similarly, being so close to 82,000 miles, I wanted to see if there was a longest route that was in the 82,000s.

I knew bubble sort wasn't the fastest algorithm, so I looked at other sorting algorithms to determine which would be best for my problem.  It seemed that insertion sort was going to be the best and most appropriate for my needs, so I rewrote the program to use insertion sort.  The program created a route randomly.  It then took a given state and inserted it into each other possible position in the route.  If the mileage decreased, the insertion was kept, and the program moved on to the next state.  When it got to the end of the route, it repeated the process starting at the beginning of the updated route.  This was done until no more insertion changes were made.  Then the final route and mileage was output.  The time a complete process took did not decrease dramatically, so I did 500 simulations each for finding the longest and shortest routes.

  • Minimum distance:  14,589 miles
  • Average distance: 16, 156 miles
  • Minimum route: Washington, Oregon, California, Nevada, Idaho, Utah, Arizona, New Mexico, Colorado, Wyoming, Oklahoma, Texas, Louisiana, Mississippi, Arkansas, Missouri, Illinois, Indiana, Ohio, West Virginia, Kentucky, Tennessee, Georgia, Alabama, Florida, South Carolina, North Carolina, Virginia, Maryland, Delaware, New Jersey, Connecticut, Rhode Island, Massachusetts, New Hampshire, Maine, Vermont, New York, Pennsylvania, Michigan, Wisconsin, Minnesota, Iowa, Kansas, Nebraska, South Dakota, North Dakota, Montana, Alaska, Hawaii


  • Maximum distance: 80,576 miles
  • Average distance: 79,634 miles
  • Maximum route: Washington, Florida, South Dakota, Tennessee, Alaska, Georgia, Minnesota, Alabama, Wisconsin, Oklahoma, Indiana, Texas, Michigan, Louisiana, North Dakota, Mississippi, Maine, Nebraska, Connecticut, Missouri, Rhode Island, Hawaii, Ohio, Kansas, South Carolina, Idaho, Delaware, Nevada, New York, Wyoming, New Jersey, Colorado, Maryland, California, North Carolina, Montana, Massachusetts, New Mexico, Virginia, Utah, West Virginia, Iowa, Vermont, Oregon, Kentucky, Arizona, New Hampshire, Arkansas, Pennsylvania, Illinois

Insertion sort didn't work so well for finding the longest route, so I tried the bubble sort method again, simply to try and break 82,000 miles.  After 1,000 simulations, it still hadn't broken 82,000 and had only improved by 1 mile.
  • Maximum distance: 81,928 miles
  • Maximum route: Washington, Kentucky, Utah, Pennsylvania, Kansas, Massachusetts, Arizona, Rhode Island, Nebraska, Tennessee, Iowa, Ohio, California, Maryland, Nevada, Virginia, Idaho, Indiana, Texas, Vermont, Arkansas, Michigan, Louisiana, Wisconsin, Mississippi, Minnesota, Alabama, North Dakota, Florida, Alaska, Georgia, South Dakota, South Carolina, Montana, North Carolina, Oregon, West Virginia, Wyoming, New York, Oklahoma, Maine, Missouri, New Jersey, Hawaii, Delaware, Colorado, Connecticut, New Mexico, New Hampshire, Illinois
Observations:
While my simulation easily broke into the 14,000s, it could not break into the 82,000s.  Perhaps with more time, more computing power, and a further refined algorithm this would eventually happen.  But a large gain in either direction was very unlikely at this point, so I decided to be content with my findings.

 Conclusion

The shortest route from state capital to state capital seems to be about 14,500 miles, while the longest route seems to be about 82,000 miles.  I'm sure that a shorter route exists than what I found, but this is not likely to be much shorter given that very few miles were gained in these final attempts.  As for the longer route, there probably is a slightly longer route, but I am skeptical that one can break 82,000 miles.  However, I'd be happy to be proven wrong...

Overall, this exercise shows that simulation is a very powerful tool in solving problems, particularly when implemented well (i.e., goal directed and iterative).

Happy travels!