What follows is the description of my personal experience of mapping the unmappable, through a classic “trial and error” learning.
Let’s start from the problem. Let’s say we want to produce something about alternative currencies. It is pretty attractive in these hard times, while we are watching the social catastrophe it is happening in countries like Greece, Spain and Italy if there are alternatives to the money that we are using.
Let’s say we ideally want to do a map for an overview of the European landscape and some kind of trend chart to show the average evolution of these over the time. As explained in a previous post, in which I played a bit with Tableu, we need a spreadsheet to which work on.
From my perspective the solutions are two: the evergreen “copy and paste”, or a more professional scraper. While with the first solution I am not sure to get some nice results to be put on an Excel table, using the second one may more likely lead to a well organized spreadsheet.
Scraping the link
According to the book Scraping for journalists by Paul Bradshaw, one of the best tools to use in this case, and that can (sometimes) work without some programming skills is Outwit hub. It needs to be downloaded from this site.
First of all I have based my work on the database of the Complementary Currency Resourse center. This site is pretty tricky as once you click on one of the links used to navigate the site the url doesn’t change. A scraper most of times need a specific url and, as we want just an European overview of local currencies, it is necessary to have a look at the html version of the site (right click and inspect element), going with the cursor over the code correspondeted to the highlighted part, and select the part of the “europe” link, which is the 4th <li> tag under the <ul class = “sideMenu2”> one. Here it is possible to find the real one, which is
Now that we have the proper link we can work on it with Outwit Hub, just inserting it into the url space on the top of the software page. Then it is just enough to click on the table option on the left and that’s it, isn’t it?
Trial and error
Because in this way we have just part of the list that we need. In fact at the bottom of the page there are the links for the rest (51-100, 101-108). So, what’s the solution? Just have a look at the html as before and we can find out in the <table width “99%” align “center”> tag that there are a couple of different url. Looking at them we can notice that in the middle they have 1&s and 2&s, so the first must be the one with 0&s.
So we have to copy and paste these links in Outwit. I personally did this three times, exported the results in excel and then copy and paste to have an organic list. But there must be a better and more professional solution.
At this point it would be worth to eliminate the “stopped” category as, as the category’s name suggests, they have stopped to work as currencies anymore and other useless columns, with the aim to have just the “community“, “local exchange system” and the “url” ones.
Here we notice a nice trouble: in the local exchange system column, both the name of the currency and the type of it are in the same cells, separated just by a semicolon.
The solution? Easy, using at least my old school version of Excel . Here‘s the exact procedure explained by the Microsoft staff.
Then what I did was to insert the country column aside the community one, because i was expecting to use tableau Public like I did the time before, but I indeed wanted to change approach and I opted for a new tool, Google Fusion tables….