First assignment: exploring Data journalism and visualizations

For this assignment I have chosen to explore the data journalism field and how to visualize data and make interactive the contents.

Completeness of a story

This decision came from my past experience as a journalist working in a traditional local newspaper, the Corriere Adriatico. Here I found my work terribly limitating, as what I have been supposed to do was just reporting what were happening in the town. All the times that I had some data to deal with I wasn’t able to visualize them in order to make my stories less boring. Or, moreover, I was unable to build a story on them because of my editors’ lack of trust in this practices. Data journalism represents to me the possibility to make a story more complete and attractive for the readers. This is why I don’t really like to discern the journalism that uses data from the traditional one, because this way should be followed by all the traditional media.

Useful for this purpose has been the “Introduction to infographics and data visualizations” held by Alberto Cairo through which, before to start the course, I have learned how to use a Tableau Public software, and how to judge a visualization.

What means to do data journalism today

I really enjoyed the similarity found by Simon Rogers, editor of the Guardian Datablog, between data journalism and punk, because in fact this describe pretty well the reality of this field. Anyone can do it. With the difference that making punk music involves to buy an instrument, while the Web is plenty of free useful tools that can be used to tell stories with numbers. What I have used mostly are Tableau Public, Datawrapper and Infogr.am for visualizations, Google Docs and Outwit Hub as scraping tools.

Data means describe the reality. Really?

The interesting fact about data is that everyone expects them to be the better medium to describe the reality. They are even the best medium used by scientists to describe the universe. But, how are them collected? This reading have been really helpful to know the answer to this question, describing how surveys and procedures may be flawed or in incomplete, undermining sometimes the reliability of the information collected. A good example experienced during my activity is represented by the data about the people killed in Pakistan by drones during the Obama administration. What I based my work on is the ones collected by the Bureau of Investigative Journalism, while different data are collected by the New America Foundation. Who can we trust? Additionally, the data I used, involve the minimum number of the deaths and the maximum one, because the information the Bureau got derives from different sources, that claim different numbers.

Communities of practice

Throughout the web it is possible to find a wealthy quantity of community built around data journalism. One of this is surely Hacks and Hackers. I met the Birmingham group during a introductive lesson about data analysis. Another group is the Data Driven Journalism group. Being subscribed to the mailing list, I have asked to the group, a bit worry to be taken like a ingnorant, what would be the best tool to create some kind of interactive product by merging data with other forms of media, like videos. And they responded enthusiastically recommending a few of them that I already knew.

A second assignment proposal 

A future project that I would like to realize is a focus on the European farm subsidies. More then once the European Parliament has been accused to don’t meke transparent their fundings to farm activities. What I would  like to do is to get the data from an interesting website, called farmsubsidies.org, that aims to collect information about this topic, making a series of visualizations to show where this money go and who really get these financial helps.

Advertisements

From scraping to visualizing the killings by drones in Pakistan. The Outwit Hub (level 2)/Datawrapper combination.

I have started to look at the number of people killed by drones in Pakistan because I wanted to produce some kind of multimedia interactive project using Zeega, a particular new on-line tool which is possible to use to combine videos, picture and sound uploaded in the Cloud. However, I wasn’t able to stand working with this tool. But I’ll talk about that in a future post.

One of the subject which collects data from various sources is the Bureau of Investigative Journalism, that have a special section with the data from Pakistan, Yemen and Somalia since 2009 until now. The problem with them is that they are not in a form that can be downloaded. Indeed they are just wrote down in a sort of list.

The three levels of Outwit Hub 

What can we do? The solution had been suggested by Paul Bradshaw, who recommend to use the =importXml of Google docs. I had a go with that but the html of the site is not formatted in a way that I could deal with, due surely to my lack of experience with that. Reading the code I thought that perhaps Outwit Hub could be the proper mean to reach my target. I have already used in the past this tool and it has been revealed to be really useful, mostly because, if you are lucky, it doesn’t need programming literacy. The potentiality of the software can be divided into three levels:

  •  level 1: it is enough to paste the link and click on the table, list or guess buttons on the left. This is what I did in a previous post
  • level 2: we need to look at the code with the scraper function. Found the information we need in the code it is sufficient to insert the tags before and after of them in the “marker before” and “marker after” sections.
  • level 3: use the regular expression language.

As Regex looked difficult to learn, I aimed to use the level 2 of Outwit.

After an initial struggle I found a way to get two columns, one for the date and one for the killings. Seeing that the code before the month of the attacks is <p><strong> I have put it in the “marker before” and the year of the attacks as a “marker after”, because I obviously don’t need it.

For what concern the numbers of people killed I put a strange poligonal symbol as a “marker before” and then the word “killed” as a “marker after“.

Immagine

And that’s it. As a result we got a table with more than than two columns, with the others having the injured of the strikes and the children killed. Repeated the same process 5 times for the pages referring to a specific year since 2009, we got 5 tables with the 2 columns which we are interest in, date and number of killed.

The good old Excel

What we have to do now is merge the tables in one. I thought initially to test the merge function of Google Fusion Tables but I wasn’t confident that the results would have been interesting for what I needed, so I used Excel.

The first column contains a code followed by the date, while the second has the minimum number of killed separated by a -. Highlighting the entire column I used the “text in column” function to separate them. But as a result of this I had the month and the number of the day separated as well. It is not a bad news anyway. Because just inserting in a left column the year and using the CONCATENATE function to merge the three column I had what I was looking for, a date column and, well, two columns with the killings, the minimum one and the maximum.

Why two columns? This is because the Bureau rely on different sources and sometimes the data collection in such drammatic event comes to different subjects, so they include a range between the minimum estimated number of people killed and the maximum one.

Anyway I had to make sure that the CONCATENATE function included also the slash between the year, month and day. This is possible just inserting a column, clicking in the first cell, writing “=CONCATENATE”, selecting the day column in text 1 box, writing “/” in text 2, and repeating that for the month and year columns.

But it’s not finished as we need to clean a bit the spreadsheet. In fact we have some extra words, like “total“, in the killed columns and, in the maximum column a few cells are empty. This is because the number of the victims are certain. So first of all we need to highlight the killed columns, format them from “general” to “numbers” so the words suddenly disappear. And then copy for each empty cell the figure of the minimum column.

The speadsheet should be like this.

Killed by drones spreadsheet

Probably there is a better way to do this operation.

Now we have five organic spreadsheets, that can be merged by copy and paste.

Visualizing with Datawrapper

The next step is visualize them. Because I have already used Tableau Public, this time I want to take advantage of the on-line resource Datawrapper.

It is a simple-to-use tool, that can offer different kind of visualizations, giving also the possibility to the reader to get the data that the creator used. This is a review by journalism.co.uk.

The process is divided into 4 steps.

  • First of all we have to copy and paste our spreadsheet.
  • Secondly we are offered to choose if the row or column to use as label and to indicate our source.
  • In the next step we can select the kind of visualization to use. For data through a period of time I personally prefer the line graph. So, selecting the line graph, what we see is a mess, because the max number of dead cover the minimum. The solution may be to use the highlight function to spotlight the min number of dead.
  • The 4th step allows us to publish our work. I personally decreased the heigth and increase the width, so the trend results clearer.

This is the result.

From Scraping to Mapping the local currencies: the Outwit Hub/Fusion Table combination (part 2)

What follows is the description of my personal experience of mapping the unmappable, through a classic “trial and error” learning.

Google Fusion Table is a really useful and versatile tool, that can be used for multiple tasks, like merge two tables or even create charts. However, I have personally used it mostly to map specific locations, like cities or others.

Once in Google Drive website it is sufficient to click on “create/Fusion Table” and import the spreadsheet from our computer. After the preview and having inserted the description of the project it is possible to see this page.

Local currencies

At this point I’d suggest to see what the map visualize clicking on the “Map” button on the top.

The locations are supposed to be more then 100, but just about half of the group is de facto visualized. What’s the reason? It’s really simple. Most of them are not indeed proper location. There are places called “locale”, “francophone”, “movement national”, or “All communities”. They simply describe generally where these systems can work, so, while some of them are proper mappable locations, some others are not.

So, what to do? In case we desperately need this map I’d suggest ignorantly to modify each of them in order to have at least a country as a locations. In this way “Scheme UK wide” will become just “UK”, or “Francaise” will be “Belgium” (it refers to the French part).

Does it look better? Yes. Is it what we had in mind? Absolutely not. Again, as I have found this topic really interested as I wrote in part 1, I really had the intention to show it. The worst part came when I studied a bit the trends of these currencies.  What I wanted to do was a trend chart which would have shown how many local systems were created yearly, and I thought to take into consideration the numbers represented in this page.

However, digging in the site, I have found that they don’t refer to their date of creation, but just the date in which the joined the website system. Which means that other important systems may exist but, as they might not know this website, they are not registered. And what I wanted to should might be just a part of the entire topic.

Do you think it is another failure? Probably, but this experience could lead to another open question: the lack of direct data and the impossibility of their collection in some circumstances.

In case you are interested in the map I obviously couldn’t embed it in this wordpress site, This is just snapshot. Enjoy!

Local currencies

 

Trial and error with waste: a Birmingham Mail experience

When it comes to learn how to make journalism, one of the best way is surely the experience in the field. At least this has worked for me so far, initially with my Italian “newspaper” prose and the interpretation of the local regulations, and now with the data analysis.

not_all_learning_in_the_field

It was a friday when I knew the Birmingham Mail might have been interested in an article based on a dataset I found in Defra (Department of Environment, Food and Rural Affairs) website, and that they wanted something before a budget discussion, held on the following Tuesday. A three day deadline, a dataset and quotes to ask. Not so bad, but considered my inexperience in dealing with data I had a lot of reasons to be anxious.

The data provide the annual figures about waste management in all the British regions. As it can be seen there are 4 tables, where the first one is probably the most important, as it gves all (or, how we’ll see later, the most) of the management particulars. They are not so many, but they may be tricky for a not expert eye like mine and, what I have realized at the end of the day, is that it took me too long to analyse these data, to visualize them and to write a simple story.

In each table there’s a big red phrase on the bottom who recommend to avoid to sum up the data as they may result in a double counting, which in the first instance I obviously ignored. Basically we can divide the information the dataset provides in three, that are the total waste collected by local authorities, and the household and commercial parts of the total. While the first represents what the councils collect from the citizen’s dwellings, the second one is about the waste produced by industrial and commercial activities. For every category there are many entries, among which the recycling and not recycing part, that are what we are interested in.

Knowing something vague about some European targets to be met in a short time in Italy, what I wanted to see was the Birmingham City position in a recycling rate list. Having the total amount of waste collected and the recycled one both in tonnes, it is possible to calculate the percentage of the latter (total waste recycled /total waste), and set the data in a “from the smallest” list. To be honest this is useless as it is already calculated in the table 2, indeed the one I was considering, but it’s good to practice anyway. Doing this we can see that Birmingham is at the 6th position, the smallest number after Westmister and others that are not part of the main island. It is a story! And it was until I spoke with a member of the City Council.

The cold sweat: always read the dataset description! 

At this point it is worth to say why the data are so tricky. In table 1 we have about 380 local authorities and in table 2 just 120. And it the fist one Birmingham was on the 28th position! Why? 

While I was thinking that my story was irreparable flawed, I read the “notes for tables” part of the dataset, in which the reason is clearly explainedSome kind of authorities are described like being collection authorities, while the others are unitary and disposal. Some of the collection ones are included in others authorities like a Russian doll (Matrioska), so it is explained the reason of the big red warning. The solution? The same notes explain that it is enough to exclude the collection authorities, and in fact our Birmingham City Council came back to 6th position. 

images

Always seek a response  

At this point, and had already missed the Tuesday deadline because of several reasons, after asked the Council for a quote I have been warned by one of its spokesperson that my consideration of the total waste rather then just the household one is wrong.

In fact almost every legislation in this topic consider just the household waste collection as it is directly collected by the Council, while who deals with the commercial one are in big part the companies themselves through private contractors. What it is described in the spreadsheet is just a small part of what the Council collects from this companies, amounting in just less then 20% of the total, while the rest is covered in the shade, as the City Council do not hold such information.

The story is broken

At this point the disappointment was inhibited by the possibility of reshaping the story from “the second worst of the Mainland outside London” to the smaller sized “the worst in the West Midlands which has been published anyway.

However, the importance of this experience has been inevaluable considered my learning the way to read and interpretate datasets, and to deal with local authorities.

My first business canvas

So the time to shape a bit your Enterprise project has just come.

In fact, what it has been done so far is just thinking and imagining yourself and your life fitting in an enterprise, but when it comes to plan its various elements, it requires to focus about some more details then just say “I want o be a freelancer!”.

One of the best tools that can help at this moment is surely a Business Canvas. It forces you to concentrate on different aspects and of what you have in mind as well as the subjects that will be involved.

For a canvas sample visit this website; it gives also some advises about how to draw a good one and how to deal with a lack of ideas.

For what concern my freelancing activity, what it needs mostly is to have a strong network of media activists, newspapers, governmental and non-governmental organizations, and much more.

In case you want to see my canvas here you are:

canvas

As I said now is time to focus a bit in the details. What can really help at this moment is a sort of timeline in which we can set every single step of our project. Not just general stuff like “build a contacts network”, but even “say hello in Facebook to Mr. Smith of Friends of the Earths” or “participate at that environmental journalism workshop”, and so on. The more specific and simple the better.

In case of laziness on draw a timeline on a paper check out this on-line tool, it just needs a quick registration and may help contributing to the deforestation (!).

Enterprise project decision (maybe)

Think about an enterprise project that you would carry on in this course.

Simple. I mean, the recipe shouldn’t be complicated. Just take what you love most, add the reasons that moved you to join the course, a wedge of imagination and a dusted of passion. Or this is what I thought initially. But then, after letting the mind operating on its own, some disparate projects started to crowd, and I needed a huge selection to come up with an idea, even if I am still not sure.

This is represented by setting a journalism free-lancing business about environment, eco-sustainability and development using mainly data and visualization.

Image

A free-lance business is probably, like the word itself suggest, the way through which creative people ideally want to support their work. No office and no fixed timetables may represent a good incentive to be more focus on oneself development rather than a company one.

But the negative downside embodies the risk to be unable to find someone interested to buy your product. Mostly in journalism, where the free-lancing competition is tough. However, considering my interest in covering a topic a bit neglecting by the mainstream, I aim to collaborate with some NGOs or other organizations in order to collect and study data to find interesting story.

In case of a lack of support by any organization, I thought about the increasing fame of the micropublishing phenomenon, that is basicly a way to publish your own work by yourself, finding in an interested niche of people one of my sources of revenue.

Image

This is probably the most viable of the projects that came in my mind recently, even if, again, others may be better.

What do you think about? 

Here we are! My first post from the BCU

Hi everybody,

as its description declares I have started this blog as part of my MA On-line journalism course at the well-known Birmingham City University. To be completely honest I was intuitively thinking to keep everything I would have learnt in secret, like a martial artist doesn’t want to show her/his most powerful techniques, but this concept simply doesn’t fit in the New Media world.

BeFunky_Samurai Fighters

It’s likely wiser for every journalist wanna-be to have a sort of showcase of her/his ability and to practice them frequently through as many media as possible. Like the successful video-journalism entrepreneur Adam Westbrook suggest in this old article:

“You have no excuse. Get a blog. Get writing. Get used to it. Blog about what you’re learning, or what you want to learn. Use it to get involved in the debate about the future of journalism”.

And considering that a lot of experts have the same idea on learning to do good journalism, who is Cristian Giulietti to claim the opposite?

So, here we are and hope for the best!