Digital Journalism Two Workbook- Ellie Phillips

15 May Digital Journalism Two Workbook- Ellie Phillips

Monday 28th January 2018
Digital Journalism 2 – 2pm

• Data journalism – Facts and statistics collected together for reference or analysis.
• Records or measurements of something. These are often numbers – but not always.
• Gathered and analyzed in a rigorous, rule – governed way, using consistent units of each dataset.
• Concerned with things that can be verified – FACTS.
• Normally refers to or represents something other than itself.
Why do Journalists use data?
• It’s a great source of stories.
• Provide context and background.
• Explain developments (eg. The next financial crisis)
• Create personalized calculators (eg. How tax changes may affect you)
• Improve reports.
• Dispel myths.
Issues affected by many misconceptions
(Remember, it’s always about people.)
Investigative data journalism (IDJ)
• Plenty of time.
• Advanced data skills.
• Find hidden meanings.

General data journalism (GDJ)
• Hours or days available.
• Modest data skills.

Real – Time data journalism
• Based on algorithms that automatically create news from data sources.

Monday 4th February 2019
Data Journalism 2 – 11 till 1pm
Visualizing Data
Ways of visualizing data

• Static infographics (print & web)
• Standalone charts/charts as stories
• Interactive charts
• Interactive data visualization
• Scrolling stories ‘Scrollitelling’
• Interactive tools and dashboards
• Animation
(Adds a specific element according to the way we want to show the data that is gathered)
Data Visualization: Find The Story
Text as Data

• State of the Union (SOTU) speeches
For data to have any meaning it must be consistent – for a comparison. Textual analysis can provide you with lots of clues about things.
How specific keywords and concepts relate, is it an issue or do they thing people want to hear about these issues? Why did we talk about it here?
Defense Department used to be called War Department. Countries used to be female but are now are referred to as ‘it.’
Nationalism.
(Find your data and a way of representing it – then you have to find your story that relates to the data but does not necessarily come directly from it.)

Workshop – 3pm to 4pm
Infographic is visual
Week 9 & 10 – working on our projects (piece of journalism for which the infographic supports)

Coursework 1 – 80% // Coursework 2 – 20%

lynda.com

Monday 11th February 2019
Broadcast Journalism 2 – Facts, Stats and Lies
Extrapolation – using the data on the facts from the past and trying to predict behavior in the future.
Don’t just publish raw data. Tell a story!

Florence Nightingale, 1857 – Rose Diagram of Causes of Death in the Army
• Health improvements raised life expectancy in the UK by 20 years saving millions of lives.
Improved hygiene: change in weather conditions? Changes in warfare?
“There are three kinds of falsehoods: lies, damned lies and statistics.” – Mark Twain

Data Terminology
• Datapoint: one piece of data.
This apple costs 20p
• Dataset: a whole array of data
Cost of all the fruit and veg in the supermarket
• Variable: one dimension of the dataset.
Cost of apples in the supermarket
(Pokemon Go – GPS data)
• Raw data: unprocessed, detailed, has not been manipulated or analysed.
• Aggregated data: grouped or combined from several measurements.
Average – Mean – Median – Mode
• Average (Arithmetic) Mean: add all and divide by number of entries
• Median: the middle number or, if even number the sum of the two middle divided by 2
• Mode: most frequent or common value
30, 56, 65, 70, 84, 90, 90, 91, 92
All of them are accurate but depending on the persons intentions they would decide which one to use.
“All statistics are a summary of a more complicated truth.” – Tim Harford

Accurate sources, good sources, ask a good amount of people and representative of the people.
Mona Chalabi
“The best financial advice for most people would fit on an index card.” – Harold Pollack, professor at the University of Chicago 2013
1 – Observe your feelings
2 – Understand the claim
• What does it mean?
• Is this a causal relationship?
• What’s being left out?
3 – Get the backstory
4 – Put things in perspective
• Is that a big number?
• What is the historical trend?
• Beward ‘statistical significance’
5 – Embrace imprecision
6 – Be curious
• Go another click
• Treat surprises as a mystery
FT’s Tim Harford Citizens Guide to Statistics

(Homework – Listen to the FT discussion between Sarah O’Connor and Tim Harford, write a 300 word story that includes the following:
Summarise the main points made about how we approach statistical claims – be careful to attribute properly.
Discuss any particular reservations, points of agreement or comments you have on the points that Harford makes.
Consider your own approach to statistical claims and give an example where you were not realizing you might have been mislead.
Describe how new information on understanding statistical claims has changed the way you approached a CURRENT news story – news this week or the coming week.)

Monday 25th February 2018
Digital Journalism – Big Data Journalism
Introduction to Data Science
Big Data

Big data is information assets with the four Vs:
• Volume: how much?
• Velocity: growth rate?
• Variety: types of data?
• Veracity: reliability/consistency?

While all four Vs are growing, Variety is becoming the single biggest driver of big-data investments.

Data Security and Governance
Big data environments currently need a complex security architectural model. Security mechanisms: (encryption/obfuscation/loggers/monitors) must protect Data at Rest and Data in transit.

  • Get data
  • Clean, Prepare & Manipulate Data
  • Train Model
  • Test Data
  • Improve

Phase 2 typically represents 80% of the whole analytic process.
Know your data sources
• IoT: Device, Network and Sensor Data
• Provenance of data can be an issue… get consent!
• Use “reliable” open source data repositories
Kaggle, Data.gov.uk etc

Qualitative and Quantitative

Data Integration
• Combining all that data and reconciling it so that it can be used to create reports can be incredibly difficult.
• Vendors offer a variety of ETL and data integration tools designed to make the process easier.
• Many enterprises have not solved the data integration problem yet.

Data Cleansing/Wrangling
Extract, Transform, Load (ETL) is data pre-processing, an essential step in organizing, cleaning & unifying data for a data warehouse.
Generating Useful Insights: Skills
R
• Easy to learn
• Statistics based functions
Python
• Relatively easy to learn
• Requires knowledge of programming fundamentals

Finding Data Sets
Search for central government sources
Office for National statistics – elections data

You can find data for anything that you wish to search for – American Government, European Data etc

Monday 4th March 2019
Digital Journalism 2 – FightHoax
Dataset: check LSBU data on mental health (dropouts, mental health cases, increases/decreases.)

“How the idea of Fighthoax was born.” – Donald Trump

What is Fighthoax?
• It is a news analysis algorithm that collects and analyzes all the necessary data that the user needs.

Methodology

Author – Bias – Emotions – Facts – Title – Syntax/Grammar

Understanding who wrote the piece you are reading is a crucial first step towards creating an overall objective view of the information.
Authors with a more consistent digital footprint generally more reliable due to their expertise and extensive coverage of specific subjects, whereas authors with no digital information create a “hole” in our objective analysis.

Author – Bias – Emotions – Facts – Title – Syntax/Grammar

The next aspect to consider is the bias of both the publication and the author who wrote the news piece you are consuming. This is closely attributed to the human nature and its group or individual perspectives that make up a group view, rather than focusing on the subjective side of the bias.
Journalism is hitherto an objective study and application of knowledge.

Author – Bias – Emotions – Facts – Title – Syntax/Grammar

A natural continuation of the human and psychological aspect of both the publication and the authors’ biases are the emotions.

Author – Bias – Emotions – Facts – Title – Syntax/Grammar

When it comes to the facts, there are two correlating aspects that we need to consider: the source itself and the content we’re reading. Where we are reading and what we are reading.
Data Journalism in steps

  1. Find the data
  2. Format them
  3. Visualize them
  4. Tell the story

Creating My Infographic:

Stage 1:

My first idea was to do an infographic on young people who suffer from Mental Health conditions and compare it to the number of dropouts from higher education.

Sources I used to find such data:

Prevalence of mental health issues within the student-aged population

https://www.ons.gov.uk/search?:uri=search&q=mental%20health%20students&page=2

https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-of-children-and-young-people-in-england/2017/2017

The sources I used to gather this information was requested FOI (Freedom of Information) for the data in recent years as the ones on the site were a few years ago on the mental health issues in young people. However, this was denied because it contained personal information.

Stage 2: 

I had to then come up with a new idea since the data I needed for my infographic was not available. I looked on Office for National Statistics to see what interesting data they had and what kind of infographic and story I could create from them.

This is when I found data on Internet Access in households and how it has increased over the years that technology has advanced. I then found data on online crime that shows that there’s been an increase in online fraud.

Sources I used for this stage here: 

https://www.ons.gov.uk/peoplepopulationandcommunity/householdcharacteristics/homeinternetandsocialmediausage/datasets/internetaccesshouseholdsandindividualsreferencetables

https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/crimeinenglandandwalesexperimentaltables

Stage 3: 

Comparing the data showed that with the increase of people using the internet in households correlates with an increase in online fraud that has increased in recent years. As because more people are using the internet, more fraudulent activity has risen.

I then used this to create my infographic, using the data that i gathered on.

Stage 4:

Creating my Infographic took me a few days since I wanted to make sure that I had all the right data on my infographic that was relevant to my story and the story of the infographic I was trying to convey.

What I used to create my Infographic:

Live Make infographics

Writing the Story

Stage 1:

When it came to writing my story I tried to use enough of the data that I had gathered to support my statement. My point of the infographic and story was that due to the increase in internet activity there was also an increase in online fraud.

I started by writing my standfirst, summing up the whole point of my infographic.

Stage 2: 

After I had finished the stand first I tried to piece together my story so that it made sense and was relevant to the infographic.

Ellie Phillips
elliejphillips@hotmail.com