Unleashing Big Data’s Potential for journalism, economy and research

Written by Alan Akbik, Ari Asmi, Adriana Homolova, Michèle B. Nuijten & Philipp-Andreas Schmidt on 4 April 2018 in Opinion Plus
Opinion Plus

The extent to which Text and Data Mining is revolutionising the way both public and private sector researchers work has yet to be fully realised by EU policymakers, argue data mining experts.

Photo credit: Pixabay


Text and Data Mining (TDM) lets us make sense of the vast amount of data that is out there. Understanding this data is critical to advancing our knowledge in climate change research, breaking corruption scandals in the press, discovering breakthrough medical treatments and training computers to improve customers’ experience online.

However, there are concerns that the current reform of EU copyright rules could limit who gets to use TDM and how they get to use it.

In order to build strong scientific datasets, or to train our Artificial Intelligence (AI) algorithms, researchers need to gather data from a broad range of sources, including scientific publications to which we have acquired lawful access through licensing agreements, or data that is publicly available on the internet (and not behind a paywall).


RELATED CONTENT


We need to make sure that our right to read this data includes the right to understand and analyse it.

We want EU policymakers to understand that TDM is not about copying or re-using creative works without paying. TDM is about understanding the works we have legally accessed to identify patterns, facts, and correlations locked within these works, such as the tone of scientific or journalistic articles or how many times specific words are used.

TDM does not harm rightsholders. In fact, the more data analytics that take place, the more TDM users will request lawful access, increasing the demand for subscriptions to articles.

Many research projects are public-private partnerships. In fact, the European Commission’s Horizon 2020 programme – the largest research programme globally – envisages collaboration between public and private entities as they take “great ideas from the lab to the market”.

"TDM does not harm rightsholders. In fact, the more data analytics that take place, the more TDM users will request lawful access, increasing the demand for subscriptions to articles"

This programme usually requires that approved projects have another source of funding, typically private funding. If the private partner of a Horizon 2020 funded consortium cannot use TDM on the same basis as a public partner, this would greatly restrict the ability to fund AI projects at a time when such research is a critical element of growing the EU’s digital economy.

Journalists also do not qualify as non-commercial beneficiaries, yet today, they need to have tools to understand the increasing amount of information at their disposal.

TDM technologies have helped uncover crucial stories with significant impact on society and democracy, such as the Panama Papers.

"With the growing threat of fake news, which we know can be best tackled by algorithms and data analytics tools, we should not undermine the quality of journalism in Europe by raising unjustified copyright barriers"

With the growing threat of fake news, which we know can be best tackled by algorithms and data analytics tools, we should not undermine the quality of journalism in Europe by raising unjustified copyright barriers.

Being able to verify the data used in AI is critical to understanding and addressing errors, bias, and needed improvements. Building adequate datasets is the first step in conducting a TDM-based research project and this can take several weeks, sometimes months.

Access to datasets once the research is completed is necessary to verify any findings. But to do that, we need to be able to safely store incidental copies of the datasets on secure servers.

However, this is something that is not allowed in the copyright reform as it stands today. Without any backup information that would allow the public to verify research conducted in Europe, we risk losing citizens’ trust in science.

Like the European Commission, we have big ambitions for Europe when it comes to Artificial Intelligence. We also want Europe to lead the global AI agenda and adopt a future-proof copyright reform that will unleash big data’s potential for journalism, economy and research in Europe.

To achieve this ambition, we need an equally ambitious TDM exception.

About the author

Alan Akbik is a research scientist at Zalando Research, working mostly on natural language processing

Ari Asmi is a researcher at the Institute for Atmospheric and Earth System Research at the University of Helsinki

Adriana Homolova, Investigative journalist, leader of the Elvis, Map me tender project

Michèle B. Nuijten, Assistant Professor at the Meta-Research Centre at the University of Tilburg

Philipp-Andreas Schmidt, Government Affairs, Bayer AG

Share this page

Tags

Categories

Partner content

This content is published by the Parliament Magazine on behalf of our partners.

Related Partner Content

Science and Democracy: safeguarding the right to science, also in agriculture
19 March 2018

The upcoming World Congress for Freedom of Scientific Research will help raise awareness of the need for legislation to embrace the right to science, explains Marco Cappato.

EU must continue to guarantee highest hardware resistance levels to cyberattacks
4 June 2018

Europe must continue to guarantee the highest hardware resistance levels to cyberattacks, says Stéfane Mouille.