Unleashing Big Data’s Potential for journalism, economy and research
The extent to which Text and Data Mining is revolutionising the way both public and private sector researchers work has yet to be fully realised by EU policymakers, argue data mining experts.
Text and Data Mining (TDM) lets us make sense of the vast amount of data that is out there. Understanding this data is critical to advancing our knowledge in climate change research, breaking corruption scandals in the press, discovering breakthrough medical treatments and training computers to improve customers’ experience online.
However, there are concerns that the current reform of EU copyright rules could limit who gets to use TDM and how they get to use it.
In order to build strong scientific datasets, or to train our Artificial Intelligence (AI) algorithms, researchers need to gather data from a broad range of sources, including scientific publications to which we have acquired lawful access through licensing agreements, or data that is publicly available on the internet (and not behind a paywall).
- Policymakers must not lose sight of citizens’ and the research community’s needs when updating copyright rules, writes Catherine Stihler
- Parliament must send an unambiguous message and propose a clear and balanced text on copyright, says Jean-Marie Cavada
- Axel Voss says the new copyright reform aims to introduce new levels of responsibility for internet platforms and will ensure a press that is financially independent from internet platforms.
- Julia Reda: EU copyright reform: Filters fail frequently
We need to make sure that our right to read this data includes the right to understand and analyse it.
We want EU policymakers to understand that TDM is not about copying or re-using creative works without paying. TDM is about understanding the works we have legally accessed to identify patterns, facts, and correlations locked within these works, such as the tone of scientific or journalistic articles or how many times specific words are used.
TDM does not harm rightsholders. In fact, the more data analytics that take place, the more TDM users will request lawful access, increasing the demand for subscriptions to articles.
Many research projects are public-private partnerships. In fact, the European Commission’s Horizon 2020 programme – the largest research programme globally – envisages collaboration between public and private entities as they take “great ideas from the lab to the market”.
"TDM does not harm rightsholders. In fact, the more data analytics that take place, the more TDM users will request lawful access, increasing the demand for subscriptions to articles"
This programme usually requires that approved projects have another source of funding, typically private funding. If the private partner of a Horizon 2020 funded consortium cannot use TDM on the same basis as a public partner, this would greatly restrict the ability to fund AI projects at a time when such research is a critical element of growing the EU’s digital economy.
Journalists also do not qualify as non-commercial beneficiaries, yet today, they need to have tools to understand the increasing amount of information at their disposal.
TDM technologies have helped uncover crucial stories with significant impact on society and democracy, such as the Panama Papers.
"With the growing threat of fake news, which we know can be best tackled by algorithms and data analytics tools, we should not undermine the quality of journalism in Europe by raising unjustified copyright barriers"
With the growing threat of fake news, which we know can be best tackled by algorithms and data analytics tools, we should not undermine the quality of journalism in Europe by raising unjustified copyright barriers.
Being able to verify the data used in AI is critical to understanding and addressing errors, bias, and needed improvements. Building adequate datasets is the first step in conducting a TDM-based research project and this can take several weeks, sometimes months.
Access to datasets once the research is completed is necessary to verify any findings. But to do that, we need to be able to safely store incidental copies of the datasets on secure servers.
However, this is something that is not allowed in the copyright reform as it stands today. Without any backup information that would allow the public to verify research conducted in Europe, we risk losing citizens’ trust in science.
Like the European Commission, we have big ambitions for Europe when it comes to Artificial Intelligence. We also want Europe to lead the global AI agenda and adopt a future-proof copyright reform that will unleash big data’s potential for journalism, economy and research in Europe.
To achieve this ambition, we need an equally ambitious TDM exception.
This content is published by the Parliament Magazine on behalf of our partners.
Making innovation happen is more than just a motto for the EIT, writes Dirk Jan van den Berg.
Europe must continue to guarantee the highest hardware resistance levels to cyberattacks, says Stéfane Mouille.
Europe’s cloud infrastructure providers support the EU’s intentions to crack down on online terrorist content, however policymakers are targeting the wrong players, explains Alban Schmutz.