A Deeper Look into the EU Text and Data Mining Exceptions: Harmonisation, Data Ownership, and the Future of Technology

There is global attention on new data analytic methods. Data scraping (a typical first step for advanced data analytics), text and data mining (TDM, the extraction of knowledge from data) and machine learning (ML, often also simply referred to as Artificial Intelligence or AI) are seen as critical technologies. The legal issues involved in the regulation of data range from privacy and data protection (such as the GDPR) to proprietary approaches (such as copyright, database rights, or proposed new rights in data themselves). This paper focusses on one specific intervention, the introduction of two exceptions for text and data mining in the Directive on Copyright in the Digital Single Market (CDSM). Art. 3 is a mandatory exception for text and data mining (TDM) for the purposes of scientific research; Art. 4 permits text and data mining by anyone but with rightsholders able to “contract-out” (Art. 4), for example preventing TDM use of publicly available online content by “machine-readable means”. We trace the context of using the lever of copyright law to enable emerging technologies and support innovation. Within the EU copyright intervention, elements that may underpin a transparent legal framework for AI are identified, such as the possibility of retention of (permanent) copies for further verification. On the other hand, we identify several pitfalls, including an excessively broad definition of TDM which makes the entire field of data-driven AI development dependent on an exception. We analyse the implications of limiting the scope of the exceptions to the right of reproduction (which leaves the communication of research results in a grey zone). We also argue that the limitation of Art. 3 to certain beneficiaries remains problematic; and that the requirement of lawful access is difficult to operationalize. In conclusion, we argue that there should be no need for a TDM exception for the act of extracting informational value from protected works. The EU’s CDSM provisions paradoxically may favour the development of biased AI systems due to price and accessibility conditions for accessing training data that offer the wrong incentives. We also identify some old and new areas of the EU acquis which will play a crucial role in the future relationship of EU copyright law with technological innovation.


A deeper look into the EU Text and Data Mining exceptions:
Harmonisation, data ownership, and the future of technology Thomas Margoni & Martin Kretschmer *

Abstract
There is global attention on new data analytic methods. Data scraping (a typical first step for advanced data analytics), text and data mining (TDM, the extraction of knowledge from data) and machine learning (ML, often also simply referred to as Artificial Intelligence or AI) are seen as critical technologies. The legal issues involved in the regulation of data range from privacy and data protection (such as the GDPR) to proprietary approaches (such as copyright, database rights, or proposed new rights in data themselves).
This paper focusses on one specific intervention, the introduction of two exceptions for text and data mining in the Directive on Copyright in the Digital Single Market (CDSM). Art. 3 is a mandatory exception for text and data mining (TDM) for the purposes of scientific research; Art. 4 permits text and data mining by anyone but with rightsholders able to "contract-out" (Art. 4), for example preventing TDM use of publicly available online content by "machine-readable means".
We trace the context of using the lever of copyright law to enable emerging technologies and support innovation. Within the EU copyright intervention, elements that may underpin a transparent legal framework for AI are identified, such as the possibility of retention of (permanent) copies for further verification. On the other hand, we identify several pitfalls, including an excessively broad definition of TDM which makes the entire field of data-driven AI development dependent on an exception. We analyse the implications of limiting the scope of the exceptions to the right of reproduction (which leaves the communication of research results in a grey zone). We also argue that the limitation of Art. 3 to certain beneficiaries remains problematic; and that the requirement of lawful access is difficult to operationalize.
In conclusion, we argue that there should be no need for a TDM exception for the act of extracting informational value from protected works. The EU's CDSM provisions paradoxically may favour the development of biased AI systems due to price and accessibility conditions for accessing training data that offer the wrong incentives. We also identify some old and new areas of the EU acquis which will play a crucial role in the future relationship of EU copyright law with technological innovation.

I. Introduction
The Directive on Copyright in the Digital Single Market (CDSM) 1 incorporates a number of provisions (32 Articles and 86 Recitals) intended to modernise EU copyright law and to make it "fit for the digital age". 2 The Directive's reach is impressive: it covers exceptions and limitations major (and a few minor 8 ) differences: it is available to any type of beneficiaries for any type of use, but can be expressly reserved by rightholders -in other words it may be object of "opt-out" or "contract-out".
This paper focuses on these two new additions to the list of EU copyright exceptions and argues that their formulation, although underpinned by a strategic innovation policy goal, is conceptually wrong, theoretically flawed and normatively unambitious. Even worse, by employing an overlay broad definition of text and data mining, the provisions under analysis regulate by way of a narrow exception not only TDM but all forms of modern data-driven digital analytics that rely on "training" on data. This is a vast field that includes most forms of modern Artificial Intelligence (AI) applications relying on machine learning in areas as varied as Natural Language Processing (NLP), image recognition and classification, content filtering and robotics (hereinafter generally referred to as AI). 9 The paper further argues that the implications of accepting the principle that EU AI can be developed only thanks to an exception or after securing proper authorisation reach far beyond the rationale and the evidence considered during the drafting phase of the new Directive. 10 The general regulation of technology, especially of a technology as pervasive as AI, exceeds the goals of copyright law. This is commonly accepted in AI policy and legislative venues where 5 the role of copyright is often seen as secondary. However, the creation of property rights in data, i.e., in the building blocks necessary for erecting the complex structure called AI, is equivalent to the implementation of a system of authorisations that AI developers need to secure before engaging in their product development. Allocating the right to authorise or forbid the use of traditionally unprotected mere facts and data essential for AI development to certain market actors creates not only a market structure but also a system of social and moral values within which this technology will be compelled to evolve. In other words, by devising the rules that regulate access to a certain technology and by allocating ownership in the elements necessary to develop it, we are shaping that technology and its impact on society for the years to come. 11 These rules, today, in the EU, clearly state that firms, governments, citizens, journalists and anyone else who is not a research and cultural organisation acting for research purposes have to obtain a specific authorisation from rightsholders to develop AI. Outside the EU they do not. In summary, the paper claims that a narrowly framed EU copyright exception may have become the formal recognition that in the digital environment EU copyright has achieved such an unprecedented hegemonic role in regulating knowledge production and circulation that it covers not only original expressions, as commonly accepted in copyright law and theory, but also 6 mere facts and data. 14 This is the likely effect of the insertion of Arts. 3 and 4 into the current Acquis Communautaire characterised among other things by a rather low originality standard This unsighted approach to law making may favour the development of opaque AI systems or AI "black boxes" 23 , an expression referring to a type of automated decision-making tool (e.g. AI) 19 For common usage, see https://en.wikipedia.org/wiki/Machine_learning (visited 1 July 2021): "Machine learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence". 20 The Impact Assessment discusses how exceptions may affect researchers and rightholders as well as the social and fundamental rights impact of certain provisions (although the latter two elements appear underdeveloped in comparison to the former), but in general does not consider broader industrial, innovation and cultural policy issues, see 9 explain the complex statistical process leading to those decisions but would make it possible to scrutinise the original training data for mistakes, omissions or bias and to replicate or reverse engineer those decisions and therefore to ensure a transparent, accountable and possibly unbiased decision-making processes. 28 The fitness of a modern copyright system in this complex technological scenario needs to be assessed in the light of its ability to explicate a balancing function in this fast developing environment. Ensuring the undistorted availability of training data in order to produce more accurate results (efficiency), as well as securing their permanent accessibility in order to ensure   Twitter which can benefit from copyright laws that permit them to engage in this type of activities without prior authorisation, therefore significantly reducing the cost of AI development. Another example that shows the problematic and likely unintended consequences ensuing from the formulation of Arts. 3 and 4 CDSM is the exclusion from their ambit of journalistic enquiry and the possibility to text-and-data mine online archives to verify the accuracy of certain facts and thus to combat fake news. 35 In this respect, it can be observed how the EC Impact Assessment focused its analysis on the needs of the publishing industry on the one hand and of academic and commercial research on the other. This is undoubtedly a very important aspect that needed to be addressed. However, this property-based approach to AI regulation failed to identify the deeper implications for society, the economy and democracy. There is an array of activities that from an economic and moral point of view seem at least as deserving as research conducted by research and cultural organisations, which are nevertheless excluded from the ambit of the EU TDM exception (or which remain in a sort of undefined status which depends on whether rightholders will reserve their use). Due to the specific characteristics of the Acquis, and particularly the right of reproduction, these activities are captured under the exclusive prerogatives of rightholders and thus cannot be lawfully performed.

II.2.
The "exceptionalism" of EU copyright law and the right of reproduction EU law defines reproductions as any "direct or indirect, temporary or permanent reproduction by any means and in any form, in whole or in part" by Art. 2 of Directive 2001/29/EC (InfoSoc Directive) 36 . As for most acts performed digitally, to "text-and-data-mine" information it is usually necessary to make (at least temporary or indirect or transient) copies, that is to say, to reproduce the original material in a way that triggers Art. 2.
This paper agrees with propositions already formulated in the literature that in a properly designed copyright framework there should be no need for a TDM exception, as the extraction of factual information from protected content is external to copyright's remit. 37 Support for this thesis can be found in internationally recognised principles such as the idea-expression and fact-expression dichotomy, that is to say in the postulate that copyright protects original expressions, whereas ideas, principles, procedures, facts and data as such are not protected.  The advice of the LAB seems to have been largely ignored in the adopted text. However, its message should not be completely lost. To rebalance the amplitude currently enjoyed by the right of reproduction, the most direct intervention would be to redefine it, i.e. a modification of Art. 2 InfoSoc. However, this seems a highly unlikely course of action at present time. 45 Looking for alternatives, whereas the "exceptionalist" rhetoric of EU copyright law has been criticised above for carrying not only semantic but also meaningful prescriptive implications, a broad and possibly flexible TDM exception, or perhaps even better a "computational uses exception", could be an acceptable compromise. This would need to be wider than the current EU TDM one(s) and wider than what was known as "option four". 46 However, also this door appears to have been firmly shut after the contentious approval of the CDSM. 47 Remaining within the field of exceptions, a useful contribution could be found in a technology-oriented interpretation of an existing provision which, while not specifically drafted for TDM, the CDSM has confirmed as capable of covering certain TDM activities: Art. 5(1) InfoSoc. 48 While not specific to computational uses, Art. 5(1) was implemented with the goal of enabling certain technological uses (mainly internet browsing 49 ) and to rebalance the excessive scope afforded to the right of reproduction. It is also the only mandatory exception of the whole InfoSoc Directive which has the important advantage of favouring cross-border uses.
Before proceeding to an analysis of Art Art. 5(1) requires that the reproduction be (1) temporary; (2) transient or incidental; (3) an integral and essential part of a technological process; (4) the sole purpose of which is to enable ... a lawful use of a work; and (5) the act has no independent economic significance.
Regarding conditions (1) and (2), the Infopaq I Court clarified that temporary and transient acts of reproduction are "intended to enable the completion of a technological process of which it forms an integral and essential part". In those circumstances those acts of reproduction "must not exceed what is necessary for the proper completion of that technological process", being understood that "that process must be automated so that it deletes that act automatically, without human intervention, once its function of enabling the completion of such a process has come to an end". 53 In Infopaq II the CJEU offered some further insights on the proper interpretation of the remaining conditions: (3) The concept of integral and essential part of a technological process requires the temporary acts of reproduction to be carried out entirely in the context of the implementation of the technological process. This concept also assumes that the completion of the temporary act of reproduction is necessary, in that the technological process concerned could not function correctly and efficiently without 51 Some Member States took full advantage of this opportunity (e.g. France, Estonia, Germany), whereas others did not (e. Electronic copy available at: https://ssrn.com/abstract=3886695 that act. This condition is satisfied notwithstanding the fact that initiating and terminating that process involves human intervention. 54 (4) Temporary acts of reproduction must pursue a sole purpose, namely, to enable [… 55 ] the lawful use of a protected work, which is in turn fulfilled when such use is authorised by the rightholder or where it is not restricted by the applicable legislation. 56 (5) Temporary acts of reproduction do not have an independent economic significance provided that the implementation of those acts does not enable the generation of an additional profit distinct or separable from the economic advantage derived from the lawful use of the work; and the acts of temporary reproduction do not lead to a modification of that work. 57 The Court also importantly clarified that as long as the conditions of Art. 5(1) as interpreted above are met, the three-step test of Art. 5(5) is satisfied.
A very brief description of the facts of the Infopaq case may be helpful to properly situate these conditions within a data capture process which shares many logical steps with more modern Electronic copy available at: https://ssrn.com/abstract=3886695 TDC: 73 % "forthcoming sale of the telecommunications group TDC, which is expected to be bought". 59 The Court found that the exception of Art. 5(1) only exempts the activities listed in points 1) to 4) above, whereas the activity of point 5), i.e., printing, constitutes a permanent act of reproduction which is therefore not covered by an exception for temporary copies. When this activity reproduces the original work in part as defined by Art. 2 InfoSoc, it has the potential to constitute a copyright infringement. In the same dispute, the Court of Justice clarified that it cannot be excluded that even 11 consecutive words, when representing the author's own intellectual creation, may qualify as an Art. 2 reproduction in part, i.e., as copyright infringement.
The conditions 1) to 4), which as the CJEU pointed out must be interpreted strictly as they derogate from the general rule, 60 are not always easy to meet in TDM processes nor is their interpretation always straightforward. That said, within a copyright framework that does not offer many other alternatives, Art. 5(1) represents an important ally as an enabler of technological development. This is an aspect acknowledged by the same CJEU, when it states that the function of 5(1) is to "allow and ensure the development and operation of new technologies, and safeguard a fair balance between the rights and interests of rights holders and of users of protected works who wish to avail themselves of those technologies". 61 The statement's ethos seems to offer a comfortable safe harbour for modern TDM and datadriven AI processes. However, while the proposition seems directed towards a technologyenabling goal, it is not an equally comfortable exercise to imagine how the rights and interests of users of protected works to avail themselves of new technologies and the very same development of such new technologies can be safeguarded by a strict interpretation of the already narrowly defined five conditions of Art. 5(1).

II.3.a) Eroding lawful uses
It should be considered the eventuality that Art. 3&4 CDSM may have contributed to narrow even further the scope of Art. 5(1) InfoSoc. This is due to condition 4) and the concept of lawful use.
A lawful use is a use authorised by the rightholder (e.g., via a licence) or not restricted by the applicable legislation. 62 In Infopaq II the Court states that "[…] the parties in the main proceedings do not dispute that in itself summary writing is lawful and does not require consent from the rightholders", that "such an activity is not restricted by European Union legislation" and 59 Infopaq II. finally that "it is apparent from the statements of both Infopaq and the DDF that the drafting of that summary is not an activity which is restricted by Danish legislation". These statements need closer scrutiny.
Regarding the first one, it seems that the Court is satisfied with the fact that parties in the main proceeding do not dispute the issue of summary preparation which allows the Court to avoid, on a procedural ground, a particularly tricky legal question. Regarding the second and third statements, it would be interesting (albeit beyond the scope of this paper) to verify whether it is domestic law which does not provide for a right of adaptation that covers the creation of summaries (or at least of summaries which reproduce in part the original work, such as 11 consecutive words); whether domestic law does it, but a specific exception excuses the activity; It follows, that if Art. 5(1) is only available when a certain use is not restricted by applicable legislation, the recognition that TDM is a reserved use of rightholders (excused when performed by research and cultural organizations for research purposes or when it is not contracted out), means that temporary acts of reproduction performed for TDM purposes outside the scope of Art. 3&4 CDSM are not permitted any longer as they do not meet the condition of lawful use. This is an odd and probably unforeseen effect of the provision, since the very same CDSM states that Electronic copy available at: https://ssrn.com/abstract=3886695 Art. 5(1) should continue to apply to TDM (Rec. 9). It seems difficult to find a logical explanation for the described situation. Certainly, the crucial function of Art. 5(1), i.e., the right of users of protected works to avail themselves of new technologies seems incompatible with the described situation. If user rights and technological development are to be safeguarded under EU copyright law, the formalistic interpretation embraced by the CJEU in certain cases needs to be abandoned in favour of a teleological approach to EU copyright law which the same Court has adopted in other judgements.

II.3.b) The function of permanent reproductions in computational uses and in the development of trusted AI systems
Retaining permanent copies represent a crucial tool to mitigate the black box of AI (discussed at the beginning of section II). Greater transparency may enable trust in AI systems that make decisions affecting in ever more sophisticated ways the life and the rights of individuals. There are two types of fundamental reproductions in TDM and machine learning whose persistence needs to be ensured.
The first type is the one created by text and data analysis which corresponds to the "memory" of the AI application, also known as the "trained model". As it has been explained in more details containing the author's own intellectual creation or a substantial extraction of the database) can not only be stored but also shared (e.g. communicated to the public) for any purpose. As we will see, Arts. 3&4 CDSM have failed to fully address this first type of permanent copies.
A second type of permanent copy is the one necessary for verification purposes. For something to be called scientific, it has to be based on replicable results, which in turn can only be achieved if the data, methods and analysis of the experiment are available for verification. This aspect is central to scientific enquiry where in the last decades the so-called "reproducibility crisis" of scientific results emerged. This phenomenon affects both social and hard sciences and has been extensively explored in the literature, which has identified both sector specific and more general issues at its basis. 64   often contained in literary works or other types of texts (text mining) or in structured and/or unstructured datasets (data mining). TDM is simply external to copyright's scope. 70 The same Berne Convention (BC) is based on the basic tenet that only original expressions are protected and not underlying ideas or facts. 71 This can be inferred not only from the general principles and the national traditions underpinning the BC, 72 but also from its literal meaning.
Art. 2 ("Protected Works"), clarifies that every production in the literary, scientific and artistic domain is protected, whatever may be the mode or form of its expression. 73 The axioms. 74 This is confirmed, among others, by the same WIPO guides to the Berne Convention which clearly states that "The scientific work is protected by copyright not because of the scientific character of its contents … but because they are books and films" and that ideas are not protected but "it is the form of expression which is capable of protection and not the idea itself". 75 Similarly, that "only concrete original expressions of ideas are [protected] may be deduced from the basic meaning of the generic expression "production." A mere idea is obviously not yet a production; it is only transformed into a production when it is developed into a concrete form of expression". 76 Furthermore Art. 2(8) bars protection for news of the day or to miscellaneous facts having the character of mere items of press information confirming that facts are explicitly excluded from copyright's ambit. They do not contain the minimum elements of intellectual creation and thus do not qualify as works. 77 The same principle underpins other articles in the international 23 copyright framework such as Arts. 10(2) TRIPs and 5 WCT which clarify that copyright in compilations of data "do not extend to the data or material itself" or in Art. 1(2) EU Software Directive which clarifies that copyright protection for software applies "to the expression in any form of a computer program. Ideas and principles which underlie any element of a computer program, including those which underlie its interfaces, are not protected by copyright". 78 Similarly, the CJEU confirms that single words cannot be considered original expressions since words considered in isolation are not an intellectual creation of the author who employs them 79 and that "keywords, syntax, commands and combinations of commands, options, defaults and iterations consist of words, figures or mathematical concepts which, considered in isolation, are not, as such, an intellectual creation of the author of the computer program". 80 It is not only creativity that is protected by excluding ideas, facts and principles from protection.
Freedom of expression, i.e., the ability to freely express and receive one's ideas and opinions is a fundamental right recognised in all modern democratic constitutions and therefore any form of limitations to that right should be resisted and limited to specific cases identified by law.
The law bears the crucial task of striking a balance between freedom of expression and other concurring rights, and in copyright theory this is done also by establishing that while ideas and copyrightable elements, therefore there should be no need for a copyright exception in order to use those elements, even when they are contained in protected works. Nonetheless, in practice, in the current state of EU copyright law as it stands today a clarification of the legality of TDM was necessary and an exception, if properly formulated, could have been one of the ways to achieve this goal, albeit not the best one.

II.5. The enacted EU TDM exception(s): Practical considerations
The main criticisms against the current formulation of Arts. 3 and 4 (understood against the misguided theoretical approach) can be structured according to the following elements: 1) definition; 2) beneficiaries; 3) rights; 4) technological overridability; and 5) access to original sources. Two additional characteristics can be seen as functional to safeguarding the exception's scope: 1) contractual overridability (which will be addressed together with point 4 above); and 2) storage of copies for verifiability.

II.5.a) Definitions
As seen in the first part of this article, whereas a broad definition of TDM activities could be seen as functional to cover a broader set of free uses, its insertion into the current Aquis may have produced the opposite effect. We refer to the analysis developed above.

II.5.b) Beneficiaries
Art. 3 introduces a double limitation: it can only be performed by research organisations and cultural heritage institutions and only for the purpose of scientific research. Therefore, a commercial enterprise will not be able to benefit from the exception. Nor a university acting for any other purpose than scientific research. Other purposes commonly accepted as fundamental in democratic societies appear also excluded, such as journalism, criticisms or review. 84 In the opinion of the drafters of the Directive, the current wording is thought to be less restrictive than the "non-commercial" limitation. 85 It seems however, that Art. 3's double limitation is very close to the non-commercial requirement and in certain respects even more restrictive in the sense that a "non-commercial" limitation would arguably allow a business acting for noncommercial scientific research purposes to benefit from the exception, something that is not possible under Art. 3 (although Public-Private Partnerships are explicitly allowed). This is a major 84  limitation to the efficacy of the exception that excludes important economic sectors and SMEs from benefiting from a critically important tool to compete on the global markets. This limitation appears in contrast with fundamental rights such as the freedom of expression and the freedom to conduct a business, even though the same preparatory material excludes such a contrast. 86 Art. 4, which is not a direct emanation of "Option 4", but which may nevertheless have benefited from its assessment, is not limited to certain beneficiaries and thus potentially available to all. It is characterised however by the additional element of being capable of "opt-out" by rightholders, a provision that may very well frustrate its efficacy. It would be important, during the national implementation phase, to clearly identify how this opt out should be performed in the light of the general guidance offered by Art. 4. It would also be important to develop public awareness around the need to not unnecessarily restrict TDM.

II.5.c) Rights
Another significant limitation found in both Arts. 3 and 4 is that they only exempt potential infringements of the right of reproduction but not of the right of distribution or communication to the public, nor of the (unharmonized) right of adaptation.
This means that in all the cases when the results of an act of TDM include a protected part of the original "mined" work (and as seen above excerpts as short as 11 consecutive words could be protected) these results cannot be communicated to the public or redistributed. In certain areas this will not represent a cause of concern, however in other areas, e.g., Natural Language Processing (NLP), the fact that certain models trained on a number of copyright protected corpora (i.e. texts) could include reproductions in part, means that those models, the result of the research purpose conducted by the research organisation, cannot be redistributed or communicated publicly. Outside textual sources, e.g. in the case of audio-visual works it may be even more difficult to establish when this threshold is reached. The question of whether a trained model can be considered an adaptation of the original corpora is excluded ratione materiae from the EU assessment, but is an aspect that will need to be clarified.

II.5.d) Contractual and technological overridability
Arts. 3 and 4 state that contractual provisions intended to limit the TDM exception shall be unenforceable. This is an important provision, as many times access to databases is based on acceptance of Terms of Use that limit TDM. Nevertheless, if the same contractual provision contrary to the TDM exception is expressed through an effective technological measure, there 26 is no equivalent provision safeguarding the enjoyment of the exception. The approach taken by the CDSM is convoluted at best. Art. 6 second sentence reads "The first, third and fifth subparagraphs of Article 6(4) of Directive 2001/29/EC shall apply to Articles 3 to 6 of this Directive". In extreme synthesis this means that the TDM exceptions are inserted in the list of exceptions for which the InfoSoc Directive establishes that: 1) if a user with legal access to a work is entitled of an exception; and 2) that exception cannot be enjoyed due to the presence of an effective technological measure; and 3) rightholders have not voluntarily taken any measures to ensure that said user can enjoy the illegitimately restricted exception; then 4) Member States shall take appropriate measures to ensure that rightholders make available to said beneficiaries of the exception the means of enjoying it.
It is important to note that subparagraph 4 of Art. 6(4) does not find application in this case.
Subparagraph 4 establishes that the reported mechanism (the obligation on MS to facilitate the enjoyment of an exception illegitimately restricted by rightholders via effective technological measures) is excluded when rightholders make available works to the public on agreed contractual terms in such a way that members of the public may access them from a place and at a time individually chosen by them, thereby rendering largely ineffective the entire provision.
Even though the CDSM recognises the importance of excluding subparagraph 4, it is the entire mechanism of Art. 6(4) InfoSoc that has proven highly ineffective due to its convoluted formulation and ultimately to the fact that it places the burden of reclaiming legitimate uses allowed by the law but illegitimately restricted by technological locks on the shoulders of end users. Illustratively, in the UK where the UK Intellectual Property Office (IPO) has set up a specific complain procedure, 87 a total of 11 applications have been filed since 2003, 9 of which failed as they related to computer programmes -an excluded category -1 was rejected considering subparagraph 4 mechanism, and 1 lead to a voluntary solution. 88 II.5.e) Lawful access to original sources Art. 3 requires lawful access to the works that will form part of data analysis. Not much justification can be found in the preamble of the Directive. Some more details about the role of 87 See https://www.gov.uk/government/publications/technological-protection-measures-tpmscomplaints-process. 88 See https://www.gov.uk/government/publications/complaints-to-secretary-of-state-under-s296zeunder-the-copyright-designs-and-patents-act-1988. These are data from 2015. A FOI request to the UK IPO revealed that since 2015, an additional two requests were filed, one rejected (due to paragraph 4 exemption) and one resolved on a voluntary basis. Ironically, this latter request, the only one that has somehow had a successful outcome in almost two decades, was based on the since repealed private copy exception.
the "lawful access" requirement can be found again in the Impact Assessment: "… the "lawful access" condition, i.e. [by the fact that] the exception would not affect publishers' ability to continue to authorise or prohibit access to their content and to generate revenues from selling subscriptions to universities and other research organisations". 89 It has been argued that a TDM exception should be considered licit also when access to the training data does not fulfil the lawful access requirement. 90 The arguments to support such a position are multiple. As prof. Carroll puts it: "copies are made only for computational research and the durable outputs of any text and data mining analysis would be factual data and would not contain enough of the original expression in the analysed articles to be copies that count.
Reference copies would be kept and shared only for reproducibility purposes or for further computational research and would not be otherwise made available". 91 Whereas such argument is developed within the US copyright framework which, as briefly discussed above, operates quite differently in relation to some of the elements of EU copyright law here scrutinised, it seems that the same rationale could find application also under EU law. Furthermore, it has been pointed out how the lawful access limitation could subject TDM research to private ordering 92 as well as severely impair other fundamental rights such as the freedom of information and to inform the public about specific undisclosed but publicly relevant issues, especially when these 28 exceptions only to the right of reproduction. This appears an important area in need of clarification during the phase of national implementation. Additionally, Art. 3(4) establishes that "Member States shall encourage rightholders, research organisations and cultural heritage institutions to define commonly agreed best practices concerning the application of the obligation and of the measures referred to in paragraphs 2 and 3 respectively". These obligations relate to safe storage provision and security and integrity measures. It would be important to ensure that Art. 3 will become effective as soon as it is transposed into domestic law, regardless of when the commonly agreed best practices are adopted.

III. EU Copyright law and data property
The position embraced in the CDSM about the proprietarisation of mere facts and data is ambivalent. Whereas when considered as such they seem excluded from protection, when they are contained in a protected work, they become object of exclusivity. The reason is to be found in the well-known ubiquity of copies in the digital environment. Whereas this reason is global, EU copyright law has developed an idiosyncratic approach characterised by a relatively low level of originality, the protection of qualifying non-original databases (and therefore of factual data) and by a broadly defined and broadly interpreted right or reproduction that is able to capture most types of digital uses. This formalistic approach to computational uses should be wholly rejected. Electronic copy available at: https://ssrn.com/abstract=3886695

III.1. Two futures separated by a common provision
The current EU copyright framework seems to be caught in between two possible futures. This unenviable situation may be connected to certain underlying and unresolved contradictions.
Two seem particularly pressing. The first is common to many copyright systems worldwide and is caused by the well-known inadequacy of rules devised in the past, sometimes a remote and analogue past, to regulate modern digital practices. After all, the problems here under discussion are intimately related to the advent of digital technologies and the EU's reaction to this advent. A reaction that as evidenced by the roadmap proposed in the Green Paper of 1988 94 interpreted technology mainly as a challenge, which certainly it was, but failed to see it also as an opportunity. It is a known aspect of (not only EU) copyright history that throughout the 1980s and 90s, faced with the paradigm shifting changes brought by digital technologies and under pressure from a content industry that witnessed an unexpected dramatic shift in business models and a potentially steep decline in revenues, EU copyright law tightened its defences, made rights broader, demoted free uses to exceptions which had to be found in a closed but not mandatory list and shielded this new reality behind encryption, i.e. technological protection measures. 95 The striking erosion of free uses and of the public domain can be seen as a direct consequence of this tension. However, this also caused the disruption of the fine balance that copyright used to explicate. Consequently, economic, social and cultural initiatives often clashed against rules that lost the ability to channel innovation while maintaining incentives for investment and protecting the moral dimension of creativity.
The second contradiction is idiosyncratic of the EU legal order and is caused by the inadequacy of national copyright rules to regulate the circulation of information in a single market made up of 27 harmonised but still distinct and territorial copyright markets. This situation is exacerbated by the only partial power that the EU has (had) to regulate copyright, a power which largely relies(d?) on internal market attributions as a legal basis. As explained elsewhere, 96 this limited allocation of competences has led to a patchwork of at least 12 Directives (and 2 Regulations) which, with few exceptions, have harmonised EU copyright law "vertically", i.e. only in relation to certain rights or certain subject matter. 97 One of the few directives that has taken a "horizontal" approach (Directive 2001/29/EC, InfoSoc Directive) has done that following an unambitious and to a certain extent contradictory legislative technique based, as already discussed, on the full harmonisation of only certain aspects of copyright (mostly rights) and leaving MS ample discretion with regards to other aspects (mostly exceptions). 98 This approach has resulted in further fragmentation and uncertainty since having diverging rules within a market that proclaims to be single produces tensions. Use of a copyright protected work may be considered lawful in one MS but not in another. 99 It is also in the light of these considerations that the CDSM aimed to regulate in a mandatory manner and with rules of full or almost full harmonisation at least certain elements of EU copyright law such as the TDM exception. This is certainly laudable. However, whereas the 2019 Directive is timidly but clearly moving in the right direction regarding the second of the above identified tensions -thanks to the mandatory nature of a number of provisions such as Arts. 3 and 4 -it fails to properly address the problems connected with the first tension. In other words, the challenge of digital technologies, after more than three decades, remains a challenge for the EU copyright law.

III.2. Non original property
In order to offer an overview of the issue of property in mere facts and data, a brief mention should be made of other stances where EU copyright law has moved towards a process of propertization of non-personal data. This will offer additional support to the critique here developed concerning the inability (or unwillingness) to address technology as an opportunity.
The Sui Generis Database Right (SGDR) naturally stands out as a unique EU creature that protects against substantial extractions of data in both original and non-original qualifying databases, thereby de facto protecting data under certain circumstances. This approach to data property was rejected in almost any other legal order due to its anti-competitive and anti-information  100 The Commission own assessment is revealing: "Despite providing some benefits at the stakeholder level, the sui generis right continues to have no proven impact on the overall production of databases in Europe, nor on the competitiveness of the EU database industry." https://ec.europa.eu/digital-single-Certainly, it has contributed a discrete amount of work for national and EU courts and has been used in ways that have negatively impacted on consumer's rights and access to knowledge. 101 Nonetheless, as it has been pointed out, it may be extremely difficult to repeal EU legislation, including when, in the words of its drafters, it failed to deliver. 102 An important observation for an accurate outline of the SGDR is that only substantial investments in obtaining, verifying or presenting data count. Created data do not qualify for protection. After all, it is a database right, not a data right. To remedy this void of protection, a data producer right has been proposed and, at least for the moment, abandoned. 103 III.3. Is the solution to the problem outside the problem?
A final element in the account of the EU approach to data property and its implications for (AI) technology is placed outside the realm of copyright law and allied rights. The new Public Sector Information (PSI) directive of 2019, also referred to as the Open Data directive regulates the reuse of information held by Public Sector Bodies (PSB). 104 Whereas it would be out of the scope of this article to explore such an important legislative tool in detail, a few specific elements are worth mentioning. First, within the broad principle of re-use by default which has gained more and more strength in the evolution of PSI legislation, the Open Data directive specifically includes research data resulting from public funding under its ambit (Art. 10). This is an important expansion of the scope of the Directive over its predecessors and has a direct impact on the issue of transparency, accountability, and replicability of EU science, contributing to make it a reference at the international level. A second important element of the new Directive relates to the adoption by the Commission (via a future implementing act) of a list of high-value datasets held by public sector bodies and public undertakings to be made available free of charge (Art. 14). As the same Commission puts it, "these datasets … have a high commercial potential and can speed up the emergence of value-added EU-wide information products. They will also serve as key data sources for the development of Artificial Intelligence". 105 A final element of the Directive is found in Art. 1(6) and reads: "The right for the maker of a database provided for in Article 7(1) of Directive 96/9/EC", which corresponds to the aforementioned SGDR "shall not be exercised by public sector bodies in order to prevent the re-use of documents or to restrict re-use beyond the limits set by this Directive". The ambit of application of the PSI Open Data Directive is limited to Public Sector Bodies and, since the new Directive, to certain public undertakings. It is perhaps not a purely provocative exercise to consider whether a proper regulatory framework would be one where similar rules in relation to the training of AI should apply generally to any type of data or works. 106 Whereas there would certainly be strong opposition to such a framework, it cannot be accepted that choices affecting both the public and private elements of the life of individuals be made by an AI developed without the guarantees of openness, transparency and accountability. We would not accept laws made by a parliament if it operated in secrecy, we would not accept the determinations of a court of justice or an administrative authority if they were not supported by the reasons exposed in publicly available decisions or through accountable decision-making processes, why should we adopt a different standard when the same decisions are made through the use of a technology that we are only starting to understand?

IV. Conclusions
This paper endeavoured to offer a novel insight into some of the least apparent but far-reaching implications of Arts. 3 and 4 CDSM. In doing so, the paper followed a double approach. In the first instance, it focused on the legal changes affecting the regulation of mere facts and data brought by the new CDSM, critically assessing their fitness for the task. Subsequently, the paper also attempted to offer a complementary conceptual and normative reading of the copyright theory behind a TDM exception. In this process, the paper pointed out that Arts. 3 and 4 are agnostic to any theoretical element, an aspect we term "theory-less law making" which is recurrent in EU copyright law. The paper also attempted to identify some of the reasons (seeing digitisation as a threat) at the basis of this approach. The paper argues that technology is not exogenous to 105 See https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sectorinformation (last visited 1 July 2021). 106  the light of recent CJEU case law that appears to establish that any fundamental rights limitation to copyright has to be found within Art. 5. 110 At the Member States level, a main source of potential flexibility has traditionally been the right of adaptation, the only major economic right not yet object of horizontal legislative harmonising interventions. 111 Despite some initial doubt, the CJEU clarified that the right of adaptation is not harmonised. However, reproductions are and in the light of cases such as Allposter 112 and Pelham 113 , it seems that the space for MS to regulate autonomously an adaptation right (including its limits and exceptions) has shrunk considerably. And yet, it seems that the fundamental function of so-called transformative uses comfortably resides within a right that perhaps more than others determines the external boundary of how far copyright law can and should extend. 114 MS interested in enabling computational uses should consider this option.
The Open Data Directive briefly discussed above and especially the national Open Access guidelines it mandates will likewise represent a fundamental area of intervention to ensure that research data held by public sector bodies fuels innovation. The opportunity to extend similar obligations also to privately held databases seems an essential condition to develop open, transparent and accountable AI. No AI trained on unverifiable data, i.e., "black box" AI, should be used by public authorities. Arguably there seems to be a timid indication of this also in the recent AI Regulation proposal.
Finally, extra EU countries which are not bound by the rigidity of EU law in this area, can be divided in two main categories. Those who have enacted a broad and/or flexible approach (US, Canada, Singapore, South Korea, Japan, Israel 115 ), and those who have not yet done so. In the light of the above, a technology enabling exception, or a computational uses provision appears as one of the most urgent additions to national copyright laws that countries concerned with cultural and technological sovereignty should pursue. For the UK which was bound by the InfoSoc Directive until very recently (and will follow the "old" rule until domestic law changes 116 ), the future seems a choice between the need to maintain a level playing field with the EU neighbour and