Language technologies in law firm practice and the challenges resulting therefrom

The use of modern language technologies in law firm practice and the challenges of data confidentiality, intellectual property rights and the processing of related personal data

authors: Anna Kotarska, Wojciech Wołoszyk

Nowadays, running a law firm dealing with business entities inevitably involves working with foreign-language or multilingual documents. Foreign economic exchange, cross-border transactions, foreign shareholders in Polish limited companies, Brexit, international arbitration — almost every professional legal representative encounters these slogans more and more frequently on their professional path. The occurrence of each of these issues means one thing — the necessity to involve a professional translator (an in-house or a third party translator), sometimes a sworn translator, or the legal representatives’ own work in translating court decisions, letters, opinions, or contracts.

Large law firms often have their own translation departments which deal with the challenges of multilingual legal services professionally and with the use of professional tools. However, most law firms in the market do not have such support within their internal resources. Their lawyers often have to manage on their own. In such situations, given costs and especially time, which is always limited in this industry, one is tempted to use free on-line machine translation tools (commonly referred to as “automatic translators”). However, in most cases, this will not be the best idea.

The most obvious choice, but not the only one, is a Google Translate service. Other prevalent tools are, for example, Microsoft Translator or the European DeepL. Admittedly, the quality of the translations offered by these translation tools is steadily improving and, if used for information purposes only, is often sufficient. Nevertheless, the quality is not the biggest pitfall in this case, although this aspect can also be very treacherous.

First, it should be noted that the use of such tools when providing any professional services, which entail the obligation to maintain professional secrecy, ensure the security of confidential information, personal and sensitive data, will very likely amount to a violation of professional standards and corporate codes of ethics. For example, in the Polish context reference may be made to Articles 21-23 of the Code of Ethics for Attorneys-at-Law[1], §19 of the Code of Ethics for Advocates or §§ 5 and 27 of the Professional Code for Sworn Translators.

Before using any of such tools in professional activity, one should read the terms and conditions of service in detail. This particularly applies to granting the providers of such services licence to use proprietary content and, more importantly, content belonging to customers which is transmitted, stored, sent, received or made available using the services. Interestingly, for Google’s services, the term “content” also refers to the emails a user sends and receives via Gmail service.

Currently, the most popular and basically universally available cloud systems for machine translation are DeepL (free and PRO versions), SmartCat, Google Translator (free and paid versions) and Microsoft Translator. Services such as Yandex or services from the Far East are less well known. In the further part of this paper, the author presents an extract from each service’s terms and conditions, which identify the risks to confidential and sensitive data that each user of such service explicitly accepts. The EU machine translation service, eTranslation, belongs to a different category of solutions because, among other things, of its non-commercial character and specific features, whose elements that are important for legal translation will be presented later.

It might seem that free machine translation tools, which cannot be used to provide commercial services, are not used in a professional business involving the provision of translation services. Unfortunately, such an assumption would prove to be false. As an expert witness in jurislinguistics and an editor of legal translations, this paper’s co-author has repeatedly had to deal with the disastrous results of the irresponsible use of such systems[2]. Simultaneously, the European Commission’s translation units, the European Parliament or the Court of Justice of the European Union, and external service providers working with them use machine translation systems in their daily work.

Analysis of the terms and conditions of selected machine translation systems:

DeepL Free:

Privacy Policy[3]: 3. “Please note that our translation service may not be used for texts containing personal data of any kind.”

DeepL Pro[4] — Terms and Conditions

“8.1.3. Customer is obligated to observe all legal requirements for the collection, processing and use of data which is transmitted to and processed by DeepL for Customer in connection with the provision of its services under this Agreement. In particular, Customer shall immediately agree with DeepL on a data processing agreement (which shall be provided by DeepL) if Customer intends to transmit personal data to DeepL using the Products. Customer guarantees not to collect, process or use any personal data in connection with the Products without the express consent of the data subject or sufficient other legal authorisation. DeepL will reasonably co-operate with Customer in order to assist Customer in implementing such required legal authorisations.

8.1.1. Customer may use the Products solely for the purpose agreed between the Parties. In particular, Customer may not, and will not allow third parties (including Internal Users and End Users) to use the Products, translations created using the Products, Documentation or other data, information or service provided by DeepL unless expressly authorised by DeepL in written form

in connection with or for the purpose of operating critical infrastructure such as electrical power stations, military or defence equipment, medical appliances or other equipment whose failure or impairment would result in unforeseeable economical or physical damages, including but not limited to critical infrastructure in terms of the European Directive 2008/114/EC;

(…)

to transmit any data to DeepL which may not be transmitted to or processed by DeepL due to data protection laws, contractual or statutory confidentiality obligations, export restrictions or other statutory provisions or third-party rights.”

The above provisions and terms and conditions of service preclude the use of this tool to translate documents that contain confidential data, personal data (in the case of the free version without any exceptions, and in the PRO version only under strict formal conditions) and documents relating to critical infrastructure.

Smartcat:

Smartcat Terms of Use[5]

“VI. USER CONTENT

Smartcat does not display, reproduce and distribute Users’s content to or with any third parties apart from other users authorised by the User who owns the content and Smartcat’s affiliates, subsidiaries and suppliers who are responsible for functional maintenance of the Platform.

To the extent it is necessary in order for Smartcat to provide the User with any of the services and features of the Platform and in order to enable usage of the Platform, the User hereby grants Smartcat, its affiliates, subsidiaries and the aforementioned suppliers, a non-exclusive, royalty-free, transferable right to use, display, reproduce and distribute such content exclusively for the purpose of maintaining the functionality of the Platform features.

VII. DATA PROTECTION

You hereby provide written authorisation to Smartcat to transfer your personal data collected by Smartcat from you pursuant to providing the Services or Supplementary Services to third party service and analytics providers for the purpose of providing, analysing and modifying the Services, Supplementary Services and the Platform, including cross-border transfers outside European Economic Area.

SmartCat Customer Agreement[6]

CONFIDENTIALITY AND NON-DISCLOSURE

4.1 Restrictions. Smartcat acknowledges that, in order to perform the Services or to provide Supplementary Services, it shall be necessary for Customer to disclose to Smartcat certain Confidential Information (defined below) of Customer. Smartcat agrees that it shall not disclose, transfer, use, copy, or allow access to any such Confidential Information to any third parties, except as authorised by Customer. Customer hereby authorises Smartcat to provide Confidential Information to Suppliers, translation service providers, marketing services providers and infrastructure and development service providers, including those located in jurisdictions without adequate protection of personal data, on the terms established by Smartcat provided that Smartcat shall implement technical and organisational security measures in respect of processing of such data.”

The above terms and conditions preclude practical use of SmartCat tools for the translation of documents containing confidential and personal data. Since a free licence is granted for the use of the translated content by the platform owner, its affiliates and subcontractors, the translation of texts copyrighted by third parties without their express prior consent is also precluded. It should also be noted that the use of this tool is to be considered as the hiring of a third-party translator or translation agency, with whom an appropriate confidentiality and personal data processing agreement should be concluded before granting access to documentation containing personal or confidential data.

Google Translate (free version):

“We use different technologies to process your information for these purposes. We use automated systems that analyse your content to provide you with things like customised search results, personalised ads, or other features tailored to how you use our services. And we analyse your content to help us detect abuse such as spam, malware, and illegal content. We also use algorithms to recognise patterns in data.

We provide personal information to our affiliates and other trusted businesses or persons to process it for us, based on our instructions and in compliance with our Privacy Policy and any other appropriate confidentiality and security measures. For example, we use service providers to help us with customer support.

We will share personal information outside of Google if we have a good-faith belief that access, use, preservation, or disclosure of the information is reasonably necessary to:

Meet any applicable law, regulation, legal process, or enforceable governmental request. We share information about the number and type of requests we receive from governments in our Transparency Report.
Enforce applicable Terms of Service, including investigation of potential violations.
Detect, prevent, or otherwise address fraud, security, or technical issues.
Protect against harm to the rights, property or safety of Google, our users, or the public as required or permitted by law.”

“If you choose to upload or share content, please make sure you have the necessary rights to do so and that the content is lawful.

(…)

We need your permission if your intellectual property rights restrict our use of your content. You provide Google with that permission through this license. (…)

This license covers your content if that content is protected by intellectual property rights.

(…)

This license is worldwide, which means it’s valid anywhere in the world.

(…)

This license allows Google to:

host, reproduce, distribute, communicate, and use your content — for example, to save your content on our systems and make it accessible from anywhere you go;
publish, publicly perform, or publicly display your content, if you’ve made it visible to others;
modify your content, such as reformatting or translating it;
sublicense these rights to:
- other users to allow the services to work as designed, such as enabling you to share photos with people you choose;
- our contractors who’ve signed agreements with us that are consistent with these terms, only for the limited purposes described in the Purpose section below.”

Google’s Terms of Service and Privacy Policy preclude the use of this popular machine translation tool to translate documents containing personal data, sensitive data, confidential information or third-party copyrighted content.

Microsoft Translator – YOUR CONTENT[9]:

“When you share Your Content with other people, you understand that they may be able to, on a worldwide basis, use, save, record, reproduce, broadcast, transmit, share and display Your Content for the purpose that you made Your content available on the Services, without compensating you. If you do not want others to have that ability, do not use the Services to share Your Content. You represent and warrant that for the duration of these Terms, you have (and will have) all the rights necessary for Your Content that is uploaded, stored, or shared on or through the Services and that the collection, use, and retention of Your Content will not violate any law or rights of others.”

(…)

To the extent necessary to provide the Services to you and others, to protect you and the Services, and to improve Microsoft products and services, you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content, for example, to make copies of, retain, transmit, reformat, display, and distribute via communication tools Your Content on the Services. If you publish Your Content in areas of the Service where it is available broadly on-line without restrictions, Your Content may appear in demonstrations or materials that promote the Service.”

The provisions of the Microsoft Services Agreement preclude the use of this tool to translate documents containing personal data, sensitive data, confidential information or third-party copyrighted content.

These provisions and terms and conditions can be summarised as follows: their providers either expressly exclude the use of their products for the translation of documents containing confidential information, personal data or copyrighted content of third parties (DeepL Pro), or reserve the right to transmit such content and data to an unspecified group of third parties outside the European Union (Smartcat), or grant themselves comprehensive rights for further use and sharing of such data (Google, Microsoft). It is also advisable to bear in mind the origin of the different tools and the ownership structure of the entities involved. As an example, the Smartcat platform was developed by a Russian company with co-financing from the Russian state through the Skolkovo Foundation. As regards the companies co-financed by this entity, which also operate in the US market, including Silicon Valley, the FBI raises severe concerns about the true motives behind their actions[10], seeing them as a way of accessing classified information that could pose a threat to national security.

What does this mean for a Polish lawyer? If a Polish lawyer cooperates with public administrative authorities and has access to strictly confidential information relevant to the state security or public safety, the use of such tools may have very serious implications.

A perfect illustration of such concerns is a report by the Australian Strategic Policy Institute (ASPI) published in 2019[11], which unequivocally points to the use of state-owned enterprises in China that provide machine translation services to collect data on non-Chinese users. The report’s author, Samantha Hoffman, says that the most valuable tools in China’s data collection campaign are technologies that users interact with entirely voluntarily and for their own benefit, of which machine translation services are a prominent example. The entire process is done through GTCOM[12], which Hoffman describes as a “cross-language big data” business that offers software and hardware for machine translation, the use of which later collects extensive data. She estimates that GTCOM, which works with both corporate and government clients, serves the equivalent of up to five billion words of plain text a day in 65 languages and in more than 200 countries. GTCOM is a subsidiary of a Chinese state-owned enterprise under the Central Propaganda Department’s direct supervision, and therefore data collection is assumed to be an active and continuous process serving largely non-commercial purposes. The report explicitly indicates that the machine translation system offered by GTCOM is used, among other things, to collect data for Chinese military intelligence.

It is fairly conceivable that other countries’ intelligence services also willingly use voluntarily provided data and information and process them for their own purposes. Therefore, the possible use of such software when providing services to central government administration bodies should obviously be treated as a potential threat to the security of classified information relevant to the Polish state’s interest. However, this issue is completely disregarded and not subject to any control. The use of widely available machine translation tools in legal professional practice should be subject to far-reaching regulation and control.

It should be stressed that the potential complete elimination of using machine translation tools is an obsolete approach that cannot be confronted with the realities of the sector and with technological trends that are also promoted by the European Union. For many years, the EC Directorates-General (DGT and DG CONNECT) have been carrying out intensive research, development, and deployment work together with research centres, academic institutions and the private sector to develop and disseminate the use of modern language technologies that considerably go beyond mere machine translation, the technology on which the authors focus in this study. Staff at the Directorate-General for Translation (DGT) use such tools in their daily work. This also applies to contractors working with DGT.

However, for lawyers, it is crucial to understand the differences between the various machine translation tools, to exclude the use of certain types of them — with a particular focus on freeware solutions — and to completely eliminate open-source tools in professional practice.

There is another aspect of using such tools that is worth remembering, namely intellectual property rights. In fact, the risks described above do not only apply to content containing confidential and personal data. Even if content covered by the intellectual property rights of the customer or third parties is uploaded into the translation systems, this will lead to an infringement of such rights. The consequence of using such a tool is that its provider is granted a very broad licence for further use of the uploaded content.

Furthermore, it is also worth bearing in mind that a foreign language text produced using a machine translation tool will not be protected by copyright, which can also have far-reaching consequences. According to reasonably explicit legal opinions, a machine translation has no author, belongs to the public domain and is not protected by copyright. In principle, copyright in works generated by computer software cannot be credited to the user of the software. At the same time, legal authors and commentators reject the concept of indirect authorship of the software developer, considering that a different view would give rise to rather insoluble difficulties concerning the practical implementation of the software developer’s rights. According to D. Flisak, no copyrightable material will be created if a machine actually replaces a human. In this context, the author gives an example of using translation software, noting that the output in the form of a translation of a text is not a product of human intellect’s efforts but of automated logic.

A similar view can also be found in foreign literature. A. Ramalho argues that, where there is no human author, the product cannot be original and, therefore, cannot be protected, thus becoming part of the public domain. While the author’s view should be endorsed, it is difficult to accept all the arguments she has presented. As European legal systems currently stand, AI products will not be protected because they have no human creator, but this does not automatically mean that they cannot be original[13].

The authors argue that also other recipients of translation services outside the legal sector should have at least basic knowledge of the issues that may prove critical; hence professionals and experts should openly share their knowledge in this area and raise awareness of the risks and propose, insofar as practicable, the best and safest solutions.

A market dominated by commercial operators offers the aforementioned tools, at no charge to the user, which allow machine translations to be delivered safely while respecting professional standards and third-party rights. The European Commission has developed the eTranslation[14] service, which currently supports 31 languages (including all official languages of the European Union), available free of charge to a specific group of users. These include the staff of the EU institutions, the national administrations of the Member States, Iceland and Norway, staff and students of linguistic faculties at higher education institutions, as well as individuals implementing projects funded under the Connecting Europe Facility (CEF). This tool, also known as the CEF.AT Platform (from “automatic translation”), is integrated with more than 90 different EC information systems and on-line services. These include the e-Justice Portal, ODR (Online Dispute Resolution), EUR Lex, N Lex, SOLVIT, or solutions: AT4LEX (“Authoring Tool for Legal Texts”), CheckLex (“Tool for verifying the authenticity of a document”).[15]

Two aspects should be mentioned as regards the latest developments with regard to using the eTranslation service. The first one is making the tool available since the end of March 2020 to small and medium-sized enterprises (SMEs) and micro-entrepreneurs. Interestingly, this is done in the form of a declaration — by indicating a user category when creating an account in the service.[16] However, due to Brexit, UK users have lost access to eTranslation since the beginning of this year.

So what distinguishes eTranslation from the systems whose terms of service have been analysed above? Firstly, a private IP-based network of the EU called TESTA (Trans-European Services for Telematics between Administrations) is used to operate the service. TESTA is a telecommunications interconnection platform for secure information exchange between the European and Member States administrations; it is currently handled under the ISA² European programme. An appropriate approach to the confidentiality and privacy of the text data transmitted and its adequate protection is crucial for the service in question. The high level of data security is a distinguishing feature and one of the main advantages of the eTranslation service. This is why the “made in Europe” solution should be recommended to public administration staff as suitable for translations, especially of the institutions’ internal documents. Simultaneously, it should be made clear that this will be a preliminary translation for information or illustrative purposes (gisting) and will require professional human editing, i.e., post-editing. The distinction between these two products — human translation and machine translation — is also reflected in the existing different ISO standards, i.e., ISO 17100 and ISO 18587, respectively. In this context, noteworthy is the latest information posted by the European Patent Office on its website, indicating the type of translation it offers and the ensuing implications.

Another legally important feature of the eTranslation service is that the user retains the rights to the translation. The service allows both translation of text fragments of up to 2,500 characters and asynchronous translation of entire files. In the former case, the translation is available on-line, while in the latter case, the user can additionally request that the translation file be sent to the email address associated with the EU login. The system stores the translations in the user’s individual account for 24 hours after which the files are deleted. There is also a “delete after download” option which, when ticked, will delete the text as soon as it is delivered to the user. In such case, the email address provided for delivery is stored until the document is sent and is then reduced to its domain only.

Naturally, the above description of the tool is not exhaustive. The co-author has selected only those elements which she believes to be relevant or interesting to users working with legal texts. She provided a detailed description of the eTranslation service in an article for “IT w Administracji” magazine, September 2020 issue[17].

One of the solutions that allow personal data to remain confidential is to use a text anonymisation tool. As part of the EU project MAPA (Multilingual Anonymisation for Public Administrations)[18], an international consortium led by a Spanish company, Pangeanic, is currently (01.2020-12.2021) developing a free tool for the public sector, with a particular focus on de-identification of legal or medical texts. The tool’s outcome is to ensure compliance with the European Union’s General Data Protection Regulation (GDPR) for all official EU languages. At this point, it should be made clear that, from a “technological” point of view, a solution that conceals or removes, for example, 98% of the data requiring de-identificationmay be considered effective. Similarly to machine translation, it is still impossible to achieve a perfect product in every case. The beta version of the tool for the Polish language should be available at the turn of the first and second quarter of this year. A pilot project involving a Polish public sector body might also be undertaken to adapt the software to the body’s needs. Following project completion, the software is expected to be made available under an open-source licence. At present it is possible to test Pangea Masker, a similar product, additionally allowing to adjust the level of sensitivity and choose specific tags.[19] In the second half of the year, a conference is planned in Poland to present the progress of the MAPA project and to demonstrate how the tool works. However, preliminary information was presented at the JURIX 2020 conference, which was summarised in an article published in the e-book entitled “Legal Knowledge and Information Systems”[20].

The above description of commonly available language technologies and the associated risks is not exhaustive. It is intended to educate legal practitioners that the apparent ease of use of certain tools can have far-reaching negative consequences. Thus, the minimum requirement for diligence is to be familiar with the terms and conditions of service and to avoid careless use of the illusory benefits of the Internet offering many options of “free” translation, for which we end up paying with our client’s data and our very own violation of professional and ethical standards.

In case of high and regular demand for translation services, it is worth using the support of a professional translation agency with quality certification for the translation services in general (ISO 17100), legal translation (ISO 20771) and information security (ISO 27001).

[1] See “Księga bezpieczeństwa komunikacji elektronicznej” [“The Electronic Communications Security Book”] prepared by the National Bar Association of Attorneys-at-Law, which analyses in detail email services and cloud storage services from the perspective of compliance with professional ethics rules:

Part 1: https://kirp.pl/ksiega-bezpieczenstwa-juz-dostepna/

Part 2: https://kirp.pl/czesc-druga-ksiegi-bezpieczenstwa-juz-dostepna/

[2] https://tlumaczenia-prawnicze.eu/lsps-responsibility-for-the-process-of-translation-and-the-rules-of-using-machine-translation/

[3] https://www.deepl.com/privacy.html accessed 22.02.2021.

^[4] https://www.deepl.com/pro-license.html #pro accessed 22.02.2021.

[5] https://www.smartcat.ai/terms/ Smartcat Terms of Use dated 10.11.2020, accessed 22.02.2021.

[6] https://www.smartcat.ai/customer-agreement/ Smartcat Customer Agreement dated 10.11.2020, accessed 22.02.2021.

[7] https://policies.google.com/privacy

[8] https://policies.google.com/terms Terms of Service dated 31.03.2020, accessed 22.02.2021.

[9] https://www.microsoft.com/en/servicesagreement/ Microsoft Services Agreement, effective: 01.10.2020, accessed 22.02.2021.

[10] Boston Business Journal https://www.bizjournals.com/boston/blog/startups/2014/04/fbis-boston-office-warns-businesses-of-venture.html

[11] Report No. 21/2019: Dr Samantha Hoffman, The Chinese Communist Party’s data-driven power expansion, https://www.aspi.org.au/report/engineering-global-consent-chinese-communist-partys-data-driven-power-expansion

[12] Global Tone Communication Technology Co. Ltd

[13] P. P. Juściński, Prawo autorskie w obliczu rozwoju sztucznej inteligencji, ZNUJ. PPWI 2019, No. 1, [Copyright law in the age of advances in artificial intelligence]

[14] https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/eTranslation

[15] eTranslation reuse: https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/Reuse

[16] https://webgate.ec.europa.eu/etranslation/public/welcome.html

[17] Issue theme: “Technologie języka w sektorze publicznym” [“Language technologies in the public sector”] and “Zastosowanie usługi eTranslation” [“Application of the eTranslation service”]

https://itwadministracji.pl/wydanie/wrzesien-2020/

[18] https://mapa-project.eu/

[19] AI-Powered anonymization – PangeaMT (pangeanic.com)

[20]Automatic Removal of Identifying Information in Official EU Languages for Public Administrations: The MAPA Project in the conference publication “Legal Knowledge and Information Systems JURIX 2020: The Thirty-third Annual Conference, Brno, Czech Republic, December 9-11, 2020”, IOS Press BV, pp. 223-226

The use of modern language technologies in law firm practice and the challenges of data confidentiality, intellectual property rights and the processing of related personal data

Wojciech Wołoszyk

ISO 24495-1:2023 Plain language — Part 2: Legal Writing and Drafting

Dodano przezTatiana Mankiewicz

Navigating the OCR Software

Dodano przezTatiana Mankiewicz

OCR. How it’s made?

Dodano przezTatiana Mankiewicz