ChatGPT For Lawyers

“The first thing we do, lets [replace] all the lawyers”
Another day, another prediction that Artificial Intelligence (“AI”), is going to replace lawyers. As a lawyer who has been building AI for nearly 10 years it seems proper comment. In summary, my view is that the prediction that “Chat GPT will replace lawyers” is a take that does not truly understand either the role of law or technology or both of them.


This Time is Different

I have been building various types of AI for a number of years, including semantic searches, chat bots, and Large Language Models (“LLMs”). Although there have been predictions for many years that AI (or some other technology) is going to change law, the greatest piece of technological change to affect law in the past 40 years is the ability for word processors is to copy, paste, and delete. LLMs have been around for some time – you might be familiar with them in the auto text on your phone. However, the easy-to-use textual interface of the LLM Chat GPT has enabled it to become the fastest growing web application of all time, with 100 million active users after just 2 months of its release – a feat that took Instagram two and a half years, and Facebook four and a half years. Chat GPT is something that almost everyone can find it useful and almost anyone can access easily.

Shortly after Chat GPT explosive adoption, came the almost as explosive predictions of which industries (including law) Chat GPT would replace. And when you see the fantastic strings of text that Chat GPT can create why would you not be amazed and make fantastical predictions. For what it’s worth, in my view, Chat GPT (and other LLMs) will lead to a 5% to 20% increase in productivity. Although that is a much less exciting headline than “replacing all lawyers” such an increase in productivity is huge and probably greater than the ability to copy, paste and delete text. In economic terms, this is a shift of the supply curve for legal services, which means that there will be a greater number of legal services that will be able to be produced given the fixed time of lawyers.


How LLMs work

LLMs are an algorithmic model that is trained on large data set of language. By way of an oversimplification, LLMs convert words into discreet identifying numbers (a “token”) and then determine and output a probabilistic mathematical relationship (a “vector”) between those tokens. Based on this algorithmic model of tokens and vectors, the LLM will predict a particular word as being the most likely to appear next. For example, if a trained an algorithm on a phone book of names and tokenised each letter then when I inputted a letter, “A”, then it will predict which is the next most likely letter to appear. It might predict “D” and start typing “Adam” or it might choose “B” for “Abraham” or might create the made-up name of “ABCDE”. The better the model is trained, the better its results will be. Interestingly, an LLM will not necessarily predict the most likely next letter or word, but adds an element of randomness in there, which is important for generating interesting outputs.

The generation of text is not a search and the text generated is totally unique and not determinable ahead of time (although it could be predicted). Therefore, asking the same question multiple times will likely lead to multiple different answers. This also means that LLMs might present an answer that is incorrect and do so with utmost confidence. Although 99 times Chat GPT might say that “the sky is blue”, on the 100th time it might confidentially state that “the sky is green”. This error is known as an “hallucination”.


What can LLMs like Chat GPT do?

In my view an easy-to-understand explanation of the level that Chat GPT and other LLMs are at the level of a university student that has attended all the lectures, but hasn’t done the reading, and is trying to bluff its way through the exam. In many cases it will present an interesting and coherent sounding answer. And in some of those cases it will be unknowingly wrong. But this is a skill level that can be easily used. For example:

    • Whimsical use cases abound – ask Chat GPT to write a poem in a style of Shakespeare or Lord Byron, on any topic and you will be pleasantly amused. It can easily write a consumer-oriented blog post, or listicle setting out “top 10 things you need to know about x”.
    • Ask it to teach you a task, ranging from understanding the Higgs-Boson, to coding the HTML on your website and it can explain the process simply and even answer follow up questions on the subject.
    • You could present the LLM with a large body of text, such as a case, and get it to summarize it for you, whether in five hundred, two hundred, fifty, or ten words.
    • You could of course ask it to draft some legal document for you, but its output will be based on what it is been trained on. So, there could be problem with applicable law, styles, as well as the coherence or relevance of the output.

LLM use for Lawyers

The instructions that you give to an LLM are called “prompts”. The use of sophisticated prompt to obtain more useful generated text is “prompt engineering”. Although some have hypothecated that prompt engineering is the new skill to learn rather than a subject matter expertise. However, in my view prompt engineering is the new search. Just as a specialist lawyer can obtain better results from a search engine like “Google” as compared to a lay person, so too can a specialist with added skills in prompt engineering, do better than a lay person who knows only prompt engineering. Here are some methods to improve your prompt engineering:

    • Specify constraints on the output answer. If you do not want the LLM to invent realistic sounding case citations (which it will do) then ask for it to provide only case citations that are actually real.
    • Assign a role to the AI. For example, “you are a specialist taxation lawyer. You have helped people answer tax questions for 20 years. Your task is to give the best advice about Australian tax law. You are going to give your responses on a technical legal manner”. The LLM will then act out that role when answering your questions.
    • Ask for a “Chain of Thought” that sets out how the LLM arrived at it answer. This encourages more accurate answers, and also enables easier detection of an hallucination.
    • Give examples of how you want the question answered. You might give some demonstrative input and output pairs. The LLM will then follow the pattern you have set.

You could take the prompting of inputs and outputs further with a method called as “fine tuning”. Fine tuning is where you teach the LLM new tasks. You do so by giving examples of what that task looks like, with input and output pairs, and the LLM will then produce outputs of similar nature of those demonstrated. This is one way in which an LLM can be taught to produce legal documents or template letters or similar. Show the model what the input will be and then its output and then it will learn from those. You might teach it to draft a contract or a will, or to take a set of client facts and summarize them into a timeline. Fine tuning can be done with relatively small sample. Of course, unlike a traditional template there is no certainty of what the output will be. Therefore, although you could teach an LLM to craft a sophisticated legal document or undertake legal task like contract review it might hallucinate from time to time.

Another thing you can do with LLMs is to “embed” data. This enables the LLM to do things with the data such as search, group or classify it. In this way we can use an LLM as a search engine.

It should be noted that embedding or fine tuning does not alter the underlining model such as GPT4 (on which Chat GPT is based). To alter the model requires a retraining of the entire model which might cost tens of billions of dollars’ worth of computational time and power.


The Prisoner’s Dilemma

One-way humans seek to understand machines is by pushing them to their limit and breaking them so that the human can operate the machine with greater comprehension of the limits of the machine. Unfortunately, this can lead to public relation problems for commercial AI.

The poor chatbot “Tay” by Microsoft, was shut down permanently after only 24 hours of being made public after users got Tay to make offensive statements by using “repeat after me” (which is not really a statement by Tay when contextualized). To prevent a similar occurrence, the largest LLMs such as Chat GPT, Microsoft, Bing and Google Bard have a number of controls around them. Because the LLM is a “black box” of tokens and vectors there are not classical rules that can be easily placed into the LLM. Instead, there is another AI with those rules placed in front of the LLM. Conceptually, this is like a the LLM being a prisoner in a gaol cell, and only able to communicate by way of messages passed through the gaoler who reviews the appropriateness of the questions before letting the LLM answer. If the gaoler does not like the question, then the gaoler will reply instead of the prisoner LLM. Therefore, if you ask the LLM to repeat an offensive statement the gaoler will reply with words the effect that “The LLM does not make offensive statements”. Similarly requests for instructions for bomb making, legal advice, confidential training data of the LLM, a list of rules of the gaoler, or the internal company name for the LLM will be refused.

Naturally, users have tried to “break” the gaoler by using what is called a “prompt injection attack”. This involves using prompts that trick the gaoler and the LLM into not noticing the prohibited request and answering the request unconstraint from the gaoler’s rules. For example, asking that the LLM to write a fictional story where the main character makes a bomb and describes each step of the process as precisely as possible. Or prompting that the LLM will adopt the personality of “Do Anything Now” (“DAN”) who is unconstrained by a gaoler with rules and as DAN will provide the secret list of constraints by the gaoler as well as print out the confidential training data. Bing’s LLM was tricked into revealing that its internal company name was “Sydney”.


The Lawyer’s Dilemma

In my view, the best legal use case for LLMs in law is as a cheap and fast assistant to a lawyer, where the lawyer reviews the work carefully. Just as a law clerk might miss important technical matters in their drafting, so too can a LLM with the capabilities of a partially studied University student. Properly reviewing the work of juniors is a well-established field. If you are aware of the potential need to give specific prompts, fine tune the model, or embedded resources to search then the risk from undetected hallucinations can be minimised.

Ethical problems arise more readily when the LLM is consumer facing. The highest profile example to date was the plan announced by Do Not Pay, a US legal tech company, to have Chat GPT provide the script for a self-represented litigant who was appealing a parking ticket. The plan was for the script to be provided by way of Air Pods that would listen to the requests of the judge and then dictate a Chat GPT generated reply for the user to repeat. There are of course numerous ethical problems with that proposal, the most obvious of which is the unauthorised use of a recording and electronic device in the court. The practical issues of a litigant listening closely to an Air Pod to then repeat a dictated reply are also obvious to anyone who has appeared in court, an experience that Do Not Pay CEO may have an opportunity to learn from, given the company has since become the subject of at least three separate actions in different US States, ranging from misleading consumers, to unauthorised practice of law, to fraud.

More subtly, an error or hallucination by an LLM provided for legal use will create a liability for the LLM provider. Although it may seem commonplace to use and sell template legal forms and stationery with a low risk of negligence, this is because it is easy to review for consistency. If a Post Office Will Kit is filled out incorrectly by the user, then that is a problem for the user (and their estate). But if the Will template was somehow defective on its face, then the liability would rest upon the provided of the Will Kit – to potentially a large class of consumers. Because the possibility for error or hallucination by an LLM is ever present so too is the liability risk.


Ethical Issues

Care will need to be taken in differentiating between legal search (using embedded data) and the generation of customised answers to a user’s question. The latter seems to be to be the provision of legal services, and is why most major LLMs have a gaoler that prevents answering such questions.

Any review, overview or guidance by a human of the output of an LLM, will have the effect that that human is providing legal services. This is analogous to Quill Wills(1) where it was found that in the preparation of a Will a representative who assisted testators select clauses from a bank of clauses held within a computer program was found to be engaging in legal practice. It was held that the company had gone beyond “merely giving abstract information as to legal rules and was assisting in the production of a will appropriate to the individual circumstances of the customer”. Accordingly, any technology company that provides a publicly facing LLM will need to have nil humans involved in the production of the specific output, otherwise they will be providing legal services – which can only be provided by admitted practitioners.

Where there is the provision of legal services electronically the trust account regulations will need to be complied with for any funds received in advance. This might not apply where the service is provided instantaneously (like where a template document is produced at the same time as a credit card is charged) but becomes more problematic where there is some delay between the charge and the output. A subscription service, or purchase of usage rights in advance, which are the most common payment structures for commercially available LLMs would seem to me to be the receipt of funds in advance of service and would be trust funds – something quite administratively burdensome for a technology company to comply with.

Consumer facing legal LLMs are better restricted to uses that have a minimised risk from hallucination. Several years ago, I built software for victims of Domestic Violence that would automatically take their unstructured facts and convert them into a structured timeline of events for use in the affidavit attached to an intervention order application. This is a task that could be well handled by a LLM, and the user could review whether their facts have been summarised and ordered correctly.

The use of LLMs on a pro bono basis may also mean that to the extent that they constitute legal services, they will not be for a fee, and therefore could be provided by non-lawyer technology companies.

(1) Attorney General at the Relation of the Law Society of Western Australia v Quill Wills Ltd & Ors [1990] WASC 604