This study examines the suitability of large pre-trained language models for analyzing and evaluating legal texts in education, showing that they are not yet fully developed in this domain.
Multilingual local models possess prior knowledge of basic concepts of the "Gutachtenstil" used for teaching purposes. By referencing carefully selected examples, they can assign components of this argumentative style, but still lag behind simpler, non-language-based models.
Large language models are particularly well-suited for evaluating and grading free-text assignments, as they already contain a lot of domain knowledge and do not need to be separately trained. In our experiments, they outperform simpler methods when evaluating English texts. However, this performance has not yet been transferable to the evaluation of complex German-language legal essays.
In this article by recode.law e.V., Olesja Kaltenecker and Jeremias Forssman take a closer look at project DeepWrite. Based on an interview with Christian Braun, Simon Alexander Nonn and Sarah Großkopf from the DeepWrite research project at the University of Passau, they examine the project itself – its strengths and opportunities – as well as the challenges that still exist. With careful preparation, AI can provide appropriate feedback, especially for shorter student solutions. Both the accuracy of the content and the (appraisal-)style as well as grammatical and lexical correctness are taken into account. However, the assessment of long solutions poses a challenge, not only in terms of the coherence of an argumentation throughout an entire exam.
This paper by Yujin Kang in the Korean Design Forum (한국디자인포럼) is based on a "Survey to determine the UX needs of law students", which was conducted in the winter semester 2023/24 by the Department of Law at the University of Passau. In this study, the responses of the survey participants with regard to both the User Experience and the integration of Artificial Intelligence into such a learning platform. With regard to the User Experience, the focus is on the User Interface and the design system, which reflect the requirements and preferences of future users. The positive appearance of the learning platform allows users to use the platform over a longer period of time and to maintain the users' attention. In addition, the article deals with the theoretical consideration of the AI- and design process and highlights the importance of Human-Computer-Interaction (HCI) from the point of view of User Experience Design by comparing the user interfaces of the Large Language Model ChatGPT in the versions GPT-2 and GPT-3.5.
In her blog post on fiete.ai, project collaborator Veronika Hackl outlines the basics of prompting for AI-generated feedback in the educational context. The feedback prompt process consists of three steps: definition of objectives, prompt formulation, and output evaluation. The text introduces various prompting techniques: Zero-Shot Prompting for simple feedback generation, Few-Shot Prompting for example-based learning, Chain-of-Thought Prompting for transparent evaluations, and Tree-of-Thoughts Prompting for multiple perspectives. Additionally, advanced concepts such as Hyperparameter Tuning and RAG systems (Retrieval-Augmented Generation) are explained. RAG enables the integration of proprietary documents, like teaching materials, into the feedback process. The text also addresses technical aspects, such as adjusting hyperparameters like temperature. The discussion concludes with an overview of current developments and challenges in AI-generated feedback, including integration into learning management systems and handling technical requirements.
In this article in the JuS (Juristische Schulung), research assistants Christian Braun, Sarah Großkopf and Simon A. Nonn discuss the question of whether ChatGPT can replace university lecturers - especially with regard to the teaching of legal reasoning skills and the "Gutachtenstil" using AI feedback.
In recent years, the Covid-19 pandemic and the associated rapid developments and advances in the field of digitalization have shown that didactics in higher education is currently undergoing change and that this process can and should be actively influenced in order to maintain the future viability and competitiveness of universities. A large part of this is the use of innovative technology and tools, such as artificial intelligence (AI), in particular large language models (LLM) and natural language processing (NLP), to create digital teaching and learning spaces for students of future generations.
This study reports the Intraclass Correlation Coefficients of feedback ratings produced by OpenAI's GPT-4, a large language model (LLM), across various iterations, time frames, and stylistic variations. The model was used to rate responses to tasks related to macroeconomics in higher education (HE), based on their content and style. Statistical analysis was performed to determine the absolute agreement and consistency of ratings in all iterations, and the correlation between the ratings in terms of content and style. The findings revealed high interrater reliability, with ICC scores ranging from 0.94 to 0.99 for different time periods, indicating that GPT-4 is capable of producing consistent ratings. The prompt used in this study is also presented and explained.
In this study we (Abdullah Al Zubaer, Michael Granitzer and Jelena Mitrović) investigate the effectiveness of GPT-3.5 and GPT-4 for argument mining in the legal domain, focusing on prompt formulation and example selection using state-of-the-art embedding models from OpenAI and sentence transformers. Our experiments demonstrate that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in classifying premises and conclusions, indicating a gap in these models' performance for complex legal texts. We also observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.
In this interview series, the Federal Agency for Civic Education presents three projects funded by the Federal Ministry of Education, Science and Research. As part of this series, Veronika Hackl was able to introduce the project DeepWrite to readers.
From 24-26 June 2022, ELSA-Passau organised the second June conference under the motto "Smart Law". This conference was to address nothing less than the future of law and the digitalisation of the legal profession. In the attached conference report, the project DeepWrite, among others, is discussed.