ThaiAcadLaws: A Legal Text Classification on Academic Obligation Domain using Text-to-Text Translating Encoder with GPT-3 Fine-tuning Decoder

Chantarat Kingsaeng, Lawankorn Mookdarsanit, Pakpoom Mookdarsanit

Abstract


This paper contributed a legal AI for Thai academic obligation domain, named ThaiAcadLaws as a new paradigm of Thai-NLP research areas. Since the academic obligation defined by Thai higher education had 5 tasks: teaching, researching, academic service, cultural art preservation and other tasks, ThaiAcadLaws architecture consisted of text-to-text translating encoder and GPT-3 fine-tuning decoder was designed for classifying those 5 target classes. The obligation data was 2,560 Thai texts. Each class had 512 texts.  The text-to-text translating encoder was based on SCB-MT-EN-TH 2020 zero-shot learning to translate the text, while GPT-3 fine-tuning decoder was few-shot learning from the translated text. The state-of-the-art results were 0.75 averaged accuracy for legal text classification.

Full Text:

PDF

References


S. Erdelez and S. O'Hare, "Legal Informatics: Application of Information Technology in Law," Annual Review of Information Science and Technology, vol. 32, pp. 367, 1997.

L. Soimart and P. Mookdarsanit, "An Admission Recommendation of High-school Students using Apriori Algorithm," in Proceedings of the 6th International Conference on Sciences and Social Sciences, Sep. 2016.

S. Thammaboosadee and U. Silparcha, "TCXML for Collection of Verdicts of Thai Dika Court," in Proceedings of the National Conference on Information Technology, Nov. 2006, pp. 179-186.

T. Tantisripreecha and N. Soonthornphisaj, "A study of Thai succession law ontology on supreme court sentences retrieval," in Proceedings of the International MultiConference of Engineers and Computer Scientists, 2010.

P. Osathitporn, N. Soonthornphisaj, and W. Vatanawood, "A scheme of criminal law knowledge acquisition using ontology," in Proceedings of the 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Jun. 2017, pp. 29-34.

T. Tantisripreecha and N. Soonthornphisaj, "Supreme court sentences retrieval using Thai law ontology," Intelligent Control and Computer Engineering, pp. 177-189, 2011.

V. Boonchom and N. Soonthornphisaj, "Legal ontology construction using ATOB algorithm," in Proceedings of Business Information Systems Workshops: BIS 2010 International Workshops, Berlin, Germany, May 3-5, 2010, Revised Papers, vol. 13, pp. 268-279.

V. Boonchom and N. Soonthornphisaj, "ATOB algorithm: an automatic ontology construction for Thai legal sentences retrieval," Journal of Information Science, vol. 38, no. 1, pp. 37-51, Feb. 2012.

T. Tantisripreecha and N. Soonthornphisaj, "Creating rules using abduction for legal reasoning by logic programming," in Proceedings of Business Information Systems Workshops: BIS 2011 International Workshops and BPSC International Conference, Poznań, Poland, Jun. 15-17, 2011, Revised Papers, vol. 14, pp. 282-293.

T. Tantisripreecha, K. Satoh, and N. Soonthornphisaj, "Legal reasoning engine for civil court procedure," in Proceedings of Intelligent Computing Methodologies: 10th International Conference, ICIC 2014, Taiyuan, China, Aug. 3-6, 2014, Proceedings, vol. 10, pp. 500-512.

T. Tantisripreecha and N. Soonthornphisaj, "LASTC: Legal Advisory System for Thai Cheque Law," in Proceedings of New Perspectives in Information Systems and Technologies, vol. 1, 2014, pp. 503-512.

T. Tantisripreecha and N. Soonthornphisaj, "LegalEX: An expert system for law firm," Intelligent Decision Technologies, vol. 10, no. 3, pp. 315-328, Jan. 2016.

S. Thammaboosadee and U. Silparcha, "A framework for criminal judicial reasoning system using data mining techniques," in Proceedings of the 2nd IEEE International Conference on Digital Ecosystems and Technologies, Feb. 2008, pp. 518-523.

S. Thammaboosadee and U. Sulparcha, "A GUI prototype for the framework of criminal judicial reasoning system," Journal of International Commercial Law and Technology., vol. 4, pp. 224, 2009.

S. Thammaboosadee, B. Watanapa, and N. Charoenkitkarn, "A framework of multi-stage classifier for identifying criminal law sentences," Procedia Computer Science, vol. 13, pp. 53-59, Jan. 2012.

S. Thammaboosadee and B. Watanapa, "Identification of criminal case diagnostic issues: a modular ANN approach," International Journal of Information Technology & Decision Making, vol. 12, no. 3, pp. 523-546, May 2013.

K. Kowsrihawat, P. Vateekul, and P. Boonkwan, "Predicting judicial decisions of criminal cases from Thai Supreme Court using bi-directional GRU with attention mechanism," in Proceedings of the 5th Asian Conference on Defense Technology (ACDT), Oct. 2018, pp. 50-55.

T. Chusri, S. Arsaibun, P. Chokesuwattanaskul, E. Chuangsuwanich, and A. T. Rutherford, "Few-Shot Law Retrieval System for Supreme Court Cases," in Proceedings of the 20th International Joint Conference on Computer Science and Software Engineering (JCSSE), Jun. 2023, pp. 84-89.

L. Lowphansirikul, C. Polpanumas, A. T. Rutherford, and S. Nutanong, "A large English–Thai parallel corpus from the web and machine-generated text," Language Resources and Evaluation, vol. 56, no. 2, pp. 477-499, Jun. 2022.

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, "Language models are few-shot learners," Advances in Neural Information Processing Systems, vol. 33, pp. 1877-1901, 2020.

H. T. Koanantakool, T. Karoonboonyanan, and C. Wutiwiwatchai, "Computers and the Thai language," IEEE Annals of the History of Computing, vol. 31, no. 1, pp. 46-61, Mar. 2009.

P. Mookdarsanit and L. Mookdarsanit, "ThaiWrittenNet: Thai Handwritten Script Recognition Using Deep Neural Networks," Azerbaijan Journal of High Performance Computing, vol. 3, no. 1, pp. 75-93, 2020.

O. Surinta and L. Schomaker, "Overview of handwritten Thai character recognition," Lecture Notes Online, 2010.

T. Emsawas and B. Kijsirikul, "Thai printed character recognition using long short-term memory and vertical component shifting," in Proceedings of PRICAI 2016: Trends in Artificial Intelligence: 14th Pacific Rim International Conference on Artificial Intelligence, Phuket, Thailand, Aug. 22-26, 2016, Proceedings, vol. 14, pp. 106-115.

L. Mookdarsanit and P. Mookdarsanit, "An Adversarial Perturbation Technique against reCaptcha Image Attacks," Journal of Science and Technology Buriram Rajabhat University, vol. 4, no. 1, 2020.

K. Kosawat, M. Boriboon, P. Chootrakool, A. Chotimongkol, S. Klaithin, S. Kongyoung, K. Kriengket, S. Phaholphinyo, S. Purodakananda, T. Thanakulwarapas, and C. Wutiwiwatchai, "BEST 2009: Thai word segmentation software contest," in Proceedings of the 8th International Symposium on Natural Language Processing, Oct. 2009, pp. 83-88.

L. Mookdarsanit and P. Mookdarsanit, "ThaiWritableGAN: Handwriting Generation under Given Information," International Journal of Computing and Digital Systems, vol. 10, no. 1, pp. 689-699, 2021.

C. Haruechaiyasak, A. Kongthon, P. Palingoon, and K. Trakultaweekoon, "S-sense: A sentiment analysis framework for social media sensing," in Proceedings of the IJCNLP 2013 Workshop on Natural Language Processing for Social Media (SocialNLP), Oct. 2013, pp. 6-13.

P. Mookdarsanit and L. Mookdarsanit, "The COVID-19 fake news detection in Thai social texts," Bulletin of Electrical Engineering and Informatics, vol. 10, no. 2, pp. 988-998, Apr. 2021.

S. Aphiwongsophon and P. Chongstitvatana, "Identifying misinformation on Twitter with a support vector machine," Engineering & Applied Science Research, vol. 47, no. 3, Jul. 2020.

L. Mookdarsanit and P. Mookdarsanit, "Thai NLP-based Text Classification of the 21st-century Skills toward Educational Curriculum and Project Design," International Journal of Applied Computer Technology and Information Systems, vol. 11, no. 2, pp. 62-67, 2022.

L. Mookdarsanit and P. Mookdarsanit, "The Insights in Computer Literacy toward HR Intelligence: Some Associative Patterns between IT Subjects and Job Positions," Journal of Science and Technology RMUTSB, vol. 4, no. 2, pp. 12-23, 2020.

A. Chattupan and P. Netisopakul, "Thai stock news sentiment classification using wordpair features," in Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, 2015, pp. 188-195.

P. Mookdarsanit and L. Mookdarsanit, "TGF-GRU: A Cyber-bullying Autonomous Detector of Lexical Thai across Social Media," NKRAFA Journal of Science and Technology, vol. 15, no. 1, pp. 50-58, 2019.

S. Hemtanon, K. Phetkrachang, and W. Yangyuen, "Classification and keyword extraction of online harassment text in Thai social network," Bulletin of Electrical Engineering and Informatics, vol. 12, no. 6, pp. 3837-3842, Dec. 2023.

L. Mookdarsanit and P. Mookdarsanit, "Combating the hate speech in Thai textual memes," Indonesian Journal of Electrical Engineering and Computer Science, vol. 21, no. 3, pp. 1493-1502, 2021.

P. Mookdarsanit and L. Mookdarsanit, "Thai-IC: Thai Image Captioning based on CNN-RNN Architecture," International Journal of Applied Computer Technology and Information Systems, vol. 10, no. 1, pp. 40-45, 2020.

K. Dittakan, K. Prompitak, P. Thungklang, and C. Wongwattanakit, "Image caption generation using transformer learning methods: a case study on Instagram image," Multimedia Tools and Applications, vol. 83, pp. 46397–46417, Oct. 2023.

P. Mookdarsanit and L. Mookdarsanit, "Thai Text-to-Image Prompt Engineering by Pre-trained Large Language with Stable Diffusion Model," Azerbaijan Journal of High Performance Computing, vol. 6, no. 2, pp. 171-190, 2023.

N. Sanguansub, P. Kamolrungwarakul, S. Poopair, K. Techaphonprasit, and T. Siriborvornratanakul, "Song lyrics recommendation for social media captions using image captioning, image emotion, and caption-lyric matching via universal sentence embedding," Social Network Analysis and Mining, vol. 13, article no. 95, Jun. 2023.

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving Language Understanding by Generative Pre-Training," OpenAI, 2018.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, vol. 1, no. 8, p. 9, Feb. 2019.


Refbacks

  • There are currently no refbacks.