Sentence Simplification in Punjabi Language

Working on creating Punjabi Shahmukhi simplification corpus

Introduction

In the domain of language simplification, creating aligned monolingual parallel data sets tailored to specific linguistic dialects is a significant endeavor. In this pursuit, we present the pioneering Punjabi Simplification (PUSIM) corpus, which focuses on the Shahmukhi dialect.

Sentence Simplification

Sentence simplification refers to the process of making complex or intricate sentences easier to understand while retaining the original meaning. This is particularly useful for individuals who may have difficulty comprehending complex language, such as non-native speakers, individuals with learning disabilities, or those with limited literacy skills. Sentence simplification techniques may involve breaking down complex sentence structures, substituting difficult words with simpler synonyms, and rephrasing convoluted phrases to improve clarity and readability without altering the underlying message or content of the text. Simplified sentences are often employed in educational materials, instructional texts, and academic resources to facilitate comprehension and accessibility for a wider audience.

PUSIM corpus Creation

Two annotators worked on corpus creation which took almost 500 man hours. Both were Punjabi natives with fluent proficiency with one being a linguist and Punjabi expert.This would be the first corpus in Punjanbi which will release publically after the publication.

Goals and Objective

Our goal is to develop a sentence simplification system using the PUSIM language corpus that can be utilized in various natural language processing (NLP) tasks, such as machine translation. To achieve this, we have three objectives.

\nCreate a simplified corpus of Punjabi language while the meaning is preserved. \n Produce further simplified versions of a complex sentence with simple vocabulary and sentence structure. \n To produce the understandable content for the speakers and language learners.

Related Work