Language Identification Project
I wanted to complete some practice problems and build a small basic portfolio of small tech projects that showcase some of my recent learning at the University of Edinburgh. I have taken several courses about natural language processing (NLP) at uni. NLP is a very wide term for a variety of different tasks, for example, text classification, named entity recognition, machine translation, language modeling, etc. This project was just a simple text classification task. I wanted to input text and have a model predict which language it is. Find the code on my Github , and my steps that I took to create the project below. Hope this helps facilitate your learning too. Step 1: Find data Foundational to any project involving any sort of prediction. You need data and preferably a lot of it. I wanted to predict language, and one corpus of language data that I already knew about from school was the Europarl Dataset . This is a corpus of EU meetings that are translated into the languages of the E