நவம்பர் 2015 ~ ந.தெய்வ சுந்தரம்

சனி, 28 நவம்பர், 2015

பொருள்மயக்கமும் மொழியியலும் - ஒரு எழுத்துவழி உரையாடல்

9:59 PM

ந.தெய்வ சுந்தரம்

அமெரிக்காவில் பணிபுரியும் கணினியியல் அறிஞரும் தமிழ்க் கணினிமொழியியலில் மிகுந்த ஆர்வத்துடன் உழைத்துவரும் திரு. வேல்முருகன் சுப்பிரமணியன் அவர்கள் வேறொரு தளத்தில் எழுப்பிய

ஒரு வினாவையொட்டி, நான் தெரிவித்த கருத்துகளை இங்குத் தொகுத்து முன்வைக்கிறேன் ...மொழியியலில் ஆர்வம் உள்ளவர்கள் படிக்கலாம்.

திரு. வேல்முருகன் சுப்பிரமணியன் :

-----------------------------------------------------------------

"அவன் பழத்தைத்தின்றிருக்கக்கூடாது." என்ற சொற்றொடரை கருத்திலெடுங்கள்.

இதில் புகாருக்கு எது காரணமாகிநிற்கின்றது.?

1. பழம்

2. தின்னப்பட்டசெயல்

தற்காலநடையால் இதற்கு பதிலளிக்கமுடியுமா?

ந. தெய்வ சுந்தரம்:

--------------------------------------

(1) ஒரு தொடரில் பொருள் மயக்கம் எற்படுவதற்குப் பல காரணங்கள் உண்டு. ஒரு சொல் ஒன்றுக்கு மேற்பட்ட பொருள்களைக் ( lexical ambiguity) கொண்டிருக்கலாம். அல்லது தொடரமைப்பு ஒன்றுக்கு மேற்பட்ட பொருள்களைக் (structural ambiguity) கொண்டிருக்கலாம். அல்லது எந்த மொழிவழிச்செயல் ( speech act) என்பதை முடிவுசெய்வதில் குழப்பம் ஏற்படலாம். அல்லது எதை முன்னிலைப்படுத்துகிறோம் ( focus of the utterance) என்பதைத் தெரிந்துகொள்வதில் குழப்பம் ஏற்படலாம். இவற்றையெல்லாம் தீர்ப்பதற்கான ஒரு துறைதான் கணினிமொழியியல். எனவே ஒரு தொடரின்

பொருளைத் தெரிந்துகொள்வதற்குக் குறிப்பிட்ட தொடர் (text) , அத்தொடருக்கு முந்திய பிந்திய தொடர் ( co-text), மொழிசாரா புறச்சூழல் (context) ஆகியவை தேவைப்படுகின்றன. இதற்கு ஹாலிடேயின் Functional Grammar அதிகமாகப் பயன்படும். தொடரின் சொற்கள், அமைப்பு கொடுக்கிற பொருளோடு ( Semantics) , பின்புல அறிவும் ( Pragmatics) தேவைப்படுகிறது. மொழியியலின் அடிப்படை நோக்கமே இவற்றை விளக்குவதுதான்.

(2) Active voice, passive voice தொடர்களுக்கு இடையிலும் இந்தப் பொருள் வேற்றுமை உண்டு. செயலை அல்லது செயல் மேற்கொள்ளப்பட்ட பொருள் அல்லது நபரை முன்னிலைப்படுத்தவே passive voice. தொடரின் மொழியமைப்பு - சொல், இலக்கணம் இரண்டும் - தருகிற பொருள், மனிதமூளையின் அறிவு அல்லது சிந்தனைப்புலத்திற்கு ( cofnition domain - domain where knowledge is reprsented)

அனுப்பப்படுகிறது. இந்தச் சூழல் சாராத (context -independent) தொடர்ப் பொருளானது , சிந்தனைப்புலத்திற்கு அனுப்பப்பட்டு , சூழல்சார்ந்த பொருள் (context dependent) பெறப்படுகிறது. எனவே சாம்ஸ்கி , சிந்தனைப்புலத்திற்கு அனுப்பப்படுகிற அமைப்பை logical form என்று அழைப்பார். இந்த அமைப்பானது அறிவு தேக்கிவைக்கப்பட்டுள்ள ( domain where knowledge is represented) புலத்திற்கு அனுப்பப்பட்டால்தான், தொடரின் முழுப்பொருள் - உண்மையான பொருள் ( intended meaning) தெரியவரும். எனவேதான் கணினிமொழியியலார், மொழியமைப்பு தொடர்பான அறிவு, அறிவுதொடர்பான அறிவு இரண்டையும்

கணினிக்கு அளிக்கும்போதுதான் முழுப்பொருள் கிடைக்கும் என்று கூறி, அதற்காக உழைத்துவருகின்றனர். மொழியியலிலும் இதற்காகவே கோட்பாடுகள், ஆய்வுமுறைகள் முன்வைக்கப்பட்டுவருகின்றன.

திரு. வேல்முருகன் சுப்பிரமணியன்:

----------------------------------------------------------------------

அதுபோன்று சூழல்கள் இல்லாதபோது, ஒரு மொழியிலுள்ள ஒரு சொற்றொடர்

1. ஒன்றிற்குமேற்பட்டபொருளைக்கொடுக்காமலிருப்பது (or less ambigious)

2. (logical conclusions) ஏரணத்தின் அடிப்படையில் பலமுடிவுகளைக்கொடுப்பது

ஆகியவை அம்மொழியின் வலிமைகளாகக்கருதப்படக்கூடியனவா?

ந. தெய்வ சுந்தரம்:

-------------------------------------

(3) ஒன்றிற்கு மேற்பட்ட பொருள்களைத் தருவது, ஏரணத்தை அடிப்படையாகக் கொண்டது - இவையெல்லாம் இயற்கைமொழிகளின் வலிமையே. குறைந்த சொற்கள், இலக்கணத்தை வைத்துக்கொண்டு, பரந்துபட்ட உலகத்தை வெளிப்படுத்துவதுதான் இயற்கைமொழிகளின் சிறப்பும் படைப்புத்திறனும் ஆகும். இந்தத் திறமையைப் பெற்றதுதான் மனித மூளை. ஆனால் இத்திறமை மனித இனத்திற்கே உரிய ஒன்று ( human species specific - genetically determined, biological endowment.) இயற்கைமொழிகளின் இத்திறமையைக் கணினிக்கு முழுமையாக அளிப்பது ஒருபோதும் முடியாது என்பதே சாம்ஸ்கி

போன்றவர்களின் கருத்து. எனவேதான, ஒரு சொல்லுக்கு ஒருபொருள், ஒரு தொடரமைப்புக்கு ஒரு விளக்கம் என்று அமைகிற செயற்கை நிரலாக்க மொழிகளை உருவாக்கி, அவற்றின் வழியாகக் கணினியின் செயல்பாடுகளுக்கான நிரல்களை உருவாக்குகிறோம். இயற்கைமொழிகளின் இத்திறமையை விளக்கத்தான் கடந்த 100, 150 ஆண்டுகளாக மொழியியலாளர்கள் முயன்றுவருகின்றனர். பல முனைகளில் ஆய்வு மேற்கொண்டு, பல வகை மொழியியல் கோட்பாடுகளையும் வடிவங்களையும் ( linguistic formalism) முன்வைத்து ஆய்வு மேற்கொண்டுவருகின்றனர். இன்னும் முழுமையாக வெற்றிபெற முடியவில்லை. சாம்ஸ்கி தொடர்ந்து தனது கோட்பாடுகளையும வடிவங்களையும் ( 1957, 1965, 1970, 1975, 1979, 1992 ....) மாற்றியும் வளர்த்தும் வருகிறார். மேலும் பலர் பல்வேறு வகை மாற்றுக் கோட்பாடுகளையும் வடிவங்களையும் முன்வைத்துவருகிறார்கள் ( Generalized Phrase Structure Grammar , Lexical functional Grammar, , Head driven Phrase structure Grammar, Tree Adjoining Grammar, Dependency Grammar , Tagmemic grammar, Systemic Grammar, Cognitive grammar, word Grammar .... ) இவற்றையெல்லாம் நான் கூறுவதற்குக் காரணம், மொழியியலின் நோக்கங்களையும் பயன்பாடுகளையும்

வெளிப்படுத்தவேண்டும் என்பதேயாகும்.

Read more »

புதன், 25 நவம்பர், 2015

செயற்பாட்டுமொழியியல் கருத்தரங்கம் - 2015 நவ. 26,27

12:07 AM

ந.தெய்வ சுந்தரம்

No comments

கோவை பாரதியார் பல்கலைக்கழக மொழியியல்துறையில் 2015. நவம்பர் 26, 27 வியாழன், வெள்ளி இண்டு நாட்களிலும் கருத்தரங்குகள் நடைபெறுகின்றன. முதல்நாள், செயற்பாட்டுமொழியியல், இரண்டாவது நாள், பழங்குடியினர் மொழிகள். இரண்டு நாட்களிலும் கருத்தரங்குகளில் நான் கலந்துகொள்கிறேன்.

Read more »

திங்கள், 23 நவம்பர், 2015

ஐந்தாண்டுகளுக்கு முன்னர் தமிழ்க்கணினிமொழியியல் மையம்குறித்து ....

10:41 PM

ந.தெய்வ சுந்தரம்

No comments

ஐந்தாண்டுகளுக்கு முன்னர் தமிழ்க்கணினிமொழியியல் மையம்குறித்து ....
-------------------------------------------------------------------------------------------------------------------

http://www.thehindu.com/features/education/college-and-university/centre-for-tamil-computing-mooted/article124105.ece

Centre for Tamil computing mooted

A proposal for the establishment of two Centres for the Development of Computational Linguistics (CDCL) in Tamil will be prepared soon, said M. Anandakrishnan, educationist and chairman, Board of Governors, IIT-Kanpur.

Speaking at a conference on Tamil computing organised by the Department of Tamil, University of Madras, in Chennai last week, he said though the government had recognised the need for Tamil computing more than ten years ago, not much progress had been made in the field.

He noted that only one professor in one department in one university in Tamil Nadu (N. Deiva Sundaram of University of Madras) was working in a field which needed inputs from many fields including computer science, physics, anatomy, and Tamil linguistics.

Establishing a CDCL would cost only around Rs. 7 crore and the Centre would spur the growth of research in this field by becoming a focal point for inter-disciplinary work, he said.

Welcoming the idea, IT Minister Poongothai Aladi Aruna said the State government would look at the detailed proposal and take a decision. The government had instituted the Tamil Virtual University, which had an average of around 2.5 lakh visitors to the home page each year, and around 7,000 students registered in it, to provide a platform for Tamil computing.

G. Thiruvasagam, vice-chancellor, University of Madras, said the university would constitute a panel of three experts who would look at the software developed by individuals at the Oriental Research Institute on the first Saturday of each month between 10 a.m. and 1 p.m.

He said the university would also publish an expanded Tamil lexicon by the end of the year with the first volume coming out by the end of March. The classic A.C. Chettiar English-Tamil dictionary which had been out of print for the last 10 years would also be reprinted in the next two months with 25,000 copies in the first run, he announced.

N. Deiva Sundaram, professor, Tamil department, said in the IT world, Tamil needed to be updated and the three-day seminar had brought experts from different countries to look at specific needs in the field.

Keywords: Centre for the Development of Computational Linguistics, Tamil computing, M. Anandakrishnan, chairman, Board of Governors, IIT-Kanpur, Department of Tamil, University of Madras, N. Deiva Sundaram, Oriental Research Institute, Tamil lexicon, A.C. Chettiar English-Tamil dictionary

Read more »

வெள்ளி, 13 நவம்பர், 2015

கணினித்தமிழ் வளர்ச்சியில் தமிழாசிரியர்களின் பங்கு

5:25 AM

ந.தெய்வ சுந்தரம்

No comments

கணினித்தமிழ் வளர்ச்சியில் தமிழாசிரியர்களின் பங்கு

----------------------------------------------------------------------------------------

திரு. திருவள்ளுவன் இலக்குவனார் அவர்கள் ஒரு மடலாடல் குழுவில் எழுப்பிய வினா: வினாவின் முக்கியத்துவம் கருதி, அதை இங்கே எனது விளக்கத்துடன் தருகிறேன்..

தமிழ்த்துறையுடன் கணித்துறையை இணைத்துச் செயலாற்றும் முயற்சி தேவை என்பது இருதுறையிலும் ஆர்வம் உள்ளவர்கள் விழைவு. சிதம்பரத்தில் உத்தமம் மாநாடு நடந்த பொழுது முனைவர் தெய்வசுந்தரத்திடம் சில ஆண்டுகளாகவே உத்தமம் போன்ற நிறுவனம்,கணிமநுட்ப வளர்ச்சியிலேயே கருத்து செலுத்துவதாலும் தமிழ் சார்ந்த கணியமைப்பு தேவை என்பதாலும் அவர் தலைமையில் ஓர் அமைப்பைத் தோற்றுவிக்குமாறு கூறினேன். இவ்வாறான எண்ணம் இருப்பதை நானறிவேன் என்று அவர் சொன்னாலும் வேறுஒன்றும் சொல்லவில்லை. சிங்கப்பூர் உத்தமம் மாநாட்டில் கருத்தரங்க அமர்விலேயே தமிழ் சார்ந்த அமைப்பைத் தொடங்கினால், போட்டி அமைப்பாகக்கருதக்கூடாது என்றும் உத்தமம், கணித்தமிழ்ச்சங்கம் ஆகியன, கணித்துறையில் தமிழை வளர்க்கவும் கணித்தமிழுக்கான அமைப்பு, தமிழ்த்துறையில் கணித்துறையறிவை வளர்க்கவும் பாடுபட வேண்டும் என்றும் கூறினேன்.

கணித்தமிழ்ச்சங்கத்தில் தொடக்கத்தில் தமிழியல்துறையினர் பங்கேற்பிற்கு வழியில்லாமல் இருந்தது. மறைந்த நண்பர் ஆண்டோ பீட்டர்படிப்படியாகப் பங்களிப்பு விகிதத்தை உயர்த்தினார்.ஆனால், இப்போதைய தலைவரான நண்பர் ஆனந்தன், அண்மையில் சிறப்பாக நடத்திய கருத்தரங்கத்தின் பொழுது, வாணாள் உறுப்பினர்களாகத் தமிழ்த்துறையினர் சிலர் வர விரும்புவதைத் தெரிவித்த பொழுது, முழுமைமயும் தொழில்நுட்பம் சார்ந்த அமைப்பாகமாற்ற இருப்பதாகவும் எனவே, தமிழ்த்துறையினர் யாரையும் உறுப்பினராகச்சேர்ப்பதாக இல்லை என்றும் தெரிவித்தார்.

எனவே, கணித்தமிழ் விருதாளர் முனைவர் தெய்வசுந்தரம் கணித்தமிழ்மொழியியல் ஆய்வு நிறுவனம்ஒன்றை அமைக்கஇருப்பது காலத்தின் கட்டாயத்தேவையாகும்.இந்நிறுவனம் சிறப்புற்றோங்க வாழ்த்துகிறேன்.
தொடர்பிலான இரண்டு வேண்டுகோள்கள்.
1. நிறுவனம் சார்ந்த பொது உறுப்பினர்கள் கொண்ட அமைப்பு ஒன்றையும் தொடங்கி ஆர்வலர்கள் பங்கேற்பிற்கு வழி வகுக்க வேண்டுகின்றேன்.
2. மொழியியல் துறையினர்மட்டும் கொண்ட அமைப்பாக நிறுவாமல், தமிழ் இலக்கியத்துறையினரையும் ஈடுபாடுகொள்ளச் செய்யும் வகையில் நிறுவனத்தை அமைக்க வேண்டுகின்றேன்.
பாராட்டுகளுடன்
-------------------------------------------------------------------------------------------------

எனது ( ந. தெய்வ சுந்தரம் ) விளக்கம்:

அன்புள்ள நண்பர் திரு. திருவள்ளுவர் இலக்குவனார் அவர்களுக்கு, தங்கள் மடல் கண்டேன்.
// கணித்தமிழ்ச்சங்கத்தில் தொடக்கத்தில் தமிழியல்துறையினர் பங்கேற்பிற்கு வழியில்லாமல் இருந்தது. மறைந்த நண்பர் ஆண்டோ பீட்டர்படிப்படியாகப் பங்களிப்பு விகிதத்தை உயர்த்தினார்.ஆனால், இப்போதைய தலைவரான நண்பர் ஆனந்தன், அண்மையில் சிறப்பாக நடத்திய கருத்தரங்கத்தின் பொழுது, வாணாள் உறுப்பினர்களாகத் தமிழ்த்துறையினர் சிலர் வர விரும்புவதைத் தெரிவித்த பொழுது, முழுமைமயும் தொழில்நுட்பம் சார்ந்த அமைப்பாகமாற்ற இருப்பதாகவும் எனவே, தமிழ்த்துறையினர் யாரையும் உறுப்பினராகச் சேர்ப்பதாக இல்லை என்றும் தெரிவித்தார்.//

திரு. ஆனந்தன் அவர்கள் அவ்வாறு சொல்லியிருக்கமாட்டார் என்று இப்போதும் நம்புகிறேன். நான் ஒரு தமிழாசிரியன்தான். ஆனால் கணித்தமிழ்ச்சங்கத்தில் உறுப்பினராக இருக்கிறேன். உத்தமத்திலும் இருக்கிறேன். தமிழ் இணையக் கல்விக் கழகத்திற்கும் என்னால் இயன்ற உதவிகளை அளித்து வருகிறேன்.

மீண்டும் கூறுகிறேன், தமிழ்க்கணினிமொழியியல், மொழித்தொழில்நுட்பம் என்ற துறை கணினியியல், தமிழ்மொழியியல், மொழியியல், புள்ளியியல், கணிதவியல் போன்ற பல்துறைசார்ந்த ஒரு புதிய துறை. மேலும் இத்துறையில் தமிழ்மொழியியல், மொழியியல்துறை சார்ந்தவர்களின் ஆய்வும் பணியும் மிக அடிப்படையானது. கணினிக்கேற்ற தமிழ்மொழி இலக்கணத்தை வடிவமைத்துக் கொடுப்பது அவர்களது பணி. அவர்கள் அவ்வாறு வடிவமைத்துக் கொடுக்கும்போது, வடிவமைப்பு எவ்வாறு இருந்தால் கணினிக்கேற்றதாக இருக்கும் என்பதைக் கூறுகிற பணியிலும் ( புள்ளியியல், கணிதவியல் துறையினர்க்கும் பங்கு உண்டு) , வடிவமைத்தபிறகு அதை நிரலாக்கம்செய்வதிலும் கணினியியல்துறை சார்ந்தவர்களுக்குரிய பணி. இதில் தெளிவு இருந்தால், யார் பணி முக்கியம் என்ற விவாதமே எழாது. எனது குழுவில் மூன்று பேர்கள் தமிழ்மொழி, மொழியியல் துறையைச் சேர்ந்தவர்கள். மூன்றுபேர்கள் கணினியியல் துறையைச் சேர்ந்தவர்கள்.

ஆனால் விரல்விட்டு எண்ணக்கூடிய மிகச் சிலர் ( திரு. வள்ளி ஆனந்தம் அவர்கள் இல்லை) தமிழ்த்துறையைச் சேர்ந்தவர்களைச் சற்றுக் குறைவாக மதிப்பிடுவதை நானே அனுபவித்திருக்கிறேன். தங்களது பணியில் வெறும் ஊறுகாயாகப் பயன்படுத்திக் கொள்ளவேண்டும் என்று நினைக்கிறார்கள். அது அவர்கள் அறியாமை. ஆனால் அந்த ஒரு சிலருக்காகப் பொதுவாகக் கணினியியல்துறையைச் சேர்ந்தவர்களை நாம் குறைசொல்லக்கூடாது. அமெரிக்காவில் உள்ள திரு. வேல்முருகன் சுப்பிரமணியன், பங்களூரில் உள்ள பேரா. ஏஜி இராமகிருஷணன், எஸ் எஸ் என் பொறியில் கல்லூரி பேரா. நாகராசன் , அம்ரிதா பல்லைக்கழகப் பேராசிரியர் திரு. சோமன் உட்பட கணினியியல் நிபுணர்கள் தமிழ்மொழி, மொழியியல் அறிஞர்களின் பங்கை முழுமையாக உணர்ந்துதான் பேசுகிறார்கள்.

ஆனால் ஒன்று உறுதியாகக் கூறுகிறேன்.,தமிழ்மொழியின் அமைப்பை இலக்கணத்தைப் பற்றிய அறிவுடைய தமிழாசிரியர்களின் பெரும்பங்கே தமிழ்க்கணினிமொழியியல்துறையில் இருக்கிறது. கணினிமொழியியல், மொழித்தொழில்நுட்பத்தின் ஆள்களம் ( domain) மொழி , மொழியறிவே. நான் உத்தமத்தின் உறுப்பினராகவும் இருக்கிறேன். கணித்தமிழ்ச்சங்கத்தின் உறுப்பினராகவும் இருக்கிறேன். ஆனால் அதேவேளையில் கடந்த சில ஆண்டுகளாகக் கணினித்தமிழ்ப் பேரவை ஒன்று நிறுவி, அதன் சார்பில் இரண்டு மாநாடுகளையும் நடத்தியிருக்கிறேன். அதன் பணி இனி மேலும் விரிவடையும். இது போட்டிச் சங்கமும் இல்லை. இதில் தமிழாசிரியர்கள், மொழியியல் ஆசிரியர்கள், கணினியியல் ஆசிரியர்கள், ஆர்வலர்கள் எல்லோரும் இருக்கிறார்கள்.

Read more »

புதன், 11 நவம்பர், 2015

Centre for Computational Linguistics ( CCL)

8:22 PM

ந.தெய்வ சுந்தரம்

3 comments

Centre for Computational Linguistics ( CCL)

The present Proposal for the establishment of a Centre for Computational Linguistics (CCL) in Chennai by Government of Tamilnadu is submitted by Prof. N. Deiva Sundaram (Former Director, Linguistic Studies Unit, University of Madras).

The proposal contains the following parts:

1. A brief note on the importance for Computational Linguistics in the development of Indian languages into e-languages (electronic languages)

2. Vision and Mission Statement and Objectives of the proposed CCL.

1. Importance of Computational Linguistics in the development of Indian languages

Information Technology, Communication and Language:

The role of Information Technology in the modern stage of social development ( “ Information Society”) is undisputable. It is getting much importance due to the development of Globalization. Man – Machine (computer) communication is very much needed to undertake most of the tasks in every field from day-to-day activities to huge industrial and business processes.

In this communication, along with the non-verbal means such as charts, pictures, diagrams etc., and the verbal means – that is, language- plays an important and key role. Language is the prime vehicle in which information is encoded, by which it is accessed and through which it is disseminated.

Language is represented in both graphical (“written”) and sound (“spoken”) media. Thus, to interact with the computer, we may have to use both written and spoken forms of any language. That is, we should be able to send our request to the computer and get response either through the written language or through the spoken language or with both.

Then, for the computer to process the natural language – for understanding and generation - , the knowledge of natural language should be provided to it. The capacity of natural language cognition by the human brain should be given to the computer. The science which deals with this is called Natural Language Processing (NLP).

Natural Language Processing (NLP):

Natural Language Processing refers to the interactions between the computer and natural languages, i.e., to make the computer to identify, understand process and to produce the utterances of natural human language, as the human brain does it.

The study of the language processing within human brain is part of the Science of Cognition or Human intelligence. Likewise the language processing by the computer – the NLP – is part of the Science of Artificial Intelligence (AI).

Various components of Natural Language Understanding (NLU) systems convert samples of human language into more computer readable representations such as parse trees or first order logic which are easier for the computers to manipulate and Natural Language Generation ( NLG) systems convert information from computer databases into readable human language.

The knowledge of human language, needed by human brain to engage in complex language behavior, can be separated into the following six categories:

1. Phonetics and Phonology – the study of linguistic sounds

2. Morphology – the study of the meaningful components of words

3. Syntax – the study of the structural relationship between words

4. Semantics – The study of meaning

5. Pragmatics – The study of how language is used to accomplish goals

6. Discourse – The study of linguistic units larger than a single utterance

For the computer, in using the above mentioned linguistic knowledge in NLP, the most important task or problem is to resolve ambiguity at various levels.

Computational Linguistics (CL):

The science which deals with the various theories, models and algorithms to resolve the different ambiguities in Natural Language Processing is called Computational Linguistics (CL).

The above mentioned models and theories are all drawn from the standard toolkits of Computer Science, Mathematics, and Linguistics. Thus, the science of Computational Linguistics is a multi-disciplinary one. It is concerned with the computational aspects of the human language faculty. It has applied and theoretical components.

Theoretical Computational Linguistics:

It develops formal models simulating aspects of the human language faculty and implements them as computer programmes. These programmes constitute the basis for the evaluation and further development of the theories. This forms the theoretical component of Computational Linguistics.

Applied Computational Linguistics / Language Technology (LT):

With the help of the theories, models and algorithms of CL, various applications such as Word processor, Text-to-Speech (TTS), Automatic Speech Recognizer (ASR), Optical Character Recognizer (OCR), and Automatic Machine Translation System are being developed. This comes under the field of Language Technology (LT) or the applied component of Computational Linguistics.

The Science of Computational Linguistics and its applied part “ Language Technology” thus contribute to verify the linguistic theories attempted to represent the human language faculty as well as to help us to communicate with the computer ( “ Man – Machine Interface”) for carrying out various tasks.

For the following, we need the help from Language Technology enterprise:

- to ascertain “ the rights of the people to benefit from the opportunity to easily access and effectively process information” ,

- to help the industries ” in the globalization of the economy to effectively communicate and manage information in an international context

- to offer people “to better communicate, to provide them with the possibility of accessing information in a more natural way, to support more effective ways of exchanging information and control its growing mass” ,

- to provide easy access to multilingual information systems and to offer the possibility to handle the information they carry in a meaningful way.

Language Technology not only helps to develop Man – Machine communication, but also to the communication among us through computer (“Man – Machine – Man”). The later communication could be done either through written language or through spoken language or through both media. Automatic Speech Recognizer (ASR), Text- to – Speech (TTS), Optical Character Recognizer, and Machine Translation are some of the software which are helpful to the above types of communication.

The above communication software are helpful to save our time, energy. Moreover, they are helpful to the differently enabled persons. The OCR and TTS may help the visually challenged persons to hear the digital materials without the help of the human readers. The ASR may help the hearing impaired persons to read the digital speech.

E-language planning for Tamil:

(E-language) Status Planning:

With the advent of globalization, during the later part of 20^th century, time and space reduced considerably and territorial boundaries have lost their literal relevance and the planners were put under the dire need to think globally and to act locally. To meet the urgent need of the globalized environment, the language planners have started realizing that attaining e-language status is mandatory for any modern language that would enable its speech community, an effective participation with the world community in the era of globalization.

A language would attain e-language status only when that language is ready for all communication and information exchange activities through electronic media necessitated by the globalized pressure. To attain the perfect stage, the language should be equipped with all computing tools and systems from Word Processor to Man – Machine Interface.

The attainment of e-language status has become obligatory to avail the benefits of globalization with no further loss of time, otherwise the language with its nation state would be pushed decades back, devoid of its citizens the scientific and technological advancements contributed by the globalized processes.

(E-language) Corpus Planning:

Once e-language status is planned, the natural process that would follow is the Corpus Planning from phonological to discourse level. It is obvious that all tools and facilities namely, computer encoding, font development, Keyboard Development, corpus Development, Morphological Parsers, Syntactic Parsers, Semantic Analyzers, Pragmatic and Discourse Analyzers should be developed in the target language.

(E-language) Acquisition Planning:

Under Acquisition Planning, e-learning facilities namely, e-dictionaries, e-Thesaurus, e-grammars and e-lessons should be developed in the target language to make non-native learners to study or apply the language in the desired field.

E-language Planning and Computational Linguistics:

‘E-language Planning ‘herein refers to the planning of various activities for a natural language to make it suitable for the Language Technology.

‘e-language Planning’ activities may be divided into two parts, namely,

1. To undertake necessary steps to develop various real-world applications with regard to the target language availing the avenues of Language Technology;

2. To undertake necessary research, analysis and development of tools with regard to the target language from the perspective of Natural Language Processing and computational Linguistics.

Computational Linguistics Centres in other parts of the world:

Computational Linguistics and Language Technology have become an important branch in Applied Linguistics. Various theories and models have been developed and applied for various natural languages. In most of the Universities in the developed and developing countries, separate centres for this branch have been started.

The above development has now led the emergence of many innovative applications such Automatic Speech Recognizers, Speech Synthesizers, Machine Translations systems etc for many natural languages. These all contribute much to the present Globalization process. Multinational Governments such as European Parliament, multinational industries such as Ford enjoy the benefits of the achievements from Computational Linguistics and Language Technology. Also the problems emerging from multilingual situation are also getting solved with the help of these developments.

Computational Linguistics and Language Technology help us to solve the problems emerged from the “Digital Divide” also. If computer applications are available in local languages, then it would enable the local people to enjoy the benefit of computers and they will get the privilege of accessing the enormous knowledge gained from Internet.

To achieve the Vision and Mission statements and Objectives mentioned above, the understanding and adoption of the Science of Computational Linguistics and Language Technology are very much essential. Realizing this necessity, the present proposal is submitted for the establishment of a Centre for Computational Linguistics in Chennai, Tamilnadu. It is to be mentioned here, at present there is no such separate centre for Computational Linguistics in Tamilnadu. Thus, the proposed Centre for CL will become a unique one.

Centre for Computational Linguistics (CCL)

Vision Statement:

To make Tamil language into an “e-language” suitable for all language computing tasks to meet out the expectations and challenges of globalization as well as to help in bridging the gap in Tamil society existing due to “Digital divide”.

Mission statement:

1. To undertake the study of linguistic system of Tamil language from the computational linguistic perspective.

2. To present the linguistic system of Tamil language in a form which could be processed by the computer.

3. To develop various language computing tools from Word processor to Man-Machine Interface adopting the present State of the Art of Language Technology.

Objectives:

1. To test and verify various modern linguistic formalisms with the help of computational linguistic theories.

2. To develop computational grammar for Tamil language.

3. To develop various types of linguistically annotated electronic corpus for Tamil language.

4. To develop various language computing tools such as Morphological Parsers, Word-class Taggers, Syntactic Parsers, tools for semantic analysis etc.

5. To develop various types of lexical databases for Tamil language adopting different formalisms such as WordNet, Generative Lexicon etc.

6. To develop various application software for Indian languages from Auto Text-checking tools to automatic Machine Translation system.

7. To develop language teaching materials for e-learning facilities

Major activities of the proposed Centre for Computational Linguistics:

The proposed Centre for CL will involve in the following activities:

1. Research and Development ( R & D)

2. Teaching & Training

3. Industrial Collaboration

4. Coordination with other Institutions

1. Research and Development ( R & D):

The R & D wing of CCL will at the initial stage mainly concentrate on the following:

a) Study of the developments of various formalisms and algorithms in Computational Linguistics

b) Application of the Computational Linguistic formalisms to Tamil language

c) Development of Electronic Corpus with proper linguistic and non-linguistic annotations

d) Development of various linguistic analysis tools such as Morphological Parser, Word-class Tagger , Syntactic Parser, Semantic Analyzer, Word Sense Disambiguate, Concordancer,

e) Development of Lexical Database for Tamil language based on WordNet, Generative Lexicon etc.

f) Development of necessary phonological analysis (including acoustic phonetic analysis) tools useful for the development of Text-to-Speech (TTS), Automatic Speech Analyzer (ASR) as well as TTS and ASR for Tamil language.

g) Development of Auto-text checkers ( Spell-checking, Sandhi checking, Grammar checking etc.,)

h) Development of Machine Translation System for Indian languages.

i) Development of Optical Character Recognizer ( both Off-line and On-line)

j) Development of Electronic dictionaries

k) Development of Computer-aided Language Teaching ( CALT) / Learning( CALL) materials

2. Teaching and Training:

Under this, the following tasks will be undertaken:

a) M.A., and M.Phil., courses in Computational Linguistics

b) Part-time Certificate and Diploma courses in Computational Linguistics

c) Ph.D. Programmes in Computational Linguistics

d) Short-term Training courses for researchers and teachers

3. Industrial collaboration:

Under this, the CCL will help the software industries who are interested in NLP / CL/LT in their industrial ventures in the development of language software systems.

4. Coordination with other Institutions:

There are many institutions in various parts of India involved in Computational Linguistics and Language Technology. In many IITS and Technology Universities, the Computer Science or Electronic Departments have been involved in developing some of the language computing tools. Also there are some Consortiums formed by the Ministry of Human Resources / Ministry of Information Technology for some specific tasks such as Corpus Development, Machine Translation. The proposed Centre at Chennai will take efforts to coordinate with them for further development and for avoiding duplication.

Infrastructure:

The proposed Centre for Computational Linguistics will have the following labs and other service units.

1. Language Technology Lab ( 3 numbers - Corpus Lab, Research Lab, Students Lab)

2. Speech Technology Lab - 1

3. CALL Lab ( Computer-aided Language Learning Lab) - 1

4. Sound Recording Lab - 1

5. Library -1

6. Video Conferencing Hall -1

7. Smart class rooms - 6

8. Meeting Hall - 1

9. Conference Hall -1

10. Faculty members rooms -

11. Research Scholars rooms - 2

12. Recreation rooms ( 2 )

13. Administrative staff rooms - 5

Academic Staff: ( 1 + 11 + 11 + 32 = 55)

Faculty (56)

1. Chair – Director – 1

2. Senior Research Fellow ( Professor cadre) – 4

a. Computational Linguistics -1

b. Language Technology -1

c. Speech Technology -1

d. Corpus Linguistics -1

3. Fellow ( Associate Professor cadre) - 8

a. Computational Linguistics -2

b. Language Technology -2

c. Speech Technology -2

d. Corpus Linguistics -2

4. Associate (Assistant Professor cadre) – 18

a. Computational Linguistics - 3

b. Language Technology -3

c. Speech Technology -3

d. Corpus Linguistics -3

e. Statistical Computational Linguistics – 3

f. Tamil Linguistics – 3

5. Project Fellow - 25

a. Computational Linguistics – 5

b. Language Technology – 5

c. Speech Technology – 5

d. Corpus Linguistics – 5

e. Tamil Linguistics - 5

Adjunct Faculty (8)

a. Senior Research Fellow – 4

b. Fellow - 4

Technical Staff: ( 31)

a. System Administrator – 5

b. System Analyst – 5

c. Speech Lab Engineer - 1

d. Senior Hardware Engineer -1

e. Junior Hardware Engineer -2

f. Programmers - 5

g. Electrical Engineer – 1

h. Data Entry operator – 5

i. Speech Lab Technician -1

j. Electrician – 2

k. Librarian – 1

l. Deputy Librarian – 1

m. Assistant Librarian - 1

Administrative staff: (4 3)

a. Administrative officer – 1

b. Accountant – 1

c. Personal Secretary - 2

d. Section Officer – 5

e. Assistant Section Officer – 5

f. Assistant - 5

g. Steno - 4

h. Typist - 5

i. Sergeant - 2

j. Lab Attender – 5

k. Office Assistant Watchman - 4

l. Watchman -4

m. Sweeper – 3

n. Driver -2

Output at the end of fifth year:

At the end of fifth year, the following works for Tamil language will be completed.

Resource and Research Tools:

1. Linguistically annotated Corpus

2. Parallel Corpus

3. Computational Lexicon

4. Morphological Parser

5. Word-class Taggers

6. Syntactic Parser

7. Semantic Analyzer

8. Acoustic Phonetic analysis

9. Character ( graphemic) analysis

Application software:

1. Unicode Fonts

2. Keyboards

3. Spell-checkers

4. Grammar –checkers

5. Automatic Speech Recognizer ( ASR)

6. Text-to-Speech ( TTS)

7. Optical Character Recognizer ( OCR)

8. Information Retrieval and Extraction

9. Machine Translation System ( Tamil – Hindi – English)

The financial estimate for the above all would be around 50 Crores in total for five years.

Read more »

ந.தெய்வ சுந்தரம்

என்னைப்பற்றி

Language Technology

Recent Posts

சனி, 28 நவம்பர், 2015

பொருள்மயக்கமும் மொழியியலும் - ஒரு எழுத்துவழி உரையாடல்

புதன், 25 நவம்பர், 2015

செயற்பாட்டுமொழியியல் கருத்தரங்கம் - 2015 நவ. 26,27

திங்கள், 23 நவம்பர், 2015

ஐந்தாண்டுகளுக்கு முன்னர் தமிழ்க்கணினிமொழியியல் மையம்குறித்து ....

வெள்ளி, 13 நவம்பர், 2015

கணினித்தமிழ் வளர்ச்சியில் தமிழாசிரியர்களின் பங்கு

புதன், 11 நவம்பர், 2015

Centre for Computational Linguistics ( CCL)

Popular Posts

Archives