Namuk Parichayapedam
Vaaku2Vec
State-of-the-Art Language Modelling and Text Classification in Malayalam Language
Enthanu Vaaku2Vec?
Vaaku2Vec ennathu bhaasha mathruka nirmaanathinum vachana vibhajanathinum upayogikkavunna oru vaaku embedding library aanu.
Vachanavibha... enth?
Vaaku embedding ennal krithrima bhudishakti (artificial intelligence) undakkunna reethikalil onnanu. Vaakukal upayogichitulla paschchathalangal padichathinu shesham ee arivu ganithathile vector enna roopa maathrukayil computer-inu sugamamaayi manasilaakavunna roopathil aakunnu. Vaakukalude paschchathalathinu purame mattu chila swabhava ghadakangalum ithinu sahaayikkuvanayi upayogikkarund. Udaharanathinu:
“Richu air gun unbox cheythu” Enna vaakyathil “air gun” enna padham Richuvinum unbox ennathinum idayil varunna vivaram computer ee vachanam vaayichathinu shesham bhaavi pravarthanangalk aayi sookshikunnu.
Ath ithiri katti aayipoyi, ennalum enthokkeyo pidikitti ennu thonunnu. Ithevideyanu upayogikunnath?
Ingane labhicha vector data pala reethiyilum upayogikkavunnathanu. Amazon website-il nammal search cheyunna vasthukkalodu saamyamulla vasthukkale haajaraakan ee reethy upakarikkum. Nammude smartphone-ukalil kaanunna Siri, Alexa thudangi nammude keyboard suggestions-il adutha vaaku ethanennu kandupidikunnidath vare ith application kandethiyitund.
Ithellam vaayikkumbol Google search cheyumbol ithinte upayogam undo ennu ningalude chintha poyenkil ningal sheriyaaya reethiyil thanneyaanu chinthichath. Ithinte uthbhavam thanne Google-il aanu.
Ith kollallo! Aara ith undakiyath?
Aashcharyaleshamanye Google Labs-ile research-il ninnu thanneyaanu ee product-inte uthbhavam. Thomas Mikolov-um team-um chernulla 2013-le paper-ilaanu ee saankethika reethy aadyamaayi avatharikkapedunnath. Ithanu aa paper: Distributed Representations of Words and Phrases and their Compositionality (2013)
Ee blog post-il prathipaadikkunna Vaaku2vec aavatte Kamal K Raj, Adam Shamsudeen ennivar chernnu vikasipich eduthathaanu. Kamal-um Adam Shamsudeen-um IndicNLP yude ankangalaanu. 2019 adyaamaayanu ithinte uthbhavam.
Alla appo ee Word2Vec ullapol enthina Vaaku2Vec?
Ithinte Github repo-il parayunnathu pole Malayalam inflections-um agglutinations-um ulla bhaashayaanu. Athayath:
ഇത് (this) + ആണ് (is) ennullath Malayalathil ഇതാണ് (this is) enn aaki maatamallo.
Ithinoth pravarthikkanayi ee algorithangale chitta peduthendathu aavashyamaanu. Ee joliyaanu Kamal-um Shamsudeen-um nirvahichitullath. Ithil upari ee algorithangale pala Malayala vivarashekharangalilum payatti theliyikkukayum (text classification) ivar cheythitund.
Adipoli, appo ithevidunnu kittum?
pinne ithinte oru demo ee website-il und.
Ith njan download cheythu. Ini enth cheyyanam?
Adya padi ithine patti nalla graahyamundakkukayaanu. Athinaayi njngal ee blog post ezhutan paryanaveshanam nadathiyappol kittiya oru link panku veykukkayaanu:
Illustrated Word2VecIth manasilaaki kazhinjaal ningalku puthiya aashayangal manassil theliyukayaanenkil ava pinthudarukayo allenkil ee project-inte TODO section-il ezhuthiyitulla aethenkilum karthavyam poorthiyaakukayo cheyyam.
The Manglish version was contributed by Sreeram Venkitesh
Itharathilulla vaarthakal udanadi ariyaan Maker Broadcast subscribe cheyyuka