My research interests lie at the intersection of machine learning and natural language processing. I am particularly interested in word embedding, part of network architectures for statistical language modelling, which can transform words or phrases from a certain corpus into vectors of real numbers for language modelling purposes. Currently, I am working on cross-lingual word embedding, which consists of embedding multiple languages into a single semantic space. Nonetheless, I am not restricted to the topic mentioned above, I also enjoy working with multiple languages, or more specifically, connecting languages and transferring resources from one language to another.
Daejeon, South Korea
Research Assistant at Users & Information Lab, KAIST
Research Assistant at Semantic Web Research Center(SWRC), KAIST
Research Assistant at Intelligent Information System Lab, Korea University
Internship at HSBC Bank, at the IT department, Algeirs, Algeria
Volunteering at FOREM, a NGO that does Humane action and solidarity in Algeirs, Algeira
Ph.D in computer science at Korea Advanced Institute of Science and Technology (KAIST)
Master in computer Science at Korea University
B.S. in mathematics and computer sciences at University of Science and Technology Houari Boumediene (USTHB)
Data processing, Statistical analysis, Programming, Database management, etc.
English (Fluent), French (Fluent), Arabic (Fluent), Korean (Intermediate)
Java, Python, Matlab, etc.
My research interest lies in the area of natural language processing (NLP) where I have worked on statistical approaches and computational models to extract semantic information from text. I am particularly interested in closing the gap between commonly spoken languages like English or other European languages, for which there exist an abundance of NLP resources and technologies, and minority languages that often lack even the most basic NLP resources and tools such as Arabic and Korean.
During my masters, I worked at the Intelligent Information System Laboratory (IIS Lab) as research/student assistant. My research area was focused on NLP tasks and Machine Learning while I prepared myself to pursue a doctorate. In the laboratory, I was involved in several projects such as The Brain Korea 21 Plus (BK21+) , a human resource development program funded by the Korean Ministry of Education.
My master thesis was about retrieving snippets from search engines and embedding them using Latent Semantic Analysis (LSA), then clustering the output vectors using Agglomerative Hierarchical Clustering (AHC). I published a paper on that by the end of my master; the general idea behind the publication was to explain how to use LSA efficiently for clustering purposes and topic analysis.
My first year of doctoral studies, I was part of the Semantic Web Research Center (SWRC) at KAIST. During that time, I completed several tasks that extracted knowledge from resourceful languages, such as English, and transfered it to a less resourceful one such as Korean or French. One of my main tasks was Entity Linking. The entity linking module is a statistical model that relied on a knowledge base (KB) to extract entities from text and identify them as a person, organization, or place, and then link them to their corresponding links in the KB. Most of these entities are very ambiguous, and thus they need a strong model to make sure every entity is correctly linked using its context. I also took part in a Hackathon called Open Knowledge Base and Question-Answering (OKBQA) that has been around for the past 3 years and consists of knowledge base construction and application. My contribution was the integration of the Entity Linking to the platform, which has been used on the Question Answering task. I have also adapted a stat-of-the-art Entity Linking model called AGDESTIS on the korean language for comparison purposes.
In the second year of my doctorate, I shifted my focus to a less application oriented laboratory and thus I have joined the Users & information Laboratory (U&I Lab) . My current research topic is about multi-lingual word embedding, where we embed several languages into a single space and from that space we could retrieve relatedness between two tokens. Using this method I have been able to collaborate with other researchers to work on different tasks such as topic analysis. One of the current collaborations involves clustering political tweets from three different languages (English, French, German) and then analyse the results to find politicians that have similar point of views.
Finally, my goal is to work on new languages and develop/improve new techniques that could serve as a bridge between researches on different languages from machine translation to all sort of other NLP tasks that rely on multilingual environment.