🔖 Master's Thesis from the year 2017 in the subject Computer Science - Applied, grade: 4.6/5, Eötvös Loránd University, course: Master's Degree in Computer Science, language: English, abstract: This thesis proposes a general pipeline architecture for one-on-one dialogues extraction from many different IRC channels to extend the state of art work for the Ubuntu IRC channel. Further more, this thesis takes the advantage of the results from the pipeline and evaluates ESA on different extracted dialogues.
The power of an intelligent program to perform its task well depends primarily on the quantity and quality of knowledge it has about that task. Advanced techniques and applications in Artificial Intelligence are highly depending on data which at the same time getting highly increased and are available over the web. However, for a computer to be able to manipulate information, the latter should be in a form that makes it easy for a computer to manipulate. That is, many available unstructured data need to be collected and post-processed in order to create structured information from the unstructured ones. Recent advances in Data-Driven Dialogue Systems made use of the Ubuntu published IRC channel conversations to extract one-on-one dialogues to use in Deep Learning methods. A best response task performed by a Dialogue System can make use of a trained model on such dialogues. In addition, techniques in Natural Language Processing like Semantic Analysis had a remarkable progress, Wikipedia-Based Explicit Semantic Analysis (ESA) is an example, where the problem of interpretation was improved for both Polysemy and Synonymy.