Sentiment Classification for Under-Resourced Language Using Word2Vec Neural Network: Amharic Language Social Media Text
Zewdie Mossie and 2
Jenq-Haur Wang, 1
Department of Computer Science and Information Engineering, National Taipei University of Technology, Taiwan and 2
Department of International Graduate Program in Electrical Engineering and Computer Science, National Taipei University of Technology, Taipei, Taiwan
Sentiment classification becomes popular task in social network texts which express opinions on different issue to analyze and produce useful knowledge. However, many linguistic computational resources are available only for English language. In the recent years, due to the emergence of social media platforms, opinion-rich resources are booming abundant for under-resourced languages with the need to perform Sentiment Analysis. On the other hand,most of the existing researches focus on how to extract the effective features, such as lexical and syntactic features,while limited work has been done on semantic features, which can make more contributions to both under-resourced and resourceful languages. In this paper, we proposed sentiment classification based on Word2Vec for Amharic Language text on political domain. The Word2Vec establishes the neural network models to learn the vector representations of words to extract the deep semantic relationships. Firstly, we cluster the similar features together and apply language modeling Ngram to check sentiment-bearing Co-occurring Terms (COT). Word2Vec and TF-IDF were used to learn the word representations as a candidate feature vector. Secondly, The Gradient-Boosting Tree(GBT) and Random forest machine learning classifiers were used to train and test in the Apache Spark platform. In our experiments, we use the Amharic language in Ethiopia and adopt a standard natural language pre-processing techniques on the crawled Facebook datasets to categorize into positive and negative opinions. Experimental results of feature extraction using Word2Vec technique performs better in the GBT classifier achieving an average accuracy of 82.29%. Therefore, our proposed approach can successfully discriminate among posts and comments expressing positive and negative opinions.
Amharic text Sentiment, Word2Vec Semantic, Social Media, Under-resourced Language
Ontological Approach for Knowledge Extraction from Clinical Documents
Raxit Goswami and
Research Department, ezDI Inc, Kentucky, USA
In clinical NLP(Natural Language Processing), Knowledge extraction is a very important task to develop a highly accurate information retrieval system. The various approaches used to develop such systems include rule-based approach, statistical approach, shortest path algorithm or hybrid of these approaches. Accuracy and coverage are the most important parameters while comparing different approaches. Some methodologies have good accuracy but low coverage and vice-versa. In this paper, our focus is to extract domain relationships, for example to extract the relationship between ‘Disease’ and ‘Procedure’ or ‘Symptom’ and ‘Disease’ etc. from the clinical documents using three different approaches. These three approaches are i) Statistical ii) Shortest Path iii) Shortest Path Using Body System. All three approaches use our existing NLP system to extract entities from the unstructured documents. The Statistical approach applies probabilistic algorithm on clinical documents whereas the Shortest Path algorithm uses the Ontological knowledge base for the hierarchical relationship between entities. This Ontological knowledge base is built upon the curated Unified Medical Language System (UMLS). For the Shortest Path Using Body System approach, we have used the domain relationship as well as hierarchical relationship. The output of these approaches is further validated by a domain expert and this validated relationship is used to enrich our ontological knowledge base. We have presented the details of these approaches one-by-one along with the comparative results of these approaches. We finally go through the analysis of the result and conclude on further work.
Knowledge Extraction ,Clinical information retrieval,Relationship Extraction, Clinical Document, Medical knowledge base,Ontology ,Clinical NLP (Natural Language Processing)
Transfer Learning for Recognition of Surgical Workflow
, Kunhua Zhong1,2,3
and Yuwen Chen1,2,3
Chengdu Computing Institute of the Chinese Academy of Sciences, Chengdu, China, 2
Chongqing Institute of Green and Intelligent, Chongqing, China and 3
University of Chinese Academy of Sciences, Beijing, China
Computer-assisted surgery has occupied an important position in modern surgery, further stimulating the progress of methodology and technology. In recent years, a large number of computer vision-based methods have been widely used in surgical workflow recognition tasks. For training this method, a lot of annotated data are necessary. However, the annotation of surgical data requires expert knowledge and thus becomes difficult and time-consuming. In this paper, we focus on the problem of data deficiency and propose a knowledge transfer learning method to compensate a small amount of labeled training data. To solve this problem, we propose an unsupervised method for pre-training a Convolutional De-Convolutional (CDC) network for sequencing surgical workflow frames, which performs convolution in space (for semantic abstraction) and de-convolution in time (for frame level resolution) simultaneously. Specifically, through transfer learning, we only fine-tuned the Convolutional De-Convolutional network to classify the surgical phase. We performed some experiments for validating the model, and it showed that the proposed model can effectively extract the surgical feature and determine the surgical phase. The accuracy, recall, precision of our model can reach 91.4%,78.9%,82.5% separately.
Convolutional De-Convolutional(CDC), transfer learning, surgical phase
Tensorflow 2.0 And Kubeflow for Scalable and Reproducable Enterprise AI
, Holger Kyas2,3
IBM Center for Open Source Data and AI Technologies,San Francisco, CA, USA , 2
Berne University of Applied Sciences, Berne, Switzerland and3
Helvetia Insurance Switzerland, Basel, Switzerland
Towards the End of 2015 Google released TensorFlow 1.0, which started out as just another numerical library, but has grown to become a de-facto standard in AI technologies. TensorFlow received a lot of hype as part of its initial release, in no small part because it was released by Google. Despite the hype, there have been complaints on usability as well. Especially, for example, the fact that debugging was only possible after construction of a static execution graph. In addition to that, neural networks needed to be expressed as a set of linear algebra operations which was considered as too low level by many practitioners. PyTorch and Keras addressed many of the flaws in TensorFlow and gained a lot of ground. TensorFlow 2.0 successfully addresses these complaints and promises to become the go-to framework for many AI problems. This paper introduces the most prominent changes in TensorFlow 2.0 targeted towards ease of use followed by introducing TensorFlow Extended Pipelines and KubeFlow in order to illustra e the latest TensorFlow and Kubernetes ecosystem movements towards simplification for large scale Enterprise AI adoption.
Artificial Intelligence, TensorFlow, Keras, Kubernetes, KubeFlow, TFX, TFX Pipelines
Buliding The First Arabic Dataset for Sentiment Analysis in Syrian Dialect Out of Facebook platform
Nasser Nasser and 2
Ali Arous, 1
Department of Software Engineering, Tishreen University, Latakia, Syria 2
Department of Software Engineering, Tishreen University, Latakia, Syria
Despite not being as competitive as its English counterpart, Sentiment Analysis in Arabic has witnessed a surge of progress in the past few years. However, most of the resources in this area are still either limited in size, domain specific or not publicly available. In this paper, we address the sparsity problem of the available resources for different dialects of Arabic by generating a multi-domain dataset for Sentiment Analysis dedicated for the Syrian Levantine Dialect. The dataset was gathered from Facebook public content, and consists of 10,000 annotated comments collected at posts of different domains, including Education, Sport, Services, Technology and Culture. We have carried out a set of experiments to validate the usefulness of our dataset, in addition to doing feature engineering for the top classifiers. From the experimental results, we highlight useful insights addressing the best performing classifiers and most viable features.
Arabic Text Mining, Sentiment Analysis, Opinion Mining, Modern Standard Arabic, Dialectical Arabic
Accounting narrative obfuscation in financial statements
Jörg Hering, Jens Hölscher and Phyllis Alexander, Department of Accounting, Finance and Economics, Bournemouth University, 89 Holdenhurst Road, Bournemouth BH8 8EB, United Kingdom
The study examines the presence and success of accounting narrative obfuscation in financial statements filed with the United States Securities and Exchange Commission (SEC). Based on more than 50,000 "Footnotes" sections in annual reports on Form 10-K submitted between 1993 and 2016, the study finds that company officials are not able "bury" negative corporate information in financial statements. Using textual sentiment analysis, the study provides evi- dence that capital market participants are well aware of the information content disclosed in the "Footnotes" sections of annual reports. Measuring "Key Word Density" (disclosure tone) in the notes to the financial statements ("Item 8"), the study reveals that investors react to changes in textual characteristics and adjust their market expectations accordingly. In addition, it is shown that investors react to changes in this subsection of the annual report much stronger and in a timelier fashion than to changes in the entire Form 10-K filing. Furthermore, the results indicate that company officials report truthful information in the "Footnotes" sections of an- nual reports representing accurate corporate disclosures.
Analysis of Echo Characteristics for Time – Varying Scatterers
and Dejun Feng3
State Key Laboratory of Complex Electromagnetic Environmental Effects on Electronics and Information System, National University of Defense Technology, Changsha, China, 2
College of Electrical Science, National University of Defense Technology,Changsha, China
Phase modulation technique is that the phase information of signal varies proportionally with a modulated signal, which is commonly applied in the field of communications. The current processing method mainly uses the active devices to intercept, modulate and repeat, but the devices are complicated and require a certain processing time. In this paper, Phase modulation method based on phase-switched screen (PSS) is studied and the echo characteristics are analyzed. Meanwhile, the realization of PSS time-varying modulation is discussed. Simulation results are utilized to demonstrate the effectiveness of the proposed method.
Linear frequency modulation (LFM), frequency spectrum shifting, phase-switched screen (PSS)
Improvement of Intrusion Detection System’s Accuracy Using Gradient Boosting Trees on Kyoto 2016 Dataset
and Morihiro Hayashida2
Erectrical Group - WORKS Co., Ltd, Masuda, Shimane, Japan
National Institute of Technology, Matsue College, Matsue, Shimane, Japan
As computers become more widespread, they are exposed to threats such as cyber-attacks. In recent years, attacks have gradually changed, and security softwares must be frequently updated. Network-based intrusion detection systems (NIDSs) have been developed for detecting such attacks. It, however, is difficult to detect unknown attacks by the signature-based NIDS that decides whether or not an access is abnormal based on known attacks. Hence, Kyoto 2016 dataset was constructed for the evaluation, and machine learning methods including support vector machines and random forests were applied to the dataset. In this paper, we examine a deep neural network and gradient boosting tree methods additionally, and perform computational experiments on Kyoto 2016 dataset. The results suggest that gradient boosting tree method XGBoost outperforms other machine learning classifiers, and the elapsed time for the classification is significantly shorter.
Network-based Intrusion Detection System, Gradient Boosting Tree, Neural Network