A Survey of Social Network - Word Embedding Approach for Hate Speeches Detection

Authors

  • Bayu Nugroho Universitas Islam Negeri Sunan Ampel

DOI:

https://doi.org/10.29080/systemic.v7i2.1771

Keywords:

Word embedding, hate speeches, online detection

Abstract

Word embedding is a technique to represent sentences in vector space. The representation itself is carried-out to build a model that would suffice in representing a particular task related to the use of the sentence itself, for example, a model of similarity among sentences/words, a model of Twitter user connectivity, and demographics of tweets model. The use of word embedding is a handful to the sentiment analysis research because it helps build a mathematical-friendly model from sentences. The model then will be suitable as feeds for the other computational process.

Downloads

Download data is not yet available.

References

Ed Mazza. https://www.huffingtonpost.com.au/entry/twitterracism-study n 4786283 racism on Twitter, by the numbers, 2014.

Mohammed Hasanuzzaman, Ga¨el Dias, and Andy Way. Demographic word embeddings for racism detection on Twitter. In Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27 – December 1, 2017 - Volume 1: Long Papers, pages 926–936, 2017.

T. Vu and D. S. Parker. Node embeddings in social network analysis. In 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 326–329, Aug 2015.

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 701–710, New York, NY, USA, 2014. ACM.

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW '15, pages 1067–1077, Republic and Canton of Geneva, Switzerland, 2015. International World Wide Web Conferences Steering Committee.

Long Jin, Yang Chen, Tianyi Wang, Pan Hui, and A.V.Vasilakos. Understanding user behavior in online social networks: a survey. Communications Magazine, IEEE, 51(9):144–150, September 2013.

Soroush Vosoughi, Prashanth Vijayaraghavan, and DebRoy. Tweet2vec: Learning tweet embeddings using character-level CNN-LSTM Encoder-Decoder. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '16, pages 1041–1044, New York, NY, USA, 2016. ACM.

Aditya Grover and Jure Leskovec. Node2vec: Scalable feature learning for networks. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, pages 855–864, New York, NY, USA, 2016. ACM.

Mainack Mondal, Leandro Ara´ujo Silva, and Fabr´ıcio Benevenuto. A measurement study of hate speech in social media. In Proceedings of the 28th ACM Conference on Hypertext and Social Media, HT '17, pages 85–94, New York, NY, USA, 2017. ACM.

St´ephan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, and Walter Daelemans. A dictionary-based approach to racism detection in dutch social media. CoRR, abs/1608.08738, 2016.

Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL '12, pages 873–882, Stroudsburg, PA, USA, 2012—Association for Computational Linguistics.

Yoon Kim. Convolutional neural networks for sentence classification. CoRR, abs/1408.5882, 2014.

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Stroudsburg, PA, October 2013. Association for Computational Linguistics.

https://en.wikipedia.org/wiki/list of ethnic slurs list of ethnic slurs, 2018.

http://rsdb.org/ racial slur database, 1999.

https://www.crowdflower.com training data, machine learning and human-in-the-loop for a.i., 2018.

https://www.cs.waikato.ac.nz/ml/weka/ weka the university of waikato, 2018.

Timothy Quinn. https://www.hatebase.org/ world's largest online repository of structured, multilingual, usage-based hate speech, 2018.

FBI.https://www.fbi.gov/investigate/civilrights/hate-crimes hate crimes – FBI, 2018.

liwc.http://dx.doi.org/10.1075/dujal.6.1.04boo the dutch translation of the linguistic inquiry and word count (liwc) 2007 dictionary, 2017.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.

http://scikit-learn.org machine learning in python, 2018.

George A. Miller. Wordnet: A lexical database for English. Commun. ACM, 38(11):39–41, November 1995.

Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web, WWW '01, pages 406–414, New York, NY, USA, 2001. ACM.

Andriy Mnih and Geoffrey Hinton. A scalable hierarchical distributed language model. In Proceedings of the 21st International Conference on Neural Information Processing Systems, NIPS'08, pages 1081–1088, USA, 2008. Curran Associates Inc.

Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. Cnn features off-the-shelf: An astounding baseline for recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW '14, pages 512–519, Washington, DC, USA, 2014. IEEE Computer Society.

Ronan Collobert, Jason Weston, L'eon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493–2537, November 2011.

Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL '05, pages 115–124, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics.

Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics, ACL '04, Stroudsburg, PA, USA, 2004. Association for Computational Linguistics.

Xin Li and Dan Roth. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics - Volume 1, COLING '02, pages 1–7, Stroudsburg, PA, USA, 2002—Association for Computational Linguistics.

Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pages 168–177, New York, NY, USA, 2004. ACM.

Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL '03, pages 423–430, Stroudsburg, PA, USA, 2003—Association for Computational Linguistics.

Richard Socher, Cliff Chiung-Yu Lin, Andrew Y. Ng, and Christopher D. Manning. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML'11, pages 129–136, USA, 2011. Omnipress.

Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 1201–1211, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.

Quoc V. Le and Tomas Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 1188–1196, 2014.

Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '08, pages 990–998, New York, NY, USA, 2008. ACM.

Lei Tang and Huan Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 817–826, New York, NY, USA, 2009. ACM.

Lei Tang and Huan Liu. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM '09, pages 1107–1116, New York, NY, USA, 2009. ACM.

Lei Tang and Huan Liu. Leveraging social media networks for classification. Data Min. Knowl. Discov., 23(3):447–478, November 2011.

Sofus A. Macskassy and Foster Provost. A simple relational classifier. In Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM-2003) at KDD-2003, pages 64–76, 2003.

Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC'07), San Diego, CA, October 2007.

Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J. Smola. Distributed large-scale natural graph factorization. In Proceedings of the 22Nd International Conference on World Wide Web, WWW '13, pages 37–48, New York, NY, USA, 2013. ACM.

Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, NIPS'01, pages 585–591, Cambridge, MA, USA, 2001. MIT Press.

Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding. SCIENCE, 290:2323–2326, 2000.

Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.

S. Yan, D. Xu, B. Zhang, H. j. Zhang, Q. Yang, and

S. Lin. Graph embedding and extensions: A general

framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1):40–51, Jan 2007.

A. Guille and C. Favre. Mention-anomaly-based event detection and tracking in twitter. In 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), pages 375–382, Aug 2014.

Downloads

Published

2022-12-31

How to Cite

Nugroho, B. (2022). A Survey of Social Network - Word Embedding Approach for Hate Speeches Detection. Systemic: Information System and Informatics Journal, 7(2), 36–41. https://doi.org/10.29080/systemic.v7i2.1771

Issue

Section

Articles