Abstract—As the speedy growth of Twitter in social media, researchers get attracted towards the use of social media data for analysis. Twitter is one of the widely used social media platform to express thoughts. This paper approaches for analyzing tweets using sentiment mining in which classify highly unstructured data on Twitter and second is data mining. There are 500 million users of Twitter. The limit of character in twitter is 140 character because of this user uses shorthand notation example “ok” can be used as “kk”. Analysis of tweets contains misspellings and grammatical error. Keywords—Social network, natural language processing, Machine Learning, sentiment analysis and data mining, I. IntroductionNow a day’s social media plays important role in modern life.
Online Social media such as Twitter, Facebook, and many enterprise social media, have become very much popular in the last few years. People are spending a large amount of time on social media to interact with people. The number of people who use social media increasing day by day. People tweets their opinion, view, thought or event on social media and also share their post.
User posts their comment and others can follow them. Twitter now has become the most popular online microblogging service. twitter the user to send image and text-based posts up to 140 characters. social media are a medium of analytics on a huge amount of user data for many companies based on which many prediction models are built .this prediction models help to step-up new business or new ideas. But this is the positive side of social media.
On the other side, people share their personal information on social media site and their information is misused .the social media is the easy target for distributing fuck and wrong information. people comment related to the crime and the post is increase the violence in public .the crime Detection system(CDS) Detect, the post is related to crime or not.
If the post is related to crime the further Classify into the subtype of crime.1) crime against the person 2) crime against the property 3) crime against the country 4) other Two approaches are used in the CDS .one is sentiment mining for detecting the crime directly from the post or comment and other is data mining for structured data and history data to find out the intensity of the crime.Sentiment mining is used for unstructured data and real-time data. Data mining is the practice of examining large preexisting databases in order to generate new information.
CDS system help to reduce the crime. The remaining of the paper proceeds as follows: Section 2 shows the related works in the crime detection research area; Section 3 shows the architecture and proposed system; Section 4 describes The observation and result; Finally, Section 5 represents the conclusions. II. related workThe previous work 1 focuses on the streaming data on twitter which is classified as the tweet is Malicious or not. Crime prediction had been a trending research field using social posts and data analysis. The tweets are fetched using the Twitter API and then analyses it using machine learning. Then the data pre-processing is done2 .the stop word removing is done with the help of Stanford NLP Libraries 5.
the twitter comment contains misspellings, elisions, and grammatical errors3, to make the sense of the twitter comment it transforms them into a canonical form, consistent with the dictionary or grammar. Analysing data from these social media sites is one of the new buzzwords for many business strategies, Technical concepts, World health issues, Election campaigns, inventions, Entertainment, all can be handled by using sentimental analysis. The sentiment mining system identification of tweet without knowing the previous background. Sentiment mining uses the negation algorithm to an identification of comment is positive or negative. Text mining aims to accurately extract, identify and analyze information from unstructured data sources.
The past studies of aggressive behavior on an uncomfortable day show clear correlations between location, day, time and criminal activities extracting specific tweet attributes6 like username, location, time, re-tweet count etc. using the attributes find the intensity of crime. for data mining, the Naive Bayes algorithm is used. III. Proposed SystemThe proposed system Collection data from the Twitter social networking site and processes data using NLP techniques. We are using two approach one is sentiment mining and other is data mining.
Sentiment mining is used for unstructured data and real-time data. As data mining is used for structured data and history data.