Introduction

 

 

Text

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

mining
is basically defined as conversion of huge data or documents into useful
numbers. Text mining is used to analyze useful or meaningful information from
raw data with use of various algorithms and patterns in the data. Text mining
is used for unstructured data or Semi structured data such as Emails, text
message. It used to filter out spam message in emails by identifying certain
text common is such emails. After certain information retrieval from the
data/documents this data is used in data mining projects (clustering and
factoring, graphics, predictive data mining).

Text
Mining is the same as Data Mining except for the fact that Text Mining works on
raw or unstructured text such as Emails, HTML or

Full
Text Documents while Data mining works on structured data.

 

Some
Common aspects of Text Mining include removing certain keyword like “THE”,
punctuation marks etc. from the important data to improve search quality. We
will learn about it in preprocessing text

 

Text
Mining is used in various Educational, Research and Industrial purposes such as
Social Media, Research Papers, and Sentimental Analysis etc.

                    

 

 

 

PREPROCESSING
TEXT

NEED
FOR PREPROCESSING TEXT

1)
To Reduce the Size of Text Document

 i) To eliminate words according to their
frequency.

ii)
It is used to eliminate common words or stop words like “the” “and”, etc.

2)
To Improve Efficiency and Performance of Information Retrieval System in Text
mining

3)
It can save Administrator significant amount of time and space resources.

 

 

 

 

WAYS OF PREPROCESSING TEXT

1)    Tokenization

Tokenization
is the process of deciphering textual content into meaning full words, terms or
symbols which are known as tokens. These words are differentiated using full
stops, commas, and whitespaces. Tokenization is dependent on the languages used
for English language Tokenization is a simple task while for languages like
Chinese, Korean it’s a difficult task to perform.

Eg:à”TEXT
MINING IS THE PROCESS OF RETRIVAL OF IMPORTANT INFORMATION FROM UNSTRUCTURED
DATA”.

Output:à”
TEXT,MINING,IS,THE,PROCESS,OF,RETRIVAL,OF
,IMPORTANT,INFORMATION,FROM,UNSTRUCTURED,DATA”.

 

 

2)
Stop Word Removal

 The Major aim of stop word removal is to make
reduce the dimensionality of the text by removing certain prepositions,
articles, pre-nouns those are not necessary for text mining. This reduces text
data significantly and helps in optimizing the data. The list of stop words is
available online . Another way of building a stop word list based on frequency
of word in a number of Documents.

Some
methods of Stop Word Removal are:-

i)                  
Term Based Random Sampling
(TBRS)  

ii)               
Zipf’s  Law

iii)             
Mutual Information Method

iv)             
Based on Precompiled List

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Application of Text
Mining

 

 

The
main objective of text mining is to reduce time utilization and filtering out
unnecessary data from the main keywords or important data. It is used to
provide better services to the users by giving proper feedback. It is used to
by businesses to analyze consumer base and provide services accordingly by
targeting the potential customers.

 

 

1)    SPAMMING
IDENTIFICATION

 

As Filtering based
on IP address is not sufficient certain techniques of Text Mining are uses to
detect salting. Salting is basically adding certain information to make it look
like original or official content. Email service providing companies uses text
mining to filter out spam messages, promotional message from the rest of
important messages thus saving users time and resources. This can be used for
further filtering out messages according to the suitable age group. It is used
to provide protection against phishing and spamming.

                                                           

2)    SENTIMENT
ANALYSIS

 

Sentiment Analysis
is used to identify positive, negative or neutral reviews about a subject.
Consider a watching a TV SERIES based on the reviews of viewers. The text used
in writing reviews is analyzed and according to the keywords used the emotion
of the user is identified which can be used for marking them as positive or
negative reviews of the show. It also focuses on the words and phrases to
identify how negative or positives these words are. 

Consider this
Statement -“I LOVED THE NEW MOBILE. BUT IT IS VERY EXPENSIVE AND DOES NOT HAVE
GREAT BATTERY LIFE”.

According to the
first line the customer seems impressed but the overall the customer has a
negative impression of the product.

Sentiment Analysis
are used to give indication about products such as while reading reviews about
a hotel you come across a word ROTTEN this

Create a negative
impression about the hotels.

                                                

3)    IN
BIOMEDICAL DOMAINS

 

Year by Year the
numbers of researches in medical fields are increasingly significantly thus the
necessity of text mining is evident text mining is used for quickly sorting out
the necessary data from medical record which are available. IN FIELDS like
Cancer treatment text mining means improvising diagnostics, treatment, and
prevention of cancer by mining of database.

Another important
use of text mining is mining EHR (Electronic Health Record) is used to search
the patients previous records of certain diseases and medical history.

Text Mining is used
in for comparing gene markers with the previous

Records and
identifying different pattern in genes for checking diseases.

 

4)    SOCIAL
MEDIA PLATFORMS

 

Social media are a
rich form of Unstructured Data. Social media is used connecting people i.e.
interactions and conversations. Some of these well known platforms are twitter,
facebook, orkut. Data can be gathered using APIs.

 

 

x

Hi!
I'm Erica!

Would you like to get a custom essay? How about receiving a customized one?

Check it out