REPORTABSTRACTMachine Learning an Artificialalgorithm tends to be pretty sophisticated. It gives the computers the abilityto learn from the surrounding data and make decisions. Instead of buildingheavy machines we have built such algorithms that eventually helps to decreasethe number of complex algorithms and helps the computer become independent. Insuch cases pattern recognition becomes one the most important challenge faced.
It is used by most of the algorithms to make optimized decisions. It is mainlya study of how to observe the environment, distinguish between what should beconsidered amongst the whole environment and to take particular decision basedon the observations. This report talks about different machine learningtechniques. Also the pattern recognition process, design cycle, applicationsand models.INTRODUCTIONDifferent types of machines havedifferent machine learning algorithms, building these algorithms is a challengefor the scientists.
Different algorithms give different learning experience tothe machines. It certainly doesn’t depend on the nature completely but also thedata structures used as well as the theories of cognitive and geneticstructures. Many of them are borrowed from the current neural networks andcognitive sciences. Overall, learning is to improve performance based on somemeasure defined to know if the machine has learned something. We have two maintypes of algorithms that is supervised and unsupervised algorithm.
Humans have developed high abilities to sensethe environment like recognizing handwriting, taste, colour, faces etc. We needto make the machines analyze the same. Pattern recognition was developed in the1960s. But in spite of all these years of research the goal of designing ageneral purpose pattern recognizer is still not accomplished. SUPERVISEDLEARNINGSupervised algorithm perceives boththe input as well as the output and generalizes in a way that it can be used byall possible inputs. After analyzing the training data it produces an immediateexample which can be used to map new examples. It follows the following steps: 1) Determining the training examples types – Theuser should know what kind of data should be used for the training set.
2) Gathering a training set – An input -output set is gathered.3) Determining the features of the input oflearned function – The learned function accuracy depends on the representationof the input. The input object is transformed into a feature vector containingfeatures describing the input object.4) Determining the learning function structureand the corresponding learning algorithm.5) Completingthe design of the data and running the learning algorithm on the same.6) Evaluating the learned function accuracy. Supervisedlearning are of 2 categories:1) Classificationalgorithm applies to just nominal responses with few values.2) Regressionfor responses that are a real number.
Thesupervised algorithms are as follows: SVM-SVMs are models used to analyze data for classification and regressionanalysis. It has good speed and memory usage when the vectors are few. Evenwhen the default linear scheme is easy to interpret while using a kernel it isdifficult to know how o=the data is being classified. NaïveBayes – It has good speed and memory usage for simple distributions but is poorfor large datasets and kernel distributions. NearestNeighbor – Nearest neighbor can have either of categorical or continuouspredictors at a time.
It has poor predictions with high dimensions and alsodoes not perform fitting with linear search.DiscriminantAnalysis – It is accurate when the modelling assumptions are satisfied else theaccuracy varies. UNSUPERVISEDLEARNINGThe machine receives input but doesnot obtain the target output nor the rewards from the environment. But it candevelop a framework based on the knowledge it develops from the environment.
Theunsupervised algorithms are as follows: 1. Hierarchical clustering – Vectors are given as input and a dendogram isreturned as output. It creates a multilevel cluster tree.2. K-means structuring – It is more efficient than hierarchicalclustering. In this algorithm each observation is classified into differentclusters depending on it’s nearest mean. SUPERVISED VS UNSUPERVISED LEARNING 1) Insupervised algorithm the classes are predetermined whereas in unsupervised thebasic task is to develop classification labels.
2) Insupervised algorithm the data can be divided into segments and then the machinesearches patterns and mathematical models based on the data. In unsupervisedalgorithm the data is divided into clusters based on their similarities.3) Insupervised algorithm the models are evaluated on the basis of their predictivecapacity in relation to measures of variance in the data itself. Whereas inunsupervised the machine is told in advance how many clusters should be formed. PROBLEMS ENCOUNTERED DURING LEARNING Learningdepends on the machine and the algorithm completely. Since the machines relieson the information it perceives from the environment, the machines should beready to face the challenges it comes across. Such problems affects thelearning process of the machine.
As different input gives different output itbecomes important to take into consideration appropriate and optimize output bythe machine. The problems faced during learning are:1) BIAS-The machine tends to prefer one hypothesis over another. Say for example wehave two agents N ad P. Since both the agents have their own hypothesispredicted by taking all the data into consideration, it becomes difficult forthe learning agent to distinguish between which one is the best. Till thelearning agent cannot choose between the two hypothesis, the agent cannotresolve the disagreement. In order to come to a conclusion, the agent needs abias.
A good bias is the one which works best in the practical environmentasking which hypothesis suits the best to the data.2) NOISE- Inthe real world, data can never be perfect(without noise) . Noise is createdwhen some of the attributes have missing values, have been assignedinappropriate values. Handling these noises becomes important for the learningalgorithm3) PATTERN RECOGNITION – Patternrecognition is used in the classification of objects (2D or 3D) and abstractmultidimensional patterns into categories.
There are many pattern recognitionsystems for character and handwriting, speech and speaker recognition, document,fingerprint, white blood cell classification, military target recognition. Themachines train the pattern recognition techniques to identify objects forsorting, inspection, and assembly. The design of a pattern recognition systemrequires the following modules: sensing, feature extraction and selection,decision making, and system performance evaluation. The availability of lowcost and high resolution sensors (e.g., CCD cameras, microphones and scanners) anddata sharing over the Internet have resulted in huge repositories of digitizeddocuments (text, speech, image and video). Need for efficient archiving andretrieval of this data has fostered the development of pattern recognitionalgorithms in new application domains. PATTERN RECOGNITION GOALS1) Hypothesizethe models that describethe two populations.
2) Processingthe data to get rid of the noise in it.3) Choosethe model that best represents the pattern. AREAS IN PATTERN RECOGNITION1) Template matching:- Thepattern to be recognized is matched against a stored template while takingintoaccount all the translation, rotation and scale changes.2) Statistical pattern recognition:-It focuses on the statistical properties of the patterns 3) Artificial Neural Networks:-It focuses on biological neural models.4) Syntactic Pattern Recognition:-It’s decisions are based on logical rules and grammars. STEPS INVOLVED IN PATTERN RECOGNITION1) Dataacquisition and sensing: Measurements of physical variables,Important issues: bandwidth, resolution, sensitivity, distortion, SNR, latency,etc. 2) Pre-processing: Removingnoise from the data, separate the patterns of interest from the background. 3) Feature extraction: Finding a newrepresentation in terms of features.
4) Model learning and estimation:Learning to map between features, pattern and categories. 5) Classification:Using features and learned models for assigning pattern to a category 6) Post-processing: Evaluating confidencein decisions, Exploitation of context to improve performance, Combination of experts. ISSUESFOR DESIGNING THE SYSTEM OF PATTERN RECOGNITION-Definition of pattern classes.-Sensing environment.-Pattern representation.-Feature extraction and selection.
-Cluster analysis.-Selection of training and test examples.-Performance evaluation. DESIGNINGPATTERN RECOGNITION SYSTEM: Designingthe pattern has the following steps:Step1) Data collection: First step is to collect our training and test data and thequestion arises ifthe data collected has adequate set of values or not. Step2) Feature selection: In this step we study the data in terms of it’s domain dependence and prior information, it’scomputational cost and feasibility, values having patterns,values having different patterns, invariant features with respect totranslation, rotation and scale, robust features with respect to occlusion,distortion, deformation, and variations in environment..Step3) Model Selection:- In this step we select the model based on the followingcriteria:- It’s domain dependence and prior information, Design criteria,parametric and non-parametric models, handling features with missing values andalso it’s computational complexity.
Thevarious models are:- Templates, theoretic or statisticaldecision, syntactic or structural, neural, and hybrid.Using these models we can identify hoe close we are to the final model havingthe underlying patterns.Step4) In this phase we decide how to learn the rules from the provided data.Learningbeing of 2 types:-Supervisedlearning – Here a categorical label is provided for each and every pattern inthe training set.
Unsupervisedlearning – The machine itself forms clusters and groups based on the inputpatterns. Reinforcementlearning – Here the agent provides a feedback of the decision is right or wrongeven when the category is not initially designed.Step5) Evaluation – This is the final step in which we need to evaluate how we canestimate the performance of the training dataset in the present and also in thenear future. And also evaluate the problems faced due to over fitting. PATTERN RECOGNITION MODELS Techniques for analyzing multidimensional dataof various types and scales along withalgorithmsfor projection, dimensionality reduction, clustering and classification of dataisexplained.
Pattern recognition models can be designed using the following steps: 1) Templatematching – For template matching the patterns are represented in the form of pixels,curves etc. and the recognition function used is correlation between thepatterns and the distance measure. The typical criterion for this approach isthe classification error.2) Statisticalpattern recognition – For statistical pattern recognition the patterns are representedin the form of features of the patterns and the recognition function used is thediscriminant function. The typical criterion for this approach is the classificationerror.3) Syntacticor Structural – For statistical pattern recognition the patterns are representedin the form of primitives of the patterns and the recognition function used arethe rules and the grammar. The typical criterion for this approach is the acceptanceerror.
4) Neural network – For statistical patternrecognition the patterns are represented in the form of pixels, features of thepatterns and the recognition function used is the network function. The typicalcriterion for this approach is the mean square error. PATTERNRECOGNITION APPLICATIONS Patternrecognition has it’s application in the following areas:· machine learning· statistics· mathematics· computer science· biology Some examples of pattern recognitionapplications are as follows:· Bioinformatics – It is used in sequenceanalysis with DNA/Protein sequence as the input.
Here the pattern classes areknown types of genes.· Data mining – It is used in searchingmeaningful patterns with points in the multidimensional space as the input.Here the pattern classes are Compact and well as separatedclusters.· Document Image Analysis – It is used in opticalcharacter recognition with document image as the input. Here the pattern classesare alphanumericcharacters, word.· Document classification – It is used inthe internet search with text document as the input. The patterns are classifiedin semantic categories.
· Industrial automation – It is used inprinted circuit board inspection with intensity image as input. The patternclasses are either defective or non-defective depending on the nature of thepattern.· Multimedia database retrieval – Internet searchingis one of the major application having video clips as input and patterns classifiedon the basis of video genres.
· Biometric recognition – Personal identificationuses biometric recognition and has fingerprints, iris, face as input and thepatterns are classified based on authorized users with access control to thosebiometrics.· Remote sensing – Remote sensing applies in forecasting thecrop yields with a multi spectral image as an input and classes in the form ofland usage and growth patterns of the crop.· Speech recognition – The telephonedirectory uses speech recognition after receiving the speech wave form andforms classes based on the spoken words.
· Medical – Computer aided diagnosis use patternrecognition with microscopic images. · Military – Automatic target recognitionhas classes in the form of target type and optical / infrared image as input.· Natural language processing – It is usedin information extraction with sentences as input and pattern classes as partsof speech. STATISTICALPATTERN RECOGNITIONStatisticalpattern recognition is used to cover all stages of an investigation fromproblem formulation and data collection through to discrimination andclassification, assessment of results and interpretation.Fewbasic terminologies are described below: Steps involved in the statistical patternrecognition: 1) Formulatingthe problem – Understanding completely the aim of investigating and alsoplanning the remain stages in the entire process.2) Datacollection – Recording details of the data collection procedure and measuring allthe appropriate variables.
3) Initialexamination of the data – Verify the data, calculate the summary statistics andproduce the plots in order to get the structure.4) Featureselection / feature extraction – Select variables from the sets that are appropriatefor the given task which are gained from the either linear of non lineartransformation of the original set. This feature extraction is artificial tosome extent. 5) Unsupervisedpattern classification / clustering – We analyze the data and provide a successfulconclusion to our study and also it acts as a pre procesing for the supervised learning.6) Applydiscrimination or regression procedures as appropriate – Here the classifier isdesigned using the training set.7) Assessmentof results – The trained classifier isapplied to the independent test set of patterns that are labeled.8) Interpretation– To analyze the results we need further hypothesis that need further datacollection. This cycle can be terminated at different stages: The hypothesisposed can be answered at the initial study of the data or maybe it is laterdiscovered that the data cannot answer the stated hypothesis and hence it hasto be reformulated.
Statisticalpattern recognition approachInthis approach all the patterns are represented in the form of d features thatare viewed as a point in the d-dimensional space. The main aim is to select thefeatures in different categories having pattern vectors so that they can capturecompact and d-dimensional feature space. The separation of different patterns fromthe classes determine how effective the representation space is. Afterobtaining training data from different classes the main aim is to generatedecision boundaries that separate the patterns that belong to differentclasses. In statistical decision theory approach we generate the decisionboundary depending on the probability distribution of the patterns belonging todifferent classes and these boundaries should be either specified or learnt.The discriminate analysis approach can also be used for classification where wefirst form a decision boundary in the parametric form and then based on the training patterns we choose thebest decision boundary. These boundaries are created using the mean square errorcriterion.
According to Vapnik’s philosophy “If we give a restricted amount of datato solve some problem and try to solve such problem but never try to solve amore generic problem then we can never conclude based on the informationprovided as it is insufficient.” RESULTS& DISCUSSION.Patternrecognition is a field of study developing significantly from 1960s. It wasvery much aninterdisciplinarysubject, covering developments in the areas of statistics, engineering,artificialintelligence,computer science, psychology and physiology, among others. It has huge numberof applications in the field of Bioinformatics, Data Mining, DocumentClassification, Document Image Analysis, and Industrial Automation, Multimedia,Database retrieval, Biometric recognition, Remote sensing, Speech recognition,Medical, Military, Natural language processing. CONCLUSIONMeasuringthe performance of learning algorithms and some classifiers have been seen andanalyzing the evaluation methods with metrics they use to measure theperformance by defining formal framework. We have concluded that theperformance of the classifiers are measured on the basis of the classificationaccuracy.
Some methods can be used to evaluate classifier or algorithm ingeneral while some others are applicable only to few algorithms. We have alsoseen how pattern recognition is important in the field of artificialintelligence. It is emerging as human beings have their own limits inrecognizing patterns. The report also shows how the statistical approach coversvarious stages of investigation from formulating the data to interpreting theresults.