Abstract-Thispaper surveys literature of detecting phishing attacks. Phishing costs Internetusers billions of dollars per year. It refers to exploiting weakness on theuser side, which is vulnerable to such attacks. The phishing problem is huge andthere does not exist only one solution to minimize all vulnerabilitieseffectively, thus multiple techniques are implemented.
In this paper, wediscuss three approaches for detecting phishing websites. First is by analyzingvarious features of URL , second is by checking legitimacy of website byknowing where the website is being hosted and who are managing it, the thirdapproach uses visual appearance based analysis for checking genuineness ofwebsite. We make use of Machine Learning techniques and algorithms forevaluation of these different features of URL and websites.
In this paper, anoverview about these approaches is presented.Keywords- phishing, security,blacklist, whitelist, URL, anti-phishing, web-pageI. IntroductionPhishing is a social engineering attack that aims atexploiting the weakness found in the system at the user’s end. For example, asystem may be technically secure enough for password theft but the unaware usermay leak his/her password when the attacker sends a false update passwordrequest through forged (phished) website. For addressing this issue, a layer ofprotection must be added on the user side to address this problem. Due to broadnature of phishing problem, the survey begins by categorizing anti-phishingsolutions as:a) UsingBlacklist approach: In computing, ablacklist or a block list is a basic access control mechanism that allowsthrough all elements (URLs, email addresses, users, domain name, etc.) exceptthose explicitly mentioned. Those items on list are denied access.
b) UsingWhitelist approach: This works exactlyopposite to blacklist approach, the items on the list are allowed to accesswhatever gate we are using.c) UsingHeuristic approach: In this approach, asignature database of known attacks will be built and used by antiviral systemsor intrusion detection systems to scan a web page. The websites will beconsidered as phishing websites if the heuristic patterns of the websites matchthe signatures in the database.d) UsingVisual similarity approach: Generally, the phishing websites have similarvisual appearance but different URL’s.
A user can easily become victim of thephishing attack by looking at the high visual resemblance of phishing websitewith the target legitimate website such as page layout, images, text content,font colour, font size.An attackerfools the user by multiple ways: 1) Visual Appearance:The attacker copies the HTML source code of the legitimate site to build hisown fake website. The fake website exactly resembles to the genuine website.2) Embedded Objects:The attacker uses embedded objects such images and scripts to hide the textualcontent and HTML coding from the phishing detection techniques.3) Address Bar:The attacker usually cover the URL with images or scripts so that user cannotidentify the URL.4) Favicon Icon:It is a shortcut icon, website icon, tab icon, URL icon, or bookmark icon thatis a file, which contains one or more small icons. These icons are associatedwith a particular website.
If the favicon shown in the address bar is otherthan the current website, then it is considered as a phishing attempt. II. Literaturesurvey A.
Blacklistapproach: a) In 11, Joby James,Sandhya L, Ciza Thomas proposed the following:1) Blacklistmembership: A large percentage of the phishing websites are present in theblacklists. In case of browsing context, the blacklists are the databases thatcontain IP addresses, domain names and URL’s of the malicious websites thatshould be avoided .Whereas whitelists contain sites that are safe. 2) DNSbased blacklists:In this the user submits a query representing the IP address or the domain nameto the blacklist provider’s DNS server, and the response is an IP address thatgenerates whether the query was present in the blacklist.
b) Zheng developed asystem where some individuals who chose to contribute data (blacklist) to acentralized sharing infrastructure. Then a single blacklist was generated byusing ranking scores provided blacklists and other scores generated for each ofthe blacklist contributor.c) In 12, Mohsen Sharifiand Seyed Hossein Siadatiproposed a method, which generated and maintained alist of phishing websites. This takes the website and extracts details of itlike to whom the company name belongs. These details are then used as input toa search engine, which gives us the top search results. The domain webpage isthen compared with domains, which are result of the search engine to check ifit is legitimate.
The name of this page will be automatically added to the listif it is identified as phishing.d) Shengdid a study, which showed the effectiveness of the anti-phishing tools, whichworked, based on blacklists. It was observed that the tools were ineffective inprotecting users against fresh phishes at zero hour. It was found that thesetools along with some heuristics caught more phishes at zero hour than onlyusing blacklists.e) In 13, Pawan Prakash,Manish Kumar, Ramana Rao Kompella, Minaxi Gupta (2010) proposed a predictiveblacklist approach to detect phishing websites. It identified new phishing URLusing heuristics and by using an appropriate matching algorithm. Heuristicscreated new URL’s by combining parts of the known phished websites from theavailable blacklist. The matching algorithm then calculates the score of URL.
If this score is more than a given threshold value it flags this website asphishing website. The score was evaluated by matching various parts of the URLagainst the URL available in the blacklist.