Tuesday, January 31, 2012

Spelling Corrector Algorithm - Java Code

I was amazed at how Google does spelling correction so well. Type in a search like 'speling' and Google comes back in 0.1 seconds or so with Did you mean: spelling. Impressive!

In one of my past projects, I was working on a RealEstate infrastructure where we had to support spelling corrections for City/State searches. At that point, we used Solr's spelling corrector. But I wanted to dig into the algorithm and write something simple and works without Solr. Just did some research online, went through some papers publishings and wrote an algorithm that serves what we are getting from Solr.

The code maintains a static dictionary of correct spellings in Hashmap, which we load on-boot. Below is the algorithm written in Java.

It works in O(n), n being the length of the word.

Spelling Corrector - Java code


package main.java.algo;

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

// References
//  - http://norvig.com/spell-correct.html
//  - http://raelcunha.com/spell-correct.php
//  - http://surguy.net/articles/scala-spelling.xml


// Components

//- dictionary (pre populated), map with word to weight
//- edits : deletes, transpose, replace, inserts
//- candidates : loop edits and populate. map to weight to word

public class SpellingCorrector {
 
 public static void main(String[] args) {

  dictionary.put("spelling", 1);
  dictionary.put("ashish", 1);
  dictionary.put("param", 1);
  
  String input = "spiling";
  String correct = correct(input);
  System.out.println(correct);
 }
 
 
 // word to count map - how may times a word is present - or a weight attached to a word
 public static Map<String, Integer> dictionary = new HashMap<String, Integer>();

 public static String correct(String word) {
  
  if(dictionary.containsKey(word))
   return word;
  
  // getting all possible edits of input word
  List<String> edits = edits(word);
  
  // Here we can either iterate through list of edits and find the 1st word that is in dictionary and return.
  // Or we can go to below logic to return the word with most weight (that we would have already stored in dictionary map)
  
  // map to hold the possible matches
  Map<Integer, String> candidates = new HashMap<Integer, String>();

  // keep checking the dictionary and populate the possible matches
  // it stores the count (or weight) of word and the actual word
  for(String s : edits) {
   if(dictionary.containsKey(s)) {
    candidates.put(dictionary.get(s), s);
   }
  }
  
  // one we have found possible matches - we want to return most popular (most weight) word
  if(candidates.size() > 0)
   return candidates.get(Collections.max(candidates.keySet()));
  
  
  // If none matched.
  // We will pick every word from edits and again do edits (using same previous logic) on that to see if anything matches
  // We don't do recursion here because we don't the loop to continue infinitely if none matches
  // If even after doing edits of edits, we don't find the correct word - then return.
  
  for(String s : edits) {
   
   List<String> newEdits = edits(s);
   for(String ns : newEdits) {
    if(dictionary.containsKey(ns)) {
     candidates.put(dictionary.get(ns), ns);
    }
   }
  }
  if(candidates.size() > 0)
   return candidates.get(Collections.max(candidates.keySet()));
  else 
   return word;
 }
 
 // Word EDITS
 // Getting all possible corrections c of a given word w. 
 // It is the edit distance between two words: the number of edits it would take to turn one into the other
 
 public static List<String> edits(String word) {
  
  if(word == null || word.isEmpty())
   return null;
  
  List<String> list = new ArrayList<String>();
  
  String w = null;
  
  // deletes (remove one letter)
  for (int i = 0; i < word.length(); i++) {
   w = word.substring(0, i) + word.substring(i + 1); // ith word skipped
   list.add(w);
  }
  
  // transpose (swap adjacent letters)
  // note here i is less than word.length() - 1
  for (int i = 0; i < word.length() - 1; i++) { // < w.len -1 :: -1 because we swapped last 2 elements in go.
   w = word.substring(0, i) + word.charAt(i + 1) + word.charAt(i) + word.substring(i + 2); // swapping ith and i+1'th char
   list.add(w);
  }
  
  // replace (change one letter to another)
  for (int i = 0; i < word.length(); i++) {
   for (char c = 'a'; c <= 'z'; c++) {
    w = word.substring(0, i) + c + word.substring(i + 1); // replacing ith char with all possible alphabets
    list.add(w);
   }
  }
  
  // insert (add a letter)
  // note here i is less than and EQUAL to
  for (int i = 0; i <= word.length(); i++) { // <= because we want to insert the new char at the end as well
   for (char c = 'a'; c <= 'z'; c++) {
    w = word.substring(0, i) + c + word.substring(i); // inserting new char at ith position
    list.add(w);
   }
  }
  
  return list;
 }
 
}

73 comments:

  1. Thank you very much very helpful indeed

    ReplyDelete
  2. // one we have found possible matches - we want to return most popular (most weight) word
    if(candidates.size() > 0)
    return candidates.get(Collections.max(candidates.keySet()));

    I think this step would serve better if you use the Bayes theorem to find which correction is more likely to occur instead of just assuming the most frequent word in the dictionary to be the correction.

    ReplyDelete
  3. Dictionary populated can be in anothe class.

    ReplyDelete
  4. Thank you for sharing such a nice story with us. I really enjoyed very much And please keep updating like this with this site.


    Java Training

    ReplyDelete
    Replies
    1. I have read your blog its very attractive and impressive. I like it your blog.

      Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

      Java Online Training Java Online Training JavaEE Training in Chennai Java EE Training in Chennai

      Delete
  5. Great work, programming was very easily understand and more important coding are provided on this post and this is very valuable in my studies,all coding easily understand and develop more skills,thanks for sharing this post.
    php training

    ReplyDelete
  6. Another great articles and very interesting to read,thanks for sharing that wonderful useful information,given programming coding was very excellent and i can easily observe all provided information.
    sas

    ReplyDelete
  7. your blog is really helpful and useful it really helped me during coding process and it is useful thanks for sharing those information.

    digital marketing training in chennai

    ReplyDelete
  8. Thank you for your post. This was really an appreciating one. You done a good job. Keep on blogging like this unique information with us.

    SAS Training in Chennai

    ReplyDelete
  9. This only considers words at an edit distance of 1 from the original word, is that correct?

    ReplyDelete
  10. Great blog..Step by step explanation is too good to understand..Its very useful for me to understand..Keep on sharing..
    SEO Training in Chennai

    ReplyDelete
  11. Wow amazing i saw the article with execution models you had posted. It was such informative. Really its a wonderful article. Thank you for sharing and please keep update like this type of article because i want to learn more relevant to this topic.

    SAP Training in Chennai

    ReplyDelete
  12. Thank you for sharing the explanation which is related with spelling corrector. It is much essential for typing through the document. So please keep update like this.

    SAP Training in Chennai

    ReplyDelete
  13. thanks for sharing wonderful information keep sharing more.
    Software Testing Training in Chennai

    ReplyDelete

  14. Really very nice blog information for this one and more technical skills are improve,i like that kind of post.



    SAP training in Chennai

    ReplyDelete
  15. Thanks for the good words! Really appreciated. Great post. I ve been commenting a lot on a few blogs recently, but I had nt thought about my approach until you brought it up.

    SAP training in Chennai

    ReplyDelete

  16. All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.

    SAP training in Chennai

    ReplyDelete
  17. Thanks you for sharing the unique content. you have done a great job. thank you for sharing such a unique content.
    Seo Company in Chennai

    ReplyDelete
  18. Great and useful article. Creating content regularly is very tough. Your points are motivated me to move on.


    SEO Company in Chennai

    ReplyDelete
  19. how to do phrase and sentence spell check
    I mean context-sensitive

    ReplyDelete
  20. You can easily fix the sentence by correcting the sentence. This site professionally helps a variety of things that greatly appeals to writers who needs some assistance in the completion of their documents like correct my sentence grammar service.

    ReplyDelete
  21. I really appreciate the information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in TECHNOLOGY , kindly Contact MaxMunus
    MaxMunus Offer World Class Virtual Instructor-led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 1,00,000 + training in India, USA, UK, Australia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Pratik Shekhar
    MaxMunus
    E-mail: pratik@maxmunus.com
    Ph:(0) +91 9066268701
    www.MaxMunus.com

    ReplyDelete
  22. wow really superb you had posted one nice information through this. Definitely it will be useful for many people. So please keep update like this.
    Open Span Training Institute in Chennai | Best Open Span Training in Velachery | Microsoft Excel Training in Chennai

    ReplyDelete
  23. Really it was an awesome article...very interesting to read..You have provided an nice article....Thanks for sharing..
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  24. I really enjoyed your post..thanks for sharing fantastic post..Keep sharing your post regularly..
    Java Training in Chennai | Web Designing Training Institute in Chennai | DotNet Training Institute in Chennai

    ReplyDelete
  25. Those guidelines additionally worked to become a good way to recognize that other people online have the identical fervor like mine to grasp great deal more around this condition.

    java training in bangalore

    ReplyDelete
  26. Thanks for sharing this information, it helped me a lot in finding valuable resources for my career.
    IEEE Project Center in Chennai | IEEE Project Center in Velachery

    ReplyDelete
  27. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

    oracle training in Bangalore

    ReplyDelete
  28. Thank you for sharing such a nice and interesting blog with us. In your blog, I had a chance to get some useful and unique information. I would like to suggest your blog in my dude circle. Please keep on updates. Hope it might be much useful for us. Keep on updating...
    IEEE Project Center in Chennai | IEEE Project Center in Velachery

    ReplyDelete
  29. Nice and good article.. it is very useful for me to learn and understand easily..
    Robotics Project Center in Chennai | IEEE Robotics Project Center in Chennai

    ReplyDelete
  30. I found some useful information in your blog.
    Oracle Training in Chennai
    In your blog more technical skills are improve.Thanks for the infrmation.
    Selenium Training in Chennai
    thanks for the information.

    ReplyDelete
  31. Those guidelines additionally worked to become a good way to recognize that other people online have the identical fervor like mine to grasp great deal more around this condition.

    oracletraininginbangalore

    ReplyDelete
  32. Nice and good article.. it is very useful for me to learn and understand easily.. thanks for sharing your valuable information and time.. please keep updating more.Cloud Computing Project Center in Chennai | IEEE Cloud Computing Projects in Velachery

    ReplyDelete
  33. Hi, i really enjoyed to read your article.. i got clear idea through your views and ideas.. thanks for sharing your post..
    Image Processing Project Center in Chennai | Image Processing Project Center in Velachery

    ReplyDelete
  34. Impressive blog with lovely information. really very useful article for us thanks for sharing such a wonderful blog...
    Best Mobile Computing Project Center in Chennai | Mobile Computing Project Center in Velachery

    ReplyDelete
  35. Awesome blog. Your articles really impressed for me, because of all information so nice and unique..
    JAVA Summer Course training Institute in Chennai|JAVA Summer Course training Institute in St. Thomas Mount

    ReplyDelete
  36. Thanks for posting your interesting and informative post with useful content.keep updating
    Summer Courses in Chennai | Summer Camp Training in Saidapet | Summer Classes in Saidapet

    ReplyDelete
  37. Great article, really very helpful content you made. Thank you, keep sharing.

    Best Software Company in USA | Austere Technology Solutions

    ReplyDelete
  38. keep sharing your information regularly for my future reference. This content creates a new hope and inspiration with in me. Thanks for sharing article like this. Graphic Designing Summer Courses in Velachery | Graphic Designing Summer Courses in Chennai | Graphic Designing Summer Courses in Taramani

    ReplyDelete
  39. It is a one of the great discussion which is very essential for me as well. I must follow the handy discussion and sure that the content will be very useful to me as well. Keep it up. Multimedia Training Institute in Chennai | MatLab Training Institute in Chennai | PCB Training Institute in Chennai | Hardware & Networking Training in Chennai

    ReplyDelete
  40. You have done a great job, really the concept of big data was superb, its very interesting and easy to understand also.. Keep updating such a nice blog.. Linux Exam Center in Chennai | CCNA Exam Center in Chennai | CCNP Exam Center in Chennai | Tally ERP9 Exam Center in Chennai

    ReplyDelete
  41. Great Blog.. The information you shared is very effective for learners I have got some important suggestions from it, Thanks for sharing such a nice blog.
    UIPath Exam Center in Chennai | Automation Anywhere Exam Center in Chennai | Blue Prism Exam Center in Chennai

    ReplyDelete
  42. Awesome Blog with informative concept. Really I feel happy to see this useful blog, Thanks for sharing such a nice blog. Linux Exam Center in Chennai | CCNA Exam Center in Chennai | CCNP Exam Center in Chennai

    ReplyDelete
  43. Awesome Blog with useful Concept. you shares such an Excellent Topic which I searched long days. Kindly Keep Blogging.. UIPath Certification Exam Center in Chennai | BluePrism Exam Center in Chennai | Automation Exam Center in Chennai

    ReplyDelete
  44. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

    aws training in chennai

    advanced aws training in chennai

    ReplyDelete
  45. Awesome blog. Your articles really impressed for me, because of all information so nice and unique.
    Android Project Center in Chennai | Android Project Center in Velachery

    ReplyDelete
  46. It’s great to come across a blog every once in a while that isn’t the same out of date rehashed material. Fantastic read.

    Digital Marketing Training in Mumbai

    Six Sigma Training in Dubai

    Six Sigma Abu Dhabi

    ReplyDelete
  47. Wow...Excellent informative blog, really helpful. Thank you.

    Best CMA Training in hyd | ISFS

    ReplyDelete
  48. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Good discussion. Thank you.
    Anexas
    Six Sigma Training in Abu Dhabi
    Six Sigma Training in Dammam
    Six Sigma Training in Riyadh

    ReplyDelete