Tuesday, January 31, 2012

Spelling Corrector Algorithm - Java Code

I was amazed at how Google does spelling correction so well. Type in a search like 'speling' and Google comes back in 0.1 seconds or so with Did you mean: spelling. Impressive!

In one of my past projects, I was working on a RealEstate infrastructure where we had to support spelling corrections for City/State searches. At that point, we used Solr's spelling corrector. But I wanted to dig into the algorithm and write something simple and works without Solr. Just did some research online, went through some papers publishings and wrote an algorithm that serves what we are getting from Solr.

The code maintains a static dictionary of correct spellings in Hashmap, which we load on-boot. Below is the algorithm written in Java.

It works in O(n), n being the length of the word.

Spelling Corrector - Java code

```package main.java.algo;

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

// References
//  - http://norvig.com/spell-correct.html
//  - http://raelcunha.com/spell-correct.php
//  - http://surguy.net/articles/scala-spelling.xml

// Components

//- dictionary (pre populated), map with word to weight
//- edits : deletes, transpose, replace, inserts
//- candidates : loop edits and populate. map to weight to word

public class SpellingCorrector {

public static void main(String[] args) {

dictionary.put("spelling", 1);
dictionary.put("ashish", 1);
dictionary.put("param", 1);

String input = "spiling";
String correct = correct(input);
System.out.println(correct);
}

// word to count map - how may times a word is present - or a weight attached to a word
public static Map<String, Integer> dictionary = new HashMap<String, Integer>();

public static String correct(String word) {

if(dictionary.containsKey(word))
return word;

// getting all possible edits of input word
List<String> edits = edits(word);

// Here we can either iterate through list of edits and find the 1st word that is in dictionary and return.
// Or we can go to below logic to return the word with most weight (that we would have already stored in dictionary map)

// map to hold the possible matches
Map<Integer, String> candidates = new HashMap<Integer, String>();

// keep checking the dictionary and populate the possible matches
// it stores the count (or weight) of word and the actual word
for(String s : edits) {
if(dictionary.containsKey(s)) {
candidates.put(dictionary.get(s), s);
}
}

// one we have found possible matches - we want to return most popular (most weight) word
if(candidates.size() > 0)
return candidates.get(Collections.max(candidates.keySet()));

// If none matched.
// We will pick every word from edits and again do edits (using same previous logic) on that to see if anything matches
// We don't do recursion here because we don't the loop to continue infinitely if none matches
// If even after doing edits of edits, we don't find the correct word - then return.

for(String s : edits) {

List<String> newEdits = edits(s);
for(String ns : newEdits) {
if(dictionary.containsKey(ns)) {
candidates.put(dictionary.get(ns), ns);
}
}
}
if(candidates.size() > 0)
return candidates.get(Collections.max(candidates.keySet()));
else
return word;
}

// Word EDITS
// Getting all possible corrections c of a given word w.
// It is the edit distance between two words: the number of edits it would take to turn one into the other

public static List<String> edits(String word) {

if(word == null || word.isEmpty())
return null;

List<String> list = new ArrayList<String>();

String w = null;

// deletes (remove one letter)
for (int i = 0; i < word.length(); i++) {
w = word.substring(0, i) + word.substring(i + 1); // ith word skipped
}

// note here i is less than word.length() - 1
for (int i = 0; i < word.length() - 1; i++) { // < w.len -1 :: -1 because we swapped last 2 elements in go.
w = word.substring(0, i) + word.charAt(i + 1) + word.charAt(i) + word.substring(i + 2); // swapping ith and i+1'th char
}

// replace (change one letter to another)
for (int i = 0; i < word.length(); i++) {
for (char c = 'a'; c <= 'z'; c++) {
w = word.substring(0, i) + c + word.substring(i + 1); // replacing ith char with all possible alphabets
}
}

// note here i is less than and EQUAL to
for (int i = 0; i <= word.length(); i++) { // <= because we want to insert the new char at the end as well
for (char c = 'a'; c <= 'z'; c++) {
w = word.substring(0, i) + c + word.substring(i); // inserting new char at ith position
}
}

return list;
}

}

```

1. Thank you very much very helpful indeed

2. // one we have found possible matches - we want to return most popular (most weight) word
if(candidates.size() > 0)
return candidates.get(Collections.max(candidates.keySet()));

I think this step would serve better if you use the Bayes theorem to find which correction is more likely to occur instead of just assuming the most frequent word in the dictionary to be the correction.

3. Dictionary populated can be in anothe class.

4. This is the best lead I've come across on how to build a grammar checker.

5. Thank you for sharing such a nice story with us. I really enjoyed very much And please keep updating like this with this site.

Java Training

Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

Java Online Training Java Online Training JavaEE Training in Chennai Java EE Training in Chennai

6. Great work, programming was very easily understand and more important coding are provided on this post and this is very valuable in my studies,all coding easily understand and develop more skills,thanks for sharing this post.
php training

7. Another great articles and very interesting to read,thanks for sharing that wonderful useful information,given programming coding was very excellent and i can easily observe all provided information.
sas

8. your blog is really helpful and useful it really helped me during coding process and it is useful thanks for sharing those information.

digital marketing training in chennai

9. Thank you for your post. This was really an appreciating one. You done a good job. Keep on blogging like this unique information with us.

SAS Training in Chennai

10. This only considers words at an edit distance of 1 from the original word, is that correct?

11. Great blog..Step by step explanation is too good to understand..Its very useful for me to understand..Keep on sharing..
SEO Training in Chennai

12. Wow amazing i saw the article with execution models you had posted. It was such informative. Really its a wonderful article. Thank you for sharing and please keep update like this type of article because i want to learn more relevant to this topic.

SAP Training in Chennai

13. Thank you for sharing the explanation which is related with spelling corrector. It is much essential for typing through the document. So please keep update like this.

SAP Training in Chennai

14. thanks for sharing wonderful information keep sharing more.
Software Testing Training in Chennai

15. Really very nice blog information for this one and more technical skills are improve,i like that kind of post.

SAP training in Chennai

16. Thanks for the good words! Really appreciated. Great post. I ve been commenting a lot on a few blogs recently, but I had nt thought about my approach until you brought it up.

SAP training in Chennai

17. All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.

SAP training in Chennai

18. Thanks you for sharing the unique content. you have done a great job. thank you for sharing such a unique content.
Seo Company in Chennai

19. Great and useful article. Creating content regularly is very tough. Your points are motivated me to move on.

SEO Company in Chennai

20. Excellent Article

21. how to do phrase and sentence spell check
I mean context-sensitive

22. Excellent Article...
AWS training in chennai

23. You can easily fix the sentence by correcting the sentence. This site professionally helps a variety of things that greatly appeals to writers who needs some assistance in the completion of their documents like correct my sentence grammar service.

24. I really appreciate the information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in TECHNOLOGY , kindly Contact MaxMunus
MaxMunus Offer World Class Virtual Instructor-led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 1,00,000 + training in India, USA, UK, Australia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
Pratik Shekhar
MaxMunus
E-mail: pratik@maxmunus.com
Ph:(0) +91 9066268701
www.MaxMunus.com

25. wow really superb you had posted one nice information through this. Definitely it will be useful for many people. So please keep update like this.
Open Span Training Institute in Chennai | Best Open Span Training in Velachery | Microsoft Excel Training in Chennai

26. Really it was an awesome article...very interesting to read..You have provided an nice article....Thanks for sharing..
Android Training in Chennai
Ios Training in Chennai

27. I really enjoyed your post..thanks for sharing fantastic post..Keep sharing your post regularly..
Java Training in Chennai | Web Designing Training Institute in Chennai | DotNet Training Institute in Chennai

28. Those guidelines additionally worked to become a good way to recognize that other people online have the identical fervor like mine to grasp great deal more around this condition.

java training in bangalore

29. Thanks for sharing this information, it helped me a lot in finding valuable resources for my career.
IEEE Project Center in Chennai | IEEE Project Center in Velachery

30. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

oracle training in Bangalore

31. Thank you for sharing such a nice and interesting blog with us. In your blog, I had a chance to get some useful and unique information. I would like to suggest your blog in my dude circle. Please keep on updates. Hope it might be much useful for us. Keep on updating...
IEEE Project Center in Chennai | IEEE Project Center in Velachery

32. Nice and good article.. it is very useful for me to learn and understand easily..
Robotics Project Center in Chennai | IEEE Robotics Project Center in Chennai

33. I found some useful information in your blog.
Oracle Training in Chennai
In your blog more technical skills are improve.Thanks for the infrmation.
Selenium Training in Chennai
thanks for the information.

34. Those guidelines additionally worked to become a good way to recognize that other people online have the identical fervor like mine to grasp great deal more around this condition.

oracletraininginbangalore

35. Nice and good article.. it is very useful for me to learn and understand easily.. thanks for sharing your valuable information and time.. please keep updating more.Cloud Computing Project Center in Chennai | IEEE Cloud Computing Projects in Velachery

36. I am so happy to read such a wonderful post..Thank you...Embedded Project Center in Chennai | Embedded Project Center in Velachery

37. Interesting post! This is really helpful for me. I like it! Thanks for sharing!
No.1 Image Processing Project Center in Chennai | No.1 Image Processing Project Center in Velachery

38. Hi, i really enjoyed to read your article.. i got clear idea through your views and ideas.. thanks for sharing your post..
Image Processing Project Center in Chennai | Image Processing Project Center in Velachery

39. Impressive blog with lovely information. really very useful article for us thanks for sharing such a wonderful blog...
Best Mobile Computing Project Center in Chennai | Mobile Computing Project Center in Velachery

40. Nice Article..Good Content... Share the Post...Thank u...
Austere Technologies |Internet Of Things

41. Awesome blog. Your articles really impressed for me, because of all information so nice and unique..
JAVA Summer Course training Institute in Chennai|JAVA Summer Course training Institute in St. Thomas Mount

42. Thanks for posting your interesting and informative post with useful content.keep updating
Summer Courses in Chennai | Summer Camp Training in Saidapet | Summer Classes in Saidapet

43. VERY INFORMATIVE BLOG. KEEP SHARING SUCH A GOOD ARTICLES.

Digital Transformation Services | Austere Technology Solutions

44. This is really great informative blog. Keep sharing.

Best Mobility Services | Austere Technology Solutions

45. Great article, really very helpful content you made. Thank you, keep sharing.

Best Software Company in USA | Austere Technology Solutions

46. Nice blog with excellent information. Thank you, keep sharing.

Best Cloud Solutions | Austere Technology Solutions

47. keep sharing your information regularly for my future reference. This content creates a new hope and inspiration with in me. Thanks for sharing article like this. Graphic Designing Summer Courses in Velachery | Graphic Designing Summer Courses in Chennai | Graphic Designing Summer Courses in Taramani

48. wow...nice blog, very helpful information. Thanks for sharing.

Best Software Security Services | Austere Technology Solutions

49. It is a one of the great discussion which is very essential for me as well. I must follow the handy discussion and sure that the content will be very useful to me as well. Keep it up. Multimedia Training Institute in Chennai | MatLab Training Institute in Chennai | PCB Training Institute in Chennai | Hardware & Networking Training in Chennai

50. You have done a great job, really the concept of big data was superb, its very interesting and easy to understand also.. Keep updating such a nice blog.. Linux Exam Center in Chennai | CCNA Exam Center in Chennai | CCNP Exam Center in Chennai | Tally ERP9 Exam Center in Chennai

51. Great Blog.. The information you shared is very effective for learners I have got some important suggestions from it, Thanks for sharing such a nice blog.
UIPath Exam Center in Chennai | Automation Anywhere Exam Center in Chennai | Blue Prism Exam Center in Chennai

52. Awesome Blog with informative concept. Really I feel happy to see this useful blog, Thanks for sharing such a nice blog. Linux Exam Center in Chennai | CCNA Exam Center in Chennai | CCNP Exam Center in Chennai

53. Awesome Blog with useful Concept. you shares such an Excellent Topic which I searched long days. Kindly Keep Blogging.. UIPath Certification Exam Center in Chennai | BluePrism Exam Center in Chennai | Automation Exam Center in Chennai

54. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

aws training in chennai

55. Awesome blog. Your articles really impressed for me, because of all information so nice and unique.
Android Project Center in Chennai | Android Project Center in Velachery

56. It’s great to come across a blog every once in a while that isn’t the same out of date rehashed material. Fantastic read.

Digital Marketing Training in Mumbai

Six Sigma Training in Dubai

Six Sigma Abu Dhabi

57. Very informative blog, really helpful. Thank you.

cs course eligibility | ISFS

58. Wow...Excellent informative blog, really helpful. Thank you.

Best CMA Training in hyd | ISFS

59. Excellent informative blog, keep for sharing.

Best System Integration services | Massil Technologies

60. Wow...Excellent informative blog. Thank you.

Best CA Training in hyderabad | ISFS

61. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
Good discussion. Thank you.
Anexas
Six Sigma Training in Abu Dhabi
Six Sigma Training in Dammam

62. Best informative blog. Thank you.

cima courses in hyderabad | ISFS

63. Your article gives lots of information to me. Thanks for sharing.
acca course in hyderabad | ISFS

64. Amazing article. Your blog helped me to improve myself in many ways thanks for sharing this kind of wonderful informative blogs in live. I have bookmarked more article from this website. Such a nice blog you are providing ! Kindly Visit Us @ Best Travels in Madurai | Tours and Travels in Madurai

65. Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Work

DevOps is currently a popular model currently organizations all over the world moving towards to it. Your post gave a clear idea about knowing the DevOps model and its importance.

Good to learn about DevOps at this time.

devops training in chennai | devops training in chennai with placement | devops training in chennai omr | devops training in velachery | devops training in chennai tambaram | devops institutes in chennai | devops certification in chennai | trending technologies list 2018

66. Thanks for sharing such a valuable information sap institute in Hyderabad

67. thanks for giving this useful informations
aws training center in chennai
aws training in chennai

68. Great Post with lots of useful informations. Excellent blog very much interesting...
SAP Training in Chennai | AWS Training in Chennai | SAP Training | AWS Training

69. Attend The Analytics Courses in Bangalore From ExcelR. Practical Analytics Courses in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Analytics Courses in Bangalore.
ExcelR Analytics Courses in Bangalore

70. Attend The Data Analytics Courses From ExcelR. Practical Data Analytics Courses Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analytics Courses.
ExcelR Data Analytics Courses

71. Nice blog. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
No:1
Dot Net Training Academy in kanchipuram

72. I am a new user of this site so here i saw multiple articles and posts posted by this site,I curious more interest in some of them hope you will give more information on this topics in your next articles.

73. I am glad that I saw this post. It is informative blog for us and we need this type of blog thanks for share this blog, Keep posting such instructional blogs and I am looking forward for your future posts.
Cyber Security Projects for Final Year

JavaScript Training in Chennai

Project Centers in Chennai

JavaScript Training in Chennai

74. do you think that this code is also suitable to use for converting the short form words into its full word?