Thursday, October 6, 2011

Inverted index - Lucene Data Structure

Inverted index

An inverted index is an index data structure storing a mapping from content, such as wordsto its locations in a database file, or in a document or a set of documents. The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database. It is the most popular data structure used in search engines.

Inverted Index Example

T0 = "it is what it is", T1 = "what is it" and T2 = "it is a banana".
We have the following inverted file index, the integers in {} refer to the their document where prseent.. T0, T1 etc.):

"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}

A term search for the terms "what", "is" and "it" would give
-> {0,1} intersect {0,1,2} intersect {0,1,2} = {0,1}.

With the same texts, we get the following full inverted index, where the pairs are document numbers and local word numbers. Like the document numbers, local word numbers also begin with zero. So, "banana": {(2, 3)} means the word "banana" is in the third document (T2), and it is the fourth word in that document (position 3).

"a": {(2, 2)}
"banana": {(2, 3)}
"is": {(0, 1), (0, 4), (1, 1), (2, 1)}
"it": {(0, 0), (0, 3), (1, 2), (2, 0)}
"what": {(0, 2), (1, 0)}

If we run a phrase search for "what is it" we get hits for all the words in both document 0 and 1. But the terms occur consecutively only in document 1.

Inverted Index Applications

The inverted index data structure is a central component of a typical search engine indexing algorithm. A goal of a search engine implementation is to optimize the speed of the query: find the documents where word X occurs. Once a forward index is developed, which stores lists of words per document, it is next inverted to develop an inverted index. Querying the forward index would require sequential iteration through each document and to each word to verify a matching document. The time, memory, and processing resources to perform such a query are not always technically realistic. Instead of listing the words per document in the forward index, the inverted index data structure is developed which lists the documents per word.

With the inverted index created, the word to document mapping can be stored in hashmap and the query can now be resolved by jumping to the word id (via random access) in the inverted index.

References -
http://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html
http://en.wikipedia.org/wiki/Search_engine_indexing
http://en.wikipedia.org/wiki/Inverted_index
http://lucene.apache.org/java/2_3_2/fileformats.html

Thursday, September 29, 2011

Creating a Custom LRU Cache in Java

Caching is an extremely useful concept to improve performance. There are many commercial and open-source caching packages available for Java. These packages need setting up the whole cache infrastructure in dev, qa, prod environments and then hooking it into the code. This is fine, but sometimes we just want a quick and simple solution - to address what we need.

In one of projects in past, we had a Gallery promotion every few days and under peak traffic, site started to melt because of read read timeouts. One option was to use cache tools like Memcached. But did we really need that? We quickly wrote a custom LRU cache that fits all the needs. The solution was well accepted and lives in production today.

Here I'll illustrate the construction of custom LRU cache. I used LinkedHashMap's accessOrder and removeEldestEntry capabilities to create the LRU feature of a cache.

// Used Generics for illustration. Static cannot be used with Generics.
// For real usage, make the cache to be static. This will make get, put everything as static.
// Also, the functions have to be synchronized, since we are deleting old elements and updating as per access order. Under multiple threads - it will lead to race condition.
public class LRUCache<K, V> {

 private final int CACHE_SIZE;
 private final int initialCapacity = 16;
 private final float loadFactor = 0.75F;

 public LRUCache(int size) {
  this.CACHE_SIZE = size;
 }

 // LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder)
 // accessOrder - to maintain in order of elements from least-recently accessed to most-recently. Invoking the put or get method results in an access.
 public LinkedHashMap<K, V> cache = new LinkedHashMap<K, V>(initialCapacity, loadFactor, true) {

  private static final long serialVersionUID = 1L;

  // The removeEldestEntry(Map.Entry) - is a method from LinkedHashMap, that should be overridden to impose a policy for removing OLD mappings automatically when new mappings are added to the map.
  // Returns true if this map should remove its eldest entry. This method is invoked by put and putAll after inserting a new entry into the map. 
  @Override
  protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
   boolean ifRemove = this.size() > CACHE_SIZE;
   return ifRemove;
  }
  
 };

 // Adds an entry to this cache. The new entry becomes the MRU (most recently used) entry. If an entry with the specified key already exists in the
 // cache, it is replaced by the new entry. If the cache is full, the LRU (least recently used) entry is removed from the cache.
 // it has to be synchronized, since we are deleting old elements and updating as per access order. Under multiple threads - it will be an issue.
 public synchronized void put(K key, V value) {
  if (value == null)
   return;
  else
   cache.put(key, value);
 }
 
 // Retrieves an entry from the cache. The retrieved entry becomes the MRU (most recently used) entry.
 public synchronized V get(K key) {
  return cache.get(key);
 }

 public synchronized void clear() {
  cache.clear();
 }
 
 
 // Test routine for the LRUCache class.
 public static void main(String[] args) {
  LRUCache<String, String> c = new LRUCache<String, String>(3);
  c.put("1", "one"); // 1
  c.put("2", "two"); // 2 1
  c.put("3", "three"); // 3 2 1
  c.put("4", "four"); // 4 3 2
  c.get("3");
  for (Map.Entry<String, String> e : c.cache.entrySet()) {
   System.out.println(e.getKey() + " : " + e.getValue());
  }
 }
}


Used Generics for illustration. Static cannot be used with Generics. For real usage, make the cache to be static. This will make get, put as static.

Also, the functions have to be synchronized, since we are deleting old elements and updating as per access order. Under multiple threads - it will lead to race condition.

Sunday, September 18, 2011

Breadth-First Binary Tree Traversal: Implementation & Complexity Analysis

For every element to be processed, we keep looping down till we reach the leaf child node. And then elements are processed while coming up.
Thus to process element n, we recursively loop (or loop to stack the elements) upto height of that node O(h)
Since going down by height, we divide the scope into half (left or right subtree) - O(h) is log n
And each time we loop (through recursion or stack) - a new layer of stack is allocated in memory

Time Complexity - O ( n log n )
Space Complexity - O ( n log n )

Thus for all elements -
Time & Space Complexity = nO(h) = nlog n = n log n

Any tree traversal - Inorder, PreOrder, PostOrder, BFS, DFS - are all same O(n log n)

package main.java.algo.datastructures.binarytree.traversal;

import java.util.LinkedList;
import java.util.Queue;

import main.java.algo.datastructures.binarytree.Node;

public class BFSTreeTraversal {
	
	public void mirrorTreeWithOutRecursion(Node root) {

		Queue<Node> queue = new LinkedList<Node>();

		queue.add(root);

		Node current = null;

		while (!queue.isEmpty()) {
			current = queue.poll();
			
			// do the processing on a node
			process(current);

			if (current.left != null)
				queue.add(current.left);
			if (current.right != null)
				queue.add(current.right);
		}
		return;
	}
	
	// process
	// example - here we are processing to create mirror of a tree
	public void process(Node n) {
		Node temp = n.left;
		n.left = n.right;
		n.right = temp;
	}

}

Depth-First Binary Tree Traversal

Time Complexity - O ( n log n )
Space Complexity - O ( n log n )

For every element to be processed, we keep looping down till we reach the leaf child node. And then elements are processed while coming up.
Thus to process element n, we recursively loop (or loop to stack the elements) upto height of that node O(h)
Since going down by height, we divide the scope into half (left or right subtree) - O(h) is log n
And each time we loop (through recursion or stack) - a new layer of stack is allocated in memory

Thus for all elements -
Time & Space Complexity = nO(h) = nlog n = n log n

Any tree traversal - Inorder, PreOrder, PostOrder, BFS, DFS - are all same O(n log n)


package main.java.algo.datastructures.binarytree.traversal;

import java.util.Stack;

import main.java.algo.datastructures.binarytree.Node;

public class DFSTreeTraversal {
	
	public void mirrorTreeWithOutRecursion(Node root) {

		Stack<Node> stack = new Stack<Node>();

		stack.push(root);

		Node current = null;

		while (!stack.isEmpty()) {
			current = stack.pop();
			
			// do the processing on a node
			process(current);

			if (current.left != null)
				stack.push(current.left);
			if (current.right != null)
				stack.push(current.right);
		}
		return;
	}
	
	// process
	// example - here we are processing to create mirror of a tree
	public void process(Node n) {
		Node temp = n.left;
		n.left = n.right;
		n.right = temp;
	}

}

Pre-Order Binary Tree Traversal

Time Complexity - O ( n log n )
Space Complexity - O ( n log n )

For every element to be processed, we keep looping down till we reach the leaf child node. And then elements are processed while coming up.
Thus to process element n, we recursively loop (or loop to stack the elements) upto height of that node O(h)
Since going down by height, we divide the scope into half (left or right subtree) - O(h) is log n
And each time we loop (through recursion or stack) - a new layer of stack is allocated in memory

Thus for all elements -
Time & Space Complexity = nO(h) = nlog n = n log n

Any tree traversal - Inorder, PreOrder, PostOrder, BFS, DFS - are all same O(n log n)

package main.java.algo.datastructures.binarytree.traversal;

import java.util.Stack;
import main.java.algo.datastructures.binarytree.Node;

/**
 ** Given a binary search tree  
.
       4 
      / \ 
     2   5 
    / \ 
   1   3
  / 
0.5
 
** Produces the output "4 2 1 0.5 3 5". 

** http://codepad.org/QEVUNzYU
**/

public class PreOrder {
	
	
	/** ************************** PreOrder : Recursive ************************** **/

	public void preOrderRecursive(Node node) {
		
		if (node == null)
			return;
		
		// first deal with the node
		System.out.print(node.data);

		// then recur on both subtrees
		preOrderRecursive(node.left);
		preOrderRecursive(node.right);

	}
	
	/** ************************** PreOrder : Iterative ************************** **/

	
	public void preOrderIterative (Node n) {
		
		//using stack
		Stack<Object> s = new Stack<Object>();
		
		// what preOrderRecursive(n.left) does
		preOrderpPushLeftChildsToStack(s, n);

		// what happens when preOrderRecursive(n.right) is called
		while (!s.isEmpty()) {
			
			Node t = (Node) s.pop();
			
			// preOrderRecursive(n.right) is called
			t = t.right;
			// what preOrderRecursive(n.left) does
			preOrderpPushLeftChildsToStack(s, t);	
		}
	}
	
	// it does what preOrderRecursive(n.left) was doing
	// note - in preorder - the processing of node is done as soon as it is discovered.
	public void preOrderpPushLeftChildsToStack(Stack<Object> s, Node n) {
		while (n != null) {
			
			// processing on a node
			process(n);
			
			s.push(n);
			n = n.left;
		}
	}
	
	public void process(Node n) {
		System.out.println(n.data);
	}

}

Post-Order Binary Tree Traversal

Time Complexity - O ( n log n )
Space Complexity - O ( n log n )

For every element to be processed, we keep looping down till we reach the leaf child node. And then elements are processed while coming up.
Thus to process element n, we recursively loop (or loop to stack the elements) upto height of that node O(h)
Since going down by height, we divide the scope into half (left or right subtree) - O(h) is log n
And each time we loop (through recursion or stack) - a new layer of stack is allocated in memory

Thus for all elements -
Time & Space Complexity = nO(h) = nlog n = n log n

Any tree traversal - Inorder, PreOrder, PostOrder, BFS, DFS - are all same O(n log n)

Given a binary tree, print out the nodes of the tree according to a
bottom-up "postorder" traversal -- both subtrees of a node are printed
out completely before the node itself is printed, and each left subtree
is printed before the right subtree.

package main.java.algo.datastructures.binarytree.traversal;

import java.util.Stack;

import main.java.algo.datastructures.binarytree.Node;

/**

 ** So the tree...
       4 
      / \ 
     2   5 
    / \ 
   1   3
  / 
0.5
 
** Produces the output "1 3 2 5 4" 
** 
** http://codepad.org/QEVUNzYU
**/

public class PostOrder {
	

	/** ************************** PostOrder : Recursive ************************** **/


	//Produces the output "1 3 2 5 4".
	public void postOrderRecursive(Node node) {
		
		if (node == null)
			return;

		// first recur on both subtrees
		postOrderRecursive(node.left);
		postOrderRecursive(node.right);

		// then deal with the node
		System.out.print(node.data);
	}
	
	
	/** ************************** PostOrder : Iterative ************************** **/

	/**
				       4 
				      / \ 
				     2   5 
				    / \ 
				   1   3
	   
	   
	 **/
	
	
	// Why do we need a VISITED flag in POST ORDER??
	
	// First remember that in post order, when we start reading from stack, we do peek first and not pop as in inorder and preorder.
	
	//	Lets consider if there was NO visited flag -
	//	
	//	1) After 1st postOrderPushLeftChildsToStack. Stack -> 4,2,1
	//	2) Peek 1 - pushleftChilds(1.right) - since there was no child of 1, stack remained same. Stack -> 4,2,1
	//	3) Pop 1, Process 1. Stack -> 4,2
	//	4) Now Stack -> 4,2. Peek 2 - pushleftChilds(2.right) - added 3. Stack -> 4,2,3
	//	5) Pop 3, Process 3. Stack again became -> 4,2
	//	
	//	Now if we did not kept visited flag - it would again then go to step 4 and go into infinite loop of steps 4 and step 5.
	//	By keeping visited flag, node 1 and node 3 is marked visited now. So, when after step 5, when it again goes to step 4, 
	//	there is no unvisted child of 2 that could be added to stack. To even after step 4, Stack -> 4,2
	//	Then 2 was popped and stack became Stack -> 4.
	//	
	//	Thus with visited flag, we avoid infinite loops as in DFS graph traversal.
	 
	
	class BTNode {
		Object data;
		BTNode left;
		BTNode right;
		
		// post order is achieved with help of DFS, using a visited flag
		boolean visited;
		
	}

	public void postOrderIterative(BTNode n) {
		//using stack
		Stack<Object> s = new Stack<Object>();
		
		// what doit(n.left) does
		postOrderPushLeftChildsToStack(s, n);

		// what happens when doit(n.right) is called
		while (!s.isEmpty()) {
			BTNode t = (BTNode) s.peek(); // Note it is peek, not pop.
			t = t.right;
			// what doit(n.left) does
			postOrderPushLeftChildsToStack(s, t);
			
			// processing on a node
			Node top = (Node) s.pop(); // Here current node to processed is the one on top of stack = s.pop()
			process(top);
		}
	}

	// it does what doit(n.left) was doing
	// pick only non visited items next time. Once visted, mark them as visited.
	public void postOrderPushLeftChildsToStack(Stack<Object> s, BTNode n) {
		while (n != null && n.visited == false) {
			n.visited = true;
			s.push(n);
			n = n.left;
		}
	}
	
	
	
	public void process(Node n) {
		System.out.println(n.data);
	}
	
	public void process(BTNode n) {
		System.out.println(n.data);
	}
}

InOrder Binary Tree Traversal

Time Complexity - O ( n log n )
Space Complexity - O ( n log n )

For every element to be processed, we keep looping down till we reach the leaf child node. And then elements are processed while coming up.

For an element to be at depth d, we have to do do d amount of work to reach that element. Then process it.

Thus to process element n, we recursively loop (or loop to stack the elements) upto height of that node O(h)
Since going down by height, we divide the scope into half (left or right subtree) - O(h) is log n
And each time we loop (through recursion or stack) - a new layer of stack is allocated in memory

Thus for all elements -
Time & Space Complexity = nO(h) = nlog n = n log n

Any tree traversal - Inorder, PreOrder, PostOrder, BFS, DFS - are all same O(n log n)


package main.java.algo.datastructures.binarytree.traversal;

import java.util.Stack;

import main.java.algo.datastructures.binarytree.Node;

/**
 ** Given a binary search tree (aka an "ordered binary tree"), 
 ** iterate over the nodes to print them out in increasing order. 
 ** 
 ** So the tree...
       4 
      / \ 
     2   5 
    / \ 
   1   3
  / 
0.5
 
** Produces the output "0.5 1 2 3 4 5". 
** 
** This is known as an "inorder" traversal of the tree.
** 
** http://codepad.org/QEVUNzYU
**/

public class InOrder {
		

	/** ************************** InOrder : Recursive ************************** **/

	
	//For each node, the strategy is: recur left, print the node data, recur right.
	public void inOrderRecursive(Node node) {
		
		if (node == null)
			return;
		 
		// left, node itself, right 

		if (node.left != null)
			inOrderRecursive(node.left);
		
		System.out.println(node.data);
		
		if (node.right != null)
			inOrderRecursive(node.right);

	}
	
	/** ************************** InOrder : Iterative ************************** **/

	
	public void inOrderIterative (Node n) {
		
		//using stack
		Stack<Object> s = new Stack<Object>();
		
		// what inOrderRecursive(n.left) does
		inOrderPushLeftChildsToStack(s, n);

		// what happens when inOrderRecursive(n.right) is called
		while (!s.isEmpty()) {
			
			Node t = (Node) s.pop();
			
			// processing on a node
			process(t);
			
			// inOrderRecursive(n.right) is called
			t = t.right;
			// what inOrderRecursive(n.left) does
			inOrderPushLeftChildsToStack(s, t);	
		}
	}

	// it does what inOrderRecursive(n.left) was doing
	public void inOrderPushLeftChildsToStack(Stack<Object> s, Node n) {
		while (n != null) {
			s.push(n);
			n = n.left;
		}
	}
	
	public void process(Node n) {
		System.out.println(n.data);
	}

}

Stack - concepts & Java Implementation

Stack is (LIFO).
It supports push O(1), pop O(1) + peek O(1), isEmpty O(1), iterate O(N)
All stack operations except iteration are constant time O(1).

package main.java.algo.datastructures;


/**************************************************
 **  A generic stack, implemented using a linked list.
 **************************************************/

import java.util.Iterator;
import java.util.NoSuchElementException;

public class Stack<T> implements Iterable<T> {
    
	// helper linked list class
    // Note - next pointer links to the element in direction OPPOSITE to which elements go out.
	// It is the next element to be moved out (popped or dequeued)
    // In Queue e.g. {1,2,3,4,5}, elements move out in <-, so next points to element on right 
    // In Stack e.g. {1,2,3,4,5}, elements move out in ->, next points to element on left
    private class Node {
        private T data;
        private Node next;
    }
	
	private int size;     // size of the stack
    private Node top;     // top of stack

    // constructor
    public Stack() {
        top = null;
        size = 0;
    }

    public boolean isEmpty() {
        return top == null;
    }

    public int size() {
        return size;
    }

   /****
     ** Add the item to the stack.
     **/
    public void push(T item) {        
        Node n = new Node();
        n.data = item;
        n.next = top;
        top = n;
        
        size++;
    }

   /****
     ** Delete and return the item most recently added to the stack.
     ** Throw an exception if no such item exists because the stack is empty.
     **/
	public T pop() {
		if (isEmpty())
			throw new RuntimeException("Stack underflow");
		T data = top.data; // save item to return
		top = top.next; // delete first node
		size--;
		return data; // return the saved item
	}


   /****
     ** Return the item most recently added to the stack.
     ** Throw an exception if no such item exists because the stack is empty.
     **/
	public T peek() {
		if (isEmpty())
			throw new RuntimeException("Stack underflow");
		return top.data;
	}

   /****
     ** Return string representation.
     **/
    public String toString() {
        StringBuilder s = new StringBuilder();
        for (Object item : this)
            s.append(item + " ");
        return s.toString();
    }
       

   /****
     ** Return an iterator to the stack that iterates through the items in LIFO order.
     **/
	public Iterator<T> iterator() {
		return new ListIterator();
	}

	// an iterator, doesn't implement remove() since it's optional
	private class ListIterator implements Iterator<T> {
		private Node current = top;

		public boolean hasNext() {
			return current != null;
		}

		public void remove() {
			throw new UnsupportedOperationException();
		}

		public T next() {
			if (!hasNext())
				throw new NoSuchElementException();
			T item = current.data;
			current = current.next;
			return item;
		}
	}
}

Queue - Concepts & Java Implementation

Queue is (FIFO)
It supports enqueue, dequeue + peeking at the top item, is empty, iterate FIFO
All queue operations except iteration are constant time.

package main.java.algo.datastructures;

/**************************************
 **  A generic queue, implemented using a linked list.
 **************************************/

import java.util.Iterator;
import java.util.NoSuchElementException;

public class Queue<T> implements Iterable<T> {
    private int size;         // number of elements on queue
    private Node first;    // beginning of queue
    private Node last;     // end of queue

    // helper linked list class
    // Note - next pointer links to the element in direction OPPOSITE to which elements go out.
	// It is the next element to be moved out (popped or dequeued)
    // In Queue e.g. {1,2,3,4,5}, elements move out in <-, so next points to element on right 
    // In Stack e.g. {1,2,3,4,5}, elements move out in ->, next points to element on left

    private class Node {
        private T item;
        private Node next;
    }

    public Queue() {
        first = null;
        last  = null;
    }

    public boolean isEmpty() {
        return first == null;
    }

    public int size() {
        return size;     
    }

   /****
     ** Return the item least recently added to the queue.
     ** Throw an exception if the queue is empty.
     **/
	public T peek() {
		if (isEmpty())
			throw new RuntimeException("Queue underflow");
		return first.item;
	}

   /****
     ** Add the item to the queue.
     **/
	
    // In Queue e.g. {1,2,3,4,5}, elements move out in <-, so next points to element on right 
	public void enqueue(T item) {
		Node x = new Node();
		x.item = item;
		if (isEmpty()) {
			first = x;
			last = x;
		} else {
			last.next = x;
			last = x;
		}
		size++;
	}

   /****
     ** Remove and return the item on the queue least recently added.
     ** Throw an exception if the queue is empty.
     **/
	public T dequeue() {
		if (isEmpty())
			throw new RuntimeException("Queue underflow");
		T item = first.item;
		first = first.next;
		size--;
		if (isEmpty())
			last = null; // to avoid loitering
		return item;
	}

   /****
     ** Return string representation.
     **/
    public String toString() {
        StringBuilder s = new StringBuilder();
        for (T item : this)
            s.append(item + " ");
        return s.toString();
    } 
 

   /****
     ** Return an iterator that iterates over the items on the queue in FIFO order.
     **/
    public Iterator<T> iterator()  {
        return new ListIterator();  
    }

    // an iterator, doesn't implement remove() since it's optional
	private class ListIterator implements Iterator<T> {
		private Node current = first;

		public boolean hasNext() {
			return current != null;
		}

		public void remove() {
			throw new UnsupportedOperationException();
		}

		public T next() {
			if (!hasNext())
				throw new NoSuchElementException();
			T item = current.item;
			current = current.next;
			return item;
		}
	}
	
}



Linked List Implementation - Java




package main.java.algo.datastructures;

public class LinkedLists {

	Node header = null;

	// header node is created as the object is created.
	// so header will never be null
	public LinkedLists() {
		header = new Node();
	}

	public int size() {
		int size = 0;
		ListIterator itr = new ListIterator(header);
		while (itr.isvalid()) {
			itr.advance();
			size++;
		}
		return size;
	}

	public void insert(Object o) {
		if (o == null)
			return;
		Node n = new Node(o);
		n.next = header.next;
		header.next = n;
	}

	public void remove(Object o) {
		ListIterator itr = new ListIterator(header);
		Node prev = null;
		Node current = null;
		while (itr.isvalid()) {
			prev = itr.retrieveNode();
			itr.advance();
			current = itr.retrieveNode();
			if (current.data.equals(o)) {
				prev.next = current.next;
				break;
			}
		}
	}

	public boolean isEmpty() {
		return (header.next == null);
	}

	public void makeEmpty() {
		header.next = null;
	}

	public Object getFirst() {
		if (header != null)
			return header.next;
		return null;
	}

	public boolean find(Object o) {
		ListIterator itr = new ListIterator(header);
		while (itr.isvalid()) {
			itr.advance();
			if (itr.retrieveData().equals(o))
				return true;
		}
		return false;
	}

	public Object findPrevious(Object o) {
		ListIterator itr = new ListIterator(header);
		Object prev;
		while (itr.isvalid()) {
			prev = itr.retrieveData();
			itr.advance();
			if (itr.retrieveData().equals(o))
				return prev;
		}
		return false;
	}
}

/************Linked List Iterator**************/


class ListIterator {

	Node current = null;

	public ListIterator(Node n) {
		current = n;
	}

	public boolean isvalid() {
		return current != null;
	}

	public Object retrieveData() {
		if (isvalid())
			return current.data;
		return null;
	}

	public Node retrieveNode() {
		if (isvalid())
			return current;
		return null;
	}

	public void advance() {
		if (isvalid())
			current = current.next;
	}
}

/************ Linked List Node **************/

class Node {

	Object data = null;;
	Node next = null;

	public Node() {
		this(null, null);
	}

	public Node(Object o) {
		this(o, null);
	}

	public Node(Object o, Node n) {
		this.data = o;
		this.next = n;
	}

}

HashMap - Concepts & Implementation

It is internally an ARRAY of objects (with key and value)
and on which index of array the object is stored - is determined by hashing

Each slot of the array contains a link to a singly-linked list containing key-value pairs with the same hash.
New key-value pairs are added to the end of the list. Lookup algorithm searches through the list to find matching key.
Initially table slots contain nulls. List is being created, when value with the certain hash is added for the first time.

Collision - resolution by chaining
if hash function returns the same hash value for different keys? It yields an effect, called collision.
Each slot of the hash table contains a link to another data structure (i.e. linked list), which stores
key-value pairs with the same hash. When collision occurs, this data structure is searched for key-value pair, which matches the key.

Dynamic resizing
Basic underlying data strucutre used to store hash table is an array.
The load factor is the ratio between the number of stored items and array's size.
Hash table can whether be of a constant size or being dynamically resized, when load factor exceeds some threshold. Resizing is done.

HashMap does not guarantee that the order of the map will remain constant over time.

COMPLEXITY of insertion, removal and lookup operations is constant - O(1)

Counts number of words in a paragraph - can be done via hashing

package main.java.algo.datastructures;

import javax.naming.ldap.HasControls;

// THIS CAN BE USED FOR HASHMAP / HASHTABLE / HASHSET
// NOTE - IN add features explained at HashTable_HasMap_HashSet for HASHMAP / HASHTABLE / HASHSET

public class HashMap<K, V> {

	// underlying data structure is array for hastable/hashmap
	// its just that the elements are stored on indexes calculated by hashcode
	Entry<K, V>[] map;
	
	// this is for dynamic resizing - if array grew to its threshold - we expand it
	int threshold = 0;
	float loadFactor = 0.75f;
	int filledSize = 0;
	
	public static final int DEFAULT_CAPACITY = 128;
	
	// array can have its size only in integer
	// and we are dynamically increasing size. To avoid integer overflow (size > max integer value), we use it.
	public static final int MAX_CAPACITY = Integer.MAX_VALUE;

	// if size is supposed to get bigger than Integer.MAX_VALUE, then we should not use array. Probably the next possible solution is to use the linked list as the underlying data structure.
	
	// How about achieving LinkedHashMap - where order also matters.
	// Then we can use the same approach, but use 2 pointers in Entry Object - next (we'll store current pointer. next  of current points to new added element) and siblings (if hashcode of new elements was same as existing, we'll create a new object and point the sibbling to it.) 
	
	int tableSize = DEFAULT_CAPACITY;

	public HashMap() {
		map = new Entry[tableSize];
		threshold = (int) (tableSize * loadFactor);
	}

	public void put(K key, V value) throws Exception {
		
		if(filledSize == MAX_CAPACITY)
			throw new Exception("Array overflow Exception. Max array size reached");
		
		int hash = calcIndexForPut(key);
		// for the 1st entry on any hashcode index
		if (map[hash] == null) {
			map[hash] = new Entry<K,V>(key, value, null);
			filledSize++;
		}
		else {
			Entry<K, V> n = map[hash];
			//elements are stored as linked-lists, so traverse through the list to add the element
			while (n.next != null && !genericEquals(n.key, key)) {
				n = n.next;
			}
			// if the key was already present, just update the same key with new value
			if (genericEquals(n.key, key)) {
				n.value = value;
			} else {
				// this key was not present on hashcode - add it as a new element to the linked list
				n.next = new Entry<K, V>(key, value, null);
				filledSize++;
			}
		}
		//dynamic resizing
		if(filledSize > threshold)
			resize();
	}

	public V get(K key) {
		// elements are stored on indexes calculated by hashcode
		int hashIndex = calcIndexForGet(key);
		if(hashIndex == -1)
			return null;
		Entry<K, V> e = map[hashIndex];
		
		//no element on hascode index.
		if (e == null)
			return null;
		else {
			while (e.next != null && !genericEquals(e.key,key))
				e = e.next;
			if (genericEquals(e.key,key))
				return e.value;
		}
		return null;
	}

	public void remove(K key) {
		int hashIndex = calcIndexForGet(key);
		if(hashIndex == -1)
			return;
		Entry<K, V> e = map[hashIndex];
		if (e == null)
			return;
		else {
			// search for element
			// it is found - do prev.next = current.next;
			Entry<K, V> prevNode = null;
			
			// traverse till element is found
			while (e.next != null && !genericEquals(e.key,key)) {
				prevNode = e;
				e = e.next;
			}
			if (genericEquals(e.key,key)) {
				// if it was the first element in the linked list
				if (prevNode == null)
					map[hashIndex] = e.next;
				else // if somewhere inbetween the list
					prevNode.next = e.next;
				
				filledSize--;
				// note - when size increased above threshold, we resize it by increasing and re-copying the array. Shouldn't we reduce the size when items are removed it becomes below some threshold. But This will be tricky here, because the indexes will not remain same if we re-copy after some missing elements in between.  
			}
		}
	}

	// Hash function is very important part of hash table design. Hash function is considered to be good, if it provides uniform distribution of hash values.
	// it decides on which index of array, the key-value object is stored
	
	// Underlying array has constant size to store 128 elements and each slot contains key-value pair.
	// Here, we are using an array of 128 length. So to fit elemnts in this size we use - the remainder of division by 128.
	private int calcIndexForPut(K key) {
		return key.hashCode() % tableSize;
	}
	
	// hashmap allows null keys. So, in that case, we need special handling of null key in calculating hashcode.
	
	// since table size keep on increasing, we need to know at what index actually actually a Key was stored
	// that can be found when we find the first table size, where division gave 0.
	// e.g. if default size is 100, then K with hashcode say 215 will be stored in range 200 - 300. and % by 300 will give the actual index.
	// note - while putting keys to hashmap we dont need this function. At that time, just our calcIndexForPut will be enuf. 
	private int calcIndexForGet(K key) {
		int hashcode = key.hashCode();
		for (int i = DEFAULT_CAPACITY; i <= tableSize; i += DEFAULT_CAPACITY) {
			if((hashcode / i) == 0)
				return hashcode % i;
		}
		return -1;
	}
	
	
	// How to check if generic values are equal? - is K key1 and K key2 equal?
	// this is how we can do this.
	private <T> boolean genericEquals(T t1, T t2){
		if(t1.hashCode() == t2.hashCode())
			return true;
		return false;
	}
	
	
	
	// not here we are not making int size=0 - it has to stay where it was
	// array that resizes itself as needed while still providing O(1) access
	// Each doubling takes O(n) time, but happens so rarely that its put/get/remove operation time is still O(1)

	public void resize() throws Exception {
		int newSize = tableSize + DEFAULT_CAPACITY;
		threshold = (int) (newSize * loadFactor);
		// there is only this way - to create new array of resized length and copy old array to it
		Entry<K, V>[] oldArray = map;
		map = new Entry[newSize];
		for (int i = 0; i < oldArray.length; i++) {
			// note - just passing keys and values and NOT .next. 
			// .next is itself constructed again
			put(oldArray[i].key, oldArray[i].value);
		}
	}
	
	// COMPLEXITY analysis - Dynamic resizing
	// resizing is done at once and operation, which triggers resizing, takes O(n) time to complete, where n is a number of entries in the table. 
	// This fact may make dynamic-sized hash tables inappropriate for real-time applications.

}


class Entry<K, V> {
	K key;
	V value;
	Entry<K, V> next;

	public Entry(K key, V value, Entry<K, V> next) {
		this.key = key;
		this.value = value;
		this.next = next;
	}

	public boolean equals(Object obj) {
		try {
			if (this == obj)
				return true;
			else if (obj == null || this.getClass() != obj.getClass())
				return false;
			Entry<K, V> o = (Entry<K, V>) obj;
			// Equals for generics - It is implemented by the actual object passed in place of  generic. E.g. If K is of type String. Then String.equals will be applicable here.
			if (this.key.equals(o.key) && (this.value.equals(o.value)))
				return true;
		} catch (Exception e) {
			return false;
		}
		return false;
	}


	public int hashCode() {
		if (this.key == null)
			return 0;
		int hash = 7;
		hash = (31 * hash) + this.key.hashCode();
		hash = (31 * hash) + this.value.hashCode();
		return hash;
	}

}

Graph - Breadth First Search



package main.java.algo.datastructures.graphs;

import java.util.LinkedList;
import java.util.List;
import java.util.Queue;

public class BreadthFirstSearch {
	
	
	public void doBFS(Graph g, GNode root) {

		GNode n = root;
		List<GNode> adjacents = null;
		
		// using a queue
		// Note in Java - Queue is an Interface and not a class like Stack. LinkedList internally implements it.
		// so this is how we use a queue
		// Queue provides add and poll methods for enqueue and dequeue respectively
		Queue<GNode> q = new LinkedList<GNode>();
		// in BFS - we process the siblings first than the children. Example - process nodes at same depth in a tree before going to the next level.
		
		q.add(n);
		
		while (!q.isEmpty()) {
			
			// dequeue
			n = q.poll();
			if (n == null)
				continue;
			
			adjacents = g.getAdjacents(n); // get all nodes that are linked to n. This method getAdjacents() is implemented in my Graph class
			
			for (GNode node : adjacents) {
				
				// In a graph pick an element only ones, to avoid cycles and loops
				if (node.state == State.UNVISTED) {
					
					// once picked, mark as visited and put in stack for processing
					node.state = State.VISITED;
					q.add(node);

					//if we want to process and edge
					processEdge(n, node);
				}
			}
			
			// we have picked all the connecting edges of a vertex n. Now if we want to process a vertex.
			// now we are done with that vertex, we mark it as processed.
			processVertex(n);
			n.state = State.PROCESSED;
		}
	}

	public void processEdge(GNode a, GNode b) {
		System.out.println("Edge processd in : " + a.toString() + ", " + b.toString());
	}
	
	public void processVertex(GNode n) {
		System.out.println("Vertex processd : " + n.toString());
	}

}

Graphs - Depth First Search



package main.java.algo.datastructures.graphs;

import java.util.List;
import java.util.Stack;

public class DepthFirstSearch {

	public void doDFS(Graph g, GNode root) {

		GNode n = root;
		List<GNode> adjacents = null;
		
		//using a stack
		Stack<GNode> s = new Stack<GNode>();
		// in stack - we drill down the edge first - finish the legs before picking the siblings
		// in BFS - we use  Queue (linkedlist internally). Queue uses less memory as compared to a stack - because in stack - every level of recursion adds a new layer to the memory.
		
		s.push(n);
		
		while (!s.isEmpty()) {
			
			n = s.pop();
			if (n == null)
				continue;
			
			adjacents = g.getAdjacents(n); // get all nodes that are linked to n. This method getAdjacents() is implemented in my Graph class
			
			for (GNode node : adjacents) {
				
				// In a graph pick an element only ones, to avoid cycles and loops
				if (node.state == State.UNVISTED) {
					
					// once picked, mark as visited and put in stack for processing
					node.state = State.VISITED;
					s.push(node);
					
					//if we want to process and edge
					processEdge(n, node);
					
				}
			}
			// we have picked all the connecting edges of a vertex n. Now if we want to process a vertex.
			// now we are done with that vertex, we mark it as processed.
			processVertex(n);
			n.state = State.PROCESSED;
		}
	}

	// this can be used to process edge - like check if 2 nodes are connected
	public void processEdge(GNode a, GNode b) {
		System.out.println("Edge processd in : " + a.toString() + ", " + b.toString());
	}
	
	public void processVertex(GNode n) {
		System.out.println("Vertex processd : " + n.toString());
	}

}

Graph Implementation


package main.java.algo.datastructures.graphs;

import java.util.ArrayList;
import java.util.List;

public class Graph {

	public GNode root;
	public ArrayList<GNode> nodes = new ArrayList<GNode>();
	public int[][] adjMatrix;
	public final int size;

	public Graph(int size) {
		this.size = size;
		adjMatrix = new int[size][size];
	}

	public void add(Object val) {
		GNode n = new GNode(val);
		nodes.add(n);
	}

	public void connect(GNode a, GNode b) {

		int aIndex = nodes.indexOf(a);
		int bIndex = nodes.indexOf(b);

		if (aIndex == -1 || bIndex == -1)
			return;

		adjMatrix[aIndex][bIndex] = 1;
		adjMatrix[bIndex][aIndex] = 1;

	}

	public void removeConnection(GNode a, GNode b) {

		int aIndex = nodes.indexOf(a);
		int bIndex = nodes.indexOf(b);

		if (aIndex == -1 || bIndex == -1)
			return;

		adjMatrix[aIndex][bIndex] = 0;
		adjMatrix[bIndex][aIndex] = 0;

	}

	public List<GNode> getAdjacents(GNode n) {
		ArrayList<GNode> list = new ArrayList<GNode>();
		
		int nIdx = nodes.indexOf(n);
		
		int[] adjacentsOfN = adjMatrix[nIdx];

		for (int i : adjacentsOfN) {
			n = nodes.get(i);
			list.add(n);
		}

		return list;

	}

}

// graph node does not have next or left right pointers
// this thing is controlled by adjacency matrix in Graph
class GNode {

	public Object data;
	public State state;

	public GNode(Object o) {
		data = o;
		state = State.UNVISTED;
	}

}

enum State {
	UNVISTED, VISITED, PROCESSED;
}

Quick Sort - Algorithm & Implementation

Algorithm

The divide-and-conquer strategy is used in quicksort.

1) Find Pivot:
We take the VALUE of the middle element as pivot value, but it can be any
value, which is in range of sorted values, even if it doesn't present in
the array. It is the value of element not any index. (because indexes keep changing when we move items to left & right of pivot)

2) Partition.
Rearrange elements in such a way, that all elements
which are lesser than the pivot go to the left part of the array and all
elements greater than the pivot, go to the right part of the array.
Values equal to the pivot can stay in any part of the array. Notice, that
array may be divided in non-equal parts.

3) Sort both parts.
Apply quicksort algorithm recursively to the left and the right parts.

package main.java.algo.sorting;

public class QuickSort {

	public static void main(String[] args) {
		
		int[] array = { 1, 12, 5, 26, 0, 14, 3, 7, 2 };
		
		new QuickSort().quickSort(array);
		
		for (int a : array) {
			System.out.println(a);
		}
	}

	public void quickSort(int[] arr) {
		quickSort(arr, 0, arr.length - 1);
	}

	void quickSort(int a[], int startIdx, int endIdx) {
		
		int lIdx = startIdx;
		int rIdx = endIdx;
		int pivotElement;

		// this is the base case of recursion on which the recursion looping exits
		if (endIdx < startIdx)
			return;

		// Note - we are not changing startIdx and endIdx. lIdx and rIdx are being changed when things are moved around. 
		// startIdx and endIdx remain constant and are used to check the start and end of array.
		
		// Arbitrarily establishing partition element as the midpoint of the array.
		 
		pivotElement = a[(startIdx + endIdx) / 2];

		// loop through the array until indices cross
		while (lIdx <= rIdx) {
			/**
			 ** find the first element that is greater than or equal to the
			 ** partition element starting from the left Index.
			 **/
			// note - here lIdx is checked against endIdx (not pivotIdx - because we compare pivot value and pivot position keeps on changing)

			while ((lIdx < endIdx) && (a[lIdx] < pivotElement))
				++lIdx;

			/**
			 ** find an element that is smaller than or equal to the partition
			 ** element starting from the right Index.
			 **/
			// note - here rIdx is checked against startIdx (not pivotIdx - because we compare pivot value and pivot position keeps on changing)
			while ((rIdx > startIdx) && (a[rIdx] > pivotElement))
				--rIdx;

			// if the indexes have not crossed, swap
			// we kept = condition because we want left and right to increment & cross each other so that while condition can be exited
			if (lIdx <= rIdx) {
				swap(a, lIdx, rIdx);
				++lIdx;
				--rIdx;
			}
		}

		/**
		 ** If the right index has not reached the left side of array must now
		 ** sort the left partition.
		 **/
		if (startIdx < rIdx)
			quickSort(a, startIdx, rIdx);

		/**
		 ** If the left index has not reached the right side of array must now
		 ** sort the right partition.
		 **/
		if (lIdx < endIdx)
			quickSort(a, lIdx, endIdx);

	}
	
	private void swap(int[] array, int idx1, int idx2) {
		int tmp = array[idx1];
		array[idx1] = array[idx2];
		array[idx2] = tmp;
	}

}


// heap sort & quick sort - which is better - no direct answer
// heap sort can not be used to access any sequential data like linked List, Quick sort can - but will perform poorly. Best for sequential data is Merge sort.
// quick sort uses log n space. Merge uses n space.


COMPLEXITY ANALYSIS

Quicksort is a divide and conquer algorithm. In a divide and conquer sorting algorithm the original data is
separated into two parts (divide) which are individually sorted (conquered) and then combined.

Time Complexity - O(n log n)

- In in one pass, elements are arranged to left and right of pivot. This takes O(n)
- because of the stack layers created for recursion. Since it is divide and conquer, it goes into recursive loop log n times. So O(log n) stack space is used.
- Complexity = total recursion loopswhat happens in each recursion. O(n)O(log n) = O( n log n)

Space complexity = O(log n)

- actually by partitioning logic I have written - This requires O(1) space.
- because of the stack layers created for recursion. Since it is divide and conquer, it goes into recursive loop log n times. So O(log n) stack space is used.
- quicksort's sequential and localized memory references work well with a cache.

Best Case = O(n log n)

- Proof :::
- occurs when the median divides array in 2 equal parts. It the array contains n elements then the first run will need O(n).
- Sorting the remaining two sub-arrays takes 2** O(n/2).
- Continuing likewise till the kth step - 2^k T (n/2^k)
- This ends up in a performance of O(n log n). (There is a mathematical proof)

Worst case = O(n^2)

- occurs when median is the smallest element and it divides array into 1 and n-1.
- Proof :::
- Worst case will occur when the elements are in such an order that anytime it picks a pivot element (depends on the pivot logic also), it always comes to be the smallest element.
- In that case, it selects only one element in each iteration and has to again call recursion for n-1 elements on pivot's right.
- So it is O(n) + O(n-1) + (On-2).. O(1) which is equal to O(n^2). (There is a mathematical proof)
- this behavior is very rare. so we dont consider this when we generally talk about QuickSort complexity.
- thus, many people favor to apply quick sort on randomly shuffled array. - even to avoid any such worst case ocurrance.

Average Case - O(n log n)

- On a randomly shuffled array, it gives O(n log n) - (There is a mathematical proof)

- Quicksort is often faster in practice than other O(nlogn) algorithms

Merge Sort - Algorithm & Implementation

First it divides a data set in half, and sorts each half separately.
Next, the first elements from each of the two lists are compared.
The lesser element is then removed from its list and added to the final result list.

Why divide & conquer ?
1) A small list will take fewer steps to sort than a large list.
2) Fewer steps are required to construct a sorted list from two sorted lists than from two unsorted lists.

package main.java.algo.sorting;

public class MergeSort {
	
	public static void main(String[] args) {
		
		int[] data = {18,  6,  9,  1,  4, 15, 12,  5,  6,  7, 11};

		
		MergeSort m = new MergeSort();
		int[] sorted = m.mergeSort(data);
		for(int a: sorted) {
			System.out.println(a);
		}		
	}
	
	/***
	 *** answer[] = ms ({18,  6,  9,  1,  4, 15, 12,  5,  6,  7, 11})

				left[] = ms ({18,  6,  9,  1,  4})
							
							left[] = ms [{18,  6})
								
									left[] = ms({18}) = 18
									right[]	= ms({6}) = 6
									merge(left,right) = merge(18,6) 
									= {6,18}
							
							right[] = ms ({9,1,4})
									
									left[] = ms({9}) = 9
									right[]	= ms({1,4})
											
												left[] = ms ({1}) = 1
												right[] ms ({4}) = 4
												merge(left,right) = merge(1,4) 
												= {1,4}
											
									merge(left,right) = merge({9},{1,4}) 
									= {1,4,9}
									
							merge(left,right) = merge({6,18},{1,4,9}) 
							={1,4,6,9,18}
							
				right[] = ms ({15, 12,  5,  6,  7, 11})
							....
							= {5,6,7,11,12,15}
							
				merge(left,right) = merge({1,4,6,9,18},{5,6,7,11,12,15}) 
				
				= {1,4,5,6,7,9,11,12,15,18}
				
				
	 *** 
	 *** ***/

	public int[] mergeSort(int[] data) {
		// since it is recursion - always remember to write the base case
		if(data == null || data.length == 1)
			return data;
		
		int middleIdx = data.length/2;
		
		//start-index : inclusive, end-index : exclusive
		//Divide array into left & right till we reach the smallest array (with only one element)
		int left[] = mergeSort(subArray(data, 0, middleIdx));
		int right[] = mergeSort(subArray(data, middleIdx, data.length));
		
		//consider the array represented as a binary tree
		//then at every level - sort using merge sort.
		//when the recursion moves backwards - we sort smallest groups, merging them into bigger and so on.. till we get the elements in the input array.
		int res[] =  merge(left, right);
		for(int a: res) {
			System.out.println(a);
		}
		System.out.println("===============");
		return res;
	}
	
	/***
	 *** We merge from bottom to top, means smaller sub arrays gets merged (and sorted).
	 *** On very 1st merge(..) call - there will be inly one element in input arrays. left={18} and right={6} gives sorted {6,18} and so on..
	 *** So left[] and right[] params at any point are sorted in itself and we need to merge them.
	 *** Thus, the smallest element among left[] and right[] - must sit at top of one of the lists. 
	 *** It can be removed and put into result[]. The 2nd smallest again lies in the top position in one of the left or right arrays. 
	 *** After removing elements, if one list is exhausted - blindly copy the elements from the other list to the result.
	 ***/
	private int[] merge(int[] left, int[] right) {

		if (left == null || left.length == 0)
			return right;
		else if (right == null || right.length == 0)
			return left;

		int resultSize = left.length + right.length;
		int[] result = new int[resultSize];
		int lIdx = 0, rIdx = 0, rsltIdx = 0;
		while (rsltIdx < resultSize) {
			if (rIdx == right.length) {
				result[rsltIdx] = left[lIdx];
				lIdx++;
			} else if (lIdx == left.length) {
				result[rsltIdx] = right[rIdx];
				rIdx++;
			} else if (left[lIdx] < right[rIdx]) {
				result[rsltIdx] = left[lIdx];
				lIdx++;
			} else if (right[rIdx] < left[lIdx]) {
				result[rsltIdx] = right[rIdx];
				rIdx++;
			} else {
				//same elements on top in left & right
				//moving only one element
				result[rsltIdx] = left[lIdx];
				lIdx++;
			}
			rsltIdx++;
		}
		return result;
	}
	
	
	//start-index : inclusive, end-index : exclusive
	private int[] subArray(int[] data, int startIdx, int endIdx) {
		int[] subArray = new int[endIdx - startIdx];
		//populating the sub-array
		//dataIdx < endIdx : since end-index is exclusive
		for (int dataIdx = startIdx, subArrayIdx = 0; dataIdx < endIdx; dataIdx++, subArrayIdx++) {
			subArray[subArrayIdx] = data[dataIdx];
		}
		return subArray;
	}
	
	
	
}

// Sorting in-place - is possible but complicated -- use linklist instead of array


/***
 *** COMPLEXITY ANALYSIS
 *** 
 *** MergeSort is a divide and conquer algorithm. 
 *** 
 *** If the running time of merge sort for a list of length n is T(n), then the recurrence T(n) = 2T(n/2)... unto kth recurrence = 2^k T (n/2^k) 
 *** 
 *** Time Complexity - O(n log n)
 *** 
 ***  	- Loop executes recursively log n times -- O(log n)
 *** 		- In Every loop, a merge function in called used to merge n elements ([k] and [n-k]) -- O(n)
 *** 		- Total =  O(n) *** O(log n)   =   O(n log n)
 *** 
 *** Space complexity = O(n) 
 *** 
 *** 		- O(n) - used in merge function for the result array
 *** 		- O(log n) - used in stack for log n recurrsive calls
 *** 		- O(n) + O(log n)  =  O(n) 
 *** 	
 *** Best Case = O(n log n) 
 *** 
 *** 
 *** Worst case = O(n log n) 
 *** 
 *** 
 *** Average Case - O(n log n) 
 *** 

 *** 
 *** ***/



**********************************************************************************

Comparisons
heapsort has same time complexity as merge sort, but requires O(1) space for sorting [once the heap is created], as compared to O(n) and is faster in implementation.

**********************************************************************************

Sorting Linked list

- Merge sort is best for any sequential data (lists, hard disks)
- It is easy to maintain merge sort that it requires only O(1) space.
- Coz of slow random access of linked list nodes -
- quick sort - performs poorly
- heap sort - impossible

**********************************************************************************

Online Sorting

Merge sort's merge operation is useful in online sorting, where the list to be sorted is received a piece at a time, instead of all at the beginning.
In this application, we sort each new piece that is received using any sorting algorithm, and then merge it into our sorted list so far using the merge operation.

However, this approach can be expensive in time and space if the received pieces are small compared to the sorted list — a better approach in this case is to
store the list in a self-balancing binary search tree and add elements to it as they are received.

**********************************************************************************

Merge sort parallelizes better; the most trivial way of parallelizing merge sort achieves close to linear speedup, while there is no obvious way to parallelize heapsort at all.


Bubble Sort - Algorithm & Implementation

package main.java.algo.sorting;

public class BubbleSort {

	/***
	 *** Algorithm
	 *** 
	 *** Compare each pair of adjacent elements from the beginning of an array
	 *** and, if they are in reversed order, swap them. If at least one swap has
	 *** been done, repeat step 1.
	 *** 
	 *** In every pass, the biggest elements keep moving towards the right.
	 *** After 1st pass, we'll have the biggest in rightmost.
	 *** In 2nd pass, we'll have 2nd biggest on last -1 position
	 *** 
	 ***/

	public void bubbleSort(int[] arr) {
		boolean swapped = true;
		int j = 0;
		int tmp;
		while (swapped) {
			swapped = false;
			j++;
			// in every one for loop, the largest element is pushed towards the right,
			for (int i = 0; i < arr.length - j; i++) {
				if (arr[i] > arr[i + 1]) {
					tmp = arr[i];
					arr[i] = arr[i + 1];
					arr[i + 1] = tmp;
					swapped = true;
				}
			}
		}
	}

}

/***
 *** Average and worst case complexity of bubble sort is O(n2). Also, it makes O(n2) swaps in the worst case - when the list is already in desc sorted
 *** Bubble sort is adaptive. It means that for almost sorted array it gives O(n) estimation***/

Heap Sort in java

This post is in Continuation to my previous post on Binary Heap: Concepts & Implementation in Java

package main.java.algo.sorting.heap;


/***Heap Sort is another implementation of selection sort using the right data structure.
 *** It optimizes the way to pick minimum element in the logic. From O(n) to find min in selection sort, it used O(log n)***/

public class HeapSort {
	
	public int[] heapSort(int[] data) throws HeapException {
		BinaryMinHeap_PriorityQueue heap = new BinaryMinHeap_PriorityQueue(data.length);
		
		//first create heap from raw array
		for (int i = 0; i < data.length; i++) {
			heap.insert(data[i]);
		}
		
		//extract min value using heap's remove min.
		for (int i = 0; i < data.length; i++) {
			data[i] = heap.extractMin();
		}
		return data;
	}

}

/***COMPLEXITY:
 *** create heap:: inserting one element is O(log n). Creating heap from n elements is O(n log n).
 *** extract min:: for one element is O(log n). For extracting n elements, worst case is O(n log n).
 *** So, complexity is :: O(n log n) + O(n log n) = O(n log n)
 *** ***/

// Heapsort relies strongly on random access, and its poor locality of reference makes it very slow on media with long access times

// just fyi -- heap works on the same funda
// A tree sort is a sort algorithm that builds a binary search tree from the keys to be sorted, and then traverses the tree (in-order) so that the keys come out in sorted order.

Binary Heap: Concepts & Implementation in Java





package main.java.algo.sorting.heap;

/****** This class is about Priority Queue, which is implemented using Binary Min Heap. 
 *** So here we'll implement Binary Min Heap being used as Priority Queue******/

public class BinaryMinHeap_PriorityQueue {

	/***
	 *** Heap Property:
	 *** - It is a complete Tree
	 *** - It is a Binary Tree
	 *** - Every node is smaller than its left and right child. Known as MIN-HEAP.
	 *** 
	 *** - If every node is bigger than its left and right child. Known as MAX-HEAP.
	 *** 
	 *** Heap is Loosely Sorted
	 *** - In heap, we know that root is always smaller than left and right child, but there is no mention if left child is smaller than right or not.
	 *** - Good to maintain min or max at run time.
	 *** - Good to maintain median at run time (if we keep 2 heaps - one as MAX HEAP having half smaller 
	 *** 		elements : example{1,2,3,4,5}, and other as MIN HEAP that other half greater elements.{6,7,8,9,10})
	 ***
	 *** Heaps are data structures that enable us to represent Binary trees without any pointers. 
	 *** Thus no extra memory is required to store pointers in heaps, as we do it in a normal Binary tree.
	 *** 
	 *** A heap is a complete binary tree, which leads to the idea of storing it using an array. 
	 *** By utilizing array-based representation, we can reduce memory costs while tree navigation remains quite simple.
	 *** 
	 *** Though it saves memory but is less flexible than using pointers. 
	 *** We cannot move the nodes around by just changing pointers. So it does not provide us the benefits 
	 *** of Binary Search tree, but works out well for heaps. Thus it is not good for searching, since we 
	 *** don't have pointers - we cannot do log n search, but we can anyways do a liner search when required.
	 *** 
	 ***/
	
	
	/***
	 *****************************************
	                      1 {0}

		             /             \
		
		         3 {1}                6 {2}
		
		        /      \              /   
		
		     5 {3}      9 {4}      8 {5}      

	{i} -- this is the index in array
	
	array representation : {1,3,6,5,9,8}
	index :                [0,1,2,3,4,5]
	
	Index of LEFT Child of a element at index i :: Left(i) = (2 *** i) + 1;
	Left child of array[1] is array[3];
	
	Index of RIGHT Child of a element at index i :: Right(i) = (2 *** i) + 2;
	Right child of array[1] is array[4];
	
	Index of PARENT of a element at index i :: Parent(i) = (int) (i-1)/2;
	Parent of array[4] is array[1];
	Parent of array[5] is array[2];
	
	
	*****************************************
	***/
	
	
	int[] data;
	int heapSize;
	
	public BinaryMinHeap_PriorityQueue(int size) {
		data = new int[size];
		heapSize = 0;
	}
		
	
	public int getLeftChildindex(int nodeIndex) { 
		return (2 *** nodeIndex) + 1;
	}
	
	public int getRightChildindex(int nodeIndex) { 
		return (2 *** nodeIndex) + 2;
	}
	
	public int getParentindex(int nodeIndex) { 
		return (int) (nodeIndex - 1)/2;
	}
	
	
	/***********************INSERTION*********************/
	
	/*** INSERTION ALGO:
	 *** 
	 *** 1) Insert the new element to the end of array
	 *** 2) Keep shifting it UP - till the heap property is not achieved. 
	 *** Shifting up means - compare the node with its parent, if they are not as per heap property - swap them.
	 *** 
	 *** 
	 *** Insert -2 into the above heap --
	 *** 
	                      1 {0}

		             /             \
		
		         3 {1}                6 {2}
		
		        /      \              /   \
		
		     5 {3}      9 {4}      8 {5}   -2 {6}
		    
	array representation : {1,3,6,5,9,8,-2}
		     
	Heap property is broken, so keep shifting new element UP
	
	
		                  1 {0}

		             /             \
		
		         3 {1}                -2 {2}
		
		        /      \              /   \
		
		     5 {3}      9 {4}      8 {5}   6 {6}
		     
	array representation : {1,3,-2,5,9,8,6}	     
		     
	Heap property is still broken, so keep shifting new element UP
		     		       
		     		       
		     		       -2 {0}

		             /             \
		
		         3 {1}                1 {2}
		
		        /      \              /   \
		
		     5 {3}      9 {4}      8 {5}   6 {6}
		     
	array representation : {-2,3,1,5,9,8,6}
	Now the heap property is achieved. Items in Order. No more shifting required.  
	
	
	COMPLEXITY
	Complexity of the insertion operation is O(h), where h is heap's height 
	AND h = log n, where n is number of elements in a heap.
	Thus, complexity O(h) = O(log n)
	 
	 ***
	 ***/
	
	
	public void insert (int value) throws HeapException {
		if(heapSize == data.length)
			throw new HeapException("Heap Overflow");
		heapSize++;
		int currentIndex = heapSize - 1;
		data[currentIndex] = value;
		bubbleUP(currentIndex);
	}
	
	public void bubbleUP(int nodeIndex) {
		if(nodeIndex == 0)
			return;
		int indexOfParent = getParentindex(nodeIndex);
		if((data[indexOfParent] > data[nodeIndex]) && indexOfParent >= 0) {
			int tmp = data[indexOfParent];
			data[indexOfParent] = data[nodeIndex];
			data[nodeIndex] = tmp;
			nodeIndex = indexOfParent;
			bubbleUP(nodeIndex);
		} else
			return;
	}
	
	
	public void insertWithoutRecursion(int value) {
		
		heapSize++;
		
		int currentIndex = heapSize - 1;
		data[currentIndex] = value;
		
		int tmp;
		int indexOfParent = getParentindex(currentIndex);
		
		while ((data[indexOfParent] > data[currentIndex]) && indexOfParent >= 0) {
			tmp = data[indexOfParent];
			data[indexOfParent] = data[currentIndex];
			data[currentIndex] = tmp;
			currentIndex = indexOfParent;
			indexOfParent = getParentindex(currentIndex);
		}

	}
	
	
	/***********************REMOVE MINIMUM*********************/
	
	/***REMOVE MINIMUM ALGO:
	 *** 
	 *** Min in a MIN-HEAP is always the root element
	 *** 
	 *** 1) copy the last element in the array to the root
	 *** 2) decrease the heapsize by 1
	 *** 3) bubbleDOWN till the heap property is achieved
	 *** 		- If there are no children, sifting down is over.
	 *** 		- If there is one child, check the heap property with it and shift down if required.
	 *** 		- If there are 2 children, check the heap property and if not met, swap with smaller of the children.
	 *** 
	 *** 
	 *** *** Remove minimum from this heap --
	 *** 
	                      1 {0}

		             /             \
		
		         3 {1}                6 {2}
		
		        /      \              /   
		
		     5 {3}      9 {4}      8 {5}   
		    
	array representation : {1,3,6,5,9,8}
		     
	Copy the last value array[5] = 8 to the root and decrease heapSize by 1.
	
	
	                      8 {0}

		             /             \
		
		         3 {1}                6 {2}
		
		        /      \                
		
		     5 {3}      9 {4}       
		     
	array representation : {8,3,6,5,9}	     
		     
	Heap property is still broken, so keep shifting down element 8
		     		       
		     		       
	                      3 {0}

		             /             \
		
		         8 {1}                6 {2}
		
		        /      \                
		
		     5 {3}      9 {4}       
		     
	Heap property is still broken, so keep shifting down element 8
			     
		     	          3 {0}

		             /             \
		
		         5 {1}                6 {2}
		
		        /      \                
		
		     8 {3}      9 {4}       
		     
	array representation : {3,5,6,8,9}
	Now the heap property is achieved. Items in Order. No more shifting required.  
	
	
	COMPLEXITY
	Complexity of the removal operation is O(h), where h is heap's height 
	AND h = log n, where n is number of elements in a heap.
	Thus, complexity O(h) = O(log n)
	 
	 *** 
	 *** ***/
	
	public int extractMin() {
		int min = data[0];
		removeMin();
		return min;
	}
	
	public void removeMin() {
		if(heapSize == 0)
			return;
		data[0] = data[heapSize -1];
		heapSize--;
		if(heapSize > 0)
			bubbleDOWN(0);
	}

	public void bubbleDOWN(int nodeIndex) {
		int leftChildIndex = getLeftChildindex(nodeIndex);
		int rightChildIndex = getRightChildindex(nodeIndex);
		int smallerValueIndex = -1;
		if (leftChildIndex < heapSize && rightChildIndex < heapSize) {
			smallerValueIndex = (data[leftChildIndex] - data[rightChildIndex]) < 0 ? leftChildIndex : rightChildIndex;
		} else if (leftChildIndex < heapSize) {
			smallerValueIndex = leftChildIndex;
		} else if (rightChildIndex < heapSize) {
			smallerValueIndex = rightChildIndex;
		} else {
			return;
		}
		if (smallerValueIndex >= 0 && data[smallerValueIndex] < data[nodeIndex]) {
			int tmp = data[nodeIndex];
			data[nodeIndex] = data[smallerValueIndex];
			data[smallerValueIndex] = tmp;
			nodeIndex = smallerValueIndex;
			bubbleDOWN(nodeIndex);
		}
	}
	

	/***********************CREATE HEAP *********************/

	public void makeHeap(int[] array) throws HeapException {
		BinaryMinHeap_PriorityQueue heap = new BinaryMinHeap_PriorityQueue(array.length);
		for (int i = 0; i < array.length; i++) {
			heap.insert(array[i]);
		}
	}
	
}




class HeapException extends Exception {
	public HeapException(String message) {
		super(message);
	}
}

ASCII Conversions in Java




package main.java.bits;

public class BitsASCIIConverter {

	public static int binaryToDecimal(String binary) {
		int i = Integer.parseInt(binary, 2);
		return i;
	}

	public static String decimalToBinary(int i) {
		String b = Integer.toBinaryString(i);
		return b;
	}

	public static int hexToDecimal(String binary) {
		int i = Integer.parseInt(binary, 16);
		return i;
	}

	public static String decimalToHex(int i) {
		String b = Integer.toHexString(i);
		// another way
		// Integer.toHexString(0x10000 | i).substring(1).toUpperCase();
		return b;
	}

	//  convert the number B3AD to decimal 
	// one can split the conversion into D (1310), A (1010), 3 (310) and B (1110) then get the final result by 
	// multiplying each decimal representation by 16p, where 'p' is the corresponding position from right to left, 
	// beginning with 0. In this case we have 13*(160) + 10*(161) + 3*(162) + 11*(163), which is equal 45997 in the decimal system.
	
	public static int hexTodecimal(String s) {
		String digits = "0123456789ABCDEF";
		s = s.toUpperCase();
		int val = 0;
		for (int i = 0; i < s.length(); i++) {
			char c = s.charAt(i);
			int d = digits.indexOf(c);
			val = 16 * val + d;
		}
		return val;
	}

	// precondition: d is a nonnegative integer
	public static String decimal2hex(int d) {
		String digits = "0123456789ABCDEF";
		if (d == 0)
			return "0";
		String hex = "";
		while (d > 0) {
			int digit = d % 16; // rightmost digit
			hex = digits.charAt(digit) + hex; // string concatenation
			d = d / 16;
		}
		return hex;
	}

	public static void main(String[] args) {
		System.out.println(hexToDecimal("AA"));
	}

}

Java Serialization and DeSerialization

Object serialization is the process of saving an object's state to a sequence of bytes
as well as the process of rebuilding those bytes into a live object at some future time.

Common use - to send objects over network

Core components -
FileOutputStream
ObjectOutputStream
FileInputStream
ObjectInputStream

The object to be persisted must implement the Serializable interface OR inherit that implementation from its object hierarchy.
For complete Object to be seralized, all members of the object should be serializable.
The object to be persisted must mark all nonserializable fields transient

serialVersionUID
If after serializing an object, we make a small change (add one field) to class and try to desirialize, it gives InvalidClassException
all Serializable classes automatically get a unique identifier serialVersionUID which become different after we change the class.
and while deserialization, if serialVersionUID's are different, it throws exception

To avoid this - if we know that the changes we make to class, will be compatible after deserialization - then we can specify serialVersionUID to class
If we hardcode serialVersionUID, and the change we make is compatible, such that and (Employee) ois.readObject() casting is fine, we won't get any exceptions
JDK provides "serialver MyClass" utility, which gives the serialVersionUID from command line.


package main.java.concepts;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;

// http://java.sun.com/developer/technicalArticles/Programming/serialization/


public class Serialization {

	public static final String dir = "/Users/ashishsharma/development/code/workspace/Samples/files/";

	public static void main(String[] args) throws IOException,
			ClassNotFoundException {
		Serialization s = new Serialization();
		s.serialize();
		s.deserialize();
	}

	public void serialize() {
		// creating an object that is to be serialized
		Employee emp = new Employee();
		
		emp.name = "Ashish";
		emp.id = 123;
		emp.role = "Manager";
		Department d = new Department();
		d.deptName = "Engineering";
		emp.dept = d;

		// file path to which the serialized object should be written
		String file = dir + "emp.ser";
		
		FileOutputStream fos = null;
		ObjectOutputStream oos = null;
		
		try {
		
			// these are the 2 core components that serialize
			// any operations with these, throw IOException
			fos = new FileOutputStream(file);
			oos = new ObjectOutputStream(fos);
			
			// this actually writes the object to bytes into the file
			oos.writeObject(emp);
		
		} catch (IOException e1) {
			e1.printStackTrace();
		} finally {
			try {
				// FileOutputStream and ObjectOutputStream MUST be closed
				if (oos != null)
					oos.close();
				if (fos != null)
					fos.close();
			} catch (IOException e2) {
				e2.printStackTrace();
			}
		}
	}

	public void deserialize() throws IOException, ClassNotFoundException {
		
		String file = dir + "emp.ser";
		FileInputStream fis = null;
		ObjectInputStream ois = null;
		
		try {
			// these are the 2 core components that read the serialized object
			// any operations with these, throw IOException
			fis = new FileInputStream(file);
			ois = new ObjectInputStream(fis);
			
			// ObjectInputStream.readObject() - returns Object(). It must be type-casted
			// It throws ClassNotFoundException
			// For a JVM to be able to deserialize an object, it must be able to find the bytecode for the class, it is deserializing and typecasting to.
			Employee e = (Employee) ois.readObject();
			
			System.out.println(e.name);
			System.out.println(e.id);
			
			
			System.out.println("Role: " + e.role);
			
			// Department class was not serializable
			// so we made dept member of Employee as transient
			// thus dept was never written to serialized object
			// when we read back the serialzed object, e.dept will not be there and will be null
			if(e.dept != null)
				System.out.println(e.dept.deptName);
			
			e.doJob();
			
		} catch (Exception e1) {
			e1.printStackTrace();
		} finally {
			try {
				// FileInputStream and ObjectInputStream MUST be closed
				if (ois != null)
					ois.close();
				if (fis != null)
					fis.close();
			} catch (IOException e2) {
				e2.printStackTrace();
			}
		}
	}
}

// The object to be persisted must implement the Serializable interface OR inherit that implementation from its object hierarchy.
// For complete Object to be seralized, all members of the object should be serializable. 
// The object to be persisted must mark all nonserializable fields transient

// serialVersionUID
// If after serializing an object, we make a small change (add one field) to class and try to desirialize, it gives InvalidClassException
// all Serializable classes automatically get a unique identifier serialVersionUID which become different after we change the class.
// and while deserialization, if serialVersionUID's are different, it throws exception

// To avoid this - if we know that the changes we make to class, will be compatible after deserialization - then we can specify serialVersionUID to class
// If we hardcode serialVersionUID, and the change we make is compatible, such that and  (Employee) ois.readObject() casting is fine, we won't get any exceptions
// JDK provides "serialver MyClass" utility, which gives the serialVersionUID from command line.

class Employee implements Serializable {

	private static final long serialVersionUID = 1L;
	
	public String name = null;
	public int id = 0;
	
	public static String role = "HardCodedStaticRole";
	
	// all members of an Object should be serailizable.
	// here Department is not serializable, it will throw java.io.NotSerializableException at runtime.
	// so we can avoid that by telling JVM that this member should not serialized by using TRANSIENT modifier.
	// in that case, transient member if not written to object bytes, while rest of the object is serialized.
	public transient Department dept;

	public void doJob() {
		System.out.println("Name is: " + name + ". ID is: " + id);
	}
}

class Department {
	
	public String deptName = null;
}

// NOTES
// Serializable - is a MARKER INTERFACE
// An empty interface having no methods or fields/constants is called a marker interface or a tag interface.
// Why Marker Interface? - the class might only need to flag that it belongs to that particular type and everything else is handled/done by some other api.

// Parent class fields are only serialized if the base class itself is serializable.


Recursion - How to approach in algorithms

package main.java.concepts;

public class Recursion {

	// Every recursion should have the following characteristics.
	//
	// 1) A simple base case which we know a solution for and a return value. This is to break the recursion loop. This is generally a hardcoded known value for fxn(0) or fxn(1) or fxn(null)
	// 2) A way of getting our problem closer to the base case. I.e. a way to chop out part of the problem to get a somewhat simpler problem.
	// 3) A recursive call which passes the simpler problem back into the method.
	
	// approach
	// solve for f(0) or f(1) or f(null)
	// solve for f(2)
	// solve for f(3) using f(2) - this will give a logic for sub-problem
	// generalize for f(n)
	// test
	
	// recusrion vs iteration
	
	// recursive - calls itself (possibly more than once), with different parameters, and defines an exit clause that is guaranteed to be reached
	// iterative - includes a loop, which iterates a pre-determined number of times, or checks for an exit clause every time through. 
	
	// 1) recursion uses more memory = Each recursive call adds a new layer to the stack, which means that if your algorithm has O(n) recursive calls then it uses O(n) memory. 
	// 	- Recursion may be slower, and use greater resources, because of the extra function calls. 
	// 2. Recursion may lead to simpler, shorter, easier-to-understand functions. 
	
	// use iteration or recursion 
	// with which it is easy to implement, maintain the code in a few months or years (and should not blow up performance) 
	// If you run into performance issues, then profile your code, and then and only then look into optimizing by moving over to an iterative implementation. 
	// You may want to look into memoization and dynamic programming.
	
	// when easy to use recursion - 
	// “Design an algorithm to compute the nth	”; “Write code to list the first n	”; “Implement a method to compute all	”; 
	
	public static void main(String[] args) {
		Recursion r = new Recursion();
		r.myCounter(4);
	}

	// recursive method is comprised of an if-else statement where the base case
	// returns one value and the non-base case(s) recursively call(s) the same
	// method with a smaller parameter or set of data.

	public void myCounter(int c) {
		if (c == 0)
			return;
		System.out.println(c);
		myCounter(--c);
		System.out.println(c);

	}

	// Note: Forgetting the base case leads to infinite recursion. It won't run
	// an infinite loop, instead, it will eventually run out of stack memory and
	// get a run-time exception called a stack overflow. The size of a
	// stack may be quite large, but limited. Therefore too deep recursion can
	// result in Stack Overflow. recursion has a serious disadvantage of using
	// large amount of memory

	public int myFactorial(int n) {
		if (n == 1)
			return 1;
		else {
			return (n * (myFactorial(n - 1)));
		}
	}
}

Java Memory Leak Causes



package main.java.concepts;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.ArrayList;

public class MemoryLeaks {
	
	// OutOfMemoryError
	// A memory leak is where an object that will never be used again still hangs around in memory and doesn't get garbage collection, because some reference to that object exists though no one is going to use it anytime.
	
	/*** ***************************************** ***/
	
	// 1. Long-living (static) objects.  Static field holding object reference - remains always in memory
	// very bad especially if "final" - once memory allocated cannot be even changed or nullified in the code or finalize() method.
	// Usage of static classes should be minimized as they stay in memory for the lifetime of the application.
	static final ArrayList<Object> list = new ArrayList<Object>(100);
	
	/*** ***************************************** ***/

	public static void main(String[] args) {

		/*** ***************************************** ***/
		
		// 2. (Unclosed) open streams ( file , network etc... )
		try {
		    BufferedReader br = new BufferedReader(new FileReader(new File("xyz")));
		} catch (Exception e) {
		    e.printStackTrace();
		}
		
		/*** ***************************************** ***/

		// 3. Unclosed connections
		try {
		    //Connection conn = MyConnectionFactory.getConnection();
		} catch (Exception e) {
		    e.printStackTrace();
		}
		// 4. Also, while using single JDBC connection,calling close() on the connection object, the db connection closes along with associated Statement and ResultSet objects and all are garbage collected. 
		// If a connection pool is used, a request to the database is made using one of the existing connections in the pool. In this case, on calling close() on the connection object, the database connection just returns back to the pool, it is not actually closed. So merely closing the connection does not automatically close the ResultSet and Statement objects. As a result, ResultSet and Statement will not become eligible for garbage collection.
		// Whatever be the connection usage - JDBC Statement and ResultSet objects must be explicitly closed in a finally block

		/*** ***************************************** ***/

		// 4. web applications objects stored in application scope till application is restarted
		// getServletContext().setAttribute("SOME_MAP", map);

		/*** ***************************************** ***/

		// 5. HTTP is a request-response-based stateless protocol. If a client wants to send information to the server, it can be stored in an HttpRequest object. 		
		// The data stored in HttpSession stays in memory as long as the user is logged in. Putting too much data (e.g. shopping cart) into HttpSession leads to the Overstuffed Session
		// Use of HttpSessions should be minimized and used only for state that cannot realistically be kept on the request object
		// Remove objects from HttpSession if they are no longer used
		// Long-living data should be migrated to the business tier
		
		/*** ***************************************** ***/

		// Memory Overheads
		Boolean b = new Boolean(true);
		String s = new String("why new?");

		/*** ***************************************** ***/

		// Detection of memory leaks
		// 1. jConsole, jProbe Memory Analyzer, IBM's MAT eclipse plugin for memory analyze 
		// 2. Runtime.getRuntime().freeMemory
		
	}

}

Simple JDBC

package main.java.concepts;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class JDBCDemo {

	//	Just an extra TIP - Signature of main
	
	//	public- main(..) is the first method called by java environment when a program is executed so it has to accessible from java environment. Hence the access specifier has to be public.
	//	static: Java environment should be able to call this method without creating an instance of the class , so this method must be declared as static.
	//	void: main does not return anything so the return type must be void
	//	The argument String indicates the argument type which is given at the command line and arg is an array for string given during command line.

	public static void main(String[] args) throws ClassNotFoundException, SQLException {
		
		Connection con = null;
		Statement st = null;
		ResultSet rs = null;

		try {

			Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
			con = DriverManager.getConnection("localhost:3403", "ashish", "ashish");
			st = con.createStatement();
			rs = st.executeQuery("select * from emp");

			while (rs.next()) {
				System.out.println(rs.getString(1));
				System.out.println(rs.getInt(2));
				System.out.println(rs.getString(3));
			}
		} catch (SQLException se) {
			se.printStackTrace();
		} finally {
			if (con != null)
				con.close();
			if (rs != null)
				rs.close();
			if (st != null)
				st.close();
		}
	}

}

Tuesday, September 13, 2011

Java Generics Tutorial



package main.java.concepts;

import java.util.ArrayList;
import java.util.List;

// Generics in Java allow "a type or method to operate on objects of various types while providing compile-time type validity

public class Generics {
	
	public static void main(String[] args) throws Exception {
		Generics g = new Generics();
		
		System.out.println("Test without Generic");
		g.withOutGeneric();
		
		System.out.println("Generic Type Erasure");
		g.genericTypeErasure();
		
		System.out.println("Generic demo");
		g.doit();
	}
	
	//****************************************************

	
	// code compiles without error, it throws a runtime exception (java.lang.ClassCastException) when executing the third line of code
	public void withOutGeneric() {
		try {
			List v = new ArrayList();
			v.add("test");
			Integer i = (Integer) v.get(0);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
	

	//****************************************************

	
	public void doit() throws Exception {
		
		// MapKeyVal - used for Integer, String
		MapKeyVal<Integer, String> intStrOne = new MapKeyVal<Integer, String>(1, "one");
		MapKeyVal<Integer, String> intStrTwo = new MapKeyVal<Integer, String>(2, "two");
		MapKeyVal<Integer, String> intStrThree = new MapKeyVal<Integer, String>(3, "three");

		// MyList - used for MapKeyVal with Integer, String
		MyList<MapKeyVal<Integer, String>> listIntStr = new MyList<MapKeyVal<Integer,String>>(5);
		listIntStr.addElem(intStrOne);
		listIntStr.addElem(intStrTwo);
		listIntStr.addElem(intStrThree);
		listIntStr.print();
		
		// MapKeyVal - used for String, String
		MapKeyVal<String, String> strStrOne = new MapKeyVal<String, String>("ek", "one");
		MapKeyVal<String, String> strStrTwo = new MapKeyVal<String, String>("do", "one");
		MapKeyVal<String, String> strStrThree = new MapKeyVal<String, String>("teen", "one");

		// MyList - used for MapKeyVal with String, String
		MyList<MapKeyVal<String, String>> listStrStr = new MyList<MapKeyVal<String,String>>(5);
		listStrStr.addElem(strStrOne);
		listStrStr.addElem(strStrTwo);
		listStrStr.addElem(strStrThree);
		listStrStr.print();
		
		// with generic, return type matches
		MapKeyVal<String, String> s = listStrStr.getElem(0);
		String sVal = s.getValue();
		System.out.println(s.getClass().getName());
		System.out.println(sVal.getClass().getName());
		
		// testing generics with method and constructor
		TestChild child = new TestChild("ashish");
		Util u = new Util(child);
		
		
		// 
		ArrayList<TestChild> childList = new ArrayList<TestChild>();
		childList.add(new TestChild("c-aaa"));
		childList.add(new TestChild("c-bbb"));


		ArrayList<TestParent> parentList = new ArrayList<TestParent>();
		parentList.add(new TestParent("pp-aaa"));
		parentList.add(new TestParent("pp-bbb"));
		
		System.out.println("? extends : "+u.inspectExtendBounds(childList).getClass().getName());
		
		System.out.println("? super : "+u.inspectSuperBounds(parentList).getClass().getName());
		
	}
	
	//****************************************************

	// Type Erasure

	// Generics are checked at compile-time for type-correctness. 
	// At run time, generic type information is then removed and they will be just like of type Object
	// ArrayList<Integer> and ArrayList<Float> will be converted to the non-generic type ArrayList, which can contain arbitrary objects. 
	// The compile-time check guarantees that the resulting code is type-correct.
	// One version of the class or function is com- piled, works for all type parameters
	public void genericTypeErasure() {
		ArrayList<Integer> li = new ArrayList<Integer>();
		ArrayList<Float> lf = new ArrayList<Float>();
		if (li.getClass() == lf.getClass()) // evaluates to true
			System.out.println("Equal");
	}

}

//****************************************************


class Util {
	
	// constructors can declare Type variables as classes do.
	// <Item> here is the type declaration with the constructor
	// But anything declared with the constructor is has a local scope within that method
	// Item will not be available outside this method.
	public <Item> Util(Item I) {
		System.out.println("In Generic Constructor");
		inspect(I);
	}
	
	// 
	
	// methods can declare Type variables as classes do.
	// <U> is the type declaration with the method
	// But anything declared with the method is has a local scope within that method
	// U will not be available outside this method.
	public <U> void inspect(U u) {
		System.out.println(u.getClass().getName());
	}
	
	// wildcard "? extends T"- to specify anything, but with this bound condition - all which extend T
	
	public <T> T inspectExtendBounds(ArrayList<? extends T> l) {
		return l.get(0);
	}

	// wildcard "? super T"- to specify anything, but with this bound condition - all which are super classes of T

	public <T> T inspectSuperBounds(ArrayList<? super T> l) {
		return (T) l.get(0); // downcasting super to child
	}
	
}

//****************************************************


// A class is generic if it declares one or more type variables.
// These type variables are known as the type parameters of the class.
// Here T is type variable which is accessible in whole class
class MyList<T> {

	// in java, array of generic like T[] is not allowed. we cannot have T[] t = new T[];
	// because we cannot instantiate T... as T t = new T()
	// so we use an Object to hold all the data and while returning we typecast it to T
	Object[] list;
	int index = 0;

	public MyList(int size) {
		list = new Object[size];
	}

	public void addElem(T elem) {
		list[index] = elem;
		index++;
	}

	public T getElem(int i) throws Exception {
		if (i > index)
			throw new Exception("IndexOutOfBound");
		return (T) list[i];
	}

	public void print() {
		for (Object o : list) {
			if(o != null)
				System.out.println(o.toString());
		}
	}
}

//****************************************************

// This class declares more than one Type variables
class MapKeyVal<K, V> {

	// type variables
	// type variables are used just like any other variables in a class
	K key;
	V val;

	// passing variables of Type declared by class
	public MapKeyVal(K k, V v) {
		key = k;
		val = v;
	}

	// returning variables of Type declared by class
	public K getKey() {
		return key;
	}

	public V getValue() {
		return val;
	}

	// using variables of Type in a logic
	public String toString() {
		return "(" + key + "," + val + ")";
	}
}

//****************************************************

class TestParent {
	String name = "";

	public TestParent(String n) {
		name = n;
	}
	
	public String toString() {
		return name;
	}
}

//****************************************************

class TestChild extends TestParent {

	public TestChild(String n) {
		super(n);
	}

	public void print() {
		System.out.println(name);
	}
}

// *************************************
// NOTES
// *************************************

// static with Generics

// Because there is only one copy per generic class at runtime, static variables are shared among all the instances of the class, 
// regardless of their type parameter. As a result, the type parameter cannot be used in the declaration of static variables 
// or in static methods. Static variables and static methods are "outside" of the scope of the class's parameterized types.
// Type parameter of templated class cannot be used for static methods and variables.
// OTHER Static variables are shared between instances of a classes of different type parameters

// wrong - > public <T> static void myMthod() {...}
// wrong - > static T t;

// *************************************

// Instantiation of a variable of generic Type.. new T()

// instantiating a Java class of a parameterized type is impossible because instantiation requires a call to a constructor,
// which is unavailable if the type is unknown.
//	 
// following code will not compile:
//
//	 T instantiateElementType(List<T> arg)
//	 {
//	   return new T(); //causes a compile error
//	 }

// *************************************

/*
 * More WildCard Examples - 
 * ***********************************

 *  Illegal call.
 * 
  	public static <T> T writeAll(Collection<T> coll, Sink<T> snk) {
	    T last;
	    return last;
	}
	...
	Sink<Object> s;
	Collection<String> cs;
	String str = writeAll(cs, s); // Illegal call. cs and s should be of same tyoe

*************************************
*
* Call is OK, but wrong return type.

  	public static <T> T writeAll(Collection<? extends T>, Sink<T>) {...}
	...
	String str = writeAll(cs, s);  
	
	// Here t is passed as Object and str is accepting String
	
*************************************
*
* Works.

	public static <T> T writeAll(Collection<T> coll, Sink<? super T> snk) {
    ...
	}
	String str = writeAll(cs, s); // Yes! 
	
	// T passed as String, returned as String

*************************************
*
* Unknown types can be passed as <?>

*/

Java - Split Files : By Lines & Size



package main.java.concepts;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

public class FileSplit {


public static void main(String[] args) throws IOException {
FileSplit f = new FileSplit();

String path = "/Users/ashishsharma/Desktop/splitdir/TestSplit";

f.split(path, 300);
}

// split [OPTION] [INPUT [PREFIX]]

// it creates file-parts suffixed by aa ab ac, etc

// split -l 300 /Desktop/mydir/TestSplit.txt TestSplitPart_
// split -b 50M filename part

// to join the files back
// cat xaa xab xac > filename



public void split(String filePath, int splitByLines) throws IOException {

File f = new File(filePath);
FileReader fr = new FileReader(f);
BufferedReader br = new BufferedReader(fr);

String NEWLINE = System.getProperty("line.separator");

String line = null;
StringBuilder sb = new StringBuilder();
int lineCount = 0;
int fileCount = 0;
while((line = br.readLine()) != null) {
sb.append(line).append(NEWLINE);
lineCount++;
if(lineCount >= splitByLines) {
write(sb.toString(), filePath + "_" + ++fileCount );

lineCount = 0;
sb = null;
sb = new StringBuilder();
br = null;
br = new BufferedReader(fr);
}
}
}

public void write(String text, String filePath) throws IOException {
File f = new File(filePath);
FileWriter writer = new FileWriter(f);
writer.write(text);
writer.close();
}

}


// with bytes
class FileSplitter {

public static void main(String args[]) throws Exception {
FileInputStream fis = new FileInputStream(args[0]);
int size = 1024;
byte buffer[] = new byte[size];

int count = 0;
while (true) {
int i = fis.read(buffer, 0, size);
if (i == -1)
break;

String filename = args[1] + count;
FileOutputStream fos = new FileOutputStream(filename);
fos.write(buffer, 0, i);
fos.flush();
fos.close();

++count;
}
}
}

Java - Externalizable Interface



package main.java.concepts;

import java.io.Externalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;

// Externalizable - interface is used to achive the serialization, but with our custom writing and reading logic.

// Example - read and write PDF files with a Java application. 
// If you know how to write and read PDF (the sequence of bytes required), 
// you could provide the PDF-specific protocol in the writeExternal and readExternal methods.

public class ExternalizableSample implements Externalizable{

	@Override
	public void writeExternal(ObjectOutput paramObjectOutput) throws IOException {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void readExternal(ObjectInput paramObjectInput) throws IOException, ClassNotFoundException {
		// TODO Auto-generated method stub
		
	}

	
}

Java - Overriding Equals & Hashcode



package main.java.concepts;

Contract between equals and hashcode:

// 1) If two objects are equal according to the equals(object) , then hashCode() method on each of the two objects must produce the same integer result.
// -- Safest is to use the same fields for equals and hashcode. It makes sure above property is valid.
// 2) If two objects are UNEQUAL according to the equals() method, then hashCode() method on each of the two objects may OR may not give same integer results

// Whenever a.equals(b), then a.hashCode() must be same as b.hashCode().

// If you override equals, then you MUST override the hashcode.

// Use the same set of fields that you use to compute equals() to compute hashCode().


public class Equals_Hashcode {
 
 private String employeeId = null;
 private String name = null;
 private int age = 0;
 
 @Override
 public boolean equals(Object obj) {
  
                // Rule 1: Reflexive
  if(this == obj)
   return true;

                // Class check and Rule 4: Reflexive
  if(obj == null || this.getClass() != obj.getClass())
   return false;
  
  Equals_Hashcode otherObj = (Equals_Hashcode) obj;
  
  if(this.employeeId.equals(otherObj.employeeId) && this.name.equals(otherObj.name) && this.age == otherObj.age)
   return true;
  
  return false;
 }

 //  FYI - hashCode value of a Java String is computed as (String.hashCode()): s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
 @Override
 public int hashCode() {
  int hashCode = 7;
  hashCode = (31 * hashCode) + (this.employeeId == null ? 0 : this.employeeId.hashCode());
  hashCode = (31 * hashCode) + (this.name == null ? 0 : this.name.hashCode());
  hashCode = (31 * hashCode) + (this.age == 0 ? 0 : new Integer(this.age).hashCode());
  return hashCode;
 }

}

As per the equals method in Java, it should be:

1) Reflexive : Object must be equal to itself.
2) Symmetric : if a.equals(b) is true then b.equals(a) must be true.
3) Transitive : if a.equals(b) is true and b.equals(c) is true then c.equals(a) must be true.
4) Consistent : multiple invocation of equals() method must result same value until any of properties are modified. So if two objects are equals in Java they will remain equals until any of there property is modified.
5) Null comparison : comparing any object to null must be false and should not result in NullPointerException. For example a.equals(null) must be false, passing unknown object, which could be null,  to equals in Java is is actually a Java coding best practice to avoid NullPointerException in Java.



Note: You should override equals and not overload it.

Java Enums - Basics to Advanced



package main.java.concepts;


// http://blog.sanaulla.info/2009/12/01/working-with-java-enumerated-types-enums/

public class Enums {
	
	//to access enums in a class
	public Day myDayEnumObj;
	public Enums(Day d) {
		myDayEnumObj = d;
	}

	
	// Note: All enums implicitly extend java.lang.Enum. 
	// Since Java does not support multiple inheritance, an enum cannot extend anything else.
	private enum Day {
		
	    //each constant implicity calls a constructor :

		SUNDAY(0), 
		MONDAY(1), 
		TUESDAY(2), 
		WEDNESDAY(3);

		int dayValue;
		
		// constructor
		// You cannot invoke an enum constructor yourself.
		// thus we can have any values like MONDAY(1,2,3) then create 3 types like dayValue and have a construtor for them
		// if it is only MONDAY without any value, default construtor is called.
		// enum constructor can never be public
		
		private Day(int v) {
			dayValue = v;
		}
	}
	
	
	public static void main(String[] args) {
		
		// Can never use "new" with any enum, even inside the enum class itself
		// here is how to access enum values

		System.out.println(Day.MONDAY);
		
		System.out.println(Day.MONDAY.dayValue);
		
		
		Day myDay = Day.TUESDAY;
		
		System.out.println(myDay);
		
		System.out.println(myDay.dayValue);
		
		// this is how to access enums created in a class
		Enums myEnumClass = new Enums(Day.SUNDAY);
		System.out.println(myEnumClass.myDayEnumObj);
	}
	
	
	
	
	// another type - enums with abstract method
	
	public enum Day2 {
		
	    //each constant implicity calls a constructor :

		SUNDAY(0){ 
		     String getInformation(){ //Method overriden by the Enum constants
		         return "day 12344";
		       }
		     },
		MONDAY(1){ 
		     String getInformation(){ //Method overriden by the Enum constants
		         return "day 2222";
		       }
		     };

		int dayValue;
		
		// constructor
		// You cannot invoke an enum constructor yourself.
		// thus we can have any values like MONDAY(1,2,3) then create 3 types like dayValue and have a construtor for them
		// if it is only MONDAY without any value, default construtor is called.
		// enum constructor can never be public
		
		private Day2(int v) {
			dayValue = v;
		}
		
		
		// if there is an abstract method within the enum class, then all the enums (MINDAY, TUESDAY) should implement that abstract method
		abstract String getInformation();
	}

}