Friday, 14 November 2008

Sorting Objects using Comparator

Sorting is common functionality required by many applications. Questions like: How do we sort in Java? or What should we use as sorting algorithm? need to answered before we can perform any kind of sorting. Definitely we're not the first ones who require this feature and others have already done it before for us. So we can simple use the Java API provided to perform sorting.

The following article explains the concept of sorting and how to use the Java API to solve this problem and closes with some programming tips.

Concept behind sorting

In order to be able to sort elements we need to be able to compare them. What does this mean? If you have two things and you need to pick one of them, which one would you pick? Let's make this more realistic. If you're not rich and you can choose between €1 and €100, which one would you choose? I don't know what would you do, but I would go for the hundred. Why? Because 100 is greater than 1! I was able to make this decision because I was able to compare the choice at hand. If instead of numbers I said choose between A and B, where A and B can be anything, would you be able to do an informed decision? No, it would be pure gamble - a matter of luck. There is no way to compare A and B without further knowledge.

In order to be able to sort two elements (or more), you need to be able to compare them. Similar to the problem discussed above, we need to be able to evaluate elements and then sort them based on this value. For example, if we need to sort the following list: {8, 5, 9}, we know that 5 is less than 8 so these two have to be swapped. Sorting numbers is very simple and straight forward. We cannot say the same for any other object (Java and real). For example, if we have a list of boxes, how would we sort them? If we're sorting by size, we first determine the size of each and then using the size value (a number) as sorting criteria. So in order to sort any kind of object, all we need to do is to convert the object into a number that we can compare with. So if we have the following list of objects: {a, b, c, d} (the letters in the previous list is used only to define the object name and has no effect on the sorting) with sorting values {4, 3, 7, 5} respectively, we know that object b should come first while object c should be placed last and so on.

How do we implement this in Java?

Java provides a set of classes and interfaces which we can use to sort lists and arrays. Most of the following examples will use lists but the same concept can be applied for arrays. A final example will show this.

Let's start by creating a list of Integers and sort these using the Collections.sort method. The Collections class (part of the Java Collection Framework) provides a list of static methods which we can use when working with collections such as list, set and the like. So in a nutshell, we can sort a list by simply calling: java.util.Collections.sort(the list) as shown in the following example:


import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Example1 {
  public static void main(String[] args) {
    List<Integer> ints = new ArrayList<Integer>();
    ints.add(4);
    ints.add(3);
    ints.add(7);
    ints.add(5);

    Collections.sort(ints);
    System.out.println(ints);
  }
}

The above class creates a list of four integers and, using the collection's sort method, sorts this list (in one line of code) without us having to worry about the sorting algorithm.

Java was able to sort this list because it knows how to compare integers as the Integer class implements the Comparable interface. Note that, as from Java 1.5, the primitive int value is autoboxed into an Integer (the object wrapper for int) before added to the list. Thus in order to be able to make use of the collection's sort method, all we need to do is implement the comparable interface by our objects.

Let's take this example one step further and create our own class. Let's create a list of students and sort them by their grade (which is a number). The student class will have two fields: name and grade. In order to be able to use the sort method from the collections class, as we did in the above example (Example1), we have to implement the comparable interface and its method, as illustrated below:


public class Student1 implements Comparable<Student1> {

  private int grade;
  private String name;

  public Student1(String name, int grade) {
    setName(name);
    setGrade(grade);
  }

  @Override
  public int compareTo(Student1 o) {
    return grade - o.grade;
  }

  @Override
  public String toString() {
    return name + " " + grade;
  }

  // Getters and setters are removed for brevity
} 

The compareTo method should return a negative integer, zero, or a positive integer if this student's grade is less than, equal to, or greater than the specified/given student's grade. The simplest way to do it is to subtract the grades of these students. Why and how would that help sorting? If this student's grade is larger than the grade of the given student, then a positive number is returned, while if the grades are equal, zero is returned. This follows the method's contract/specifications. Note that our job is to provide information to the sorting algorithm and not sorting the objects ourselves.

The toString method was overridden so that we can see readable results when we print the list. Otherwise the output will look something like: [Student1@19821f, Student1@addbf1, Student1@42e816, Student1@9304b1] (instead of: [Joe Vella 47, Paul Galea 52, Albert Attard 65, Mary Borg 93]) which is not readable.

Let's modify the previous class (Example 1) and sort a list of students.


import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Example2 {
  public static void main(String[] args) {
    List<Student1> students = new ArrayList<Student1>();
    students.add(new Student1("Albert Attard", 65));
    students.add(new Student1("Mary Borg", 93));
    students.add(new Student1("Joe Vella", 47));
    students.add(new Student1("Paul Galea", 52));

    Collections.sort(students);
    System.out.println(students);
  }
}

Making full use of the API

How can we sort the student by both name and grade? In order to achieve this using the comparable interface (above) we have to add more fields to the student class in order to be able to determine which field we're sorting on. The following approach is not recommended and included here only for demonstration and comparisons purposes. Changes are shown in bold.


public class Student2 implements Comparable<Student2> {

  private int grade;
  private String name;
  private int sortBy;

  public Student2(String name, int grade, int sortBy) {
    setName(name);
    setGrade(grade);
    setSortBy(sortBy);
  }

  @Override
  public int compareTo(Student2 o) {
    switch (sortBy) {
    case 1: // Sort by name
      return name.compareTo(o.name);
    default: // Sort by grade by default
      return grade - o.grade;
    }
  }

  // Getters, setters and toString method are removed for 
  // brevity
}

We had to add a new field in the student class, called sortBy, which we have to set to 1 in order to sort the students by their name. Any other value will sort the students by their grade as illustrated in the following example. Ideally we use enums instead of int as the data type of the sortBy field which will prevent illegal values. Changes are shown in bold.


import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Example3 {
  public static void main(String[] args) {
    List<Student2> students = new ArrayList<Student2>();
    students.add(new Student2("Albert Attard", 65, 1));
    students.add(new Student2("Mary Borg", 93, 1));
    students.add(new Student2("Joe Vella", 47, 1));
    students.add(new Student2("Paul Galea", 52, 1));
    
    // Sort by name
    Collections.sort(students);
    System.out.println(students);
    
    // Change these to sort them by grade
    for(Student2 student:students){
      student.setSortBy(0);
    }
    Collections.sort(students);
    System.out.println(students);
  }
}

This approach has two main pitfalls. First, the student class has to include fields and methods that are not related to the student object. Sorting properties are not student properties. The other issue is that for any new fields or sorting order we have to change the student class and emend the compareTo method accordingly, adding complexity to a method making it harder to maintain. Also, the sorting method is bound with the student class and cannot be used alone (apart from the object). The sorting state are saved with the object's state which is not what we want. For every student we have an instance of the sortby field, which we cannot make static. Why? If we have two lists, one to be sorted by name and the other by grade, then the static value will be shared by all instance of student and may effect the sorting outcome.

Java provides another way to compare objects. Instead of implementing the comparable interface (Student2), we can implement the Comparator interface. What's the difference? The main difference between these two interfaces is that the comparable interface defines one method compareTo, which takes one parameter. This parameter is compared with this object (the instance of student). In other words, the student object has a method which makes it able to compare to another student. On the other hand, the comparator interface defines one method (in reality two, but we're not interested from the second one) that takes two parameters (of the same type) and returns the comparison of these two objects (an int exactly the same as the method compareTo from the comparable interface). As such, the comparator allows us to remove the unnecessary fields and methods from the student class and move these elsewhere as illustrated below.

With reference to the Student1 class defined above.


import java.util.Comparator;

public class StudentGradeComparator1 implements Comparator<Student1> {

  @Override
  public int compare(Student1 a, Student1 b) {
    return a.getGrade() - b.getGrade();
  }
}

How can we use this? The collections class has an overloaded version of the sort method (http://java.sun.com/javase/6/docs/api/java/util/Collections.html#sort(java.util.List, java.util.Comparator)). The overloaded version accepts two parameters: the list to be sorted and an instance of comparator. The comparator outcome will be used by the sorting algorithm to determine the elements' magnitude and relation. Changes from the previous examples, are shown in bold.


import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Example4 {
  public static void main(String[] args) {
    List<Student1> students = new ArrayList<Student1>();
    students.add(new Student1("Albert Attard", 65));
    students.add(new Student1("Mary Borg", 93));
    students.add(new Student1("Joe Vella", 47));
    students.add(new Student1("Paul Galea", 52));

    Collections.sort(students, new StudentGradeComparator1());
    System.out.println(students);
  }
}

How can we sort by name? We can create another class that implements comparator and compare the student class as required. All we need to do is replace the compare method as shown below.


  @Override
  public int compare(Student1 a, Student1 b) {
    return a.getName().compareTo(b.getName());
  }

Making the difference

Let's analyse the StudentGradeComparator1 class created above. This class has no fields, thus we can say that this class is stateless. Do we need to create more than one instance of this class? No. We don't need to have more than one instance of this class as the fields of this class (that are none) never change. So we can have one instance a always use it. This is referred to the singleton pattern. In simple terms, this pattern prevents the user to create more than one instance of this object. For more information about this, please refer to: http://java.sun.com/developer/JDCTechTips/2006/tt0113.html article. Why should we have it? Every object created consumes memory. Since our class (StudentGradeComparator1) is stateless, there is no need to enable/allow the user to create hundreds or thousands instance of this class as these will be identical. Changes are shown in bold.


import java.util.Comparator;

public class StudentGradeComparator2 implements Comparator<Student1> {

  private static final StudentGradeComparator2 instance = 
      new StudentGradeComparator2();

  public static StudentGradeComparator2 getInstance() {
    return instance;
  }

  private StudentGradeComparator2() {
  }

  @Override
  public int compare(Student1 a, Student1 b) {
    return a.getGrade() - b.getGrade();
  }
}

This got more complex, but in reality it got simpler. The student class only contain code related to the student, while the sorting classes contain the code required by the comparator. This means that each class is doing only thing.

How can we use it? Instead of creating an instance of the comparator, we call the get instance method: Collections.sort(students, StudentGradeComparator2.getInstance()) as shown in bold below:


import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Example5 {
  public static void main(String[] args) {
    List students = new ArrayList();
    students.add(new Student1("Albert Attard", 65));
    students.add(new Student1("Mary Borg", 93));
    students.add(new Student1("Joe Vella", 47));
    students.add(new Student1("Paul Galea", 52));

    Collections.sort(students, 
        StudentGradeComparator2.getInstance());
    System.out.println(students);
  }
}

How can we sort the students by their grade but in descending order? Taking this another level, we can have two instances of the comparator class (not singleton anymore): one to sort ascending and the other descending. Here we're extending the singleton concept to prevent the user from freely creating instances. We can remove the get instance method and allow access to the static fields as shown below. Since, the class is radically changed, no changes are highlighted.


import java.util.Comparator;

public class StudentGradeComparator3 implements Comparator {

  public static final StudentGradeComparator3 ASC = 
      new StudentGradeComparator3(1);

  public static final StudentGradeComparator3 DESC = 
      new StudentGradeComparator3(-1);

  private final int order;

  private StudentGradeComparator3(int order) {
    this.order = order;
  }

  @Override
  public int compare(Student1 a, Student1 b) {
    return order * (a.getGrade() - b.getGrade());
  }
}

Here we removed the get instance method and included two instances ASC and DESC. We added an integer (named order) with values 1 or -1 which is used to switch the sorting polarity. Note that when a number is multiplied by a negative number, its sign (negative or positive) is changed. Changing the sign will change the sorting order. This field is constant (not static) and is set through the solo private constructor by the two static fields.


import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Example6 {
  public static void main(String[] args) {
    List<Student1> students = new ArrayList<Student1>();
    students.add(new Student1("Albert Attard", 65));
    students.add(new Student1("Mary Borg", 93));
    students.add(new Student1("Joe Vella", 47));
    students.add(new Student1("Paul Galea", 52));

    Collections.sort(students, StudentGradeComparator3.ASC);
    System.out.println(students);

    Collections.sort(students, StudentGradeComparator3.DESC);
    System.out.println(students);
  }
}

Arrays

We can easily adopt the above example for arrays. Instead of using collection's sort method we use the array's version (http://java.sun.com/javase/6/docs/api/java/util/Arrays.html#sort(T[], java.util.Comparator)).


import java.util.Arrays;

public class Example7 {
  public static void main(String[] args) {
    Student1[] students = { new Student1("Albert Attard", 65),
        new Student1("Mary Borg", 93), 
        new Student1("Joe Vella", 47),
        new Student1("Paul Galea", 52) };

    Arrays.sort(students, StudentGradeComparator3.ASC);
    System.out.println(students);
  }
}

The same comparator is used to sort the array of students.

Conclusion

This article discusses how to perform sorting without having to worry about the sorting algorithm. From the collections Java doc:

"...the sorting algorithm is a modified mergesort (in which the merge is omitted if the highest element in the low sublist is less than the lowest element in the high sublist). This algorithm offers guaranteed n log(n) performance. The specified list must be modifiable, but need not be resizable. This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array. This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place...

One thing to take away from this article is don't reinvent the wheel. Before implementing something, have a look around as most probably others did it before you

No comments:

Post a Comment