Sunday, July 11, 2010

Puzzle 61 - Set for Optimization?

I'm back with an optimization question. Lately I've been dealing with some huge volume of data and I've realized how performance can start to be a serious problem as the data increases. Here is a small illustration with relatively less data (50,000 values). Hopefully the question should be self explanatory from the code.

package com.test; import java.util.List; import java.util.ArrayList; class MyCollection{ public static void populateList(List<Long> l, int multiple){ for(int i=0; i<50000; i++){ Long value = Long.valueOf(i*multiple); l.add(value); } } public static void main(String[] args) { List<Long> listOf2 = new ArrayList<Long>(); populateList(listOf2, 2); List<Long> listOf3 = new ArrayList<Long>(); populateList(listOf3, 3); long startTimestamp, endTimestamp; List<Long> commonA = null, commonB = null, commonC = null; //First Attempt - Runs in 60 seconds startTimestamp = System.currentTimeMillis(); commonA = new ArrayList<Long>(listOf2); commonA.retainAll(listOf3); endTimestamp = System.currentTimeMillis(); System.out.println("Execution Time : " + (endTimestamp-startTimestamp)/1000); //Second Attempt - Runs in 73 seconds - //There are fewer elements in commonB shouldn't this run faster? startTimestamp = System.currentTimeMillis(); commonB = new ArrayList<Long>(listOf3); commonB.retainAll(listOf2); endTimestamp = System.currentTimeMillis(); System.out.println("Execution Time : " + (endTimestamp-startTimestamp)/1000); System.out.println("Are Equal : " + (commonA.equals(commonB))); //Third Attempt - Runs in 2 seconds startTimestamp = System.currentTimeMillis(); /* This part has been intentionally left blank. * That is because I need to have a question for the puzzle, * and I felt like leaving out the Third part would be the right * thing do to. * I've done most of the hard work so this should be easy to fill. * Yes this must run 10 times faster than the solution I have provided * and should be done just as many lines (3 to 5 lines of code should be fine!). * Well I never said anything about life being fair, did I? * */ endTimestamp = System.currentTimeMillis(); System.out.println("Execution Time : " + (endTimestamp-startTimestamp)/1000); System.out.println("Are Equal : " + (commonA.equals(commonC))); } }


Just keep in mind that both the "Are Equal" print statements print true and that you don't use the reference commonA or commonB when writing the code for the third part. You are free to make any other changes (in the place where the comments are there).

Update - The commonA, commonB were incorrectly pointing to the objects listOf2/listof3. That has been corrected to create new Objects -- it now reads commonA = new ArrayList(listOf2); instead of commonA = listOf2;.
Thanks to Colin Hebert for pointing it out.

17 comments:

  1. You forgot to reset your list between the first and the second attempt. That's why your test isn't correct.

    At line 28 and 35 you pass a reference so you modify the original List at each time.

    To make it work faster on the third attempt you just have to write :
    commonC = listOf3;

    ReplyDelete
  2. @Colin - Eeeks! Your right. Not my intention at all. I've corrected the puzzle and I'm hoping I've got it right this time.

    ReplyDelete
  3. For this particular case we need to find the numbers that divide by 2 and 3 smaller then 50000 * 2.

    commonC = new ArrayList();
    for ( int i = 0; i < ( 50000 * 2 ) / 3 ; i+=2 )
    {
    Long value = Long.valueOf(i*3);
    commonC.add( value );
    }

    ReplyDelete
  4. Convert lists to arrays:

    Long[] array2 = listOf2.toArray(new Long[listOf2.size()]);
    Long[] array3 = listOf3.toArray(new Long[listOf3.size()]);

    for (Long item : array2) { if (Arrays.binarySearch(array3, item) >= 0) commonC.add(item); }

    ReplyDelete
  5. The data set you are using is a Set and retainAll is a set operation. If you use a set it will run 2000x faster or more. By tuning the JVM you can get each operation down to 12 milli-seconds.

    ReplyDelete
  6. Long[] array = new Long[16667];
    for(int i=0; i<array.length; i++) {
    array[i] = Long.valueOf(i*6);
    }
    commonC = Arrays.asList(array);

    ReplyDelete
  7. final Set setC = new HashSet(listOf3);
    commonC = new LinkedList();
    for (final Long value : listOf2){
    if (setC.contains(value)){
    commonC.add(value);
    }
    }

    Execution Time : 4297
    Execution Time : 4437
    Are Equal : true
    Execution Time : 36
    Are Equal : true

    ReplyDelete
  8. Since both lists are sorted, following solution works:

    ...
    commonC = new ArrayList();

    int i = 0;

    for (Long elem : listOf2) {
    long res = elem.longValue() - listOf3.get(i).longValue();

    if (res >= 0) {

    if (res == 0) {
    commonC.add(elem);
    }

    i++;
    }
    }
    ...

    But in my environment is not in 10 times, but almost 1000 times faster:


    Execution Time : 4234
    Execution Time : 4149
    Are Equal : true #16667
    Execution Time : 11
    Are Equal : true #16667

    (I'm displaying milliseconds instead of seconds and after results of comparison total number of elements in collection is provided).
    With 100000 elements results even better:

    Execution Time : 16288
    Execution Time : 16608
    Are Equal : true #33334
    Execution Time : 18
    Are Equal : true #33334

    ReplyDelete
  9. First of all - great blog!

    I have a guess, but unfortunately based on a Set, not a List. To this particular case it doesn't matter, but in general case it makes my guess unacceptable.

    Here is the code:

    commonC = new LinkedList(listOf2);
    Set b = new HashSet(listOf3);
    commonC.retainAll(b);

    And the measurements:

    [A] Execution Time : 146
    [B] Execution Time : 153
    [B] Are Equal : true
    [C] Execution Time : 0 (505 milis)
    [C] Are Equal : true

    ReplyDelete
  10. //Well, I got it to 0 (about 10M nano). My //original "number 1" was about 16 seconds though.
    //I use a version of "merge-sort" tailored for this case
    //The general case would be a bit more complex - but the only real requirement is that the lists are sorted.


    commonC = new ArrayList(listOf2.size());
    int j=0;
    for(int i=0; i current3) {
    j++;
    current3 = listOf3.get(j);
    }
    if(current2.longValue() == current3.longValue()) {
    commonC.add(current2);
    }
    }

    ReplyDelete
  11. Oh, and I forgot. The factor is x2000, not x10.

    ReplyDelete
  12. Another solution which does not depend on order of list elements:


    Set commonE = new HashSet(listOf2);

    commonC = new ArrayList();

    for (Long value : listOf3) {
    if (commonE.contains(value)) {
    commonC.add(value);
    }
    }

    Results of execution on my PC:

    For 50000 elements:

    Execution Time : 4238
    Execution Time : 4158
    Are Equal : true #16667
    Execution Time : 27
    Are Equal : true #16667

    For 100000 elements:

    Execution Time : 16288
    Execution Time : 16609
    Are Equal : true #33334
    Execution Time : 39
    Are Equal : true #33334

    ReplyDelete
  13. Attempts 1 and 2 are the same.
    new ArrayList(listOf2).retainAll(listOf3) is the same as new ArrayList(listOf3).retainAll(listOf2).

    Btw both attempts run in slightly over 4 seconds on my simple laptop. Why does it take 60 sec when you run it?

    ReplyDelete
  14. @Wouter - Umm I got a 5 year old P4 with 1G ram and stuff runs kinda really slow on it. If someone find the code running fast enough on their machine, I suggest increasing the LOOP value from 50,000 to 150,000 or higher. The performance contrast should be obvious.

    @All - There are lot of other comments (with answers). I'll be approving everything this weekend. Comments are hidden to prevent 'spoilers' for anyone reading the puzzle!

    ReplyDelete
  15. Hi there! My try is:

    startTimestamp = System.currentTimeMillis();
    HashSet hashSet = new HashSet(listOf2);
    commonC = new ArrayList();
    for (Long val : listOf3) {
    if (!hashSet.add(val)) { commonC.add(val); }
    }

    ReplyDelete
  16. Collections.sort(listOf3);
    commonC = new ArrayList();
    int result = 0;
    for (Long long1 : listOf2) {
    result = Collections.binarySearch(listOf3, long1);
    if (result >= 0) {
    commonC.add(long1);
    }
    }

    ReplyDelete
  17. commonC = new ArrayList();
    for (Long _l : listOf2)
    {
    if ((_l % 3) == 0)
    commonC.add(_l);
    }

    ReplyDelete

Solution for this question?