05-09-2002 10:52 AM
I seem to have coded myself into a corner. I am working on a fairly complex set of financial reports written mainly in awk. The reports work but take a very long time to complete. I have found that my main problem is the sort. The sort example I use was taken from the O'reilly Book "Sed & Awk" on page 222. It is very compact and works well when the array is small but when processing about 5000 accounts can take several minutes to finish. My data is actually accumulated in a number of parallel arrays. I also have an array which looks like this:
The first field is the account number and the second field is the
index in the original arrays. I then sort this array and output the original parallel array values using the second field of this sorted array to access the data. The reports now print in account number order just as they should. My only problem is the speed of the sort. I am thinking about writing the data to a file and then calling the sort program but maybe someone has a better idea.
Thanks in advance, Neil
Solved! Go to Solution.
05-09-2002 11:09 AM
I had to pull out my copy of 'Sed & Awk' to see what you were talking about but as soon as I did I saw what your problem is. This is a classic 'bubble sort' and its performances degrades with the square of the number of elements.
Before I give you a good answer, I suppose I should point out that all that parallel array nonsense (excuse me - code) could have been avoided with the use of perl (specifically the Class:Struct module) which could have kept all your data related to one account in a nice box (a struct/record).
The good news is that in my awk code library, I have a sort function that will do the trick and should do in at most a few seconds what takes your bubblesort minutes. (It does assume that your arrays are numeric indexed rather than associative and the lowest index is 1).
use it like this:
Arg 1 = the array to sort
Arg 2 = 1 (because this is a recursive function it needs an array lower bounds; should always be 1)
Arg 3 = number of elements in the array
This should fix you, Clay
05-09-2002 11:53 AM
I've tried to learn Perl but I seem to get bogged down everytime I try. Clay, I put in your sort function and sorts that were taking 15 minutes are now done in 1 or 2 seconds!!! I can only say wow!!! I am truly amazed because your sort is only a few lines longer than mine. My users are going to be impressed with 'my' work.
Thanks again, Neil
05-09-2002 12:00 PM
I'm glad you liked the sort function. The fundamental problem with your sort is that the exchange interval (1) is so small that repeated passes are required to sort the data.
I actually have a better function that combines the best features of quicksort and bubblesort by switching to bubblesort when the array boundaries are small. Bubblesort excels in this environment. Since you are already in the 1-2 second range and seem happy, I don't think I will bother.
05-10-2002 07:29 AM
I just saw your last post. Would you mind posting your better sort function? I would really like to squeeze all the speed out of this program that I can.
05-10-2002 08:06 AM
The 4th argument is the 'bubblepoint' and whenever a quicksort partition size falls to this value, bubblesort is used instead.
The 'bubblepoint' should probably be kept in the range of 4 to 10 - 6 is an all-around good value for the vast majority of datasets.