Insert Sort, Merge Sort and Quick Sort

Introduction

Insert Sort Algorithm

Straight Insertion Sort:

Think of n elements to be sorted as an ordered table and an unordered table. At the beginning, the ordered table contains only 1 element, while the unordered table contains n-1 elements. In the sorting process, the first element is taken out of the unordered table every time and inserted into the appropriate position in the ordered table to make it a new ordered table. Repeat n-1 times to complete the sorting process.

What we should do:

  • Take the first number in the disordered region and find its corresponding position in the ordered region.
  • Insert the data in the unordered region into the ordered region; If necessary, the relevant data in the ordered region is shifted.

Merge Sort Algorithm

Operation

The ordered subsequences were merged to obtain the completely ordered sequence. That is, order each subsequence first, and then order the subsequence segments. If two ordered tables are merged into one ordered table, it is called two-way merge.

Application scenarios

The speed is second only to quicksort, which is a stable sorting algorithm. It is generally used for a sequence of numbers that are generally unordered, but each subitem is relatively ordered.

Quick Sort Algorithm

Operation

After a run of sorting, the data to be sorted is divided into two independent parts. All the data in one part are smaller than all the data in the other part. Then, the two parts of data are respectively fast-sorted in this way.

Application scenarios

Quicksort is unstable. It doesn’t require extra storage. Its application scenario is large-scale data sorting, and the actual performance is better than merge sort.

Code

Algorithm Code

Insert Sort Algorithm

/*Insertion Sorting*/

void InsertSortUp(int A[], int n)
{
for (int i = 1; i < n; i++) {
int swap = A[i];
int j;
for (j = i-1 ; A[j] > swap && j >= 0; j--)
A[j+1] = A[j];
A[j+1] = swap;
}
}

Merge Sort Algorithm

/*MergeSort*/

void Merge(int A[], int p, int q, int r)
{
int n1 = q - p + 1, n2 = r - q;
int L[n1], R[n2];
int i,j;
for (i = 0; i < n1; i++)
L[i] = A[p + i];
for (j = 0; j < n2; j++)
R[j] = A[q + 1 + j];

	i = 0; j = 0;
for (int k = p; k <= r; k++) {
if(i == n1){
while(j < n2) A[k++] = R[j++];
} else if ( j == n2){
while(i < n1) A[k++] = L[i++];
} else{
if (L[i] <= R[j]) {
A[k] = L[i];
i++;
}
else {
A[k] = R[j];
j++;
}
}

}
}
void MergeSortUp(int A[], int p,int r)
{
if (p < r) {
int q = (p + r) / 2;
MergeSortUp(A, p, q);
MergeSortUp(A, q + 1, r);
Merge(A, p, q, r);
}
}

Quick Sort Algorithm

/*QuickSort*/
void swap(int A[], int i, int j)
{
int x = A[i];
A[i] = A[j];
A[j] = x;
}

int partition(int A[], int p, int r)
{
int x = A[p];
int i = p;
for (int j = p + 1; j <= r; j++) {
if (A[j] < x) {
i = i + 1;
swap(A, i, j);
}
}
swap(A, p, i);
return i;
}
void QuickSortUp(int A[], int p, int r)
{
if (p < r) {
int q = partition(A, p, r);
QuickSortUp(A, p, q - 1);
QuickSortUp(A, q + 1, r);
}
}

Test Code

/*Test*/
#include<ctime>
#include<stdlib.h>
#include<stdio.h>
#define size 10000
#include"InsertSort.cpp"
#include"MergeSort.cpp"
#include"QuickSort.cpp"

int A[size];
clock_t start,finish;
double runtime_insert,runtime_merge,runtime_quick;
/*Generating test data set*/
void tstdata(int n)
{
FILE *fp;
if((fp = fopen("tstdata.txt","w+"))== NULL)
printf("cant open the file.\n");
else{
srand(time(NULL));
for (int i = 1; i <= n; i++) {
if(i!=n) fprintf(fp,"%d ",rand());
else fprintf(fp,"%d\n",rand());
}
fclose(fp);
}

}
/*output function*/
int output(char *filename,int A[])
{
FILE * fp;
if((fp = fopen(filename,"w+"))==NULL){
printf("cant open the file.\n");
}
else{
for(int i=0;i<size;i++){
if(i!=size-1) fprintf(fp,"%d ",A[i]);
else fprintf(fp,"%d\n",A[i]);
}
fclose(fp);
}

return 0;
}
main()
{
int i=0;

/*generating test data*/
/*tstdata(size);
printf("Data set has been created.\n");*/


/*get the test data*/
FILE *fp;
if((fp = fopen("tstdata.txt","r"))==NULL){
printf("cant open the file.\n");
}
while(fscanf(fp, "%d", &A[i]) != EOF)
i++;
fclose(fp);
for(i=0;i<size;i++){
printf("%d ",A[i]);
}
printf("\n");
printf("Array has been created.\n");


/*copy the test data set*/
int A1[size],A2[size],A3[size];
for(int i=0;i<size;i++){
A1[i]=A[i];
A2[i]=A[i];
A3[i]=A[i];
}

    /*Insert Sorting*/
printf("Insert Sorting...\n");

start = clock();
InsertSortUp(A1,size);
finish = clock();

output("InsertSortUp.txt",A1);
runtime_insert = (double)(finish - start)/CLOCKS_PER_SEC;
printf("Insert Sort has been finished.\nTime Cost:%lf\n",runtime_insert);


/*Merge Sorting*/
printf("Merge Sorting...\n");

start = clock();
MergeSortUp(A2,0,size-1);
finish = clock();

output("MergeSortUp.txt",A2);
runtime_merge = (double)(finish-start)/CLOCKS_PER_SEC;
printf("Merge Sort has been finished.\nTime Cost:%lf\n",runtime_merge);

/*Quick Sorting*/
printf("Quick Sorting...\n");

start = clock();
QuickSortUp(A3,0,size-1);
finish = clock();

output("QuickSortUp.txt",A3);
runtime_quick = (double)(finish-start)/CLOCKS_PER_SEC;
printf("Quick Sort has been finished.\nTime Cost:%lf\n",runtime_quick);


}

Test

I prepared 12 test datasets for the test of the algorithm, in which the data amount is 10,000\50,000\100,000

  • For each data size, there are randomly generated data sets (for testing average complexity) and inverted data sets (for testing worst case)

  • To minimize errors, two of each type of dataset are prepared, resulting in 3x2x2=12 datasets

  • The correctness test of the algorithm is manually verified when the amount of data is small, so here we only focus on the comparison of time complexity

Test Data

Generate test data by “randomly generate test set” part in the main function.

To test the time complexity of the sorting algorithm, the corresponding “reverse order data set” is generated by the quicksort algorithm.

Result

data set Insert Sort(s) Merge Sort(s) Quick Sort(s)
S1:10,000\rand array(average condition) 0.082 0.001 0.002
S2:10,000\rand array(average condition) 0.069 0.002 0.001
S3:10,000\reserve array(worst condition) 0.469 0.003 0.554
S4:10,000\reserve array(worst condition) 0.463 0.003 0.546
S5:50,000\rand array(average condition) 1.718 0.009 0.009
S6:50,000\rand array(average condition) 1.881 0.01 0.009
S7:50,000\reserve array(worst condition) 8.54 0.015 /
S8:50,000\reserve array(worst condition) 7.956 0.015 /
S9:100,000\rand array(average condition) 7.023 0.022 0.017
S10:100,000\rand array(average condition) 6.86 0.02 0.018
S11:100,000\reserve array(worst condition) 31.549 0.027 /
S12:100,000\reserve array(worst condition) 30.989 0.029 /

Tips: For 50,000 and 100,000 data, the quicksort algorithm program cannot complete the sorting in the worst case.

Algorithm complexity analysis

Insert Sort Algorithm

  • Best condition
    Compare once and move twice at least.
  • Worst condition
    Compare times at most and move times. (reserve array) ()

Therefore, the time complexity of direct insertion sort is , which is related to the order of the sequence to be sorted.

Merge Sort Algorithm

According to the iterative drawing method, the complexity of the merging algorithm is , and its time complexity is independent of the order of the sequence to be sorted.

Quick Sort Algorithm

  • Worst Condition
    In order or reverse order, partition can only solve the position arrangement of one element at a time, so the worst-case time complexity is .
  • Average Condition
    . The more evenly divided the two sides of the pivot element, the smaller the time complexity.

Optimizing

Sentry in merge sort

In merge sort, when merging two sorted sequences together, we used to do this:

if(i == n1){
while(j < n2) A[k++] = R[j++];
} else if ( j == n2){
while(i < n1) A[k++] = L[i++];
}

If a sentinel is added to the rightmost end of the two sequences to be sorted, that is, the maximum value MAX, it is not necessary to judge the problem that the sequence has been selected, which can effectively reduce the number of judgments.

How to choose pivot in quick sort

The choice of pivot has a significant impact on the time complexity of quicksort. From the “reverse test data” above, we can see that the complexity of quicksort can reach if the pivot selected each time is the maximum/minimum value.

Picking the pivot at random during each run usually yields better results.

I adopted a method that I selected the median of the first, last and middle elements as pivot, which can effectively avoid the appearance of “worst condition”.

//median-of-three pivot rule
private static int choosePivotMedianOfThree(int[] a, int l, int r) {
int mid = 0;
if ((r-l+1) % 2 == 0) {
mid = l + (r-l+1)/2 - 1;
} else {
mid = l + (r-l+1)/2;
}

if (((a[l]-a[mid]) * (a[l]-a[r])) <= 0) {
return l;
} else if (((a[mid]-a[l]) * (a[mid]-a[r])) <= 0) {
return mid;
} else {
return r;
}
}

Then swap the selected pivot with the first element in the queue.

Improve the stability

The reason quicksort is “unstable” is that the last step of the partition, pivot, switches with the element at the position.

Example :(elements of the same size are distinguished by ')

  • 3 1 3’ 5 2 6 1’
  • 3 1 2 5 3’ 6 1’
  • 3 1 2 1’ 3’ 6 5
    (No problem at this point.)
  • 1’ 1 2 3 3’ 6 5
    (The order of 1’ and 1 are changed in the last step)

Solution: At the last step of each partition, iterate over the part before array , moving all elements of the same size as backward.

Summary

  • In this experiment, I learned three important algorithms: insertion sort algorithm, merge sort algorithm and quicksort algorithm, and understood their principles and applicable scenarios.

  • I think merge sort is more robust than quicksort, it doesn’t affect the time complexity because of the order of the sequence, and it is a stable sort. Perhaps because the data set is not large enough, I haven’t fully appreciated the time advantage of quicksort.

  • The worst-case time complexity is very different from the average time complexity, so we should take both into account when analyzing the algorithm in the future.