Automated Decryption of Polyalphabetic Ciphers Vijai Gandikota February 25, 2002 Contents AUTOMATED DECRYPTION OF POLYALPHABETIC CIPHERS................................................... 1 DESCRIPTION OF THE PROGRAMMING ENVIRONMENT ................................................................................. 3 DETAILED DESCRIPTION OF THE APPROACH ............................................................................................... 3 Determination of Number of alphabets ................................................................................................. 3 Kasiski Method ................................................................................................................................................. 3 IOC Calculation ................................................................................................................................................ 4 Cipher text alphabets’ frequency determination ................................................................................... 4 Decryption ............................................................................................................................................ 5 Standard frequency based allocation ................................................................................................................. 5 Determination of “e” and “t” ............................................................................................................................. 5 Determination of “h”, “a”, “i” and confirmation of “e” .................................................................................... 5 Determination of “n”,”c”,”r”, “s” ...................................................................................................................... 5 Determination of “l” and confirmation of “s” ................................................................................................... 6 Time of run ............................................................................................................................................ 6 TESTING ..................................................................................................................................................... 6 RESULTS ..................................................................................................................................................... 6 DISCUSSION ................................................................................................................................................ 7 ISSUES ENCOUNTERED ............................................................................................................................... 7 PROMISING IDEAS OF IMPROVEMENT.......................................................................................................... 7 REFERENCES............................................................................................................................................... 8 CODE LISTING ............................................................................................................................................ 8 Description of the programming environment The following table lists the various aspects of the programming environment used to develop this application. Hardware OS Version Processor Type Language Compiler Source code file Final Version Location Libraries Input encrypted file Output decrypted file Execution command Other programs used for testing Sun Ultra-4 5.7 Sparc C++ gcc Split1.cpp /home/vijai/version3/version4/version5 math.h, stdio.h, stdlib.h, string.h, ctype.h, encrypt.txt decrypted.txt a.out <encrypted_filename> freqcnt.cpp : for IOC verification patid.cpp : for verification of factors generated using Kasiski method testcnt.cpp : to test and compare output decrypted file with the original plain text file test2000.cpp : to test and compare the first 2000 characters in the original and the decrypted file location : /home/vijai/CS691/version3/version4/version5/testing Detailed Description of the approach Determination of Number of alphabets The determination of the number of alphabets was carried out through a two pronged approach to improve the accuracy of the prediction. Kasiski Method Algorithm 1. Determine the number of characters in the text and load the text into memory 2. Identify repeated patterns of three or more characters until 20(arbitrary) characters in the encrypted text. 3. Store the locations of the repeated patterns in a file (later in memory). 4. Store the distances between each occurrence in memory 5. Find out all the factors of each distance and record the number of times each factor occurs. (This was implemented by dividing each distance with 1,2,3,4 or 5 and checking for 0 remainder). 6. Find the factor that appeared the most number of times and predict that as the most likely candidate for the number of alphabets. 7. Compare this with the value returned from IOC. As IOC showed a lot of variance in its predictions during testing kasiski was used to override IOC in the case of conflicts and based on the number of alphabets predicted by the two. Optimization of the prediction of the number of alphabets i. If IOC predicts 1 or 2 set number of alphabets to that value ii. If Kasiski is 2 and IOC is 4 and if the difference in the number of factors for 2 and 4 alphabets in Kasiski prediction is less than 160 choose 4 (as two is a factor of 4). iii. If Kasiski predicts 3 or 5 set the number to that value 8. Use this value predicted to further check the number of alphabets predicted. i. Read in the Read in the cipher text into various columns in a multidimensional array, where the number of columns are same as the number of alphabets predicted. That is if the number of alphabets is 2, read in first character into first set, second character into second set, third character into first set and so on. ii. For each set of characters determine the IOC. If it does not match specify which set is failing the test and return after generating an error condition and specifying the two differing results. (Code beyond this to go back to Kasiski and IOC re-calculation could not be implemented due to lack of time). If the IOC matches English for all sets proceed to next step. IOC Calculation 1. Obtain the text loaded in memory 2. Determine the number of times each character appears in in the cipher text and store 3. Determine the IOC by computing the IC formula for the determined frequencies of individual characters. λ=z IC=∑ (Freq λ *( Freq λ-1))/n*(n-1) λ=a where Freq λ is the frequency of occurrence of each character and n is the total number of characters. 4. Predict the number of characters by assuming ranges about the optimal IOC published in literature for 1,2,3,4 and 5 characters. Please refer to the function “kasiski” in the code for the exact ranges specified. 5. Return this value to Kasiski method for comparison and more accurate prediction. Cipher text alphabets’ frequency determination 1. For each set of mono-alphabetic cipher-text sets, determine the frequency of each character. 2. Initialize a 3rd dimension column with each set to store the suggested plain text characters. 3. Each character of the cipher text is denoted by its positional notation. For example “a” in the cipher text has location 0 in the array and the contents of the location 0 contain the frequency of the occurrence of “a”. Similarly “z” has the location 25. Decryption Note: No attempt is made to try to recreate or preserve the case of the characters from the original plain text file. Standard frequency based allocation For each set of the mono-alphabetic cipher text characters make initial guesses based on the frequency distribution of the characters in English and hierarchically assign characters to the cipher characters. This is intended to identify “e” and “t” they being the most widely occurring characters respectively and taking a chance on the rest of the characters. Determination of “e” and “t” As mentioned above in each set the two most widely occurring characters are assigned the positional value of e and t in the English alphabet (i.e 4 and 19, a being 0). This is then taken for the determination of h, i, a, n, c, r and s through the identification of digrams. Determination of “h”, “a”, “i” and confirmation of “e” 1. Find the occurrence of the identified character for “t” in the cipher text and find the frequency of occurrence of each character that occurs after it. 2. Take the top three characters- the first should point to “h”, the second to “i” and the third to “e”. While the frequency of the digram “th” is clearly far more than any other digram with “ t” as first character, “ti” and “te” have closer frequency distributions. 3. Identify the characters for “i” and “e” and compare with the identified “e” from above. The character that doesn’t match the character for “e” represents “i”. (Please note that due to the initial loss of time because of the inaccurate assumption stated below the distinct identification code of i from e currently works for the case when the number of alphabets is 1. Given more time this could have been implemented for the rest of the alphabets thus guaranteeing a rise in accuracy. 4. Subsequently determine the occurrence of “t” and obtain the frequencies of all characters appearing before the cipher character for “t”. 5. The most occurring character is the cipher text character representing “a”. 6. Assign all the characters thus identified, their true values in the multidimensional array called alpha_freq[][][] Determination of “n”,”c”,”r”, “s” 1. Similar to the identification of digrams for t above, identify all digrams with e as the first character. The top two frequencies determine “r” and “s” and the lower two frequencies determine “n” and “c”. 2. However “n” has an individual frequency much greater than “c” in English language. Use this to distinguish “n” and “c”. 3. “r” and “s” have similar frequency of occurrence in English therefore at this time they are assigned in the order of frequencies with which they appear, the larger one being “r”. Determination of “l” and confirmation of “s” 1. As this section was initially implemented using the inaccurate assumption listed below it only has been implemented for the case where the number of alphabets is identified as 1. 2. Determine the frequency distribution of all characters that occur as pairs (eg: “ee”, “ll”, “ss” etc.). The characters with the top two frequencies in the common pairs represent “s” and “l”. 3. Use the “s” identified here to compare with the “r” and “s” identified above to make a more accurate prediction of “r” and “s”. 4. Given more time this could be modified and used in conjunction with the characters identified above to verify the accuracy of the above predictions and to identify “l” in all the cases of the number of alphabets used in the key. Time of run Code for testing the time taken to run the application was not implemented as in all cases the program execution was completed within 10 seconds. Testing 1. Testing for Number of alphabets in key Exhaustive testing was performed on the number of alphabets guessed by the Kasiski Method and the IOC calculation method. 2. Testing and comparison of plain text file and decrypted file Tests were also performed on the decrypted text file and comparisons were made with the original text file provided by the instructor. Results The following table lists the comparison results of the original texts and the decrypted texts. Text File Text0 Text1 Text2 Number of % Correct % Correct alphabets used Characters in the Characters in the for encryption complete text first 2000 characters 1 17.85 48.95 2 15.70 21.8 1 6.87 7.65 2 5.88 5.95 4 5.82 7.00 1 7.74 13.10 Text3 Text4 Text6 1 3 5 1 1 24.8 5.32 5.70 12.80 13.03 51.00 5.35 5.75 33.1 18.95 Discussion 1. As shown in the above table between 5.3 to 25 % characters (1- 7 characters from the alphabet)have been accurately identified in the encrypted texts. Moreover in the first 2000 characters of the decrypted texts up to 51% characters (up to 13 characters from the alphabet). This is very promising as a little more intelligence incorporated can allow the program to decrypt a larger set of characters consistently. 2. Due to the use of frequency distributions a number of characters fall in the range of other characters with the same frequency distributions. Therefore its is noticed that the IOC of the decrypted texts are very close to the IOC of the English language published in the literature [1]. As stated above further intelligence incorporated into the program through a larger set of digram and trigram comparisons as well as common pair comparisons, than used here, can more accurately identify the correct characters. 3. In some cases for example with Text0 and 5 alphabets and Text6 with 5 alphabets, the IOC and the Kasiski method yield widely varying results as suggested before in the class. This is especially so due to the variations in the range of resulting numbers from IOC calculation. Issues Encountered Incorrect assumption: Some backtracking and re-coding had to be performed as the digram and trigram comparisons were initially incorrectly made on the sets of monoalphabetic cipher text characters separated out for each alphabet in the key. This was identified in time and corrected. However this caused a lot of time loss. Since a number of validations of character predictions were implemented here (e.g. checking and rechecking if character identified is correct) it was noted that there was a dramatic increase seen in the number of characters identified when the number of alphabets in the key was 1. Promising Ideas of Improvement Given More time the following could have been implemented and more testing conducted to give very good results in the decryption of poly-alphabetic ciphers. A. Algorithm Modifications 1. Calculation of number of alphabets : While evaluation and selection of best guess made for the number of alphabets was implemented, more selection criteria may be added to predict the number of alphabets accurately in all cases despite the inaccuracies in the IOC calculations. Another addition that was planned but could not be implemented due to lack of time was to loop back to Kasiski and IOC prediction of the calculation of IOC on the individual sets or mono-alphabetic ciphered characters did not match English. This would have completely eliminated any scope of inaccurate predictions. 2. Prediction of the characters: While predicting the characters the approach can be modified to compare digrams of the first two identified characters (“e” and “t”) with “h” to better ascertain the accuracy of “t”. Having done this we can implement similar comparisons for the other characters like “n”, “c”, “r” and “s” sought to be decrypted in this program. This would allow proper identification of the characters that have frequencies very similar to each other. B. Code Modifications 1. Reduction of code through a better reuse of code 2. Better Memory management, through reuse of memory locations and reduction in total memory usage 3. Addition of more validations to check for invalid inputs at runtime to prevent buffer overflows. C. Additions to algorithms 1. More digram testing and comparisons to identify more characters than what has been implemented. 2. Addition of trigram, tetragram and pentagram comparisons References [1] Security in Computing, Charles Pfleeger, Second Edition, Prentice Hall PTR [2] Polyalphabetic Ciphers, http://www.cs.nps.navy.mil/curricula/tracks/security/notes/chap01_1.html Code Listing /* This program takes the encrypted text and the suggested number of alphabets used in encryption program and splits the text into sets Other functions have been added in to analyze the text to predict the number of alphabets used to encipher a message and to decrypt the text. Program Author: Vijai Gandikota Contact : gandikotav@hotmail.com Date : February 10, 2002 */ // Header File Declarations #include<math.h> #include<stdio.h> #include<stdlib.h> #include<string.h> #include<ctype.h> //Constant Declarations #define MAX_ROWS 24000 #define MAX_COLS 5 #define ALPHABET 26 int char_cnt; // Function Declarations int ioc(char char_split_array[MAX_ROWS][MAX_COLS], int, int, int, char* ); int kasiski(int, char **, int*, int*); double iocenc(char*, int); void bsort(float*, int); int main (int argc, char **argv) { FILE *f1; int cnt, alpha_cnt, i, j, ctr, num_rows,total_num_rows,num_cols=0; char c; char_cnt=alpha_cnt=0; if (argc != 2) { printf("\n\t Please enter one filename with: %s \n", argv[0]); exit(1); } f1=fopen(argv[1], "r"); char_cnt=0; /* while(!feof(f1)){ c=fgetc(f1); if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z'))) char_cnt=char_cnt+1; //count the number of characters } rewind(f1); printf("Enter the number of used in encryption : "); scanf("%d", &alpha_cnt); fflush(stdin); */ kasiski(argc, argv, &char_cnt, &alpha_cnt); char char_array[char_cnt];//create an array of the number of characters i=0; printf("***************************************************************"); printf("\n DECRYPTION OF POLYALPHABETIC CIPHER ENCRYPTED TEXT \n"); printf("***************************************************************"); printf("\nNumber of characters in the encrypted text (char_cnt) : %d\n",char_cnt); printf("Number of alphabets used (alpha_cnt) is %d \n", alpha_cnt); //read all the characters from the file into the array while(!feof(f1)){ c=fgetc(f1); if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z'))) { char_array[i]=c; i=i+1; } } //for(j=0;j<char_cnt;j++) printf(" %c ", char_array[j]); //Remember that the total number of columns is equal to the alpha_cnt if (char_cnt%alpha_cnt==0) total_num_rows=char_cnt/alpha_cnt; else total_num_rows=(char_cnt/alpha_cnt)+1; //char char_split_array[total_num_rows][alpha_cnt];//Declare the array char char_split_array[MAX_ROWS][MAX_COLS];//Declare the array //Initialize the array for(i=0;i<total_num_rows;i++){ for(j=0;j<alpha_cnt;j++){ char_split_array[i][j]=' '; } } printf("Total number of characters per set = %d \n",total_num_rows); //rest the values of i and j for later use i=0; j=0; //split the char_array into alpha_cnt number of sets for(num_cols=0;num_cols<alpha_cnt;num_cols++){ for(ctr=num_cols, num_rows=0;ctr<char_cnt && num_rows<total_num_rows;ctr=ctr+alpha_cnt, num_rows++){ char_split_array[num_rows][num_cols]=char_array[ctr]; }//end for ctr }//end for num_cols // To print the char_split_array on screen for debugging /* for(i=0;i<total_num_rows;i++){ for (j=0;j<alpha_cnt;j++){ if(char_split_array[i][j]!=' ') printf(" %c ", char_split_array[i][j]); c=char_split_array[i][j]; } if(c!=' ') printf("\n"); } Now we have sets of characters Now we have to do the following 1. Analyze each set for frequency distributions by storing all the distinct characters in an array for 26 by 1 2. Account for the case where more than one alpha_cnt is suggested 3. Identify unused letters 3. Try to identify the most common characters t, e 4. Try to identify diagrams 5. Try to identify Common Pairs 6. Try to identify triagrams 7. Try to identify tetragrams and pentagrams */ // Call the function that takes over and does all this now ioc(char_split_array, alpha_cnt, total_num_rows, char_cnt, char_array); }//end main /* This program identifies the percentage of occurrence of characters appearing in an encrypted text file. IOC Program Author: Vijai Gandikota Contact : gandikotav@hotmail.com Date : February 10, 2002 */ various int ioc(char char_split_array[MAX_ROWS][MAX_COLS], int alpha_cnt, int total_num_rows, int char_cnt, char* ptr) { FILE *fd; /* printf(" %d = alpha_cnt, %d total_num_rows, %d, char_cnt\n",alpha_cnt,total_num_rows, char_cnt); */ char c, firstchar='y'; int i,j,k,cnt=0, pos_ptr=0; float alpha_freq[26][alpha_cnt][2];//for each of the 26 alphabets in each set //stores the freq and makes possible sure shot suggestions for(i=0;i<26;i++){ for(j=0;j<alpha_cnt;j++){ for(k=0;k<2;k++){ alpha_freq[i][j][k]=0; }}} int rows_per_col[alpha_cnt]; for(i=0; i<alpha_cnt;i++){ for(j=0;j<total_num_rows;j++){ c=char_split_array[j][i]; //printf(" %c ", c); if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z'))) { cnt=cnt+1; //count the number of characters switch(c){ case 'a': alpha_freq[0][i][0]=alpha_freq[0][i][0]+1; break; case 'A': alpha_freq[0][i][0]=alpha_freq[0][i][0]+1; break; case 'b': alpha_freq[1][i][0]=alpha_freq[1][i][0]+1; break; case 'B': alpha_freq[1][i][0]=alpha_freq[1][i][0]+1; break; case 'c': alpha_freq[2][i][0]=alpha_freq[2][i][0]+1; break; case 'C': alpha_freq[2][i][0]=alpha_freq[2][i][0]+1; break; case 'd': alpha_freq[3][i][0]=alpha_freq[3][i][0]+1; break; case 'D': alpha_freq[3][i][0]=alpha_freq[3][i][0]+1; break; case 'e': alpha_freq[4][i][0]=alpha_freq[4][i][0]+1; break; case 'E': alpha_freq[4][i][0]=alpha_freq[4][i][0]+1; break; case 'f': alpha_freq[5][i][0]=alpha_freq[5][i][0]+1; break; case 'F': alpha_freq[5][i][0]=alpha_freq[5][i][0]+1; break; case 'g': alpha_freq[6][i][0]=alpha_freq[6][i][0]+1; break; case 'G': alpha_freq[6][i][0]=alpha_freq[6][i][0]+1; break; case 'h': alpha_freq[7][i][0]=alpha_freq[7][i][0]+1; break; case 'H': alpha_freq[7][i][0]=alpha_freq[7][i][0]+1; break; case 'i': alpha_freq[8][i][0]=alpha_freq[8][i][0]+1; break; case 'I': alpha_freq[8][i][0]=alpha_freq[8][i][0]+1; break; case 'j': alpha_freq[9][i][0]=alpha_freq[9][i][0]+1; break; case 'J': alpha_freq[9][i][0]=alpha_freq[9][i][0]+1; break; case 'k': alpha_freq[10][i][0]=alpha_freq[10][i][0]+1; break; case 'K': alpha_freq[10][i][0]=alpha_freq[10][i][0]+1; break; case 'l': alpha_freq[11][i][0]=alpha_freq[11][i][0]+1; break; case 'L': alpha_freq[11][i][0]=alpha_freq[11][i][0]+1; break; case 'm': alpha_freq[12][i][0]=alpha_freq[12][i][0]+1; break; case 'M': alpha_freq[12][i][0]=alpha_freq[12][i][0]+1; break; case 'n': alpha_freq[13][i][0]=alpha_freq[13][i][0]+1; break; case 'N': alpha_freq[13][i][0]=alpha_freq[13][i][0]+1; break; case 'o': alpha_freq[14][i][0]=alpha_freq[14][i][0]+1; break; case 'O': alpha_freq[14][i][0]=alpha_freq[14][i][0]+1; break; case 'p': alpha_freq[15][i][0]=alpha_freq[15][i][0]+1; break; case 'P': alpha_freq[15][i][0]=alpha_freq[15][i][0]+1; break; case 'q': alpha_freq[16][i][0]=alpha_freq[16][i][0]+1; break; case 'Q': alpha_freq[16][i][0]=alpha_freq[16][i][0]+1; break; case 'r': alpha_freq[17][i][0]=alpha_freq[17][i][0]+1; break; case 'R': alpha_freq[17][i][0]=alpha_freq[17][i][0]+1; break; case 's': alpha_freq[18][i][0]=alpha_freq[18][i][0]+1; break; case 'S': alpha_freq[18][i][0]=alpha_freq[18][i][0]+1; break; case 't': alpha_freq[19][i][0]=alpha_freq[19][i][0]+1; break; case 'T': alpha_freq[19][i][0]=alpha_freq[19][i][0]+1; break; case 'u': alpha_freq[20][i][0]=alpha_freq[20][i][0]+1; break; case 'U': alpha_freq[20][i][0]=alpha_freq[20][i][0]+1; break; case 'v': alpha_freq[21][i][0]=alpha_freq[21][i][0]+1; break; case 'V': alpha_freq[21][i][0]=alpha_freq[21][i][0]+1; break; case 'w': alpha_freq[22][i][0]=alpha_freq[22][i][0]+1; break; case 'W': alpha_freq[22][i][0]=alpha_freq[22][i][0]+1; break; case 'x': alpha_freq[23][i][0]=alpha_freq[23][i][0]+1; break; case 'X': alpha_freq[23][i][0]=alpha_freq[23][i][0]+1; break; case 'y': alpha_freq[24][i][0]=alpha_freq[24][i][0]+1; break; case 'Y': alpha_freq[24][i][0]=alpha_freq[24][i][0]+1; break; case 'z': alpha_freq[25][i][0]=alpha_freq[25][i][0]+1; break; case 'Z': alpha_freq[25][i][0]=alpha_freq[25][i][0]+1; break; default : printf("error: this character should not be here"); break; }//end case }//endif }//end second for if(c==' ') rows_per_col[i]=total_num_rows-1; else rows_per_col[i]=total_num_rows; /* printf(" Number of actual rows in col %d is : %d\n", i, rows_per_col[i]); */ }//end first for char ptrchar=' '; char ptrcharnext=' '; char ptrcharprev=' '; float g1,g2, ind_ioc=0; double sum[alpha_cnt]; for(i=0;i<alpha_cnt;i++) sum[i]=0; i=0;j=0;k=0; int t=0; int e=0; int g_freq=0; int g_freq2=0; g1=g2=0; int gt1=0; int gt2=0; int gs=0; int gl=0; int l=0; int m=0;int gth=0; int gth1=0; int e1,e2,e3,e4, er,es,en,ec,th,ti,te,t1,t2,t3; e1=e2=e3=e4=er=es=en=ec=t1=t2=t3=th=ti=te=0; float cmp[26]; int alphabet[26]={'z','q','j','x','k','v','b','y','w','g','f','p','u','m','c','l','d','h','r' ,'i','n','s','o','a','t','e'}; int char_table_t[26], char_table_e[26],char_table_aa[26]; for(j=0;j<26;j++){ cmp[j]=0; char_table_t[j]=0; // table to store freq of digrams with T char_table_e[j]=0; // table to store freq of digrams with E char_table_aa[j]=0; // table to store freq of common pairs eg AA, BB CC } for(j=0;j<alpha_cnt;j++){ // for each set of chars in char_split_array for(i=0;i<26;i++) { ind_ioc=(alpha_freq[i][j][0]/rows_per_col[j])*((alpha_freq[i][j][0] 1)/(rows_per_col[j]-1)); sum[j]=sum[j]+ind_ioc; alpha_freq[i][j][0]=(alpha_freq[i][j][0]/rows_per_col[j])*100; if(alpha_freq[i][j][0]==0){ //Finding unused chars alpha_freq[i][j][1]=99; } if(alpha_freq[i][j][0]>g1){ //Finding T and E g2=g1; g1=alpha_freq[i][j][0]; g_freq2=g_freq; g_freq=i; }//end if else { if(alpha_freq[i][j][0]>g2){ g2=alpha_freq[i][j][0]; g_freq2=i; }//endif }//end else cmp[i]=alpha_freq[i][j][0]; //printf(" %d %d \n", g_freq, g_freq2); }//endfor i /* for(l=0;l<26;l++) printf(" %f ", cmp[l]); */ //printf("\n###\n"); bsort(cmp, ALPHABET); for(l=0;l<26;l++) // printf(" %f ", cmp[l]); for(l=0;l<26;l++){ for(m=0;m<26;m++){ if(alpha_freq[l][j][0]==cmp[m]) { alpha_freq[l][j][1]=alphabet[m]-'a'; break; } }// end for }//end for for(l=0;l<26;l++) cmp[l]=0; l==m==0; alpha_freq[g_freq][j][1]=4; //set the g_freq+1 th char in the jth set to e alpha_freq[g_freq2][j][1]=19; // set the g_freq2+1 th char in jth set to t //printf(" e = %d t= %d \n", g_freq, g_freq2); // we have found t and e so far /********************************************************************** FINDING DIAGRAMS WITH T (H and A) *********************************************************************/ for(l=0;l<26;l++) char_table_t[l]=0; if(alpha_cnt!=1){ for(l=j;l<char_cnt-1;l=l+alpha_cnt){ ptrchar=*(ptr+l); ptrcharnext=*(ptr+l+1); if(ptrchar=='a'+g_freq2){ switch(ptrcharnext){ case 'a':char_table_t[0]++;break; case 'A':char_table_t[0]++;break; case 'b':char_table_t[1]++;break; case 'B':char_table_t[1]++;break; case 'c':char_table_t[2]++;break; case 'C':char_table_t[2]++;break; case 'd':char_table_t[3]++;break; case 'D':char_table_t[3]++;break; case 'e':char_table_t[4]++;break; case 'E':char_table_t[4]++;break; case 'f':char_table_t[5]++;break; case 'F':char_table_t[5]++;break; case 'g':char_table_t[6]++;break; case 'G':char_table_t[6]++;break; case 'h':char_table_t[7]++;break; case 'H':char_table_t[7]++;break; case 'i':char_table_t[8]++;break; case 'I':char_table_t[8]++;break; case 'j':char_table_t[9]++;break; case 'J':char_table_t[9]++;break; case 'k':char_table_t[10]++;break; case 'K':char_table_t[10]++;break; case 'l':char_table_t[11]++;break; case 'L':char_table_t[11]++;break; case 'm':char_table_t[12]++;break; case 'M':char_table_t[12]++;break; case 'n':char_table_t[13]++;break; case 'N':char_table_t[13]++;break; case 'o':char_table_t[14]++;break; case 'O':char_table_t[14]++;break; case 'p':char_table_t[15]++;break; case 'P':char_table_t[15]++;break; case 'q':char_table_t[16]++;break; case 'Q':char_table_t[16]++;break; case 'r':char_table_t[17]++;break; case 'R':char_table_t[17]++;break; case 's':char_table_t[18]++;break; case 'S':char_table_t[18]++;break; case 't':char_table_t[19]++;break; case 'T':char_table_t[19]++;break; case 'u':char_table_t[20]++;break; case 'U':char_table_t[20]++;break; case 'v':char_table_t[21]++;break; case 'V':char_table_t[21]++;break; case 'w':char_table_t[22]++;break; case 'W':char_table_t[22]++;break; case 'x':char_table_t[23]++;break; case 'X':char_table_t[23]++;break; case 'y':char_table_t[24]++;break; case 'Y':char_table_t[24]++;break; case 'z':char_table_t[25]++;break; case 'Z':char_table_t[25]++;break; default : break; }//end case }//end if }//end for gth=gth1=l=m=0; for(l=0;l<26;l++){ //printf(" %d ", char_table_t[l]); if(char_table_t[l]>gth){ gth=char_table_t[l]; gth1=l; } }//end for //found the numerical value corresponding to h //printf("\n H should be %d \n", gth1); //check to see if that char int alpha_freq has numeric val of h if(alpha_freq[gth1][j][1]!=7){ //then do something for(m=0;m<26;m++){ if(alpha_freq[m][j][1]==7){ alpha_freq[m][j][1]=alpha_freq[gth1][j][1];// swap them alpha_freq[gth1][j][1]=7; // and the set that char to h }//end if }// end for }//end if //SO I HAVE FOUND H!!! gth=gth1=l=m=0; /********************************************************************** // Now to find A from AT **********************************************************************/ for(l=0;l<26;l++) char_table_t[l]=0; for(l=j;l<char_cnt;l=l+alpha_cnt){ ptrchar=*(ptr+l); if(l!=0){ ptrcharprev=*(ptr+l-1); if(ptrchar=='a'+g_freq2){ switch(ptrcharprev){ case 'a':char_table_t[0]++;break; case 'A':char_table_t[0]++;break; case 'b':char_table_t[1]++;break; case 'B':char_table_t[1]++;break; case 'c':char_table_t[2]++;break; case 'C':char_table_t[2]++;break; case 'd':char_table_t[3]++;break; case 'D':char_table_t[3]++;break; case 'e':char_table_t[4]++;break; case 'E':char_table_t[4]++;break; case 'f':char_table_t[5]++;break; case 'F':char_table_t[5]++;break; case 'g':char_table_t[6]++;break; case 'G':char_table_t[6]++;break; case 'h':char_table_t[7]++;break; case 'H':char_table_t[7]++;break; case 'i':char_table_t[8]++;break; case 'I':char_table_t[8]++;break; case 'j':char_table_t[9]++;break; case 'J':char_table_t[9]++;break; case 'k':char_table_t[10]++;break; case 'K':char_table_t[10]++;break; case 'l':char_table_t[11]++;break; case 'L':char_table_t[11]++;break; case 'm':char_table_t[12]++;break; case 'M':char_table_t[12]++;break; case 'n':char_table_t[13]++;break; case 'N':char_table_t[13]++;break; case 'o':char_table_t[14]++;break; case 'O':char_table_t[14]++;break; case 'p':char_table_t[15]++;break; case 'P':char_table_t[15]++;break; case 'q':char_table_t[16]++;break; case 'Q':char_table_t[16]++;break; case 'r':char_table_t[17]++;break; case 'R':char_table_t[17]++;break; case 's':char_table_t[18]++;break; case 'S':char_table_t[18]++;break; case 't':char_table_t[19]++;break; case 'T':char_table_t[19]++;break; case 'u':char_table_t[20]++;break; case 'U':char_table_t[20]++;break; case 'v':char_table_t[21]++;break; case 'V':char_table_t[21]++;break; case 'w':char_table_t[22]++;break; case 'W':char_table_t[22]++;break; case 'x':char_table_t[23]++;break; case 'X':char_table_t[23]++;break; case 'y':char_table_t[24]++;break; case 'Y':char_table_t[24]++;break; case 'z':char_table_t[25]++;break; case 'Z':char_table_t[25]++;break; default : break; }//end case }//end if }//end if l not equal to 0 }//end for gth=gth1=l=m=0; for(l=0;l<26;l++){ //printf(" %d ", char_table_t[l]); if(char_table_t[l]>gth){ gth=char_table_t[l]; gth1=l; } }//end for //found the numerical value corresponding to a //printf("\n A should be %d \n", gth1); //check to see if that char int alpha_freq has numeric val of h if(alpha_freq[gth1][j][1]!=0){ //then do something for(m=0;m<26;m++){ if(alpha_freq[m][j][1]==0){ alpha_freq[m][j][1]=alpha_freq[gth1][j][1];// swap them alpha_freq[gth1][j][1]=0; // and the set that char to a }//end if }// end for }//end if //SO I HAVE FOUND A!!! gth=gth1=l=m=0; for(l=0;l<26;l++) char_table_t[l]=0; /*********************************************************** Now to find R from the occurences of ER in the complete text **********************************************************/ for(l=j;l<char_cnt-1;l=l+alpha_cnt){ ptrchar=*(ptr+l); ptrcharnext=*(ptr+l+1); if(ptrchar=='a'+g_freq){ switch(ptrcharnext){ case 'a':char_table_t[0]++;break; case 'A':char_table_t[0]++;break; case 'b':char_table_t[1]++;break; case 'B':char_table_t[1]++;break; case 'c':char_table_t[2]++;break; case 'C':char_table_t[2]++;break; case 'd':char_table_t[3]++;break; case 'D':char_table_t[3]++;break; case 'e':char_table_t[4]++;break; case 'E':char_table_t[4]++;break; case 'f':char_table_t[5]++;break; case 'F':char_table_t[5]++;break; case 'g':char_table_t[6]++;break; case 'G':char_table_t[6]++;break; case 'h':char_table_t[7]++;break; case 'H':char_table_t[7]++;break; case 'i':char_table_t[8]++;break; case 'I':char_table_t[8]++;break; case 'j':char_table_t[9]++;break; case 'J':char_table_t[9]++;break; case 'k':char_table_t[10]++;break; case 'K':char_table_t[10]++;break; case 'l':char_table_t[11]++;break; case 'L':char_table_t[11]++;break; case 'm':char_table_t[12]++;break; case 'M':char_table_t[12]++;break; case 'n':char_table_t[13]++;break; case 'N':char_table_t[13]++;break; case 'o':char_table_t[14]++;break; case 'O':char_table_t[14]++;break; case 'p':char_table_t[15]++;break; case 'P':char_table_t[15]++;break; case 'q':char_table_t[16]++;break; case 'Q':char_table_t[16]++;break; case 'r':char_table_t[17]++;break; case 'R':char_table_t[17]++;break; case 's':char_table_t[18]++;break; case 'S':char_table_t[18]++;break; case 't':char_table_t[19]++;break; case 'T':char_table_t[19]++;break; case 'u':char_table_t[20]++;break; case 'U':char_table_t[20]++;break; case 'v':char_table_t[21]++;break; case 'V':char_table_t[21]++;break; case 'w':char_table_t[22]++;break; case 'W':char_table_t[22]++;break; case 'x':char_table_t[23]++;break; case 'X':char_table_t[23]++;break; case 'y':char_table_t[24]++;break; case 'Y':char_table_t[24]++;break; case 'z':char_table_t[25]++;break; case 'Z':char_table_t[25]++;break; default : break; }//end case }//end if }//end for gth=gth1=l=m=0; /* for(l=0;l<26;l++){ //printf(" %d ", char_table_t[l]); if(char_table_t[l]>gth){ gth=char_table_t[l]; gth1=l; } }*/ for(k=0;k<26;k++){ if((char_table_t[k])>e1){ e4=e3;e3=e2;e2=e1;e1=char_table_t[k]; ec=en;en=es;es=er;er=k; } else { if((char_table_t[k])>e2){ e4=e3;e3=e2;e2=char_table_t[k]; ec=en;en=es;es=k; } //end if else { if(char_table_t[k]>e3) { e4=e3;e3=char_table_t[k]; ec=en;en=k; }//end if else { if(char_table_t[k]>e4){ e4=char_table_t[k]; e4=k; }//end if }// end else }//end else }//end else }//end for k if(alpha_freq[en][j][0]>alpha_freq[ec][j][0]){ if(alpha_freq[en][j][1]!=13){ //then do something for(m=0;m<26;m++){ if(alpha_freq[m][j][1]==13){ alpha_freq[m][j][1]=alpha_freq[en][j][1];// swap them alpha_freq[en][j][1]=13; // and the set that char to r }// end if }// end for }//end if if(alpha_freq[ec][j][1]!=2){ //then do something for(m=0;m<26;m++){ if(alpha_freq[m][j][1]==2){ alpha_freq[m][j][1]=alpha_freq[ec][j][1];// swap them alpha_freq[ec][j][1]=2; // and the set that char to r }// end if }// end for }//end if } else { if(alpha_freq[ec][j][1]!=13){ //then do something for(m=0;m<26;m++){ if(alpha_freq[m][j][1]==13){ alpha_freq[m][j][1]=alpha_freq[ec][j][1];// swap them alpha_freq[ec][j][1]=13; // and the set that char to r }// end if }// end for }//end if if(alpha_freq[en][j][1]!=2){ //then do something for(m=0;m<26;m++){ if(alpha_freq[m][j][1]==2){ alpha_freq[m][j][1]=alpha_freq[en][j][1];// swap them alpha_freq[en][j][1]=2; // and the set that char to }// end if }// end for }//end if } //Found N and C //found the numerical value corresponding to R //check to see if that char int alpha_freq has numeric val of r if(alpha_freq[er][j][1]!=17){ //then do something for(m=0;m<26;m++){ if(alpha_freq[m][j][1]==17){ alpha_freq[m][j][1]=alpha_freq[er][j][1];// swap them alpha_freq[er][j][1]=17; // and the set that char to r }// end if }// end for }//end if //SO I HAVE FOUND R!!! //found the numerical value corresponding to S //check to see if that char int alpha_freq has numeric val of r if(alpha_freq[es][j][1]!=18){ //then do something for(m=0;m<26;m++){ if(alpha_freq[m][j][1]==18){ alpha_freq[m][j][1]=alpha_freq[es][j][1];// swap them alpha_freq[es][j][1]=18; // and the set that char to s }// end if }// end for }//end if //SO I HAVE FOUND S!!! gth=gth1=l=m=0; k=ec=en=er=es=e1=e2=e3=e4=0; }//end if /************************************************************************* //FINDING Digrams with T as the first character. Looking for TH, TI and TE with frequencies in that order *************************************************************************/ if(alpha_cnt==1){ // I just realize that the following will only work for // this case. I had thought that I was decrypting upto 10 chars // in all cases but that is not the case as this section is operating // on char_split_array columns and not the entire array // I can still do it for the case where the number of alphabets > 1 // But I have to rearrange the following code and operate it on the // complete set of encrypted characters with t and e marked // So I am going to start writing that function and try to finish it. // Now to find h, i for (k=0;k<rows_per_col[j];k++){ if(((char_split_array[k][j])==('a'+g_freq2)) || ((char_split_array[k][j])==('A'+g_freq2))){ switch(char_split_array[k+1][j]){ case 'a':char_table_t[0]++;break; case 'A':char_table_t[0]++;break; case 'b':char_table_t[1]++;break; case 'B':char_table_t[1]++;break; case 'c':char_table_t[2]++;break; case 'C':char_table_t[2]++;break; case 'd':char_table_t[3]++;break; case 'D':char_table_t[3]++;break; case 'e':char_table_t[4]++;break; case 'E':char_table_t[4]++;break; case 'f':char_table_t[5]++;break; case 'F':char_table_t[5]++;break; case 'g':char_table_t[6]++;break; case 'G':char_table_t[6]++;break; case 'h':char_table_t[7]++;break; case 'H':char_table_t[7]++;break; case 'i':char_table_t[8]++;break; case 'I':char_table_t[8]++;break; case 'j':char_table_t[9]++;break; case 'J':char_table_t[9]++;break; case 'k':char_table_t[10]++;break; case 'K':char_table_t[10]++;break; case 'l':char_table_t[11]++;break; case 'L':char_table_t[11]++;break; case 'm':char_table_t[12]++;break; case 'M':char_table_t[12]++;break; case 'n':char_table_t[13]++;break; case 'N':char_table_t[13]++;break; case 'o':char_table_t[14]++;break; case 'O':char_table_t[14]++;break; case 'p':char_table_t[15]++;break; case 'P':char_table_t[15]++;break; case 'q':char_table_t[16]++;break; case 'Q':char_table_t[16]++;break; case 'r':char_table_t[17]++;break; case 'R':char_table_t[17]++;break; case 's':char_table_t[18]++;break; case 'S':char_table_t[18]++;break; case 't':char_table_t[19]++;break; case 'T':char_table_t[19]++;break; case 'u':char_table_t[20]++;break; case 'U':char_table_t[20]++;break; case 'v':char_table_t[21]++;break; case 'V':char_table_t[21]++;break; case 'w':char_table_t[22]++;break; case 'W':char_table_t[22]++;break; case 'x':char_table_t[23]++;break; case 'X':char_table_t[23]++;break; case 'y':char_table_t[24]++;break; case 'Y':char_table_t[24]++;break; case 'z':char_table_t[25]++;break; case 'Z':char_table_t[25]++;break; default : break; } //end case }//end if }//end for k for(k=0;k<26;k++){ if((char_table_t[k])>t1){ t3=t2;t2=t1;t1=char_table_t[k]; te=ti;ti=th;th=k; } else { if((char_table_t[k])>t2){ t3=t2;t2=char_table_t[k]; te=ti;ti=k; } //end if else { if(char_table_t[k]>t3){ t3=char_table_t[k]; te=k; }//end if }//end else }//end else }//end for k printf("For Set %d th=%d, ti=%d te= %d\n",j,th,ti,te); if(te==g_freq) { // e matches in both the calculations if(alpha_freq[th][j][1]==0){ alpha_freq[th][j][1]=7; alpha_freq[ti][j][1]=8; } } else { if(ti==g_freq) { // ti is actually te and te is ti if(alpha_freq[th][j][1]==0){ alpha_freq[th][j][1]=7; alpha_freq[te][j][1]=8; }//end if }// end if }//end else // Now that we have the frequencies of the digrams with t // Get the alphabets with the top three frequencies third should be e // First will be h and second will be i //if second is not i then second has to // be e and third will positively be i printf("\nChar table T %d \n",j); for(k=0;k<26;k++){ // printf(" %d,", char_table_t[k]); char_table_t[k]=0; } printf("\n"); /************************************************************************ FINDING Digrams with E as the first character.Looking for ER, ES, EN, EC ER=143, ES= 132, EN=99.2, EC=80.9 After identifying candidates for en and ec use the fact that freq of n is greater than that of c. *************************************************************************/ for(k=0;k<rows_per_col[j];k++){ if(((char_split_array[k][j])==('a'+g_freq)) || ((char_split_array[k][j])==('A'+g_freq))){ switch(char_split_array[k+1][j]){ case 'a':char_table_e[0]++;break; case 'A':char_table_e[0]++;break; case 'b':char_table_e[1]++;break; case 'B':char_table_e[1]++;break; case 'c':char_table_e[2]++;break; case 'C':char_table_e[2]++;break; case 'd':char_table_e[3]++;break; case 'D':char_table_e[3]++;break; case 'e':char_table_e[4]++;break; case 'E':char_table_e[4]++;break; case 'f':char_table_e[5]++;break; case 'F':char_table_e[5]++;break; case 'g':char_table_e[6]++;break; case 'G':char_table_e[6]++;break; case 'h':char_table_e[7]++;break; case 'H':char_table_e[7]++;break; case 'i':char_table_e[8]++;break; case 'I':char_table_e[8]++;break; case 'j':char_table_e[9]++;break; case 'J':char_table_e[9]++;break; case 'k':char_table_e[10]++;break; case 'K':char_table_e[10]++;break; case 'l':char_table_e[11]++;break; case 'L':char_table_e[11]++;break; case 'm':char_table_e[12]++;break; case 'M':char_table_e[12]++;break; case 'n':char_table_e[13]++;break; case 'N':char_table_e[13]++;break; case 'o':char_table_e[14]++;break; case 'O':char_table_e[14]++;break; case 'p':char_table_e[15]++;break; case 'P':char_table_e[15]++;break; case 'q':char_table_e[16]++;break; case 'Q':char_table_e[16]++;break; case 'r':char_table_e[17]++;break; case 'R':char_table_e[17]++;break; case 's':char_table_e[18]++;break; case 'S':char_table_e[18]++;break; case 't':char_table_e[19]++;break; case 'T':char_table_e[19]++;break; case 'u':char_table_e[20]++;break; case 'U':char_table_e[20]++;break; case 'v':char_table_e[21]++;break; case 'V':char_table_e[21]++;break; case 'w':char_table_e[22]++;break; case 'W':char_table_e[22]++;break; case 'x':char_table_e[23]++;break; case 'X':char_table_e[23]++;break; case 'y':char_table_e[24]++;break; case 'Y':char_table_e[24]++;break; case 'z':char_table_e[25]++;break; case 'Z':char_table_e[25]++;break; default : break; } //end case }//end if }//end for k for(k=0;k<26;k++){ if((char_table_e[k])>e1){ e4=e3;e3=e2;e2=e1;e1=char_table_e[k]; ec=en;en=es;es=er;er=k; } else { if((char_table_e[k])>e2){ e4=e3;e3=e2;e2=char_table_e[k]; ec=en;en=es;es=k; } //end if else { if(char_table_e[k]>e3) { e4=e3;e3=char_table_e[k]; ec=en;en=k; }//end if else { if(char_table_e[k]>e4){ e4=char_table_e[k]; e4=k; }//end if }// end else }//end else }//end else }//end for k // thus at this time we are tentatively guessing that ec,en,es and er // are point to possible array locations whose index indicates the ciper //alphabets corresponding to c, n, s and r // compare the individual letter freqs of the lower two to get n and c printf(" er=%d es=%d en=%d ec=%d\n",er,es,en,ec); if(alpha_freq[en][j][0]>alpha_freq[ec][j][0]){ alpha_freq[en][j][1]=13; alpha_freq[ec][j][1]=2; } else { alpha_freq[ec][j][1]=13; alpha_freq[en][j][1]=2; } // r & s will be too close so for now just guess that the lower one is s // Then find all common pairs & from that find s and compare to check // that previous guess was correct then the other char is r // also find l from the common pairs printf("\n Char Table E\n"); for(k=0;k<26;k++){ // printf(" %d,", char_table_e[k]); char_table_e[k]=0; } printf("\n"); /********************************************************************* Find Common Pairs of Characters the top two frequencies should correspond to SS and LL in that order **********************************************************************/ for(l=0;l<rows_per_col[j];l++){ if(char_split_array[l][j]==char_split_array[l+1][j]){ switch(char_split_array[l][j]){ case 'a':char_table_aa[0]++;break; case 'A':char_table_aa[0]++;break; case 'b':char_table_aa[1]++;break; case 'B':char_table_aa[1]++;break; case 'c':char_table_aa[2]++;break; case 'C':char_table_aa[2]++;break; case 'd':char_table_aa[3]++;break; case 'D':char_table_aa[3]++;break; case 'e':char_table_aa[4]++;break; case 'E':char_table_aa[4]++;break; case 'f':char_table_aa[5]++;break; case 'F':char_table_aa[5]++;break; case 'g':char_table_aa[6]++;break; case 'G':char_table_aa[6]++;break; case 'h':char_table_aa[7]++;break; case 'H':char_table_aa[7]++;break; case 'i':char_table_aa[8]++;break; case 'I':char_table_aa[8]++;break; case 'j':char_table_aa[9]++;break; case 'J':char_table_aa[9]++;break; case 'k':char_table_aa[10]++;break; case 'K':char_table_aa[10]++;break; case 'l':char_table_aa[11]++;break; case 'L':char_table_aa[11]++;break; case 'm':char_table_aa[12]++;break; case 'M':char_table_aa[12]++;break; case 'n':char_table_aa[13]++;break; case 'N':char_table_aa[13]++;break; case 'o':char_table_aa[14]++;break; case 'O':char_table_aa[14]++;break; case 'p':char_table_aa[15]++;break; case 'P':char_table_aa[15]++;break; case 'q':char_table_aa[16]++;break; case 'Q':char_table_aa[16]++;break; case 'r':char_table_aa[17]++;break; case 'R':char_table_aa[17]++;break; case 's':char_table_aa[18]++;break; case 'S':char_table_aa[18]++;break; case 't':char_table_aa[19]++;break; case 'T':char_table_aa[19]++;break; case 'u':char_table_aa[20]++;break; case 'U':char_table_aa[20]++;break; case 'v':char_table_aa[21]++;break; case 'V':char_table_aa[21]++;break; case 'w':char_table_aa[22]++;break; case 'W':char_table_aa[22]++;break; case 'x':char_table_aa[23]++;break; case 'X':char_table_aa[23]++;break; case 'y':char_table_aa[24]++;break; case 'Y':char_table_aa[24]++;break; case 'z':char_table_aa[25]++;break; case 'Z':char_table_aa[25]++;break; default : break; } //end case } //end if } //end for l //FINDING THE TOP TWO FREQUENCIES for(m=0;m<26;m++){ //printf(" %d ", char_table_aa[m]); if(char_table_aa[m]>gt1){ gt2=gt1; gt1=char_table_aa[m]; gl=gs; gs=m; } else { if(char_table_aa[m]>gt2){ gt2=char_table_aa[m]; gl=m; } } //printf(" %d and %d \n", gs, gl); }//end for m //printf("\nThe chars with top two freq for this set are %d and %d \n", gs, gl); // First check to see that s obtained here and in the E digrams is same if(gs==es) { printf("\ngs and es are equal\n"); if(alpha_freq[gs][j][1]==0) alpha_freq[gs][j][1]=18; if(alpha_freq[er][j][1]==0) alpha_freq[er][j][1]=17; }//end if else { printf("\n gs (%d) and es (%d) are not equal\n",gs,es); if(gs==er){ printf("\ngs and er are equal\n"); if(alpha_freq[er][j][1]==0) alpha_freq[er][j][1]=18; if(alpha_freq[es][j][1]==0) alpha_freq[es][j][i]=17; }//end if else printf("\n gs (%d) and er (%d) are also not equal\n",gs,er); }//end else // we can be reasonable confident about L as it comes a clear second // to S if(alpha_freq[gl][j][1]==0) alpha_freq[gl][j][1]=11; //Thus we have identified s,n,c,e,r,t,l printf("\nChar Table AA\n"); for(k=0;k<26;k++){ //printf(" %d ", char_table_aa[k]); char_table_aa[k]=0; // resetting table to store freq of common pairs } printf("\n"); }//end if g1=g2=gs=gl=gt1=gt2=g_freq=g_freq2=0; l=m=th=ti=te=t1=t2=t3=e1=e2=e3=e4=es=er=en=ec=0; }//end for j //g_freq and its counterparts are used to identify e and t i=j=l=m=0; fd=fopen("decrypted.txt","w"); char decrypted_array[char_cnt]; //Replace the encrypted text in each of the sets by for (j=0;j<alpha_cnt;j++){ for(i=0;i<total_num_rows;i++){ if(char_split_array[i][j]!=' '){ l=char_split_array[i][j]-'a'; char_split_array[i][j]='a'+int(alpha_freq[l][j][1]); //printf(" %c ", char_split_array[i][j]); }//end if }//endfor //printf("\n++++++++++++++++++++++++++++++\n"); }//end for l=m=0; for(i=0;i<total_num_rows;i++){ for(j=0;j<alpha_cnt;j++){ if(char_split_array[i][j]!=' '){ decrypted_array[l]=char_split_array[i][j]; fputc(decrypted_array[l],fd); if(l%6==0) fputc(' ',fd); l++; }//end if }//end for }//end for double check_ioc=iocenc(decrypted_array,char_cnt); printf("\n THE IOC OF THE DECRYPTED TEXT IS : %f \n",check_ioc); printf(" The name of the decrypted text file is decrypted.txt\n\n"); printf(" Frequency dist. table of the encrypted text. No. of sets=No. of alphabets \n\n"); printf("----------------------------------------------------------------------------"); printf("\n CHAR Frequency Suggest\n"); // The following prints all the sets for(i=0;i<26;i++){ for(j=0;j<alpha_cnt;j++){ if(j==0) printf(" %d ", i); printf(" % f %d", alpha_freq[i][j][0], int(alpha_freq[i][j][1])); }printf("\n");} /* // Debug Printing //printf(" Index of Coincidence = %f\n", sum); for(j=0;j<alpha_cnt;j++) { printf(" The IC of the %d set is : %f\n", j, sum[j]); } */ for(j=0;j<alpha_cnt;j++) { if(sum[j]<0.059) { printf("ERROR!!! The IC %f of set %d is not matching English\n",sum[j], j); printf("ERROR!!! The Number of alphabets calculated is incorrect\n",sum[j], j); return 1; } } fclose(fd); }//end ioc /* This function identifies the various repeated patterns of three or more characters appearing in an encrypted text file. It identifies the positions and uses that to get the likely number of alphabets used in the polyalphabetic cipher used. Program Author: Vijai Gandikota Contact : gandikotav@hotmail.com Date : February 10, 2002 */ int { kasiski(int argcnum, char**argval, int *char_cnt1, int *alpha_cnt) FILE *f1,*f2; char f_name[25], c, ch_equal_flg='n',firstchar='y'; int ctr1,ctr2,i,j,k,cnt,pos,prev_pos,pos2=0; int diffarray[800][10]; int factor_array[5]; for(ctr1=0;ctr1<5;ctr1++){ factor_array[ctr1]=0; } ctr1=0; //printf(" \n %d =the pointer variable value \n", *char_cnt1); for(ctr2=0;ctr2<800;ctr2++){ for(ctr1=0;ctr1<10;ctr1++){ diffarray[ctr2][ctr1]=0; //printf("diffarray[%d][%d]=%d\n",ctr2,ctr1,diffarray[ctr2][ctr1]); }} ctr1=0;ctr2=0; //clear previous copies of the tempstore file //get the name of encrypted file system("rm -f tempstore"); system("clear"); /* printf("Enter encrypted text file name: "); scanf("%s", f_name); fflush(stdin); */ if (argcnum < 2) { printf(" Please enter the file to be decrypted\n"); return 1; } if (argcnum >2 ){ printf(" Incorrect Number of Filenames\n"); return 1; } //Open the file that is given on the command line to decrypt f1=fopen(argval[1],"r"); /* To check if the file was indeed opened. THis code is not yet working if (!(f1=fopen(f_name, "r"))) { printf("unable to open the encrypted file"); return 1; } */ while(!feof(f1)) { c=fgetc(f1); if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z'))) cnt=cnt+1; //count the number of characters } *char_cnt1=cnt; //printf("number of characters in the encrypted file :%d\n", *char_cnt1); /* fclose(f1); f1=fopen(argval[1], "r");//reopen the file */ rewind(f1); //f2=fopen("tempstore","w");//open a file to store char matches and positions // Read all the characters from the encrypted file into an array char ch[cnt]; //create an array i=0; int newint=0; while(!feof(f1) && i<cnt){ c=fgetc(f1);//get each char if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z'))) ch[i++]=c; //assign to array if not a space } if(cnt>1000) i=1000; //MARK2 //printf("number of characters considered in the encrypted file :%d\n",i); //for(int z=0;z<i;z++) printf("ch[%d] = %c\n",z,ch[z]); /*For every set of 3 or more consecutive characters compare them to another consecutive characters of the same number and see if they are equal*/ char char_comp[i-1]; // This contains the character sets from ch to compare //fprintf(f2,"No. of chars, positions repeated in string"); for (j=3;j<21 && j<i;j++){ //Start of outer most loop looking at number of characters //We compare a maximum of 20 characters //printf("number of characters considering at a time = %d\n", j); for(pos=0;pos<(i-j+1);pos++){ //printf("current pos position is %d\n", pos); set of for(k=0;k<j;k++){//For picking char //printf("picking char %d\n", k); char_comp[k]=ch[pos+k]; }//End for picking char firstchar='y'; for(pos2=pos+1;pos2<i-j+1;pos2++){ //printf("pos2 pointer in ch at %d\n",pos2); ch_equal_flg='n'; //Reset the character equal flag for(k=0;k<j;k++){ //printf("comparing ch[%d] %c with char_comp[%d] %c\n",pos2+k,ch[pos2+k],pos+k,ch[pos+k], k, char_comp[k]); if(ch[pos2+k]==char_comp[k]){//Checking for equality to continue //printf("character ch[%d] is same as character ch[%d]\n",pos2+k,pos+k); }//endif if(ch[pos2+k]!=char_comp[k] || ch[pos2+k]==' '){//Checking nonequal to exit this for loop //printf("character sets not equal exiting this for loop\n"); ch_equal_flg='n'; break; } ch_equal_flg='y'; }//endfor k if(ch_equal_flg=='y'){ if(firstchar=='y'){ ctr2=ctr2+1; ctr1=0; diffarray[ctr2][ctr1]=j; ctr1=ctr1+1; prev_pos=pos; //fprintf(f2,"\n%d,%d",j,pos); //fprintf(f2,"\n%d",j); firstchar='n'; }//endif diffarray[ctr2][ctr1]=pos2-prev_pos; ctr1=ctr1+1; //fprintf(f2,",%d",pos2-prev_pos); prev_pos=pos2; }//endif ch_equal }//end for pos2 }//nd for pos }//end for j fclose(f1); /* Now here we will analyze the tempstore file and identify the possible factors*/ float modtest=0; for(ctr2=0;ctr2<800;ctr2++){ for(ctr1=0;ctr1<10;ctr1++){ if(diffarray[ctr2][ctr1]!=0) //printf(" %d ", diffarray[ctr2][ctr1]); for(i=0;i<5;i++){ modtest=diffarray[ctr2][ctr1]%(i+1); if(modtest==0){ factor_array[i]=factor_array[i]+1; } }//endfor } //printf("\n"); } int greatest=0; int greatint=0; printf("\nKasiski Factor Array: Freqency of factors 2 - 5\n"); for(i=1;i<5;i++) { printf("f[%d] = %d, ", i+1, factor_array[i]); if(greatest<factor_array[i]){ greatest=factor_array[i]; greatint=i; } } printf("\n"); /* if (greatint==1 &&(factor_array[1]-factor_array[3]<120)) { greatint=3; //printf("Though it is shown that 2 is more likely that inactuality may not be correct as the numbers for 4 and 2 are so close. Its is our suggestion that 4 is probably the right answer\n"); } greatint=greatint+1; //printf("\nGreatest = %d, alpha_cnt = %d\n", greatest, greatint); */ int alpha_num=0; double iocsum=0; iocsum=iocenc(ch, cnt); if(iocsum>=0.060){ alpha_num=1; //printf("Number of Enciphering alphabets is 1\n"); } if(iocsum>=0.0479 && iocsum<0.060) {alpha_num=2; //printf("Number of Enciphering alphabets is approx 2\n"); } if(iocsum>=0.046140 && iocsum<0.0479) {alpha_num=3; //printf("Number of Enciphering alphabets is approx 3\n"); } if(iocsum>0.044 && iocsum<0.046140) {alpha_num=4; //printf("Number of Enciphering alphabets is approx 4\n"); } if(iocsum>=0.041116 && iocsum<=0.044){ alpha_num=5; //printf("Number of Enciphering alphabets is approx 5\n"); } if(iocsum<0.0420){ alpha_num=5; printf("Number of Enciphering alphabets is difficult to predict assume 5\n", iocsum); } printf("\nIOC Calculated = %f\n", iocsum); if (greatint==1 &&(factor_array[1]-factor_array[3]<160) && (alpha_num!=(greatint+1))) { greatint=3; /*printf("Though it is shown that 2 is more likely that inactuality may not be c orrect as the numbers for 4 and 2 are so close. Its is our suggestion that 4 is probably the right answer\n");*/ } greatint=greatint+1; //printf("\nGreatest = %d, alpha_cnt = %d\n", greatest, greatint); if(alpha_num!=greatint){ printf("IOC No. of Alphabets (%d)!=Kasiski(%d)\n", alpha_num, greatint); if(alpha_num==1){ // THen that is the correct answer greatint=1; printf(" greatint = %d", greatint); } //if(alpha_num=5 && greatint==2) greatint=5; //return 1; } //return greatint; *alpha_cnt=greatint; //fclose(f2); }//end kasiski /* This function identifies the percentage of occurrence of characters appearing in an encrypted text file. IOCENC Program Author: Vijai Gandikota Contact : gandikotav@hotmail.com Date : February 10, 2002 */ various double iocenc (char *ch, int cntr) { int i,cnt=0; double charcnt[26]; for(i=0;i<26;i++) charcnt[i]=0; //initialize the array char c, firstchar='y'; /* FILE *f1,*f2; char f_name[25], c, firstchar='y'; //get the name of encrypted file //printf("Enter encrypted text file name: "); //scanf("%s", f_name); //fflush(stdin); //Open the file that is given on the command line to decrypt //f1=fopen(f_name,"r"); f1=fopen("encrypt.txt","r"); */ /* while(!feof(f1)) { c=fgetc(f1); */ for(i=0;i<cntr;i++){ c=ch[i]; if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z'))) { cnt=cnt+1; //count the number of characters switch(c){ case 'a': charcnt[0]=charcnt[0]+1; break; case 'A': charcnt[0]=charcnt[0]+1; break; case 'b': charcnt[1]=charcnt[1]+1; break; case 'B': charcnt[1]=charcnt[1]+1; break; case 'c': charcnt[2]=charcnt[2]+1; break; case 'C': charcnt[2]=charcnt[2]+1; break; case 'd': charcnt[3]=charcnt[3]+1; break; case 'D': charcnt[3]=charcnt[3]+1; break; case 'e': charcnt[4]=charcnt[4]+1; break; case 'E': charcnt[4]=charcnt[4]+1; break; case 'f': charcnt[5]=charcnt[5]+1; break; case 'F': case 'g': case 'G': case 'h': case 'H': case 'i': case 'I': case 'j': case 'J': case 'k': case 'K': case 'l': case 'L': case 'm': case 'M': case 'n': case 'N': case 'o': case 'O': case 'p': case 'P': case 'q': case 'Q': case 'r': case 'R': case 's': case 'S': case 't': case 'T': case 'u': case 'U': case 'v': case 'V': case 'w': case 'W': case 'x': case 'X': case 'y': case 'Y': case 'z': case 'Z': default : charcnt[5]=charcnt[5]+1; break; charcnt[6]=charcnt[6]+1; break; charcnt[6]=charcnt[6]+1; break; charcnt[7]=charcnt[7]+1; break; charcnt[7]=charcnt[7]+1; break; charcnt[8]=charcnt[8]+1; break; charcnt[8]=charcnt[8]+1; break; charcnt[9]=charcnt[9]+1; break; charcnt[9]=charcnt[9]+1; break; charcnt[10]=charcnt[10]+1; break; charcnt[10]=charcnt[10]+1; break; charcnt[11]=charcnt[11]+1; break; charcnt[11]=charcnt[11]+1; break; charcnt[12]=charcnt[12]+1; break; charcnt[12]=charcnt[12]+1; break; charcnt[13]=charcnt[13]+1; break; charcnt[13]=charcnt[13]+1; break; charcnt[14]=charcnt[14]+1; break; charcnt[14]=charcnt[14]+1; break; charcnt[15]=charcnt[15]+1; break; charcnt[15]=charcnt[15]+1; break; charcnt[16]=charcnt[16]+1; break; charcnt[16]=charcnt[16]+1; break; charcnt[17]=charcnt[17]+1; break; charcnt[17]=charcnt[17]+1; break; charcnt[18]=charcnt[18]+1; break; charcnt[18]=charcnt[18]+1; break; charcnt[19]=charcnt[19]+1; break; charcnt[19]=charcnt[19]+1; break; charcnt[20]=charcnt[20]+1; break; charcnt[20]=charcnt[20]+1; break; charcnt[21]=charcnt[21]+1; break; charcnt[21]=charcnt[21]+1; break; charcnt[22]=charcnt[22]+1; break; charcnt[22]=charcnt[22]+1; break; charcnt[23]=charcnt[23]+1; break; charcnt[23]=charcnt[23]+1; break; charcnt[24]=charcnt[24]+1; break; charcnt[24]=charcnt[24]+1; break; charcnt[25]=charcnt[25]+1; break; charcnt[25]=charcnt[25]+1; break; printf("error: this character should not be here"); break; }//end case }//endif }//end while if(cnt!=cntr) {printf("NO OF CHARS DONT MATCH"); return 1;} double sum = 0; for(i=0;i<26;i++) { charcnt[i]=(charcnt[i]/cnt)*((charcnt[i] - 1)/(cnt-1)); // printf("charcnt[%d] = %f\n",i,charcnt[i]); sum=sum+charcnt[i]; }//endfor /*printf(" Index of Coincidence = %f\n", sum);*/ return sum; /* int alpha_num=0; if(sum>=0.060){ alpha_num=1; //printf("Number of Enciphering alphabets is 1\n"); } if(sum>=0.0479 && sum<0.060) {alpha_num=2; //printf("Number of Enciphering alphabets is approx 2\n"); } if(sum>=0.046140 && sum<0.0479) {alpha_num=3; //printf("Number of Enciphering alphabets is approx 3\n"); } if(sum>0.044 && sum<0.046140) {alpha_num=4; //printf("Number of Enciphering alphabets is approx 4\n"); } if(sum>=0.041116 && sum<=0.044){ alpha_num=5; //printf("Number of Enciphering alphabets is approx 5\n"); } if(sum<0.0420){ alpha_num=0; //printf("Number of Enciphering alphabets is difficult to predict\n"); } return alpha_num; //fclose(f1); */ }//end iocenc void bsort(float* ptr, int n) { void order(float*, float*); int j, k; for(j=0;j<n-1;j++) for(k=j+1;k<n;k++) order(ptr+j, ptr+k); //order the pointer contents } void order(float* numb1, float* numb2) { if(*numb1 > *numb2) { float temp = *numb1; *numb1=*numb2; // Swapping operation *numb2=temp; } }