Automated Decryption of Polyalphabetic Ciphers

advertisement
Automated Decryption of Polyalphabetic Ciphers
Vijai Gandikota
February 25, 2002
Contents
AUTOMATED DECRYPTION OF POLYALPHABETIC CIPHERS................................................... 1
DESCRIPTION OF THE PROGRAMMING ENVIRONMENT ................................................................................. 3
DETAILED DESCRIPTION OF THE APPROACH ............................................................................................... 3
Determination of Number of alphabets ................................................................................................. 3
Kasiski Method ................................................................................................................................................. 3
IOC Calculation ................................................................................................................................................ 4
Cipher text alphabets’ frequency determination ................................................................................... 4
Decryption ............................................................................................................................................ 5
Standard frequency based allocation ................................................................................................................. 5
Determination of “e” and “t” ............................................................................................................................. 5
Determination of “h”, “a”, “i” and confirmation of “e” .................................................................................... 5
Determination of “n”,”c”,”r”, “s” ...................................................................................................................... 5
Determination of “l” and confirmation of “s” ................................................................................................... 6
Time of run ............................................................................................................................................ 6
TESTING ..................................................................................................................................................... 6
RESULTS ..................................................................................................................................................... 6
DISCUSSION ................................................................................................................................................ 7
ISSUES ENCOUNTERED ............................................................................................................................... 7
PROMISING IDEAS OF IMPROVEMENT.......................................................................................................... 7
REFERENCES............................................................................................................................................... 8
CODE LISTING ............................................................................................................................................ 8
Description of the programming environment
The following table lists the various aspects of the programming environment used to
develop this application.
Hardware
OS Version
Processor Type
Language
Compiler
Source code file
Final Version Location
Libraries
Input encrypted file
Output decrypted file
Execution command
Other programs used
for testing
Sun Ultra-4
5.7
Sparc
C++
gcc
Split1.cpp
/home/vijai/version3/version4/version5
math.h, stdio.h, stdlib.h, string.h, ctype.h,
encrypt.txt
decrypted.txt
a.out <encrypted_filename>
freqcnt.cpp : for IOC verification
patid.cpp : for verification of factors generated
using Kasiski method
testcnt.cpp
: to test and compare output decrypted
file with the original plain text file
test2000.cpp : to test and compare the first 2000
characters in the original and the decrypted file
location
:
/home/vijai/CS691/version3/version4/version5/testing
Detailed Description of the approach
Determination of Number of alphabets
The determination of the number of alphabets was carried out through a two pronged
approach to improve the accuracy of the prediction.
Kasiski Method
Algorithm
1. Determine the number of characters in the text and load the text into memory
2. Identify repeated patterns of three or more characters until 20(arbitrary) characters
in the encrypted text.
3. Store the locations of the repeated patterns in a file (later in memory).
4. Store the distances between each occurrence in memory
5. Find out all the factors of each distance and record the number of times each
factor occurs. (This was implemented by dividing each distance with 1,2,3,4 or 5
and checking for 0 remainder).
6. Find the factor that appeared the most number of times and predict that as the
most likely candidate for the number of alphabets.
7. Compare this with the value returned from IOC. As IOC showed a lot of variance
in its predictions during testing kasiski was used to override IOC in the case of
conflicts and based on the number of alphabets predicted by the two.
Optimization of the prediction of the number of alphabets
i.
If IOC predicts 1 or 2 set number of alphabets to that value
ii.
If Kasiski is 2 and IOC is 4 and if the difference in the number of factors for 2
and 4 alphabets in Kasiski prediction is less than 160 choose 4 (as two is a
factor of 4).
iii. If Kasiski predicts 3 or 5 set the number to that value
8. Use this value predicted to further check the number of alphabets predicted.
i. Read in the Read in the cipher text into various columns in a
multidimensional array, where the number of columns are same as the
number of alphabets predicted. That is if the number of alphabets is 2,
read in first character into first set, second character into second set, third
character into first set and so on.
ii. For each set of characters determine the IOC. If it does not match specify
which set is failing the test and return after generating an error condition
and specifying the two differing results. (Code beyond this to go back to
Kasiski and IOC re-calculation could not be implemented due to lack of
time). If the IOC matches English for all sets proceed to next step.
IOC Calculation
1. Obtain the text loaded in memory
2. Determine the number of times each character appears in in the cipher text and
store
3. Determine the IOC by computing the IC formula for the determined frequencies
of individual characters.
λ=z
IC=∑ (Freq λ *( Freq λ-1))/n*(n-1)
λ=a
where Freq λ is the frequency of occurrence of each character and n
is the total number of characters.
4. Predict the number of characters by assuming ranges about the optimal IOC
published in literature for 1,2,3,4 and 5 characters. Please refer to the function
“kasiski” in the code for the exact ranges specified.
5. Return this value to Kasiski method for comparison and more accurate prediction.
Cipher text alphabets’ frequency determination
1. For each set of mono-alphabetic cipher-text sets, determine the frequency of each
character.
2. Initialize a 3rd dimension column with each set to store the suggested plain text
characters.
3. Each character of the cipher text is denoted by its positional notation. For example
“a” in the cipher text has location 0 in the array and the contents of the location 0
contain the frequency of the occurrence of “a”. Similarly “z” has the location 25.
Decryption
Note: No attempt is made to try to recreate or preserve the case of the characters from the
original plain text file.
Standard frequency based allocation
For each set of the mono-alphabetic cipher text characters make initial guesses based on
the frequency distribution of the characters in English and hierarchically assign
characters to the cipher characters. This is intended to identify “e” and “t” they being the
most widely occurring characters respectively and taking a chance on the rest of the
characters.
Determination of “e” and “t”
As mentioned above in each set the two most widely occurring characters are assigned
the positional value of e and t in the English alphabet (i.e 4 and 19, a being 0).
This is then taken for the determination of h, i, a, n, c, r and s through the identification of
digrams.
Determination of “h”, “a”, “i” and confirmation of “e”
1. Find the occurrence of the identified character for “t” in the cipher text and find
the frequency of occurrence of each character that occurs after it.
2. Take the top three characters- the first should point to “h”, the second to “i” and
the third to “e”. While the frequency of the digram “th” is clearly far more than
any other digram with “ t” as first character, “ti” and “te” have closer frequency
distributions.
3. Identify the characters for “i” and “e” and compare with the identified “e” from
above. The character that doesn’t match the character for “e” represents “i”.
(Please note that due to the initial loss of time because of the inaccurate
assumption stated below the distinct identification code of i from e currently
works for the case when the number of alphabets is 1. Given more time this could
have been implemented for the rest of the alphabets thus guaranteeing a rise in
accuracy.
4. Subsequently determine the occurrence of “t” and obtain the frequencies of all
characters appearing before the cipher character for “t”.
5. The most occurring character is the cipher text character representing “a”.
6. Assign all the characters thus identified, their true values in the multidimensional
array called alpha_freq[][][]
Determination of “n”,”c”,”r”, “s”
1. Similar to the identification of digrams for t above, identify all digrams with e as
the first character. The top two frequencies determine “r” and “s” and the lower
two frequencies determine “n” and “c”.
2. However “n” has an individual frequency much greater than “c” in English
language. Use this to distinguish “n” and “c”.
3. “r” and “s” have similar frequency of occurrence in English therefore at this time
they are assigned in the order of frequencies with which they appear, the larger
one being “r”.
Determination of “l” and confirmation of “s”
1. As this section was initially implemented using the inaccurate assumption listed
below it only has been implemented for the case where the number of alphabets is
identified as 1.
2. Determine the frequency distribution of all characters that occur as pairs (eg:
“ee”, “ll”, “ss” etc.). The characters with the top two frequencies in the common
pairs represent “s” and “l”.
3. Use the “s” identified here to compare with the “r” and “s” identified above to
make a more accurate prediction of “r” and “s”.
4. Given more time this could be modified and used in conjunction with the
characters identified above to verify the accuracy of the above predictions and to
identify “l” in all the cases of the number of alphabets used in the key.
Time of run
Code for testing the time taken to run the application was not implemented as in all cases
the program execution was completed within 10 seconds.
Testing
1. Testing for Number of alphabets in key
Exhaustive testing was performed on the number of alphabets guessed by the
Kasiski Method and the IOC calculation method.
2. Testing and comparison of plain text file and decrypted file
Tests were also performed on the decrypted text file and comparisons were made
with the original text file provided by the instructor.
Results
The following table lists the comparison results of the original texts and the decrypted
texts.
Text File
Text0
Text1
Text2
Number
of %
Correct %
Correct
alphabets used Characters in the Characters in the
for encryption
complete text
first 2000 characters
1
17.85
48.95
2
15.70
21.8
1
6.87
7.65
2
5.88
5.95
4
5.82
7.00
1
7.74
13.10
Text3
Text4
Text6
1
3
5
1
1
24.8
5.32
5.70
12.80
13.03
51.00
5.35
5.75
33.1
18.95
Discussion
1. As shown in the above table between 5.3 to 25 % characters (1- 7 characters
from the alphabet)have been accurately identified in the encrypted texts.
Moreover in the first 2000 characters of the decrypted texts up to 51% characters
(up to 13 characters from the alphabet). This is very promising as a little more
intelligence incorporated can allow the program to decrypt a larger set of
characters consistently.
2. Due to the use of frequency distributions a number of characters fall in the range
of other characters with the same frequency distributions. Therefore its is noticed
that the IOC of the decrypted texts are very close to the IOC of the English
language published in the literature [1]. As stated above further intelligence
incorporated into the program through a larger set of digram and trigram
comparisons as well as common pair comparisons, than used here, can more
accurately identify the correct characters.
3. In some cases for example with Text0 and 5 alphabets and Text6 with 5
alphabets, the IOC and the Kasiski method yield widely varying results as
suggested before in the class. This is especially so due to the variations in the
range of resulting numbers from IOC calculation.
Issues Encountered
Incorrect assumption: Some backtracking and re-coding had to be performed as the
digram and trigram comparisons were initially incorrectly made on the sets of monoalphabetic cipher text characters separated out for each alphabet in the key. This was
identified in time and corrected. However this caused a lot of time loss.
Since a number of validations of character predictions were implemented here (e.g.
checking and rechecking if character identified is correct) it was noted that there was a
dramatic increase seen in the number of characters identified when the number of
alphabets in the key was 1.
Promising Ideas of Improvement
Given More time the following could have been implemented and more testing conducted
to give very good results in the decryption of poly-alphabetic ciphers.
A. Algorithm Modifications
1. Calculation of number of alphabets : While evaluation and selection of
best guess made for the number of alphabets was implemented, more
selection criteria may be added to predict the number of alphabets
accurately in all cases despite the inaccuracies in the IOC calculations.
Another addition that was planned but could not be implemented due to
lack of time was to loop back to Kasiski and IOC prediction of the
calculation of IOC on the individual sets or mono-alphabetic ciphered
characters did not match English. This would have completely eliminated
any scope of inaccurate predictions.
2. Prediction of the characters: While predicting the characters the approach
can be modified to compare digrams of the first two identified characters
(“e” and “t”) with “h” to better ascertain the accuracy of “t”. Having done
this we can implement similar comparisons for the other characters like
“n”, “c”, “r” and “s” sought to be decrypted in this program. This would
allow proper identification of the characters that have frequencies very
similar to each other.
B. Code Modifications
1. Reduction of code through a better reuse of code
2. Better Memory management, through reuse of memory locations and
reduction in total memory usage
3. Addition of more validations to check for invalid inputs at runtime to
prevent buffer overflows.
C. Additions to algorithms
1. More digram testing and comparisons to identify more characters than
what has been implemented.
2. Addition of trigram, tetragram and pentagram comparisons
References
[1] Security in Computing, Charles Pfleeger, Second Edition, Prentice Hall PTR
[2] Polyalphabetic Ciphers,
http://www.cs.nps.navy.mil/curricula/tracks/security/notes/chap01_1.html
Code Listing
/*
This program takes the encrypted text and the suggested number of alphabets used in
encryption program and splits the text into sets
Other functions have been added in to analyze the text to predict the number of alphabets
used to encipher a message and to decrypt the text.
Program Author: Vijai Gandikota
Contact
: gandikotav@hotmail.com
Date
: February 10, 2002
*/
// Header File Declarations
#include<math.h>
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<ctype.h>
//Constant Declarations
#define MAX_ROWS 24000
#define MAX_COLS 5
#define ALPHABET 26
int char_cnt;
// Function Declarations
int ioc(char char_split_array[MAX_ROWS][MAX_COLS], int, int, int, char* );
int kasiski(int, char **, int*, int*);
double iocenc(char*, int);
void bsort(float*, int);
int main (int argc, char **argv)
{
FILE *f1;
int cnt, alpha_cnt, i, j, ctr, num_rows,total_num_rows,num_cols=0;
char c;
char_cnt=alpha_cnt=0;
if (argc != 2)
{
printf("\n\t Please enter one filename with: %s \n", argv[0]);
exit(1);
}
f1=fopen(argv[1], "r");
char_cnt=0;
/*
while(!feof(f1)){
c=fgetc(f1);
if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z')))
char_cnt=char_cnt+1; //count the number of characters
}
rewind(f1);
printf("Enter the number of used in encryption : ");
scanf("%d", &alpha_cnt);
fflush(stdin);
*/
kasiski(argc, argv, &char_cnt, &alpha_cnt);
char char_array[char_cnt];//create an array of the number of characters
i=0;
printf("***************************************************************");
printf("\n
DECRYPTION OF POLYALPHABETIC CIPHER ENCRYPTED TEXT
\n");
printf("***************************************************************");
printf("\nNumber of characters in the encrypted text (char_cnt) : %d\n",char_cnt);
printf("Number of alphabets used (alpha_cnt) is %d \n", alpha_cnt);
//read all the characters from the file into the array
while(!feof(f1)){
c=fgetc(f1);
if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z')))
{
char_array[i]=c;
i=i+1;
}
}
//for(j=0;j<char_cnt;j++) printf(" %c ", char_array[j]);
//Remember that the total number of columns is equal to the alpha_cnt
if (char_cnt%alpha_cnt==0)
total_num_rows=char_cnt/alpha_cnt;
else
total_num_rows=(char_cnt/alpha_cnt)+1;
//char char_split_array[total_num_rows][alpha_cnt];//Declare the array
char char_split_array[MAX_ROWS][MAX_COLS];//Declare the array
//Initialize the array
for(i=0;i<total_num_rows;i++){
for(j=0;j<alpha_cnt;j++){
char_split_array[i][j]=' ';
}
}
printf("Total number of characters per set = %d \n",total_num_rows);
//rest the values of i and j for later use
i=0;
j=0;
//split the char_array into alpha_cnt number of sets
for(num_cols=0;num_cols<alpha_cnt;num_cols++){
for(ctr=num_cols, num_rows=0;ctr<char_cnt && num_rows<total_num_rows;ctr=ctr+alpha_cnt,
num_rows++){
char_split_array[num_rows][num_cols]=char_array[ctr];
}//end for ctr
}//end for num_cols
// To print the char_split_array on screen for debugging
/*
for(i=0;i<total_num_rows;i++){
for (j=0;j<alpha_cnt;j++){
if(char_split_array[i][j]!=' ')
printf(" %c ", char_split_array[i][j]);
c=char_split_array[i][j];
}
if(c!=' ')
printf("\n");
}
Now we have sets of characters
Now we have to do the following
1. Analyze each set for frequency distributions by storing all the
distinct characters in an array for 26 by 1
2. Account for the case where more than one alpha_cnt is suggested
3. Identify unused letters
3. Try to identify the most common characters t, e
4. Try to identify diagrams
5. Try to identify Common Pairs
6. Try to identify triagrams
7. Try to identify tetragrams and pentagrams
*/
// Call the function that takes over and does all this now
ioc(char_split_array, alpha_cnt, total_num_rows, char_cnt, char_array);
}//end main
/* This program identifies the percentage of occurrence of
characters appearing in an encrypted text file.
IOC
Program Author: Vijai Gandikota
Contact
: gandikotav@hotmail.com
Date
: February 10, 2002
*/
various
int ioc(char char_split_array[MAX_ROWS][MAX_COLS], int alpha_cnt, int total_num_rows, int
char_cnt, char* ptr)
{
FILE *fd;
/*
printf(" %d = alpha_cnt, %d total_num_rows, %d, char_cnt\n",alpha_cnt,total_num_rows,
char_cnt);
*/
char c, firstchar='y';
int i,j,k,cnt=0, pos_ptr=0;
float alpha_freq[26][alpha_cnt][2];//for each of the 26 alphabets in each set
//stores the freq and makes possible sure shot suggestions
for(i=0;i<26;i++){
for(j=0;j<alpha_cnt;j++){
for(k=0;k<2;k++){
alpha_freq[i][j][k]=0;
}}}
int rows_per_col[alpha_cnt];
for(i=0; i<alpha_cnt;i++){
for(j=0;j<total_num_rows;j++){
c=char_split_array[j][i];
//printf(" %c ", c);
if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z')))
{
cnt=cnt+1; //count the number of characters
switch(c){
case 'a': alpha_freq[0][i][0]=alpha_freq[0][i][0]+1; break;
case 'A': alpha_freq[0][i][0]=alpha_freq[0][i][0]+1; break;
case 'b': alpha_freq[1][i][0]=alpha_freq[1][i][0]+1; break;
case 'B': alpha_freq[1][i][0]=alpha_freq[1][i][0]+1; break;
case 'c': alpha_freq[2][i][0]=alpha_freq[2][i][0]+1; break;
case 'C': alpha_freq[2][i][0]=alpha_freq[2][i][0]+1; break;
case 'd': alpha_freq[3][i][0]=alpha_freq[3][i][0]+1; break;
case 'D': alpha_freq[3][i][0]=alpha_freq[3][i][0]+1; break;
case 'e': alpha_freq[4][i][0]=alpha_freq[4][i][0]+1; break;
case 'E': alpha_freq[4][i][0]=alpha_freq[4][i][0]+1; break;
case 'f': alpha_freq[5][i][0]=alpha_freq[5][i][0]+1; break;
case 'F': alpha_freq[5][i][0]=alpha_freq[5][i][0]+1; break;
case 'g': alpha_freq[6][i][0]=alpha_freq[6][i][0]+1; break;
case 'G': alpha_freq[6][i][0]=alpha_freq[6][i][0]+1; break;
case 'h': alpha_freq[7][i][0]=alpha_freq[7][i][0]+1; break;
case 'H': alpha_freq[7][i][0]=alpha_freq[7][i][0]+1; break;
case 'i': alpha_freq[8][i][0]=alpha_freq[8][i][0]+1; break;
case 'I': alpha_freq[8][i][0]=alpha_freq[8][i][0]+1; break;
case 'j': alpha_freq[9][i][0]=alpha_freq[9][i][0]+1; break;
case 'J': alpha_freq[9][i][0]=alpha_freq[9][i][0]+1; break;
case 'k': alpha_freq[10][i][0]=alpha_freq[10][i][0]+1; break;
case 'K': alpha_freq[10][i][0]=alpha_freq[10][i][0]+1; break;
case 'l': alpha_freq[11][i][0]=alpha_freq[11][i][0]+1; break;
case 'L': alpha_freq[11][i][0]=alpha_freq[11][i][0]+1; break;
case 'm': alpha_freq[12][i][0]=alpha_freq[12][i][0]+1; break;
case 'M': alpha_freq[12][i][0]=alpha_freq[12][i][0]+1; break;
case 'n': alpha_freq[13][i][0]=alpha_freq[13][i][0]+1; break;
case 'N': alpha_freq[13][i][0]=alpha_freq[13][i][0]+1; break;
case 'o': alpha_freq[14][i][0]=alpha_freq[14][i][0]+1; break;
case 'O': alpha_freq[14][i][0]=alpha_freq[14][i][0]+1; break;
case 'p': alpha_freq[15][i][0]=alpha_freq[15][i][0]+1; break;
case 'P': alpha_freq[15][i][0]=alpha_freq[15][i][0]+1; break;
case 'q': alpha_freq[16][i][0]=alpha_freq[16][i][0]+1; break;
case 'Q': alpha_freq[16][i][0]=alpha_freq[16][i][0]+1; break;
case 'r': alpha_freq[17][i][0]=alpha_freq[17][i][0]+1; break;
case 'R': alpha_freq[17][i][0]=alpha_freq[17][i][0]+1; break;
case 's': alpha_freq[18][i][0]=alpha_freq[18][i][0]+1; break;
case 'S': alpha_freq[18][i][0]=alpha_freq[18][i][0]+1; break;
case 't': alpha_freq[19][i][0]=alpha_freq[19][i][0]+1; break;
case 'T': alpha_freq[19][i][0]=alpha_freq[19][i][0]+1; break;
case 'u': alpha_freq[20][i][0]=alpha_freq[20][i][0]+1; break;
case 'U': alpha_freq[20][i][0]=alpha_freq[20][i][0]+1; break;
case 'v': alpha_freq[21][i][0]=alpha_freq[21][i][0]+1; break;
case 'V': alpha_freq[21][i][0]=alpha_freq[21][i][0]+1; break;
case 'w': alpha_freq[22][i][0]=alpha_freq[22][i][0]+1; break;
case 'W': alpha_freq[22][i][0]=alpha_freq[22][i][0]+1; break;
case 'x': alpha_freq[23][i][0]=alpha_freq[23][i][0]+1; break;
case 'X': alpha_freq[23][i][0]=alpha_freq[23][i][0]+1; break;
case 'y': alpha_freq[24][i][0]=alpha_freq[24][i][0]+1; break;
case 'Y': alpha_freq[24][i][0]=alpha_freq[24][i][0]+1; break;
case 'z': alpha_freq[25][i][0]=alpha_freq[25][i][0]+1; break;
case 'Z': alpha_freq[25][i][0]=alpha_freq[25][i][0]+1; break;
default : printf("error: this character should not be here");
break;
}//end case
}//endif
}//end second for
if(c==' ')
rows_per_col[i]=total_num_rows-1;
else
rows_per_col[i]=total_num_rows;
/*
printf(" Number of actual rows in col %d is : %d\n", i, rows_per_col[i]);
*/
}//end first for
char ptrchar=' ';
char ptrcharnext=' ';
char ptrcharprev=' ';
float g1,g2, ind_ioc=0;
double sum[alpha_cnt];
for(i=0;i<alpha_cnt;i++)
sum[i]=0;
i=0;j=0;k=0;
int t=0;
int e=0;
int g_freq=0;
int g_freq2=0;
g1=g2=0;
int gt1=0;
int gt2=0;
int gs=0;
int gl=0;
int l=0;
int m=0;int gth=0; int gth1=0;
int e1,e2,e3,e4, er,es,en,ec,th,ti,te,t1,t2,t3;
e1=e2=e3=e4=er=es=en=ec=t1=t2=t3=th=ti=te=0;
float cmp[26];
int
alphabet[26]={'z','q','j','x','k','v','b','y','w','g','f','p','u','m','c','l','d','h','r'
,'i','n','s','o','a','t','e'};
int char_table_t[26], char_table_e[26],char_table_aa[26];
for(j=0;j<26;j++){
cmp[j]=0;
char_table_t[j]=0; // table to store freq of digrams with T
char_table_e[j]=0; // table to store freq of digrams with E
char_table_aa[j]=0; // table to store freq of common pairs eg AA, BB CC
}
for(j=0;j<alpha_cnt;j++){ // for each set of chars in char_split_array
for(i=0;i<26;i++)
{
ind_ioc=(alpha_freq[i][j][0]/rows_per_col[j])*((alpha_freq[i][j][0] 1)/(rows_per_col[j]-1));
sum[j]=sum[j]+ind_ioc;
alpha_freq[i][j][0]=(alpha_freq[i][j][0]/rows_per_col[j])*100;
if(alpha_freq[i][j][0]==0){
//Finding unused chars
alpha_freq[i][j][1]=99;
}
if(alpha_freq[i][j][0]>g1){ //Finding T and E
g2=g1;
g1=alpha_freq[i][j][0];
g_freq2=g_freq;
g_freq=i;
}//end if
else {
if(alpha_freq[i][j][0]>g2){
g2=alpha_freq[i][j][0];
g_freq2=i;
}//endif
}//end else
cmp[i]=alpha_freq[i][j][0];
//printf(" %d %d \n", g_freq, g_freq2);
}//endfor i
/*
for(l=0;l<26;l++)
printf(" %f ", cmp[l]);
*/
//printf("\n###\n");
bsort(cmp, ALPHABET);
for(l=0;l<26;l++)
//
printf(" %f ", cmp[l]);
for(l=0;l<26;l++){
for(m=0;m<26;m++){
if(alpha_freq[l][j][0]==cmp[m])
{
alpha_freq[l][j][1]=alphabet[m]-'a';
break;
}
}// end for
}//end for
for(l=0;l<26;l++)
cmp[l]=0;
l==m==0;
alpha_freq[g_freq][j][1]=4; //set the g_freq+1 th char in the jth set to e
alpha_freq[g_freq2][j][1]=19; // set the g_freq2+1 th char in jth set to t
//printf(" e = %d t= %d \n", g_freq, g_freq2);
// we have found t and e so far
/**********************************************************************
FINDING DIAGRAMS WITH T (H and A)
*********************************************************************/
for(l=0;l<26;l++) char_table_t[l]=0;
if(alpha_cnt!=1){
for(l=j;l<char_cnt-1;l=l+alpha_cnt){
ptrchar=*(ptr+l);
ptrcharnext=*(ptr+l+1);
if(ptrchar=='a'+g_freq2){
switch(ptrcharnext){
case 'a':char_table_t[0]++;break;
case 'A':char_table_t[0]++;break;
case 'b':char_table_t[1]++;break;
case 'B':char_table_t[1]++;break;
case 'c':char_table_t[2]++;break;
case 'C':char_table_t[2]++;break;
case 'd':char_table_t[3]++;break;
case 'D':char_table_t[3]++;break;
case 'e':char_table_t[4]++;break;
case 'E':char_table_t[4]++;break;
case 'f':char_table_t[5]++;break;
case 'F':char_table_t[5]++;break;
case 'g':char_table_t[6]++;break;
case 'G':char_table_t[6]++;break;
case 'h':char_table_t[7]++;break;
case 'H':char_table_t[7]++;break;
case 'i':char_table_t[8]++;break;
case 'I':char_table_t[8]++;break;
case 'j':char_table_t[9]++;break;
case 'J':char_table_t[9]++;break;
case 'k':char_table_t[10]++;break;
case 'K':char_table_t[10]++;break;
case 'l':char_table_t[11]++;break;
case 'L':char_table_t[11]++;break;
case 'm':char_table_t[12]++;break;
case 'M':char_table_t[12]++;break;
case 'n':char_table_t[13]++;break;
case 'N':char_table_t[13]++;break;
case 'o':char_table_t[14]++;break;
case 'O':char_table_t[14]++;break;
case 'p':char_table_t[15]++;break;
case 'P':char_table_t[15]++;break;
case 'q':char_table_t[16]++;break;
case 'Q':char_table_t[16]++;break;
case 'r':char_table_t[17]++;break;
case 'R':char_table_t[17]++;break;
case 's':char_table_t[18]++;break;
case 'S':char_table_t[18]++;break;
case 't':char_table_t[19]++;break;
case 'T':char_table_t[19]++;break;
case 'u':char_table_t[20]++;break;
case 'U':char_table_t[20]++;break;
case 'v':char_table_t[21]++;break;
case 'V':char_table_t[21]++;break;
case 'w':char_table_t[22]++;break;
case 'W':char_table_t[22]++;break;
case 'x':char_table_t[23]++;break;
case 'X':char_table_t[23]++;break;
case 'y':char_table_t[24]++;break;
case 'Y':char_table_t[24]++;break;
case 'z':char_table_t[25]++;break;
case 'Z':char_table_t[25]++;break;
default : break;
}//end case
}//end if
}//end for
gth=gth1=l=m=0;
for(l=0;l<26;l++){
//printf(" %d ", char_table_t[l]);
if(char_table_t[l]>gth){
gth=char_table_t[l];
gth1=l;
}
}//end for
//found the numerical value corresponding to h
//printf("\n H should be %d \n", gth1);
//check to see if that char int alpha_freq has numeric val of h
if(alpha_freq[gth1][j][1]!=7){
//then do something
for(m=0;m<26;m++){
if(alpha_freq[m][j][1]==7){
alpha_freq[m][j][1]=alpha_freq[gth1][j][1];// swap them
alpha_freq[gth1][j][1]=7; // and the set that char to h
}//end if
}// end for
}//end if
//SO I HAVE FOUND H!!!
gth=gth1=l=m=0;
/**********************************************************************
// Now to find A from AT
**********************************************************************/
for(l=0;l<26;l++) char_table_t[l]=0;
for(l=j;l<char_cnt;l=l+alpha_cnt){
ptrchar=*(ptr+l);
if(l!=0){
ptrcharprev=*(ptr+l-1);
if(ptrchar=='a'+g_freq2){
switch(ptrcharprev){
case 'a':char_table_t[0]++;break;
case 'A':char_table_t[0]++;break;
case 'b':char_table_t[1]++;break;
case 'B':char_table_t[1]++;break;
case 'c':char_table_t[2]++;break;
case 'C':char_table_t[2]++;break;
case 'd':char_table_t[3]++;break;
case 'D':char_table_t[3]++;break;
case 'e':char_table_t[4]++;break;
case 'E':char_table_t[4]++;break;
case 'f':char_table_t[5]++;break;
case 'F':char_table_t[5]++;break;
case 'g':char_table_t[6]++;break;
case 'G':char_table_t[6]++;break;
case 'h':char_table_t[7]++;break;
case 'H':char_table_t[7]++;break;
case 'i':char_table_t[8]++;break;
case 'I':char_table_t[8]++;break;
case 'j':char_table_t[9]++;break;
case 'J':char_table_t[9]++;break;
case 'k':char_table_t[10]++;break;
case 'K':char_table_t[10]++;break;
case 'l':char_table_t[11]++;break;
case 'L':char_table_t[11]++;break;
case 'm':char_table_t[12]++;break;
case 'M':char_table_t[12]++;break;
case 'n':char_table_t[13]++;break;
case 'N':char_table_t[13]++;break;
case 'o':char_table_t[14]++;break;
case 'O':char_table_t[14]++;break;
case 'p':char_table_t[15]++;break;
case 'P':char_table_t[15]++;break;
case 'q':char_table_t[16]++;break;
case 'Q':char_table_t[16]++;break;
case 'r':char_table_t[17]++;break;
case 'R':char_table_t[17]++;break;
case 's':char_table_t[18]++;break;
case 'S':char_table_t[18]++;break;
case 't':char_table_t[19]++;break;
case 'T':char_table_t[19]++;break;
case 'u':char_table_t[20]++;break;
case 'U':char_table_t[20]++;break;
case 'v':char_table_t[21]++;break;
case 'V':char_table_t[21]++;break;
case 'w':char_table_t[22]++;break;
case 'W':char_table_t[22]++;break;
case 'x':char_table_t[23]++;break;
case 'X':char_table_t[23]++;break;
case 'y':char_table_t[24]++;break;
case 'Y':char_table_t[24]++;break;
case 'z':char_table_t[25]++;break;
case 'Z':char_table_t[25]++;break;
default : break;
}//end case
}//end if
}//end if l not equal to 0
}//end for
gth=gth1=l=m=0;
for(l=0;l<26;l++){
//printf(" %d ", char_table_t[l]);
if(char_table_t[l]>gth){
gth=char_table_t[l];
gth1=l;
}
}//end for
//found the numerical value corresponding to a
//printf("\n A should be %d \n", gth1);
//check to see if that char int alpha_freq has numeric val of h
if(alpha_freq[gth1][j][1]!=0){
//then do something
for(m=0;m<26;m++){
if(alpha_freq[m][j][1]==0){
alpha_freq[m][j][1]=alpha_freq[gth1][j][1];// swap them
alpha_freq[gth1][j][1]=0; // and the set that char to a
}//end if
}// end for
}//end if
//SO I HAVE FOUND A!!!
gth=gth1=l=m=0;
for(l=0;l<26;l++) char_table_t[l]=0;
/***********************************************************
Now to find R from the occurences of ER in the complete text
**********************************************************/
for(l=j;l<char_cnt-1;l=l+alpha_cnt){
ptrchar=*(ptr+l);
ptrcharnext=*(ptr+l+1);
if(ptrchar=='a'+g_freq){
switch(ptrcharnext){
case 'a':char_table_t[0]++;break;
case 'A':char_table_t[0]++;break;
case 'b':char_table_t[1]++;break;
case 'B':char_table_t[1]++;break;
case 'c':char_table_t[2]++;break;
case 'C':char_table_t[2]++;break;
case 'd':char_table_t[3]++;break;
case 'D':char_table_t[3]++;break;
case 'e':char_table_t[4]++;break;
case 'E':char_table_t[4]++;break;
case 'f':char_table_t[5]++;break;
case 'F':char_table_t[5]++;break;
case 'g':char_table_t[6]++;break;
case 'G':char_table_t[6]++;break;
case 'h':char_table_t[7]++;break;
case 'H':char_table_t[7]++;break;
case 'i':char_table_t[8]++;break;
case 'I':char_table_t[8]++;break;
case 'j':char_table_t[9]++;break;
case 'J':char_table_t[9]++;break;
case 'k':char_table_t[10]++;break;
case 'K':char_table_t[10]++;break;
case 'l':char_table_t[11]++;break;
case 'L':char_table_t[11]++;break;
case 'm':char_table_t[12]++;break;
case 'M':char_table_t[12]++;break;
case 'n':char_table_t[13]++;break;
case 'N':char_table_t[13]++;break;
case 'o':char_table_t[14]++;break;
case 'O':char_table_t[14]++;break;
case 'p':char_table_t[15]++;break;
case 'P':char_table_t[15]++;break;
case 'q':char_table_t[16]++;break;
case 'Q':char_table_t[16]++;break;
case 'r':char_table_t[17]++;break;
case 'R':char_table_t[17]++;break;
case 's':char_table_t[18]++;break;
case 'S':char_table_t[18]++;break;
case 't':char_table_t[19]++;break;
case 'T':char_table_t[19]++;break;
case 'u':char_table_t[20]++;break;
case 'U':char_table_t[20]++;break;
case 'v':char_table_t[21]++;break;
case 'V':char_table_t[21]++;break;
case 'w':char_table_t[22]++;break;
case 'W':char_table_t[22]++;break;
case 'x':char_table_t[23]++;break;
case 'X':char_table_t[23]++;break;
case 'y':char_table_t[24]++;break;
case 'Y':char_table_t[24]++;break;
case 'z':char_table_t[25]++;break;
case 'Z':char_table_t[25]++;break;
default : break;
}//end case
}//end if
}//end for
gth=gth1=l=m=0;
/*
for(l=0;l<26;l++){
//printf(" %d ", char_table_t[l]);
if(char_table_t[l]>gth){
gth=char_table_t[l];
gth1=l;
}
}*/
for(k=0;k<26;k++){
if((char_table_t[k])>e1){
e4=e3;e3=e2;e2=e1;e1=char_table_t[k];
ec=en;en=es;es=er;er=k;
}
else { if((char_table_t[k])>e2){
e4=e3;e3=e2;e2=char_table_t[k];
ec=en;en=es;es=k;
} //end if
else { if(char_table_t[k]>e3) {
e4=e3;e3=char_table_t[k];
ec=en;en=k;
}//end if
else { if(char_table_t[k]>e4){
e4=char_table_t[k];
e4=k;
}//end if
}// end else
}//end else
}//end else
}//end for k
if(alpha_freq[en][j][0]>alpha_freq[ec][j][0]){
if(alpha_freq[en][j][1]!=13){
//then do something
for(m=0;m<26;m++){
if(alpha_freq[m][j][1]==13){
alpha_freq[m][j][1]=alpha_freq[en][j][1];// swap them
alpha_freq[en][j][1]=13; // and the set that char to r
}// end if
}// end for
}//end if
if(alpha_freq[ec][j][1]!=2){
//then do something
for(m=0;m<26;m++){
if(alpha_freq[m][j][1]==2){
alpha_freq[m][j][1]=alpha_freq[ec][j][1];// swap them
alpha_freq[ec][j][1]=2; // and the set that char to r
}// end if
}// end for
}//end if
}
else {
if(alpha_freq[ec][j][1]!=13){
//then do something
for(m=0;m<26;m++){
if(alpha_freq[m][j][1]==13){
alpha_freq[m][j][1]=alpha_freq[ec][j][1];// swap them
alpha_freq[ec][j][1]=13; // and the set that char to r
}// end if
}// end for
}//end if
if(alpha_freq[en][j][1]!=2){
//then do something
for(m=0;m<26;m++){
if(alpha_freq[m][j][1]==2){
alpha_freq[m][j][1]=alpha_freq[en][j][1];// swap them
alpha_freq[en][j][1]=2; // and the set that char to
}// end if
}// end for
}//end if
}
//Found N and C
//found the numerical value corresponding to R
//check to see if that char int alpha_freq has numeric val of r
if(alpha_freq[er][j][1]!=17){
//then do something
for(m=0;m<26;m++){
if(alpha_freq[m][j][1]==17){
alpha_freq[m][j][1]=alpha_freq[er][j][1];// swap them
alpha_freq[er][j][1]=17; // and the set that char to r
}// end if
}// end for
}//end if
//SO I HAVE FOUND R!!!
//found the numerical value corresponding to S
//check to see if that char int alpha_freq has numeric val of r
if(alpha_freq[es][j][1]!=18){
//then do something
for(m=0;m<26;m++){
if(alpha_freq[m][j][1]==18){
alpha_freq[m][j][1]=alpha_freq[es][j][1];// swap them
alpha_freq[es][j][1]=18; // and the set that char to s
}// end if
}// end for
}//end if
//SO I HAVE FOUND S!!!
gth=gth1=l=m=0;
k=ec=en=er=es=e1=e2=e3=e4=0;
}//end if
/*************************************************************************
//FINDING Digrams with T as the first character.
Looking for TH, TI and TE
with frequencies in that order
*************************************************************************/
if(alpha_cnt==1){
// I just realize that the following will only work for
// this case. I had thought that I was decrypting upto 10 chars
// in all cases but that is not the case as this section is operating
// on char_split_array columns and not the entire array
// I can still do it for the case where the number of alphabets > 1
// But I have to rearrange the following code and operate it on the
// complete set of encrypted characters with t and e marked
// So I am going to start writing that function and try to finish it.
// Now to find h, i
for (k=0;k<rows_per_col[j];k++){
if(((char_split_array[k][j])==('a'+g_freq2)) ||
((char_split_array[k][j])==('A'+g_freq2))){
switch(char_split_array[k+1][j]){
case 'a':char_table_t[0]++;break;
case 'A':char_table_t[0]++;break;
case 'b':char_table_t[1]++;break;
case 'B':char_table_t[1]++;break;
case 'c':char_table_t[2]++;break;
case 'C':char_table_t[2]++;break;
case 'd':char_table_t[3]++;break;
case 'D':char_table_t[3]++;break;
case 'e':char_table_t[4]++;break;
case 'E':char_table_t[4]++;break;
case 'f':char_table_t[5]++;break;
case 'F':char_table_t[5]++;break;
case 'g':char_table_t[6]++;break;
case 'G':char_table_t[6]++;break;
case 'h':char_table_t[7]++;break;
case 'H':char_table_t[7]++;break;
case 'i':char_table_t[8]++;break;
case 'I':char_table_t[8]++;break;
case 'j':char_table_t[9]++;break;
case 'J':char_table_t[9]++;break;
case 'k':char_table_t[10]++;break;
case 'K':char_table_t[10]++;break;
case 'l':char_table_t[11]++;break;
case 'L':char_table_t[11]++;break;
case 'm':char_table_t[12]++;break;
case 'M':char_table_t[12]++;break;
case 'n':char_table_t[13]++;break;
case 'N':char_table_t[13]++;break;
case 'o':char_table_t[14]++;break;
case 'O':char_table_t[14]++;break;
case 'p':char_table_t[15]++;break;
case 'P':char_table_t[15]++;break;
case 'q':char_table_t[16]++;break;
case 'Q':char_table_t[16]++;break;
case 'r':char_table_t[17]++;break;
case 'R':char_table_t[17]++;break;
case 's':char_table_t[18]++;break;
case 'S':char_table_t[18]++;break;
case 't':char_table_t[19]++;break;
case 'T':char_table_t[19]++;break;
case 'u':char_table_t[20]++;break;
case 'U':char_table_t[20]++;break;
case 'v':char_table_t[21]++;break;
case 'V':char_table_t[21]++;break;
case 'w':char_table_t[22]++;break;
case 'W':char_table_t[22]++;break;
case 'x':char_table_t[23]++;break;
case 'X':char_table_t[23]++;break;
case 'y':char_table_t[24]++;break;
case 'Y':char_table_t[24]++;break;
case 'z':char_table_t[25]++;break;
case 'Z':char_table_t[25]++;break;
default : break;
} //end case
}//end if
}//end for k
for(k=0;k<26;k++){
if((char_table_t[k])>t1){
t3=t2;t2=t1;t1=char_table_t[k];
te=ti;ti=th;th=k;
}
else { if((char_table_t[k])>t2){
t3=t2;t2=char_table_t[k];
te=ti;ti=k;
} //end if
else { if(char_table_t[k]>t3){
t3=char_table_t[k];
te=k;
}//end if
}//end else
}//end else
}//end for k
printf("For Set %d th=%d, ti=%d te= %d\n",j,th,ti,te);
if(te==g_freq) {
// e matches in both the calculations
if(alpha_freq[th][j][1]==0){
alpha_freq[th][j][1]=7;
alpha_freq[ti][j][1]=8;
}
}
else {
if(ti==g_freq) {
// ti is actually te and te is ti
if(alpha_freq[th][j][1]==0){
alpha_freq[th][j][1]=7;
alpha_freq[te][j][1]=8;
}//end if
}// end if
}//end else
// Now that we have the frequencies of the digrams with t
// Get the alphabets with the top three frequencies third should be e
// First will be h and second will be i
//if second is not i then second has to
// be e and third will positively be i
printf("\nChar table T %d \n",j);
for(k=0;k<26;k++){
// printf(" %d,", char_table_t[k]);
char_table_t[k]=0;
}
printf("\n");
/************************************************************************
FINDING Digrams with E as the first character.Looking for ER, ES, EN, EC
ER=143, ES= 132, EN=99.2, EC=80.9
After identifying candidates for en and ec use the fact that freq of n is
greater than that of c.
*************************************************************************/
for(k=0;k<rows_per_col[j];k++){
if(((char_split_array[k][j])==('a'+g_freq)) || ((char_split_array[k][j])==('A'+g_freq))){
switch(char_split_array[k+1][j]){
case 'a':char_table_e[0]++;break;
case 'A':char_table_e[0]++;break;
case 'b':char_table_e[1]++;break;
case 'B':char_table_e[1]++;break;
case 'c':char_table_e[2]++;break;
case 'C':char_table_e[2]++;break;
case 'd':char_table_e[3]++;break;
case 'D':char_table_e[3]++;break;
case 'e':char_table_e[4]++;break;
case 'E':char_table_e[4]++;break;
case 'f':char_table_e[5]++;break;
case 'F':char_table_e[5]++;break;
case 'g':char_table_e[6]++;break;
case 'G':char_table_e[6]++;break;
case 'h':char_table_e[7]++;break;
case 'H':char_table_e[7]++;break;
case 'i':char_table_e[8]++;break;
case 'I':char_table_e[8]++;break;
case 'j':char_table_e[9]++;break;
case 'J':char_table_e[9]++;break;
case 'k':char_table_e[10]++;break;
case 'K':char_table_e[10]++;break;
case 'l':char_table_e[11]++;break;
case 'L':char_table_e[11]++;break;
case 'm':char_table_e[12]++;break;
case 'M':char_table_e[12]++;break;
case 'n':char_table_e[13]++;break;
case 'N':char_table_e[13]++;break;
case 'o':char_table_e[14]++;break;
case 'O':char_table_e[14]++;break;
case 'p':char_table_e[15]++;break;
case 'P':char_table_e[15]++;break;
case 'q':char_table_e[16]++;break;
case 'Q':char_table_e[16]++;break;
case 'r':char_table_e[17]++;break;
case 'R':char_table_e[17]++;break;
case 's':char_table_e[18]++;break;
case 'S':char_table_e[18]++;break;
case 't':char_table_e[19]++;break;
case 'T':char_table_e[19]++;break;
case 'u':char_table_e[20]++;break;
case 'U':char_table_e[20]++;break;
case 'v':char_table_e[21]++;break;
case 'V':char_table_e[21]++;break;
case 'w':char_table_e[22]++;break;
case 'W':char_table_e[22]++;break;
case 'x':char_table_e[23]++;break;
case 'X':char_table_e[23]++;break;
case 'y':char_table_e[24]++;break;
case 'Y':char_table_e[24]++;break;
case 'z':char_table_e[25]++;break;
case 'Z':char_table_e[25]++;break;
default : break;
} //end case
}//end if
}//end for k
for(k=0;k<26;k++){
if((char_table_e[k])>e1){
e4=e3;e3=e2;e2=e1;e1=char_table_e[k];
ec=en;en=es;es=er;er=k;
}
else { if((char_table_e[k])>e2){
e4=e3;e3=e2;e2=char_table_e[k];
ec=en;en=es;es=k;
} //end if
else { if(char_table_e[k]>e3) {
e4=e3;e3=char_table_e[k];
ec=en;en=k;
}//end if
else { if(char_table_e[k]>e4){
e4=char_table_e[k];
e4=k;
}//end if
}// end else
}//end else
}//end else
}//end for k
// thus at this time we are tentatively guessing that ec,en,es and er
// are point to possible array locations whose index indicates the ciper
//alphabets corresponding to c, n, s and r
// compare the individual letter freqs of the lower two to get n and c
printf(" er=%d es=%d en=%d ec=%d\n",er,es,en,ec);
if(alpha_freq[en][j][0]>alpha_freq[ec][j][0]){
alpha_freq[en][j][1]=13;
alpha_freq[ec][j][1]=2;
}
else {
alpha_freq[ec][j][1]=13;
alpha_freq[en][j][1]=2;
}
// r & s will be too close so for now just guess that the lower one is s
// Then find all common pairs & from that find s and compare to check
// that previous guess was correct then the other char is r
// also find l from the common pairs
printf("\n Char Table E\n");
for(k=0;k<26;k++){
// printf(" %d,", char_table_e[k]);
char_table_e[k]=0;
}
printf("\n");
/*********************************************************************
Find Common Pairs of Characters the top two frequencies should correspond to
SS and LL in that order
**********************************************************************/
for(l=0;l<rows_per_col[j];l++){
if(char_split_array[l][j]==char_split_array[l+1][j]){
switch(char_split_array[l][j]){
case 'a':char_table_aa[0]++;break;
case 'A':char_table_aa[0]++;break;
case 'b':char_table_aa[1]++;break;
case 'B':char_table_aa[1]++;break;
case 'c':char_table_aa[2]++;break;
case 'C':char_table_aa[2]++;break;
case 'd':char_table_aa[3]++;break;
case 'D':char_table_aa[3]++;break;
case 'e':char_table_aa[4]++;break;
case 'E':char_table_aa[4]++;break;
case 'f':char_table_aa[5]++;break;
case 'F':char_table_aa[5]++;break;
case 'g':char_table_aa[6]++;break;
case 'G':char_table_aa[6]++;break;
case 'h':char_table_aa[7]++;break;
case 'H':char_table_aa[7]++;break;
case 'i':char_table_aa[8]++;break;
case 'I':char_table_aa[8]++;break;
case 'j':char_table_aa[9]++;break;
case 'J':char_table_aa[9]++;break;
case 'k':char_table_aa[10]++;break;
case 'K':char_table_aa[10]++;break;
case 'l':char_table_aa[11]++;break;
case 'L':char_table_aa[11]++;break;
case 'm':char_table_aa[12]++;break;
case 'M':char_table_aa[12]++;break;
case 'n':char_table_aa[13]++;break;
case 'N':char_table_aa[13]++;break;
case 'o':char_table_aa[14]++;break;
case 'O':char_table_aa[14]++;break;
case 'p':char_table_aa[15]++;break;
case 'P':char_table_aa[15]++;break;
case 'q':char_table_aa[16]++;break;
case 'Q':char_table_aa[16]++;break;
case 'r':char_table_aa[17]++;break;
case 'R':char_table_aa[17]++;break;
case 's':char_table_aa[18]++;break;
case 'S':char_table_aa[18]++;break;
case 't':char_table_aa[19]++;break;
case 'T':char_table_aa[19]++;break;
case 'u':char_table_aa[20]++;break;
case 'U':char_table_aa[20]++;break;
case 'v':char_table_aa[21]++;break;
case 'V':char_table_aa[21]++;break;
case 'w':char_table_aa[22]++;break;
case 'W':char_table_aa[22]++;break;
case 'x':char_table_aa[23]++;break;
case 'X':char_table_aa[23]++;break;
case 'y':char_table_aa[24]++;break;
case 'Y':char_table_aa[24]++;break;
case 'z':char_table_aa[25]++;break;
case 'Z':char_table_aa[25]++;break;
default : break;
} //end case
} //end if
} //end for l
//FINDING THE TOP TWO FREQUENCIES
for(m=0;m<26;m++){
//printf(" %d ", char_table_aa[m]);
if(char_table_aa[m]>gt1){
gt2=gt1;
gt1=char_table_aa[m];
gl=gs; gs=m;
}
else {
if(char_table_aa[m]>gt2){
gt2=char_table_aa[m];
gl=m;
}
}
//printf(" %d and %d \n", gs, gl);
}//end for m
//printf("\nThe chars with top two freq for this set are %d and %d \n", gs, gl);
// First check to see that s obtained here and in the E digrams is same
if(gs==es) {
printf("\ngs and es are equal\n");
if(alpha_freq[gs][j][1]==0)
alpha_freq[gs][j][1]=18;
if(alpha_freq[er][j][1]==0)
alpha_freq[er][j][1]=17;
}//end if
else {
printf("\n gs (%d) and es (%d) are not equal\n",gs,es);
if(gs==er){
printf("\ngs and er are equal\n");
if(alpha_freq[er][j][1]==0)
alpha_freq[er][j][1]=18;
if(alpha_freq[es][j][1]==0)
alpha_freq[es][j][i]=17;
}//end if
else
printf("\n gs (%d) and er (%d) are also not equal\n",gs,er);
}//end else
// we can be reasonable confident about L as it comes a clear second
// to S
if(alpha_freq[gl][j][1]==0)
alpha_freq[gl][j][1]=11;
//Thus we have identified s,n,c,e,r,t,l
printf("\nChar Table AA\n");
for(k=0;k<26;k++){
//printf(" %d ", char_table_aa[k]);
char_table_aa[k]=0; // resetting table to store freq of common pairs
}
printf("\n");
}//end if
g1=g2=gs=gl=gt1=gt2=g_freq=g_freq2=0;
l=m=th=ti=te=t1=t2=t3=e1=e2=e3=e4=es=er=en=ec=0;
}//end for j
//g_freq and its counterparts are used to identify e and t
i=j=l=m=0;
fd=fopen("decrypted.txt","w");
char decrypted_array[char_cnt];
//Replace the encrypted text in each of the sets by
for (j=0;j<alpha_cnt;j++){
for(i=0;i<total_num_rows;i++){
if(char_split_array[i][j]!=' '){
l=char_split_array[i][j]-'a';
char_split_array[i][j]='a'+int(alpha_freq[l][j][1]);
//printf(" %c ", char_split_array[i][j]);
}//end if
}//endfor
//printf("\n++++++++++++++++++++++++++++++\n");
}//end for
l=m=0;
for(i=0;i<total_num_rows;i++){
for(j=0;j<alpha_cnt;j++){
if(char_split_array[i][j]!=' '){
decrypted_array[l]=char_split_array[i][j];
fputc(decrypted_array[l],fd);
if(l%6==0) fputc(' ',fd);
l++;
}//end if
}//end for
}//end for
double check_ioc=iocenc(decrypted_array,char_cnt);
printf("\n THE IOC OF THE DECRYPTED TEXT IS : %f \n",check_ioc);
printf(" The name of the decrypted text file is decrypted.txt\n\n");
printf(" Frequency dist. table of the encrypted text. No. of sets=No. of alphabets
\n\n");
printf("----------------------------------------------------------------------------");
printf("\n CHAR Frequency Suggest\n");
// The following prints all the sets
for(i=0;i<26;i++){
for(j=0;j<alpha_cnt;j++){
if(j==0) printf(" %d ", i);
printf("
% f %d", alpha_freq[i][j][0], int(alpha_freq[i][j][1]));
}printf("\n");}
/*
// Debug Printing
//printf(" Index of Coincidence = %f\n", sum);
for(j=0;j<alpha_cnt;j++)
{
printf(" The IC of the %d set is : %f\n", j, sum[j]);
}
*/
for(j=0;j<alpha_cnt;j++)
{
if(sum[j]<0.059)
{
printf("ERROR!!! The IC %f of set %d is not matching English\n",sum[j], j);
printf("ERROR!!! The Number of alphabets calculated is incorrect\n",sum[j], j);
return 1;
}
}
fclose(fd);
}//end ioc
/* This function identifies the various repeated patterns of
three or more characters appearing in an encrypted text file.
It identifies the positions and uses that to get the likely
number of alphabets used in the polyalphabetic cipher used.
Program Author: Vijai Gandikota
Contact
: gandikotav@hotmail.com
Date
: February 10, 2002
*/
int
{
kasiski(int argcnum, char**argval, int *char_cnt1, int *alpha_cnt)
FILE *f1,*f2;
char f_name[25], c, ch_equal_flg='n',firstchar='y';
int ctr1,ctr2,i,j,k,cnt,pos,prev_pos,pos2=0;
int diffarray[800][10];
int factor_array[5];
for(ctr1=0;ctr1<5;ctr1++){
factor_array[ctr1]=0;
}
ctr1=0;
//printf(" \n %d =the pointer variable value \n", *char_cnt1);
for(ctr2=0;ctr2<800;ctr2++){
for(ctr1=0;ctr1<10;ctr1++){
diffarray[ctr2][ctr1]=0;
//printf("diffarray[%d][%d]=%d\n",ctr2,ctr1,diffarray[ctr2][ctr1]);
}}
ctr1=0;ctr2=0;
//clear previous copies of the tempstore file
//get the name of encrypted file
system("rm -f tempstore");
system("clear");
/*
printf("Enter encrypted text file name: ");
scanf("%s", f_name);
fflush(stdin);
*/
if (argcnum < 2)
{
printf(" Please enter the file to be decrypted\n");
return 1;
}
if (argcnum >2 ){
printf(" Incorrect Number of Filenames\n");
return 1;
}
//Open the file that is given on the command line to decrypt
f1=fopen(argval[1],"r");
/* To check if the file was indeed opened. THis code is not yet working
if (!(f1=fopen(f_name, "r")))
{
printf("unable to open the encrypted file");
return 1;
}
*/
while(!feof(f1)) {
c=fgetc(f1);
if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z')))
cnt=cnt+1; //count the number of characters
}
*char_cnt1=cnt;
//printf("number of characters in the encrypted file :%d\n", *char_cnt1);
/*
fclose(f1);
f1=fopen(argval[1], "r");//reopen the file
*/
rewind(f1);
//f2=fopen("tempstore","w");//open a file to store char matches and positions
// Read all the characters from the encrypted file into an array
char ch[cnt]; //create an array
i=0;
int newint=0;
while(!feof(f1) && i<cnt){
c=fgetc(f1);//get each char
if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z')))
ch[i++]=c; //assign to array if not a space
}
if(cnt>1000) i=1000;
//MARK2
//printf("number of characters considered in the encrypted file :%d\n",i);
//for(int z=0;z<i;z++) printf("ch[%d] = %c\n",z,ch[z]);
/*For every set of 3 or more consecutive characters compare them to another
consecutive characters of the same number and see if they are equal*/
char char_comp[i-1]; // This contains the character sets from ch to compare
//fprintf(f2,"No. of chars, positions repeated in string");
for (j=3;j<21 && j<i;j++){
//Start of outer most loop looking at number of characters
//We compare a maximum of 20 characters
//printf("number of characters considering at a time = %d\n", j);
for(pos=0;pos<(i-j+1);pos++){
//printf("current pos position is %d\n", pos);
set of
for(k=0;k<j;k++){//For picking char
//printf("picking char %d\n", k);
char_comp[k]=ch[pos+k];
}//End for picking char
firstchar='y';
for(pos2=pos+1;pos2<i-j+1;pos2++){
//printf("pos2 pointer in ch at %d\n",pos2);
ch_equal_flg='n'; //Reset the character equal flag
for(k=0;k<j;k++){
//printf("comparing ch[%d] %c with char_comp[%d] %c\n",pos2+k,ch[pos2+k],pos+k,ch[pos+k],
k, char_comp[k]);
if(ch[pos2+k]==char_comp[k]){//Checking for equality to continue
//printf("character ch[%d] is same as character ch[%d]\n",pos2+k,pos+k);
}//endif
if(ch[pos2+k]!=char_comp[k] || ch[pos2+k]==' '){//Checking nonequal to exit this for
loop
//printf("character sets not equal exiting this for loop\n");
ch_equal_flg='n';
break;
}
ch_equal_flg='y';
}//endfor k
if(ch_equal_flg=='y'){
if(firstchar=='y'){
ctr2=ctr2+1;
ctr1=0;
diffarray[ctr2][ctr1]=j;
ctr1=ctr1+1;
prev_pos=pos;
//fprintf(f2,"\n%d,%d",j,pos);
//fprintf(f2,"\n%d",j);
firstchar='n';
}//endif
diffarray[ctr2][ctr1]=pos2-prev_pos;
ctr1=ctr1+1;
//fprintf(f2,",%d",pos2-prev_pos);
prev_pos=pos2;
}//endif ch_equal
}//end for pos2
}//nd for pos
}//end for j
fclose(f1);
/* Now here we will analyze the tempstore file and identify the possible
factors*/
float modtest=0;
for(ctr2=0;ctr2<800;ctr2++){
for(ctr1=0;ctr1<10;ctr1++){
if(diffarray[ctr2][ctr1]!=0)
//printf(" %d ", diffarray[ctr2][ctr1]);
for(i=0;i<5;i++){
modtest=diffarray[ctr2][ctr1]%(i+1);
if(modtest==0){
factor_array[i]=factor_array[i]+1;
}
}//endfor
}
//printf("\n");
}
int greatest=0;
int greatint=0;
printf("\nKasiski Factor Array: Freqency of factors 2 - 5\n");
for(i=1;i<5;i++)
{
printf("f[%d] = %d, ", i+1, factor_array[i]);
if(greatest<factor_array[i]){
greatest=factor_array[i];
greatint=i;
}
}
printf("\n");
/*
if (greatint==1 &&(factor_array[1]-factor_array[3]<120))
{
greatint=3;
//printf("Though it is shown that 2 is more likely that inactuality may not be correct as
the numbers for 4 and 2 are so close. Its is our suggestion that 4 is probably the right
answer\n");
}
greatint=greatint+1;
//printf("\nGreatest = %d, alpha_cnt = %d\n", greatest, greatint);
*/
int alpha_num=0;
double iocsum=0;
iocsum=iocenc(ch, cnt);
if(iocsum>=0.060){ alpha_num=1;
//printf("Number of Enciphering alphabets is 1\n");
}
if(iocsum>=0.0479 && iocsum<0.060) {alpha_num=2;
//printf("Number of Enciphering alphabets is approx 2\n");
}
if(iocsum>=0.046140 && iocsum<0.0479) {alpha_num=3;
//printf("Number of Enciphering alphabets is approx 3\n");
}
if(iocsum>0.044 && iocsum<0.046140) {alpha_num=4;
//printf("Number of Enciphering alphabets is approx 4\n");
}
if(iocsum>=0.041116 && iocsum<=0.044){ alpha_num=5;
//printf("Number of Enciphering alphabets is approx 5\n");
}
if(iocsum<0.0420){ alpha_num=5;
printf("Number of Enciphering alphabets is difficult to predict assume 5\n", iocsum);
}
printf("\nIOC Calculated = %f\n", iocsum);
if (greatint==1 &&(factor_array[1]-factor_array[3]<160) && (alpha_num!=(greatint+1)))
{
greatint=3;
/*printf("Though it is shown that 2 is more likely that inactuality may not be c
orrect as the numbers for 4 and 2 are so close. Its is our suggestion that 4 is
probably the right answer\n");*/
}
greatint=greatint+1;
//printf("\nGreatest = %d, alpha_cnt = %d\n", greatest, greatint);
if(alpha_num!=greatint){
printf("IOC No. of Alphabets (%d)!=Kasiski(%d)\n", alpha_num, greatint);
if(alpha_num==1){
// THen that is the correct answer
greatint=1;
printf(" greatint = %d", greatint);
}
//if(alpha_num=5 && greatint==2) greatint=5;
//return 1;
}
//return greatint;
*alpha_cnt=greatint;
//fclose(f2);
}//end kasiski
/* This function identifies the percentage of occurrence of
characters appearing in an encrypted text file.
IOCENC
Program Author: Vijai Gandikota
Contact
: gandikotav@hotmail.com
Date
: February 10, 2002
*/
various
double iocenc (char *ch, int cntr)
{
int i,cnt=0;
double charcnt[26];
for(i=0;i<26;i++)
charcnt[i]=0; //initialize the array
char c, firstchar='y';
/*
FILE *f1,*f2;
char f_name[25], c, firstchar='y';
//get the name of encrypted file
//printf("Enter encrypted text file name: ");
//scanf("%s", f_name);
//fflush(stdin);
//Open the file that is given on the command line to decrypt
//f1=fopen(f_name,"r");
f1=fopen("encrypt.txt","r");
*/
/*
while(!feof(f1)) {
c=fgetc(f1);
*/
for(i=0;i<cntr;i++){
c=ch[i];
if(c!=' ' && c!='\n' && ((c>='a') && (c<='z') || (c>='A') && (c<='Z')))
{
cnt=cnt+1; //count the number of characters
switch(c){
case 'a': charcnt[0]=charcnt[0]+1; break;
case 'A': charcnt[0]=charcnt[0]+1; break;
case 'b': charcnt[1]=charcnt[1]+1; break;
case 'B': charcnt[1]=charcnt[1]+1; break;
case 'c': charcnt[2]=charcnt[2]+1; break;
case 'C': charcnt[2]=charcnt[2]+1; break;
case 'd': charcnt[3]=charcnt[3]+1; break;
case 'D': charcnt[3]=charcnt[3]+1; break;
case 'e': charcnt[4]=charcnt[4]+1; break;
case 'E': charcnt[4]=charcnt[4]+1; break;
case 'f': charcnt[5]=charcnt[5]+1; break;
case 'F':
case 'g':
case 'G':
case 'h':
case 'H':
case 'i':
case 'I':
case 'j':
case 'J':
case 'k':
case 'K':
case 'l':
case 'L':
case 'm':
case 'M':
case 'n':
case 'N':
case 'o':
case 'O':
case 'p':
case 'P':
case 'q':
case 'Q':
case 'r':
case 'R':
case 's':
case 'S':
case 't':
case 'T':
case 'u':
case 'U':
case 'v':
case 'V':
case 'w':
case 'W':
case 'x':
case 'X':
case 'y':
case 'Y':
case 'z':
case 'Z':
default :
charcnt[5]=charcnt[5]+1; break;
charcnt[6]=charcnt[6]+1; break;
charcnt[6]=charcnt[6]+1; break;
charcnt[7]=charcnt[7]+1; break;
charcnt[7]=charcnt[7]+1; break;
charcnt[8]=charcnt[8]+1; break;
charcnt[8]=charcnt[8]+1; break;
charcnt[9]=charcnt[9]+1; break;
charcnt[9]=charcnt[9]+1; break;
charcnt[10]=charcnt[10]+1; break;
charcnt[10]=charcnt[10]+1; break;
charcnt[11]=charcnt[11]+1; break;
charcnt[11]=charcnt[11]+1; break;
charcnt[12]=charcnt[12]+1; break;
charcnt[12]=charcnt[12]+1; break;
charcnt[13]=charcnt[13]+1; break;
charcnt[13]=charcnt[13]+1; break;
charcnt[14]=charcnt[14]+1; break;
charcnt[14]=charcnt[14]+1; break;
charcnt[15]=charcnt[15]+1; break;
charcnt[15]=charcnt[15]+1; break;
charcnt[16]=charcnt[16]+1; break;
charcnt[16]=charcnt[16]+1; break;
charcnt[17]=charcnt[17]+1; break;
charcnt[17]=charcnt[17]+1; break;
charcnt[18]=charcnt[18]+1; break;
charcnt[18]=charcnt[18]+1; break;
charcnt[19]=charcnt[19]+1; break;
charcnt[19]=charcnt[19]+1; break;
charcnt[20]=charcnt[20]+1; break;
charcnt[20]=charcnt[20]+1; break;
charcnt[21]=charcnt[21]+1; break;
charcnt[21]=charcnt[21]+1; break;
charcnt[22]=charcnt[22]+1; break;
charcnt[22]=charcnt[22]+1; break;
charcnt[23]=charcnt[23]+1; break;
charcnt[23]=charcnt[23]+1; break;
charcnt[24]=charcnt[24]+1; break;
charcnt[24]=charcnt[24]+1; break;
charcnt[25]=charcnt[25]+1; break;
charcnt[25]=charcnt[25]+1; break;
printf("error: this character should not be here");
break;
}//end case
}//endif
}//end while
if(cnt!=cntr) {printf("NO OF CHARS DONT MATCH"); return 1;}
double sum = 0;
for(i=0;i<26;i++)
{
charcnt[i]=(charcnt[i]/cnt)*((charcnt[i] - 1)/(cnt-1));
//
printf("charcnt[%d] = %f\n",i,charcnt[i]);
sum=sum+charcnt[i];
}//endfor
/*printf(" Index of Coincidence = %f\n", sum);*/
return sum;
/*
int alpha_num=0;
if(sum>=0.060){ alpha_num=1;
//printf("Number of Enciphering alphabets is 1\n");
}
if(sum>=0.0479 && sum<0.060) {alpha_num=2;
//printf("Number of Enciphering alphabets is approx 2\n");
}
if(sum>=0.046140 && sum<0.0479) {alpha_num=3;
//printf("Number of Enciphering alphabets is approx 3\n");
}
if(sum>0.044 && sum<0.046140) {alpha_num=4;
//printf("Number of Enciphering alphabets is approx 4\n");
}
if(sum>=0.041116 && sum<=0.044){ alpha_num=5;
//printf("Number of Enciphering alphabets is approx 5\n");
}
if(sum<0.0420){ alpha_num=0;
//printf("Number of Enciphering alphabets is difficult to predict\n");
}
return alpha_num;
//fclose(f1);
*/
}//end iocenc
void bsort(float* ptr, int n)
{
void order(float*, float*);
int j, k;
for(j=0;j<n-1;j++)
for(k=j+1;k<n;k++)
order(ptr+j, ptr+k); //order the pointer contents
}
void order(float* numb1, float* numb2)
{
if(*numb1 > *numb2)
{
float temp = *numb1;
*numb1=*numb2;
// Swapping operation
*numb2=temp;
}
}
Download