Sheet Structure Storage

advertisement
JChem for Excel Structure Storage
Specification
Structure strings are stored on a hidden sheet. The name of the hidden sheet is:
"__JChemStructureSheet"
The first row on the sheet contains headers. The default and required columns are the
following:
Hash
StructureStringLength
StructureStringFormat
StructureString
Hash:
Contains the MD5 Hash of the structure string.
StructureStringLength:
The number of cells the structure string occupies.
StructureStringFormat:
The chemical file format of the structure string. It is possible to store structures in any
format Marvin API supports.
StructureString:
The compressed and encoded representation of structure strings.
Structure strings are GZipped and converted to Base64 representation.
The encoded strings in each cell are prefixed with “JChemExcel”. The reason the prefix
is needed, is that the Base64 encoded representation could start with characters, which
are not allowed in Excel cells.
An Excel cell could contain a maximum 32767 characters. If the structure string (or any
other data), does not fit in a single cell, a new column has to be added, with the same
header name.
Note: If a structure string is split, each part should start with the prefix “JChemExcel”.
Pseudo code example of creating a new workbook and add a single
small structure in SMILES
1. Create the workbook.
2. Add the structure sheet named "__JChemStructureSheet".
2.1. Fill the first row with the default headers.
2.1.1. A1: Hash, A2: StructureStringLength, A3: StructureStringFormat, A4:
StructureString
3. Add a sheet called Sheet1 (or any other name).
4. Create the MD5 hash from the structure string.
5. GZip the structure string.
6. Create the Base64 representation of the GZipped structure string.
7. Append the base64 encoded, compressed structure string after the “JChemExcel”
prefix.
8. On the structure sheet set the cell values:
8.1. A2: MD5 Hash of the structure string.
8.2. A3: 1
8.3. A4: MOL
8.4. A5: The resulting string from step 7.
9. On Sheet1 set cell formula:
9.1. A1: =JCSYSStructure(“<MD5 hash of the structure string>”)
Code for compressing and Base64 encoding structure strings
public class GzipCompressor
{
public static string CompressBase64(string text)
{
if (text == null)
throw new ArgumentNullException("text","text is null");
byte[] buffer = Encoding.UTF8.GetBytes(text);
var ms = new MemoryStream();
using (var zip = new GZipStream(ms,
CompressionMode.Compress, true))
{
zip.Write(buffer, 0, buffer.Length);
}
ms.Position = 0;
var outStream = new MemoryStream();
byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);
byte[] gzBuffer = new byte[compressed.Length + 4];
System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4,
compressed.Length);
System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0,
gzBuffer, 0, 4);
return Convert.ToBase64String(gzBuffer);
}
public static string DecompressBase64(string compressedText)
{
if (compressedText == null)
throw new ArgumentNullException("compressedText",
"compressedText is null");
byte[] gzBuffer = Convert.FromBase64String(compressedText);
//using (var ms = new MemoryStream())
var ms = new MemoryStream();
try
{
int msgLength = BitConverter.ToInt32(gzBuffer, 0);
ms.Write(gzBuffer, 4, gzBuffer.Length - 4);
byte[] buffer = new byte[msgLength];
ms.Position = 0;
//using (var zip = new GZipStream(ms,
CompressionMode.Decompress))
GZipStream zip = null;
try
{
zip = new GZipStream(ms,
CompressionMode.Decompress);
zip.Read(buffer, 0, buffer.Length);
}
finally
{
if (zip != null)
zip.Dispose();
}
return Encoding.UTF8.GetString(buffer);
}
finally { ms.Dispose(); }
}
Download