Reading, Writing and using Lists
A D V E R T I S E M E N T
The following Applescript uses Chemdraw to calculate to calculate a variety
of molecular properties and then stores them as individual values. These can
then be used as demonstrated rather trivially by the display dialog command.
tell application "CS ChemDraw Ultra"
set the_SMILES to SMILES of selection
set Elem_Anal to Elemental Analysis of selection
set Exact_mass to Exact Mass of selection
set Mol_Form to Molecular Formula of selection
set Mol_weight to Molecular Weight of selection
set Chem_props to "SMILES " & the_SMILES & return & "Chem Analysis " \
& Elem_Anal & return & "Molecular Formula " \
& Mol_Form & return & "Molecular Weight " \
& Mol_Form & return & "Molecular Weight " & Mol_weight
display dialog Chem_props
end tell
This is fine if all you have to do is calculate the properties for a single
molecule but what if you want to perform the calculation of a list of
structures. Suppose you have a file containing a series of structures in SMILES
format, the file should look like this, a tab deliminated list with SMILES
string followed by compound name.
c1ccccc1 benzene
Ic1ccccc1 iodobenzene
O=C1CCCCC1 cyclohexanone
NC1CCCCC1 cyclohexamine
CN(C)c1cccnc1 3-dimethylaminopyridine
N1(c2ccccc2)CCNCC1 phenylpiperazine
You can download the file here
control click on the link and choose "Download linked file
....". What we need to do now is have the user choose a file, read the contents
and then store the data in a list. Lists are just a group of values stuck
between {} for example {1,2,3} or {1,"b","hello",{1,3,5}}. As you can see you
can mix types, and even have a list within a list. So in the script below we
first define the list we will read the molecules into, then get the user to
choose a file, read the contents of the file into theData.
set mol_list to {}
set theData to ""
set theFile to (choose file with prompt "Select the file:" of type {"TEXT"}) as alias
open for access theFile
set theData to read theFile using delimiter return
close access
If you copy and paste the above text into Script Editor, compile select
"Event Log" and click "Run" you can choose the temp_mac.txt file and you should
see a result as shown below. Each of the lines is read as a value into the
list:-
{"c1ccccc1 benzene", "Ic1ccccc1 iodobenzene", "O=C1CCCCC1 cyclohexanone",
"NC1CCCCC1 cyclohexamine", "CN(C)c1cccnc1 3-dimethylaminopyridine",
"N1(c2ccccc2)CCNCC1 phenylpiperazine"}
Having read the file we will of course want to write out the results at some
point so this seems a good time to think about the the file we will be saving
to. We do this with the help of a simple sub-routine, we want to save the
results in the same folder as the file we read in. We pass "theFile" to the
sub-routine which returns the folder in which it resides. It is a simple task to
append the output file name.
set the_file_path to GetParentPath(theFile)
set theSaveFile to the_file_path & "test2.smi"
on GetParentPath(theFile)
tell application "Finder" to return container of theFile as text
end GetParentPath
So now we have all the data into a list we can begin to manipulate it, first
we need to get the SMILES strings. At the moment the first item in the list is
"c1ccccc1 benzene" we need to seperate the two terms. First change the text
delimiter to "tab" then a simple repeat loop selects each item in theData and
copies it to the end of a new list called "mol_list". Remember to change the
delimiter back!
set text item delimiters to tab
repeat with i from 1 to count of theData
set theLine to text items of item i of theData
copy theLine to the end of mol_list
end repeat
set text item delimiters to ""
The result is a list of lists:-
{{"c1ccccc1", "benzene"}, {"Ic1ccccc1", "iodobenzene"}, {"O=C1CCCCC1",
"cyclohexanone"}, {"NC1CCCCC1", "cyclohexamine"}, {"CN(C)c1cccnc1",
"3-dimethylaminopyridine"}, {"N1(c2ccccc2)CCNCC1", "phenylpiperazine"}}
We can select both the "SMILES" and "name" of each item of "mol_list" and use
"ChemDraw to calculate the properties.
set the_compound to item i of mol_list
set the_SMILES to item 1 of the_compound
set the_name to item 2 of the_compound
--display dialog the_SMILES
--display dialog the_name
set the clipboard to the_SMILES
However getting ChemDraw to create the chemical structure from the SMILES
string is not straight-forward, there is not a "Paste SMILES" command in the
Applescript dictionary. So we script the menus to paste the SMILES. The rest of
the ChemDraw commands you have seen before. We then combine all the different
data items for a single compound into a list "mol_props_list" and then add them
to the end of "all_mol_list"
tell application "CS ChemDraw Ultra"
activate
if enabled of menu item "Paste" then do menu item "SMILES" of menu "Paste Special" of menu "Edit"
set the_CD_SMILES to SMILES of selection
set Elem_Anal to Elemental Analysis of selection
set Exact_mass to Exact Mass of selection
set Mol_Form to Molecular Formula of selection
set Mol_weight to Molecular Weight of selection
copy the_SMILES to the end of mol_props_list
copy the_name to the end of mol_props_list
copy the_CD_SMILES to the end of mol_props_list
copy Elem_Anal to the end of mol_props_list
copy Exact_mass to the end of mol_props_list
copy Mol_Form to the end of mol_props_list
copy Mol_weight to the end of mol_props_list
if enabled of menu item "Paste" then do menu item "Clear" of menu "Edit"
--display dialog (item 3 of mol_props_list)
end tell
copy mol_props_list to the end of all_mols_list
It only remains to convert the list to tab delimited text and then save the
result. The repeat loop does the conversion and the sub-routine adds each line
to the file. It is probably worth mentioning that having regularly used snippets
of code as sub-routines certainly helps the cut and paste school of programming!
repeat with i from 1 to num_compounds
set mol_list to item i of all_mols_list
-- convert list to text
set old_delim to AppleScript's text item delimiters
set AppleScript's text item delimiters to tab
set mol_list to mol_list as text
--set mol_list to mol_list & "\n" needs UNIX line endings
set mol_list to mol_list & "
"
set AppleScript's text item delimiters to old_delim
my write_to_file(mol_list, theSaveFile, true)
end repeat
on write_to_file(this_data, target_file, append_data)
try
set the target_file to the target_file as text
set the open_target_file to N
open for access file target_file with write permission
if append_data is false then N
set eof of the open_target_file to 0
write this_data to the open_target_file starting at eof
close access the open_target_file
return true
on error
try
close access file target_file
end try
return false
end try
end write_to_file
The result should look something like this:-
c1ccccc1 benzene c1ccccc1 C, 92.26; H, 7.74 78.0469501926 C6H6 78.11184
Ic1ccccc1 iodobenzene Ic1ccccc1 C, 35.32; H, 2.47; I, 62.21 203.9435931605
C6H5I 204.00837
O=C1CCCCC1 cyclohexanone O=C1CCCCC1 C, 73.43; H, 10.27; O, 16.30
98.0731649431 C6H10O 98.143
NC1CCCCC1 cyclohexamine NC1CCCCC1 C, 72.66; H, 13.21; N, 14.12 99.1047994225
C6H13N 99.17412
CN(C)c1cccnc1 3-dimethylaminopyridine CN(c1cnccc1)C C, 68.82; H, 8.25; N,
22.93 122.0843983314 C7H10N2 122.1677
N1(c2ccccc2)CCNCC1 phenylpiperazine N1(CCNCC1)c2ccccc2 C, 74.03; H, 8.70; N,
17.27 162.1156984598 C10H14N2 162.23156
c1ccccc1 benzene c1ccccc1 C, 92.26; H, 7.74 78.0469501926 C6H6 78.11184
Ic1ccccc1 iodobenzene Ic1ccccc1 C, 35.32; H, 2.47; I, 62.21 203.9435931605
C6H5I 204.00837
O=C1CCCCC1 cyclohexanone O=C1CCCCC1 C, 73.43; H, 10.27; O, 16.30
98.0731649431 C6H10O 98.143
NC1CCCCC1 cyclohexamine NC1CCCCC1 C, 72.66; H, 13.21; N, 14.12 99.1047994225
C6H13N 99.17412
CN(C)c1cccnc1 3-dimethylaminopyridine CN(c1cnccc1)C C, 68.82; H, 8.25; N,
22.93 122.0843983314 C7H10N2 122.1677
N1(c2ccccc2)CCNCC1 phenylpiperazine N1(CCNCC1)c2ccccc2 C, 74.03; H, 8.70; N,
17.27 162.1156984598 C10H14N2 162.23156
UNIX rears its head again
The problem is SMILES often arrive as UNIX files, and there are two different
line ending conventions in Mac OS X: Mac-style (lines end with return: "\r" or
ASCII character 13) and Unix-style (lines end with line-feed: "\n" or ASCII
.
|