Building a substructure searchable database
Anyone who has had to store or search a collection of chemical structures
rapidly realises that they need a software tool with a little chemical
intelligence. Whilst there are a number of commercial databases they tend to be
rather expensive.
Fine for large corporations but not suitable for a single
chemist or small group. Today I'm going to show you how to use OpenBabel to
build a substructure searchable database. Whilst in theory you could use MySQL
I'm going to assume that you don't actually want to become a database
administrator and instead I'm going to use FileMaker, not free (there is a free
trial) but incredibly easy to use. We will use Openbabel to actually run a
search on an external file pass the results to FileMaker which will display the
selected results. The easiest way to get OpenBabel if you have not done so
already is to install ChemSpotlight.
A D V E R T I S E M E N T
First we need some structures I've provided a file (you will need to unzip)
containing about 650 substituted acetophenones, the file contains a SMILES
string and an identification number.
- c1(c(cccc1)Br)C(C)=O ID_NUM_00000067
- c1(cc(ccc1)Br)C(C)=O ID_NUM_00000083
- c1(C(C)=O)ccc(Br)cc1 ID_NUM_00000105
- c1(c(c(c(F)c(c1F)F)F)F)C(C)=O ID_NUM_00000296
- c1(c(cccc1)F)C(C)=O ID_NUM_00000320
- c1(c(cccc1F)F)C(C)=O ID_NUM_00000328
- c1(cc(ccc1)F)C(C)=O ID_NUM_00000338
- c1(C(C)=O)ccc(F)cc1 ID_NUM_00000354
....
First open FileMaker and from the File menu select "New Database", select the
"Create a new empty file" radio button and call it SMILES_Database.fp7 and save
it. We now need to create two field in the database, SMILES and ID_NUM.
Now click OK, and from the FileMaker File menu select Import Records>File and
naviagate to the SMILES file and import as shown below.
We now need to set up a related record search within FileMaker, first define
another field as before called Find_List, this time lick on the Options button
and in the "Storage" select "Use global storage (one value for all records).
We can now set up the relationship, in the Define Database window click on
the "Relationships" tab and then click on the edit relationships button
(outlined in red below). From the two dropdown menus select SMILES_database and
in the first window select "Find_List" and in the second "ID_NUM", click OK and
you should be prompted to give the relationship a name call it "SMILES_Link",
now click OK.
We now need to set up the FileMaker part of the search, click on "Scripts" in
the FileMaker main menu and in the box that appears select "New"
Call the script "Find_Related" and from the list on the left select "Go to
Related record", if you then double click on the line in the script box you can
modify it to Show only related records, select the table "SMILES_Link" and
display using the current layout.
If you now cut and paste a selection of the ID_NUM into the Find_List box you
can see how the related records search works. If you now select "Find Related"
from the scripts menu the result should be a "Found Set" of only those records
that were in the Find_List field.
We now define another field as before called SMILES_Query, this time lick on
the Options button and in the "Storage" select "Use global storage (one value
for all records). This will be the text string we used to do the substructure
search. We now need to set up the files OpenBabel will use to do the searching,
firstly rename the downloaded file acetophenones.tab to acetophenones.smi,
whilst FileMaker needs a tab delimited file to import, the file is actually a
SMILES file (unfortunately the same extension .smi is used for self-mounting
images). You now need to decide where you are going to store all the files since
we will need to have explicit paths to the files to do the searching. For now
lets assume you have a folder on your desktop called Chem_Database and into this
you have put both acetophenones.smi and SMILES_Database.fp7. OpenBabel can
search SMILES files directly but it is MUCH, MUCH faster if you first create a
fast search index. To do this you can either use iBabel a GUI for Openbabel or
issue this command in the "Terminal".
/usr/local/bin/babel /Users/your_user_name/Desktop/Chem_Database/acetophenones.smi
-ofs -xFP2 /Users/your_user_name/Desktop/Chem_Database/acetophenones.fs
This creates a fast search file using FP2 which are fingerprints that Indexes
linear fragments up to 7 atoms. This can be searched using a SMILES string, for
example to identify all records containing iodobenzene type in the following.
usr/local/bin/babel /Users/swain/Desktop/Chem_Database/acetophenones.fs
-osmi -xt -s'Ic1ccccc1'
3 candidates from fingerprint search phase
ID_NUM_00045320
ID_NUM_00060283
ID_NUM_00094998
3 molecules converted
We can use an applescript within FileMaker to run this sort of query and then
put the results into the Find_List field and then run the "Find_Related" script
we wrote earlier. So back to the FileMaker Database, create a new script called
"Substructure_search. scroll down to the bottom of the left hand list and select
"Perform ApplexScript", double click on the command and enter the following
applescript text, make sure you get all the paths correct for your machine.
set the_smarts to (cell "SMILES_query" of current record)
set the_script to "/usr/local/bin/babel /Users/YOUR_USER_NAME/Desktop/Chem_Database/acetophenones.fs
-osmi -xt -s'" & the_smarts & "'"
set the_results to do shell script the_script --& " || echo ERROR" without
altering line endings
--display dialog the_results
set cell "Find_list" to the_results
This script takes the contents of the cell "SMILES_query" (put a valid SMILES
string eg Ic1ccccc1 in the field) and uses it to construct the shell script "the_script".
The do shell script then call Openbabel to do the actual search and returns
the_results (a list of record ids). This list is then put in "Find_List" and the
related records search is run.
|