Haley Beech (Olsen Lab, MIT)
Why a reluctant experimentalist was convinced that learning BigSMILES and using CRIPT are worthwhile endeavors.
I learned about BigSMILES, the machine-readable language used to stochastically describe polymers,1 on a chilly January day in 2019. One of my starter projects as a first year graduate student was to translate a table of over 300 polymers into the corresponding BigSMILES strings as an initial beta-test of the language (you can find this list in the SI if you want to get really bored really quickly).2 This was not what I had envisioned myself doing in grad school—I’m primarily an experimentalist, so the promise of machine learning capabilities made me roll my eyes and itch to get back in the lab where Python codes and Jupyter notebooks couldn’t bother me. Nevertheless, I dutifully learned about SMILES and the variations by which it was adapted to capture messy polymers of all shapes and sizes.
For a step-by-step guide to BigSMILES, I made a mini tutorial which is linked to this post.3 There is a blank version you can try to fill out on your own first and an accompanying answer key.4 It would definitely help to take a cursory glance at the SMILES language rules first if you’re not already familiar;5 plus, knowing SMILES will give you street cred with your chemist friends. Alternatively, you could pick your top 300 favorite polymers and translate them into a handy BigSMILES table. I watched an entire season of The Great British Bakeoff while completing that task which should tell you something about how easy it is use the language once you learn the basic rules.
Not convinced BigSMILES is for you? Neither was I. But then came CRIPT, which has just started realizing the potential of the BigSMILES language. Despite the night of the living dead connotations of the acronym, CRIPT is making data live longer and more useful existences than ever before.
Here are 5 reasons to not hate CRIPT, from a semi-cynical almost-fifth-year grad student who likes running experiments a whole lot more than writing Python codes.
#5) You don’t even have to know BigSMILES to use the database.
There’s a handy Ketcher-based6 structure drawing tool in the CRIPT app7 which will let you draw your structure and spit out the BigSMILES string so you don’t have to think about it! I’d still recommend taking a look at the tutorial and learning the basics of the language because it usually takes less time to write the string than to draw the molecule, but if you don’t have time for that this is a nice short cut.
#4) The support team is ready to help and learn with you.
The developers are super responsive and eager to work with you to improve the data input workflow. Seriously, they’re not even paying me to say this. They work super hard to make sure everyone, even cranky experimentalists, can easily use the database. Feel free to reach out to them with questions or concerns!
#3) Sharing is intrinsic to the data structure.
While it’s not quite far enough along to be my default sharing mechanism yet, I can easily see how CRIPT will simplify collaborations by allowing the members of a particular group to easily upload and view new data. My ideal vision would be to have a completely private space for my day-to-day notes (no one needs to read about the reaction that failed four times in a row), with the ability to switch any given entry to a specific group with a click. This will be super useful when it comes time to publish, since everything will already be grouped into shareable projects.
#2) Python skills are not required.
With the ever-improving user interface and upload capabilities, you can continue using your ancient Excel spreadsheets. With a bit of organization, they can be uploaded directly for safe keeping.
#1) Big things are coming.
I think of CRIPT as an investment. I would love to one day be able to simply download an associated data file with a manuscript instead of scraping data off of a plot to directly compare to mine. CRIPT is set up to do this. I would love to have a way to seamlessly work between my lab notebook and the database. CRIPT is set up to do this. I would love to be able to have my data automatically plotted as soon as it’s entered. CRIPT is set up to do this, too.
CRIPT will obviously be incredibly useful for big data projects and machine learning applications if it gains wide-spread acceptance in the polymer community. However, this hinges on the scientists and engineers who actually generate the data being willing to upload to and use the database. I spent a long time fighting the tide because it is rather annoying to spend hours learning a new language, rearranging spreadsheets, and struggling with Python to upload data which will primarily benefit scientists with far more computational expertise than myself. However, I hope this short post has started to convince you that it’s really not too hard to get over the initial learning curve, and that there are benefits for experimentalists to also hop on the CRIPT train. Even if the benefits aren’t immediately evident, there are worse ways to spend time than watching the next season of the bakeoff and uploading the data for your upcoming manuscript to CRIPT.
- Lin, Tzyy-Shyang, et al. “BigSMILES: a structurally-based line notation for describing macromolecules.” ACS central science 5.9 (2019): 1523-1531. DOI: 10.1021/acscentsci.9b00476. ↩
- Supporting Information for Lin et al. (2019) ↩
- Beech, H. and Olsen, B. “BigSMILES tutorial” Slideshare. ↩
- Beech, H. and Olsen, B. “BigSMILES workbook” Slideshare. ↩
- https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html ↩
- https://lifescience.opensource.epam.com/ketcher/ ↩
- https://criptapp.org/ ↩