Written by Sam Hessenauer, CTO & Co-founder of Nanome
Since the announcement of MARA in December 2023, We’ve made some incredible progress developing MARA.
Overcoming Initial Challenges in MARA’s Development: Addressing LLM Limitations and Enhancing Functional Integration
In Q1 2024, there were a few pieces of the puzzle that needed to be solved from bridging the gap between our vision of MARA and the prototypes we’ve made so far. LLMs have intrinsic biases from pre-training data, there are limited context window (input character length), they are trained only to a snapshot in time, and hallucination problems plagued the line between fact and fiction. There is also no ability for the LLMs to actually use tools on its own, you essentially needed to use software engineering methods to still convert outputs into a rigid structure (known as output parsing) and handle triggering appropriate functionality.
Enhancing LLM Efficacy with RAG and Agents: Pioneering Solutions for Bias Reduction and Tool Utilization
Langchain and others started to address these issues more head on, more notably Retrieval Augmented Generation (RAG) and Agents. RAG enabled you to essentially chunk and vectorize content and then prompt inject relevant chunks/snippets into the prompt, thus reducing the reliance on the LLM’s pre-trained bias/hallucination and essentially act as a flexible summarizer of authentic and correct data that you determined. Agents were essentially equipping the LLMs themselves with deterministic tools. We can write code functions that have logic and describe their use, and the LLMs would use reasoning to decide which tool to use.
Scalable Solutions and Reliable Execution: Advanced Problem-Solving from Prototype to Platform
We ran into (and solved) many challenges throughout last year, everything from scaling from a handful of cheminformatics tools to hundreds, to reliable planning and execution, to workflow creation, hallucination mitigation, and much more. We go as far as evaluating what’s being done on the fly and flagging if something looks right. That, alongside a very clear communication of what tools are being used, what the inputs and outputs are, gives our scientist users the confidence they need on evaluating the legitimacy of the outputs of our new platform. Something that falls short with any other software.
MARA essentially brings a completely internal ChatGPT-like interface to your internal scientific databases and informatics tools. It massively drops the barrier to entry for scientists to rapidly answer questions, expand their hypotheses, and trigger advanced informatics workflows like never before.
The Core of MARA: Tools, Knowledge, and Workflow Integration: Building a Versatile and Dynamic Scientific Environment
MARA is primarily driven by 3 concepts: Tools, The Knowledge base, and Workflows. MARA is only limited by the tools it has at its disposal, you can bring in your internal tools in via:
- REST API: Create a tool that can hit any API endpoint and return any type, from values to images to files.
- SQL Query Templates: Create a tool with a pre-set SQL query and its inputs or describe the query with natural language. This is great for repeatable SQL queries that only have input parameter variations.
- Python Snippets: Quickly type out your latest idea in python code using our embedded editor and make it a tool to be triggered in a sandboxed container (lambda-style).
Of course, for every conversation, MARA will automatically curate a dataset to be used immediately for data analysis. Users can easily chat with their data or even make modifications to the dataset via natural language. Making it easy for more users to focus on the science and not the menial parts of data science.
While MARA is primarily a web-based platform, it can also be interacted with via API, making it easy to trigger tools and workflows from your company’s internal tools or even from directly within an interactive Jupyter notebook. In the web interface, we’ve included a few quality of live features for scientists:
- Directly upload molecular structural files such as PDB, MMCIF, SDF, and more
- Draw a small molecule using a 2D Chemical Drawing tool
- Embedded Molecular Viewer for viewing molecular structural files such as MMCIF, PDB, and SDF files
Ensuring Security and Control with MARA: An Enterprise-Ready Platform for Scientific Inquiry
MARA is designed to be an enterprise-ready platform deployed completely behind your organization’s IT firewall and using the latest Open Source Large Language Models to power it. That means you can sleep easy knowing that all questions and conversations scientists ask MARA will be completely controlled by your organization.
This Scientific Co-pilot is part of an entirely new class of tools called LLM-enabled applications, and we expect to see many of the world’s industries get rapidly transformed by these tools over the next decade.
Validating MARA’s Impact: Early Pilots and Positive Feedback: Showcasing Future Possibilities and Real-World Applications
We’ve been in an early pilot with several customers and the feedback has been very positive. Now that we’ve made some incredible progress, we’d like to start sharing what’s possible with the world. Keep an eye out on our socials as we showcase impressive use-cases that now are incredibly simple to achieve with the MARA platform.
If you or your organization are interested in using MARA, if not for just a taste of the future, then please reach out to us. We’d love to hear your feedback and make rapid improvements which will accelerate our journey to our ultimate goal of equipping every scientist with their own Jarvis for Molecules.