This topic was automatically generated from Slack. You can find the original thread here.
Does anyone have any thoughts on how to get rid of all the junk data from the connect apis for integrations like gmail? You may have thoughts here, trying to use Pipedream for our integrations but most if not all data returned is base64 mess (millions of tokens) or completely unstructured content useless for the AI itself
for example, when retrieving from the list mail endpoint in gmail, we get a wide variety of base64 encoded data, html representation data, all right besides plaintext.
Also a bunch of metadata and mail routing/sending information that isnt useful for a models usage.
typically a single call could result in >2M tokens if not properly pre-filtered and all necessary fields of interest identified, so I am trying to identify the way to generally extract all the useful information (like sender info / mail content / attachments, etc) from the gmail endpoint and similar fields of interest from different Pipedream Connect API endpoints
So yes, preferrably if we could get structured markdown data with all of the common fields of interest that would be helpful for a LLM, that would be an incredible improvement!
when you get a chance, can you take a look at this? I think we should figure out how to optionally return plain text or markdown for all of the Gmail components (and probably others as well). We already do this for Notion and some of the Gmail triggers.
I was looking into some of the Gmail actions, and most currently have a withTextPayload prop which converts the payload from base64 to plaintext. is this option not handling it well enough for you? We could probably add conversion to markdown, just want to understand if you’ve already used this or not.