How to Eliminate Junk Data from Connect APIs like Gmail for Integrations Using Pipedream?

This topic was automatically generated from Slack. You can find the original thread here.

Does anyone have any thoughts on how to get rid of all the junk data from the connect apis for integrations like gmail? You may have thoughts here, trying to use Pipedream for our integrations but most if not all data returned is base64 mess (millions of tokens) or completely unstructured content useless for the AI itself

Can you please be more precise with what you want, other than “get rid of all the junk data from the connect apis”?

For example, are you looking for markdown or plaintext responses from some of the Gmail actions or triggers?

Hey, sure thing.

for example, when retrieving from the list mail endpoint in gmail, we get a wide variety of base64 encoded data, html representation data, all right besides plaintext.

Also a bunch of metadata and mail routing/sending information that isnt useful for a models usage.

typically a single call could result in >2M tokens if not properly pre-filtered and all necessary fields of interest identified, so I am trying to identify the way to generally extract all the useful information (like sender info / mail content / attachments, etc) from the gmail endpoint and similar fields of interest from different Pipedream Connect API endpoints

So yes, preferrably if we could get structured markdown data with all of the common fields of interest that would be helpful for a LLM, that would be an incredible improvement!

when you get a chance, can you take a look at this? I think we should figure out how to optionally return plain text or markdown for all of the Gmail components (and probably others as well). We already do this for Notion and some of the Gmail triggers.

I was looking into some of the Gmail actions, and most currently have a withTextPayload prop which converts the payload from base64 to plaintext. is this option not handling it well enough for you? We could probably add conversion to markdown, just want to understand if you’ve already used this or not.

I think we have that in the triggers only, right? Do we have it in the actions? We should probably expand the usage of it to any that don’t have it.

You’re right, it’s mostly in the triggers - but we also have it in find-email which is the only action that actually returns an email.