One of the integration pipelines I maintain for a client leverages AWS Athena as a middle tier to facilitate the transfer of data from NetSuite to Tableau. Recently this stopped working and I had to figure out why.
Here is a super simplified version of the data flow. In reality this flow is much more complicated!
- A NetSuite saved search is used as a data source
- Celigo is used as the iPaaS to run a daily flow to grab the data from this saved search and dump it into S3
- This data is made available to Athena, which creates a custom database view
- Tableau then connects the Athena and uses the view as its data source
Tableau started raising the following error every time someone tried to view the workbook
An error has been thrown from the AWS Athena client. [ErrorCode: INTERNAL_ERROR_QUERY_ENGINE] Amazon Athena experienced an internal error while executing this query. Please contact AWS support for further assistance. You will not be charge for this query. We apologize for the inconvenience.At least they are sorry?
Not super helpful. Also this client doesn’t pay for an AWS support plan, so they could not even reach out to AWS support as directed! So that error message is extra unhelpful.
After a little triage I was able to discover this error in Athena (shown with some identifying data omitted)
GENERIC_INTERNAL_ERROR: java.lang.ClassFormatError: Illegal field name “lambda_fake name/for field_123″ in class com/Facebook/presto/$gen/CursorProcessor_XXXXXXX”
That at least pointed me in the direction of the field name. Originally I thought someone changed the name of the field in the saved search in NetSuite, but after some digging that did not appear to be the case. So what happened?
The NetSuite saved search had a forward slash “/” in one field name (which one can glean from the error message above) and that was now the root cause of the error in Athena (aka the “fake name/for field” field). Athena used that field as part of a “CREATE VIEW” statement, and the view was failing to compile due to the “/” character in a source field name.
This had all been working for almost a year though, so what changed?
AWS automatically upgraded my client to the Athena V2 engine and I believe that is what triggered this error. I did not see any reference to this in the upgrade notes/docs, but removing that “/” fixed the issue. Normally I would never put a “/” in a field name when designing a software system, but when this integration was built it worked fine we all just glossed over it : )
The final resolution involved changing the field name in NetSuite to no longer have the “/”, re-running the integration, re-running the AWS crawler to update the table definition, and then changing the “CREATE VIEW” SQL inside of Athena to reference the new field name, sans “/” (e.g. it was now just “fake name for field”). Oddly enough the view itself still has a field name with the “/” in it, but that part works just fine. Leaving it there let us only modify this part of the flow, otherwise we would need to update logic in Tableau Prep and also in a lot of the data visualizations in Tableau itself.
BONUS: The full integration pipeline
I will probably due another post on this, but for those curious, here is the full end to end integration flow:
A few points:
- Using an iPaaS tool like Celigo eliminates the need to build/maintain the API connection to NetSuite (currently an XML/SOAP connection). Most iPaaS are “no code” solutions and lack things like version control, easy testing, mocking, observability, etc.. but they save an incredible amount of time as they instantly connect to hundreds of cloud applications.
- AWS Glue serves its purpose. The crawlers can be really finicky, but with a simple dataset it works fine and a good job accommodating larger datasets. In this case the NetSuite export is about 60,000 rows with about 40 columns and grows daily.
- Athena works well for querying CSV or JSON datasets! I use it quite often in bespoke integrations
- Tableau Prep into Tableau Online piece was supposed to be automated, however Tableau wanted to charge this client an absurd amount of money to allow them to automate Prep using their product called Conductor. My client just has someone manually run the Prep piece once a week and it takes them about 5 mins. It breaks my rule of 5, but it is easy to teach once someone has Prep installed.