Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 2000 - Upgrade EMR to 7.2.0 #3490

Open
wants to merge 22 commits into
base: develop
Choose a base branch
from
Open

Conversation

patchwork01
Copy link
Collaborator

@patchwork01 patchwork01 commented Oct 15, 2024

Make sure you have checked all steps below.

Issue

  • My PR addresses the following issues and references them in the PR title. For example, "Issue 1234 - My Sleeper
    PR"

Tests

  • My PR adds the following tests OR does not need testing for this extremely good reason:
    • Covered by existing tests
    • Ran quick system test suite in AWS

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it, or I have linked to a
    separate issue for that below.

@patchwork01 patchwork01 marked this pull request as draft October 16, 2024 07:37
Base automatically changed from 2001-upgrade-datasketches to develop October 16, 2024 08:27
@patchwork01 patchwork01 force-pushed the 2000-upgrade-emr branch 2 times, most recently from ead7b52 to 1920d09 Compare October 16, 2024 12:43
@patchwork01 patchwork01 added the pr-base-for-stacking Base for stacked pull requests (a dependency for others) label Oct 16, 2024
@rtjd6554 rtjd6554 removed their assignment Oct 16, 2024
@patchwork01
Copy link
Collaborator Author

Bulk import on non-persistent EMR is failing on this branch currently. It passes on EMR Serverless, but on a non-persistent EMR on EC2 cluster we get this exception:

ERROR TaskResultGetter: Exception while getting task result
java.io.EOFException: null
  at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:331) ~[spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at org.apache.spark.serializer.SerializerHelper$.deserializeFromChunkedBuffer(SerializerHelper.scala:52) ~[spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:108) ~[spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(TaskResultGetter.scala:84) ~[spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [scala-library-2.12.17.jar:?]
  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1971) [spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:72) [spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
  at java.lang.Thread.run(Thread.java:840) [?:?]

Looking at the source code of KryoDeserializationStream, it looks like Spark is swallowing the real exception from Kryo without logging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
one-review-required pr-base-for-stacking Base for stacked pull requests (a dependency for others)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to EMR 7
2 participants