Issue 2000 - Upgrade EMR to 7.2.0 #3490

patchwork01 · 2024-10-15T14:02:18Z

Make sure you have checked all steps below.

Issue

My PR addresses the following issues and references them in the PR title. For example, "Issue 1234 - My Sleeper
PR"
- Resolves Upgrade to EMR 7 #2000

Tests

My PR adds the following tests OR does not need testing for this extremely good reason:
- Covered by existing tests
- Ran quick system test suite in AWS

Documentation

In case of new functionality, my PR adds documentation that describes how to use it, or I have linked to a
separate issue for that below.

patchwork01 · 2024-10-18T07:25:49Z

Bulk import on non-persistent EMR is failing on this branch currently. It passes on EMR Serverless, but on a non-persistent EMR on EC2 cluster we get this exception:

ERROR TaskResultGetter: Exception while getting task result
java.io.EOFException: null
  at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:331) ~[spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at org.apache.spark.serializer.SerializerHelper$.deserializeFromChunkedBuffer(SerializerHelper.scala:52) ~[spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:108) ~[spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(TaskResultGetter.scala:84) ~[spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [scala-library-2.12.17.jar:?]
  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1971) [spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:72) [spark-core_2.12-3.5.1-amzn-0.jar:3.5.1-amzn-0]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
  at java.lang.Thread.run(Thread.java:840) [?:?]

Looking at the source code of KryoDeserializationStream, it looks like Spark is swallowing the real exception from Kryo without logging it.

patchwork01 added 10 commits October 15, 2024 14:00

Upgrade EMR to 7.2.0

235e408

Update references to RowEncoder.encoderFor

b2253ef

Update some dependency check suppressions

592906c

Regenerate properties templates

4dab763

Manage version of Scala dependencies

8791b38

Use Spark ExpressionEncoder

a8c8d15

Update devcontainer.json extensions

8fd0e4f

Downgrade Scala to match Spark dependency

75b5497

Add scala-reflect dependency to bulk-import-runner

1797474

Upgrade Spark Platform

0852b1e

patchwork01 marked this pull request as draft October 16, 2024 07:37

patchwork01 added 2 commits October 16, 2024 08:09

Adjust scope for scala-reflect

d2a6949

Ignore unused dependency false positive

18f3d15

Base automatically changed from 2001-upgrade-datasketches to develop October 16, 2024 08:27

patchwork01 added 2 commits October 16, 2024 08:36

Revert Scala downgrade

82789ba

Merge remote-tracking branch 'origin/develop' into 2000-upgrade-emr

b6f9ef4

patchwork01 marked this pull request as ready for review October 16, 2024 08:47

patchwork01 requested a review from gaffer01 October 16, 2024 08:47

patchwork01 assigned rtjd6554 Oct 16, 2024

patchwork01 added the one-review-required label Oct 16, 2024

patchwork01 added 3 commits October 16, 2024 09:58

Merge branch 'develop' into 2000-upgrade-emr

2c16e19

Upgrade Spark & Hadoop in EKS Dockerfiles

92e1430

Downgrade Scala to match version in Spark Docker image

1920d09

patchwork01 force-pushed the 2000-upgrade-emr branch 2 times, most recently from ead7b52 to 1920d09 Compare October 16, 2024 12:43

patchwork01 added 2 commits October 16, 2024 13:08

Adjust types setting extra Spark strategies

c221baf

Merge branch 'develop' into 2000-upgrade-emr

81b93c7

patchwork01 added the pr-base-for-stacking Base for stacked pull requests (a dependency for others) label Oct 16, 2024

rtjd6554 approved these changes Oct 16, 2024

View reviewed changes

rtjd6554 assigned patchwork01 Oct 16, 2024

rtjd6554 removed their assignment Oct 16, 2024

patchwork01 added 3 commits October 17, 2024 07:49

Adjust comments in pom.xml

07f20f1

Downgrade Arrow to match Spark

ff24b57

Merge branch 'develop' into 2000-upgrade-emr

143395a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 2000 - Upgrade EMR to 7.2.0 #3490

Issue 2000 - Upgrade EMR to 7.2.0 #3490

patchwork01 commented Oct 15, 2024 •

edited

Loading

patchwork01 commented Oct 18, 2024

Issue 2000 - Upgrade EMR to 7.2.0 #3490

Are you sure you want to change the base?

Issue 2000 - Upgrade EMR to 7.2.0 #3490

Conversation

patchwork01 commented Oct 15, 2024 • edited Loading

Issue

Tests

Documentation

patchwork01 commented Oct 18, 2024

patchwork01 commented Oct 15, 2024 •

edited

Loading