[Impeller] ComputeTest.CanCreateComputePass test leaks memory

This issue has been tracked since 2023-03-17.

@dnfield

./out/host_profile/impeller_unittests --gtest_filter=Compute/ComputeTest.CanCreateComputePass* --gtest_repeat=1000

Impeller command buffer could not be committed!
Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
Domain: MTLCommandBufferErrorDomain Code: out of memory
ComputePass: completed
<<<<<<<
[ERROR:flutter/fml/backtrace.cc(108)] Caught signal SIGABRT during program execution.
Frame 0: 0x10044753b fml::KillProcess()
Frame 1: 0x10044750c fml::LogMessage::~LogMessage()
Frame 2: 0x100446363 impeller::ValidationLog::~ValidationLog()
Frame 3: 0x100ccba50 ___ZN8impeller16CommandBufferMTL16OnSubmitCommandsENSt21_LIBCPP_ABI_NAMESPACE8functionIFvNS_13CommandBuffer6StatusEEEE_block_invoke
Frame 4: 0x7ff826cd3690 MTLDispatchListApply
Frame 5: 0x7ff826cd3bb6 -[_MTLCommandBuffer didCompleteWithStartTime:endTime:error:]
Frame 6: 0x7ff839cdf1b0 -[IOGPUMetalCommandBuffer didCompleteWithStartTime:endTime:error:]
Frame 7: 0x7ff826cd3830 -[_MTLCommandQueue commandBufferDidComplete:startTime:completionTime:error:]
Frame 8: 0x7ff839ce0422 __62-[IOGPUMetalCommandBuffer fillCommandBufferArgs:commandQueue:]_block_invoke.82
Frame 9: 0x7ff839ce845c IOGPUNotificationQueueDispatchAvailableCompletionNotifications
Frame 10: 0x7ff839ce8546 __IOGPUNotificationQueueSetDispatchQueue_block_invoke
...
dnfield wrote this answer on 2023-03-17

I think a teardown is probably missing. I'm not able to get the out of memory exception, but I do see consistently after 240 iterations in a debug unopt build:

Repeating all tests (iteration 240) . . .

Note: Google Test filter = Compute/ComputeTest.CanCreateComputePass*
[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from Compute/ComputeTest
[ RUN      ] Compute/ComputeTest.CanCreateComputePass/Metal
[FATAL:flutter/impeller/base/validation.cc(31)] Could not setup the command queue.
[ERROR:flutter/fml/backtrace.cc(108)] Caught signal SIGABRT during program execution.
Frame 0: 0x19e81a2c8 abort
Frame 1: 0x100cd8864 fml::LogMessage::~LogMessage()
Frame 2: 0x100cd8808 fml::LogMessage::~LogMessage()
Frame 3: 0x100cd8890 fml::LogMessage::~LogMessage()
Frame 4: 0x100caeca8 impeller::ValidationLog::~ValidationLog()
Frame 5: 0x100caedec impeller::ValidationLog::~ValidationLog()
Frame 6: 0x10320806c impeller::ContextMTL::ContextMTL()
Frame 7: 0x1032093a4 impeller::ContextMTL::ContextMTL()
Frame 8: 0x103209ea4 impeller::ContextMTL::Create()
Frame 9: 0x100aae728 impeller::PlaygroundImplMTL::PlaygroundImplMTL()
Frame 10: 0x100aaf5d8 impeller::PlaygroundImplMTL::PlaygroundImplMTL()
Frame 11: 0x100aac3cc std::_LIBCPP_ABI_NAMESPACE::make_unique[abi:v15000]<>()
Frame 12: 0x100aac1f8 impeller::PlaygroundImpl::Create()
Frame 13: 0x100a92058 impeller::Playground::SetupContext()
Frame 14: 0x103023d4c impeller::ComputePlaygroundTest::SetUp()
Frame 15: 0x103a88d38 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
Frame 16: 0x103a5bd00 testing::internal::HandleExceptionsInMethodIfSupported<>()
Frame 17: 0x103a5bbc8 testing::Test::Run()
Frame 18: 0x103a5ca30 testing::TestInfo::Run()
Frame 19: 0x103a5e078 testing::TestSuite::Run()
Frame 20: 0x103a6aba4 testing::internal::UnitTestImpl::RunAllTests()
Frame 21: 0x103a903c0 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
Frame 22: 0x103a6a6d4 testing::internal::HandleExceptionsInMethodIfSupported<>()
Frame 23: 0x103a6a540 testing::UnitTest::Run()
Frame 24: 0x101314bb8 RUN_ALL_TESTS()
Frame 25: 0x101314aac main
Frame 26: 0x19e5b7e50 start

[1]    71006 abort      ./out/host_debug_unopt_arm64/impeller_unittests  --gtest_repeat=1000
dnfield wrote this answer on 2023-03-18

I'm now getting the OOM message consistently after a few more tries.

I'm suspicious of https://developer.apple.com/forums/thread/120931. I ran through instruments and see a whole bunch of debug classes around the MTLCommandBuffer/Encoder stuff.

dnfield wrote this answer on 2023-03-18

If I run the test case with https://github.com/flutter/engine/blob/625ea5395042ad64df28ef188376100e3c243ade/impeller/renderer/backend/metal/command_buffer_mtl.mm#L114-L120 commented out, it passes and doesn't seem to leak memory. Allocation profiling is suggesting that Apple's adding a bunch fo debug instrumentation and it seems like that may be leaking.

This test case is slightly unrealistic, because we would never actually create so many command queues in a real application. But it is concerning that this is causing such sharp increases in memory usage and loss of GPU access.

dnfield wrote this answer on 2023-03-18

@chinmaygarde for input on whether the error information is worth this potential leak (or maybe he can find some other reason this results in a leak)

dnfield wrote this answer on 2023-03-18

I decided to look at that block in particular because allocation profiling was showing a lot of MTLCommandBufferDescriptor objects. I still see plenty of them after this change, but they seem to be tied to less memory.

More Details About Repo
Owner Name flutter
Repo Name flutter
Full Name flutter/flutter
Language Dart
Created Date 2015-03-06
Updated Date 2023-03-30
Star Count 151602
Watcher Count 3555
Fork Count 25000
Issue Count 11498

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date