{"affected":[{"ecosystem_specific":{"binaries":[{"libnss_slurm2_23_02":"23.02.5-150100.3.11.2","libpmi0_23_02":"23.02.5-150100.3.11.2","libslurm39":"23.02.5-150100.3.11.2","perl-slurm_23_02":"23.02.5-150100.3.11.2","slurm_23_02":"23.02.5-150100.3.11.2","slurm_23_02-auth-none":"23.02.5-150100.3.11.2","slurm_23_02-config":"23.02.5-150100.3.11.2","slurm_23_02-config-man":"23.02.5-150100.3.11.2","slurm_23_02-cray":"23.02.5-150100.3.11.2","slurm_23_02-devel":"23.02.5-150100.3.11.2","slurm_23_02-doc":"23.02.5-150100.3.11.2","slurm_23_02-lua":"23.02.5-150100.3.11.2","slurm_23_02-munge":"23.02.5-150100.3.11.2","slurm_23_02-node":"23.02.5-150100.3.11.2","slurm_23_02-pam_slurm":"23.02.5-150100.3.11.2","slurm_23_02-plugin-ext-sensors-rrd":"23.02.5-150100.3.11.2","slurm_23_02-plugins":"23.02.5-150100.3.11.2","slurm_23_02-rest":"23.02.5-150100.3.11.2","slurm_23_02-slurmdbd":"23.02.5-150100.3.11.2","slurm_23_02-sql":"23.02.5-150100.3.11.2","slurm_23_02-sview":"23.02.5-150100.3.11.2","slurm_23_02-torque":"23.02.5-150100.3.11.2","slurm_23_02-webdoc":"23.02.5-150100.3.11.2"}]},"package":{"ecosystem":"SUSE:Linux Enterprise High Performance Computing 15 SP1-LTSS","name":"slurm_23_02","purl":"pkg:rpm/suse/slurm_23_02&distro=SUSE%20Linux%20Enterprise%20High%20Performance%20Computing%2015%20SP1-LTSS"},"ranges":[{"events":[{"introduced":"0"},{"fixed":"23.02.5-150100.3.11.2"}],"type":"ECOSYSTEM"}]}],"aliases":[],"details":"This update for slurm_23_02 fixes the following issues:\n\n- Updated to version 23.02.5 with the following changes:\n\n  * Bug Fixes:\n\n    + Revert a change in 23.02 where `SLURM_NTASKS` was no longer set in the\n      job's environment when `--ntasks-per-node` was requested.\n      The method that is is being set, however, is different and should be more\n      accurate in more situations.\n    + Change pmi2 plugin to honor the `SrunPortRange` option. This matches the\n      new behavior of the pmix plugin in 23.02.0. Note that neither of these\n      plugins makes use of the `MpiParams=ports=` option, and previously\n      were only limited by the systems ephemeral port range.\n    + Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if\n      a node features plugin is configured.\n    + Fix and prevent reoccurring reservations from overlapping.\n    + `job_container/tmpfs` - Avoid attempts to share BasePath between nodes.\n    + With `CR_Cpu_Memory`, fix node selection for jobs that request gres and\n      `--mem-per-cpu`.\n    + Fix a regression from 22.05.7 in which some jobs were allocated too few\n      nodes, thus overcommitting cpus to some tasks.\n    + Fix a job being stuck in the completing state if the job ends while the\n      primary controller is down or unresponsive and the backup controller has\n      not yet taken over.\n    + Fix `slurmctld` segfault when a node registers with a configured\n      `CpuSpecList` while `slurmctld` configuration has the node without\n      `CpuSpecList`.\n    + Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state after\n      not registering by `ResumeTimeout`.\n    + `slurmstepd` - Avoid cleanup of `config.json-less` containers spooldir\n      getting skipped.\n    + Fix scontrol segfault when 'completing' command requested repeatedly in\n      interactive mode.\n    + Properly handle a race condition between `bind()` and `listen()` calls\n      in the network stack when running with SrunPortRange set.\n    + Federation - Fix revoked jobs being returned regardless of the\n      `-a`/`--all` option for privileged users.\n    + Federation - Fix canceling pending federated jobs from non-origin\n      clusters which could leave federated jobs orphaned from the origin\n      cluster.\n    + Fix sinfo segfault when printing multiple clusters with `--noheader`\n      option.\n    + Federation - fix clusters not syncing if clusters are added to a\n      federation before they have registered with the dbd.\n    + `node_features/helpers` - Fix node selection for jobs requesting\n      changeable.\n      features with the `|` operator, which could prevent jobs from\n      running on some valid nodes.\n    + `node_features/helpers` - Fix inconsistent handling of `&` and `|`,\n      where an AND'd feature was sometimes AND'd to all sets of features\n      instead of just the current set. E.g. `foo|bar&baz` was interpreted\n      as `{foo,baz}` or `{bar,baz}` instead of how it is documented:\n      `{foo} or {bar,baz}`.\n    + Fix job accounting so that when a job is requeued its allocated node\n      count is cleared. After the requeue, sacct will correctly show that\n      the job has 0 `AllocNodes` while it is pending or if it is canceled\n      before restarting.\n    + `sacct` - `AllocCPUS` now correctly shows 0 if a job has not yet\n      received an allocation or if the job was canceled before getting one.\n    + Fix intel OneAPI autodetect: detect the `/dev/dri/renderD[0-9]+` GPUs,\n      and do not detect `/dev/dri/card[0-9]+`.\n    + Fix node selection for jobs that request `--gpus` and a number of\n      tasks fewer than GPUs, which resulted in incorrectly rejecting these\n      jobs.\n    + Remove `MYSQL_OPT_RECONNECT` completely.\n    + Fix cloud nodes in `POWERING_UP` state disappearing (getting set\n      to `FUTURE`)\n      when an `scontrol reconfigure` happens.\n    + `openapi/dbv0.0.39` - Avoid assert / segfault on missing coordinators\n      list.\n    + `slurmrestd` - Correct memory leak while parsing OpenAPI specification\n      templates with server overrides.\n    + Fix overwriting user node reason with system message.\n    + Prevent deadlock when `rpc_queue` is enabled.\n    + `slurmrestd` - Correct OpenAPI specification generation bug where\n      fields with overlapping parent paths would not get generated.\n    + Fix memory leak as a result of a partition info query.\n    + Fix memory leak as a result of a job info query.\n    + For step allocations, fix `--gres=none` sometimes not ignoring gres\n      from the job.\n    + Fix `--exclusive` jobs incorrectly gang-scheduling where they shouldn't.\n    + Fix allocations with `CR_SOCKET`, gres not assigned to a specific\n      socket, and block core distribion potentially allocating more sockets\n      than required.\n    + Revert a change in 23.02.3 where Slurm would kill a script's process\n      group as soon as the script ended instead of waiting as long as any\n      process in that process group held the stdout/stderr file descriptors\n      open. That change broke some scripts that relied on the previous\n      behavior. Setting time limits for scripts (such as\n      `PrologEpilogTimeout`) is strongly encouraged to avoid Slurm waiting\n      indefinitely for scripts to finish.\n    + Fix `slurmdbd -R` not returning an error under certain conditions.\n    + `slurmdbd` - Avoid potential NULL pointer dereference in the mysql\n      plugin.\n    + Fix regression in 23.02.3 which broken X11 forwarding for hosts when\n      MUNGE sends a localhost address in the encode host field. This is caused\n      when the node hostname is mapped to 127.0.0.1 (or similar) in\n      `/etc/hosts`.\n    + `openapi/[db]v0.0.39` - fix memory leak on parsing error.\n    + `data_parser/v0.0.39` - fix updating qos for associations.\n    + `openapi/dbv0.0.39` - fix updating values for associations with null\n      users.\n    + Fix minor memory leak with `--tres-per-task` and licenses.\n    + Fix cyclic socket cpu distribution for tasks in a step where\n      `--cpus-per-task` < usable threads per core.\n    + `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of\n      node's energy field `current_watts` to a dictionary to account for\n      unset value instead of dumping 4294967294.\n    + `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's\n      field 'priority' to a dictionary to account for unset value instead of\n      dumping 4294967294.\n    + slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code'\n      code field in `v0.0.39_job_exit`_code will be set to -127 instead of\n      being left unset where job does not have a relevant return code.\n\n  * Other Changes:\n\n    + Remove --uid / --gid options from salloc and srun commands. These options\n      did not work correctly since the CVE-2022-29500 fix in combination with\n      some changes made in 23.02.0.\n    + Add the `JobId` to `debug()` messages indicating when\n      `cpus_per_task/mem_per_cpu` or `pn_min_cpus` are being automatically\n      adjusted.\n    + Change the log message warning for rate limited users from verbose to\n      info.\n    + `slurmstepd` - Cleanup per task generated environment for containers in\n      spooldir.\n    + Format batch, extern, interactive, and pending step ids into strings that\n      are human readable.\n    + `slurmrestd` - Reduce memory usage when printing out job CPU frequency.\n    + `data_parser/v0.0.39` - Add `required/memory_per_cpu` and\n      `required/memory_per_node` to `sacct --json` and `sacct --yaml` and\n      `GET /slurmdb/v0.0.39/jobs` from slurmrestd.\n    + `gpu/oneapi` - Store cores correctly so CPU affinity is tracked.\n    + Allow `slurmdbd -R` to work if the root assoc id is not 1.\n    + Limit periodic node registrations to 50 instead of the full `TreeWidth`.\n      Since unresolvable `cloud/dynamic` nodes must disable fanout by setting\n      `TreeWidth` to a large number, this would cause all nodes to register at\n      once.\n  ","id":"SUSE-RU-2023:4335-1","modified":"2023-11-02T01:00:43Z","published":"2023-11-02T01:00:43Z","references":[{"type":"ADVISORY","url":"https://www.suse.com/support/update/announcement/-2023-4335/suse-ru-20234335-1/"},{"type":"REPORT","url":"https://bugzilla.suse.com/1215437"},{"type":"WEB","url":"https://www.suse.com/security/cve/CVE-2022-29500"}],"related":["CVE-2022-29500"],"summary":"Recommended update for slurm_23_02","upstream":["CVE-2022-29500"]}