Codesti
determined-ai/determined: Determined: Deep Learning Training Platform
2021
STARS
62
WATCHERS
274
FORKS
84
ISSUES
determined's Language Statistics
determined-ai's Other Repos
determined-ai/determined: Determined: Deep Learning Training Platform
Last Updated: 2023-01-28
determined-ai/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
Last Updated: 2022-05-23
determined-ai/devcluster: A developer tool for running the determined cluster.
Last Updated: 2022-05-21
determined-ai/DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Last Updated: 2022-05-28
determined-ai/environments: Determined AI public environments
Last Updated: 2022-05-21
determined-ai/yogadl: yogadl, the flexible data layer
Last Updated: 2022-12-15
determined-ai/works-with-determined: This repository contains example integrations between Determined and other ML products
Last Updated: 2022-05-21
determined-ai/lunch_and_learn: Determined's Lunch & Learn Series
Last Updated: 2022-05-22
determined-ai/webui-coding-challenge:
Last Updated: 2021-12-28
determined-ai/public-model-zoo:
Last Updated: 2021-12-28
determined-ai/tensorflow-wheels: Tensorflow wheel builder
Last Updated: 2022-05-21
determined-ai/pedl-examples:
Last Updated: 2021-12-28
determined-ai/tf-keras-cvae-starter-code:
Last Updated: 2021-12-28
determined-ai/horovod: Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Last Updated: 2022-05-22
determined-ai/vae-starter-code:
Last Updated: 2021-12-28
determined-ai/determined-examples:
Last Updated: 2021-12-28
determined-ai/OReilly:
Last Updated: 2021-12-28
determined-ai/public_assets:
Last Updated: 2021-12-28
determined-ai/scheduler-plugins: Repository for out-of-tree scheduler plugins based on scheduler framework.
Last Updated: 2022-05-31
determined-ai/nccl: Optimized primitives for collective multi-GPU communication
Last Updated: 2022-05-24
determined-ai/petal:
Last Updated: 2021-09-29
determined-ai/command-batch-inference:
Last Updated: 2021-09-29
Star history of determined
Issue history of determined
determined Recent Issues
Issue Title
State
Comments
Created Date
Updated Date
Closed Date
π‘[feat] Adding an option to specify docker volumes besides bind_mounts.
open
1
2023-01-23
2023-01-28
-
π€[question]tensorflow runtime error
open
1
2023-01-20
2023-01-28
-
π‘[feat] Support for configuring elevated privileges for notebooks
open
6
2023-01-19
2023-01-28
-
π[bug] Failure to delete experiments in k8s if experiment podspec was invalid
open
2
2023-01-18
2023-01-28
-
π€[question] the runing trial may terminate accidentally
closed
11
2023-01-10
2023-01-22
2023-01-18
the accuracy of imagenet decreases a lotπ€[question]
closed
13
2023-01-04
2023-01-09
2023-01-11
π€[question] how to convert bn to syncbn
closed
1
2022-12-26
2023-01-28
2023-01-03
The key(min_checkpoint_period) in yaml file seems to have some error π€[question]
closed
5
2022-12-22
2023-01-26
2023-01-03
π€[question] some question about tensorflow configuration in model_def.py
closed
0
2022-12-19
2023-01-02
2022-12-19
π€[question] Set http_proxy in master
closed
0
2022-12-16
2023-01-02
2022-12-16
π€[question] Is there an example master.yaml ?
closed
1
2022-12-16
2023-01-14
2022-12-16
π€[question] The chepoint file state_dict.pth can not directly be loaded by the original model
closed
0
2022-12-09
2022-12-21
2022-12-12
How to set 'records_per_epoch' in the yaml.
closed
2
2022-12-08
2022-12-21
2022-12-09
some questions about entrypoint:
closed
3
2022-11-29
2022-12-03
2022-11-30
π[bug] With det deployment, after the machine restarts, the task disappears
closed
11
2022-11-17
2022-12-03
2022-11-29
π€[question] How can I commit a running container of a shell job to a image in determined-ai?
closed
4
2022-11-16
2023-01-14
2022-11-16
π€[question] Can I set up a service in determined and expose the port to other clients to call?
open
2
2022-11-15
2022-11-22
-
π[bug] "CUDA out of memory" error when trying to run distributed training
closed
4
2022-11-14
2023-01-15
2022-12-22
π€[question] How can I mount a folder from a remote computer to the determined agent?
closed
12
2022-11-07
2023-01-09
2022-11-18
π[bug] Cannot select alternative metrics for main plot on trial page of multi-trial experiment
closed
1
2022-10-31
2022-12-13
2022-11-01
π€[question] Any user can see the tasks created by other users?
closed
2
2022-10-19
2023-01-13
2022-10-20
π[bug] The Jupyter Lab Webpage doesn't show anything.
open
1
2022-10-16
2023-01-24
-
π‘[feat] support experiment artifact tracking
open
0
2022-10-06
2023-01-11
-
π€[question] Multi-GPU Error for Custom Optimizer
closed
9
2022-09-30
2022-12-11
2022-10-13
π[bug] Dashboard miss items.
open
4
2022-09-30
2023-01-28
-
π[bug] Tensoboard event file upload failed.
closed
14
2022-09-30
2023-01-29
2022-11-19
Pytorch Lightning v1.7.7 support
closed
1
2022-09-29
2023-01-15
2022-10-12
π€[question] State in "DELETING"
closed
2
2022-09-22
2023-01-14
2022-09-23
π€[question] Fluentbit container not launched in task container defaults network_mode
closed
3
2022-09-20
2023-01-14
2022-09-28
π[feat] Filters/sorting set in one workspace/projects table (or uncategorized) are also applied to tables in other workspaces/projects
open
2
2022-09-16
2023-01-20
-
π[bug] the "total checkpoint size" box on experiment page includes deleted checkpoints in the total
closed
4
2022-09-16
2023-01-16
2022-09-20
[bug] sorting of workload table by metric column is broken
closed
4
2022-09-16
2023-01-16
2022-09-22
π[bug] Reducer process only 1/x samples in multi-GPU setting
closed
5
2022-09-16
2023-01-28
2022-09-16
π‘[feat] Track each task GPU utilization (and other information) and display it on the WebUI
open
5
2022-09-16
2023-01-15
-
π[bug] Cannot see job information on WebUI
open
6
2022-09-16
2023-01-24
-
π[bug] Checkpointing with LR Scheduler throws an error for multi-GPU Pytorch Trial
closed
4
2022-09-14
2023-01-27
2022-09-16
help with mmclassification and mmsegmentation
closed
2
2022-09-06
2022-12-29
2022-09-07
Lower Barrier to Entry by making the CLA sign off a webpage
open
1
2022-09-01
2023-01-31
-
[bug] zsh: command not found: det
closed
5
2022-09-01
2022-12-30
2022-10-24
[bug] exp.wait() fails because missing `config` key in response
closed
3
2022-08-24
2022-11-07
2022-08-25
[bug] A question about the reasoning correctness of training results
closed
2
2022-08-19
2023-01-21
2022-08-22
: unrecognized arguments: model_def:MyTrial
closed
4
2022-08-18
2023-01-30
2022-08-19
[bug] Can not use vscode ssh
closed
1
2022-08-15
2023-01-15
2022-10-24
[bug] Here are two questions about the correctness of the model results that you need to answerοΌthank youοΌ
closed
2
2022-08-10
2023-01-14
2022-09-06
[bug] det tensorboard - link is not URL-encoded
closed
1
2022-08-04
2023-01-13
2022-08-10
[feat] Use IMDS v2 for EC2 instances
closed
3
2022-07-30
2022-12-18
2022-11-29
PyTorch 1.12 Support
open
1
2022-07-21
2022-12-09
-
det deploy local cluster-up fails (wrong determinedai/determined-agent image pull)
closed
7
2022-07-18
2023-01-29
2022-07-19
det deploy local cluster-down does not stop fluent
closed
1
2022-07-17
2022-11-07
2022-07-25
Here is a question about the operation of distributed tasks that you need to help answer.
closed
1
2022-07-15
2022-12-10
2022-07-15