Releases: Qihoo360/hbox
Releases · Qihoo360/hbox
v1.9.5
v1.9.4
v1.9.3
Bug Fixes and Other Changes
- Fix cli parsing for --conf, --intput and --output
- Avoid RunJar to unjar the client into a temp dir
- Open HBOX_CLIENT_OPTS flags
Full Changelog: v1.9.2...v1.9.3
v1.9.2
Bug Fixes and Other Changes
- Fix missing mpi logs.
- Add role and rank in job history page.
Full Changelog: v1.9.1...v1.9.2
v1.9.1
Bug Fixes and Other Changes
- support switch submitting user in azkaban.
Full Changelog: v1.9.0...v1.9.1
v1.9.0
Major Features And Improvements
- support full mobile deployment of OpenMPI on AM.
- format Java code using palantir-java-format.
- support emitting opentelemetry traces.
Bug Fixes and Other Changes
- fix hbox-logs issue and improve MPI orted status monitoring.
- fix PATH and LD_LIBRARY_PATH for mpi jobs .
- fix hadoop ugi and shellcheck error.
Full Changelog: v1.8.1...v1.9.0
v1.8.0
What's Changed
- Support GPU training
- Support multiple version of yarn (>= 2.6)
- add faq by @yuyajian in #15
- fix the board history display by @jiarunying in #17
- reset worker num when less than inputfile number by @liyuance in #35
- xlearning container allocate port from 20000 to 30000 by default by @FANNG1 in #45
- add docker support by @SuperbDong in #64
- Tensorflow example hangs when using multiple workers by @lshmouse in #62
New Contributors
- @yuyajian made their first contribution in #15
- @jiarunying made their first contribution in #17
- @liyuance made their first contribution in #35
- @FANNG1 made their first contribution in #45
- @SuperbDong made their first contribution in #64
- @lshmouse made their first contribution in #62
Full Changelog: v1.7.2...v1.8.0
What's Changed
New Contributors
Full Changelog: v1.4...v1.8.0
v1.8.0-beta2
What's Changed
- add faq by @yuyajian in #15
- fix the board history display by @jiarunying in #17
- reset worker num when less than inputfile number by @liyuance in #35
- xlearning container allocate port from 20000 to 30000 by default by @FANNG1 in #45
- add docker support by @SuperbDong in #64
- Tensorflow example hangs when using multiple workers by @lshmouse in #62
New Contributors
- @yuyajian made their first contribution in #15
- @jiarunying made their first contribution in #17
- @liyuance made their first contribution in #35
- @FANNG1 made their first contribution in #45
- @SuperbDong made their first contribution in #64
- @lshmouse made their first contribution in #62
Full Changelog: https://github.com/Qihoo360/hbox/commits/v1.8.0-beta2
XLearning 1.4
Release XLearning 1.4
Major Features And Improvements
- Support the application running on the docker
- Support the mpi application
- ClusterDef is avaliable for TensorFlow Distribution Strategy API
- Allow the amount of memory to be set separately for chief and estimator worker for TensorFlow Application
- Specify the Yarn node label for job execution
- Multi-threads upload the output
- Allow the inter-result incremental upload
- Support the regular matching for input path
Bug Fixes and Other Changes
- The memory usage adjustment prompt is only displayed when the application finish status is successed.
XLearning 1.3
Release XLearning 1.3
Major Features And Improvements
- Support the lightLDA, see examples/lightLDA for use
- Support the xflow, see examples/xflow for use
- By submitting the configuration parameter to support the user-defined environment variable settings
- Setting the last worker as estimator role of the distribute TensorFlow application if the user set the
tf-evaluator
astrue
, see examples/tfEstimators for use - Define the single worker index to save the output by set the
output-index
- Port reservation mechanism optimization
- Local data container allocation priority mechanism
- Display resource application and usage information
- ps role function expansion: more convenient metrics use information rendering and output output upload
Bug Fixes and Other Changes
- Container waits for the remaining machine port addresses to be stuck in the process due to the failure of the Container in distributed mode
- After the worker applies, the number of redundant applications is released, and the remove request operation is added
- Application failed due to excessive environment variables too long of the input in PLACEHOLDER mode
- Job execution judgment failure condition control
- The status code returns incorrectly when the Container successfully exits