2021-02-21 14:13:06 +00:00
|
|
|
# Copyright (c) 2020 Frank Harrison <frank@doublethefish.com>
|
2021-02-28 20:13:57 +00:00
|
|
|
# Copyright (c) 2021 Pierre Sassoulas <pierre.sassoulas@gmail.com>
|
2021-06-17 08:21:08 +00:00
|
|
|
# Copyright (c) 2021 Marc Mueller <30130371+cdce8p@users.noreply.github.com>
|
mapreduce| Fixes -jN for map/reduce Checkers (e.g. SimilarChecker)
This integrate the map/reduce functionality into lint.check_process().
We previously had `map` being invoked, here we add `reduce` support.
We do this by collecting the map-data by worker and then passing it to a
reducer function on the Checker object, if available - determined by
whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin
interface or nor.
This allows Checker objects to function across file-streams when using
multiprocessing/-j2+. For example SimilarChecker needs to be able to
compare data across all files.
The tests, that we also add here, check that a Checker instance returns
and reports expected data and errors, such as error-messages and stats -
at least in a exit-ok (0) situation.
On a personal note, as we are copying more data across process
boundaries, I suspect that the memory implications of this might cause
issues for large projects already running with -jN and duplicate code
detection on. That said, given that it takes a long time to perform
lints of large code bases that is an issue for the [near?] future and
likely to be part of the performance work. Either way but let's get it
working first and deal with memory and perforamnce considerations later
- I say this as there are many quick wins we can make here, e.g.
file-batching, hashing lines, data compression and so on.
2020-03-26 11:41:22 +00:00
|
|
|
|
|
|
|
# Licensed under the GPL: https://www.gnu.org/licenses/old-licenses/gpl-2.0.html
|
2021-07-01 10:47:58 +00:00
|
|
|
# For details: https://github.com/PyCQA/pylint/blob/main/LICENSE
|
mapreduce| Fixes -jN for map/reduce Checkers (e.g. SimilarChecker)
This integrate the map/reduce functionality into lint.check_process().
We previously had `map` being invoked, here we add `reduce` support.
We do this by collecting the map-data by worker and then passing it to a
reducer function on the Checker object, if available - determined by
whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin
interface or nor.
This allows Checker objects to function across file-streams when using
multiprocessing/-j2+. For example SimilarChecker needs to be able to
compare data across all files.
The tests, that we also add here, check that a Checker instance returns
and reports expected data and errors, such as error-messages and stats -
at least in a exit-ok (0) situation.
On a personal note, as we are copying more data across process
boundaries, I suspect that the memory implications of this might cause
issues for large projects already running with -jN and duplicate code
detection on. That said, given that it takes a long time to perform
lints of large code bases that is an issue for the [near?] future and
likely to be part of the performance work. Either way but let's get it
working first and deal with memory and perforamnce considerations later
- I say this as there are many quick wins we can make here, e.g.
file-batching, hashing lines, data compression and so on.
2020-03-26 11:41:22 +00:00
|
|
|
import abc
|
|
|
|
|
|
|
|
|
|
|
|
class MapReduceMixin(metaclass=abc.ABCMeta):
|
2021-04-26 11:59:44 +00:00
|
|
|
"""A mixin design to allow multiprocess/threaded runs of a Checker"""
|
mapreduce| Fixes -jN for map/reduce Checkers (e.g. SimilarChecker)
This integrate the map/reduce functionality into lint.check_process().
We previously had `map` being invoked, here we add `reduce` support.
We do this by collecting the map-data by worker and then passing it to a
reducer function on the Checker object, if available - determined by
whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin
interface or nor.
This allows Checker objects to function across file-streams when using
multiprocessing/-j2+. For example SimilarChecker needs to be able to
compare data across all files.
The tests, that we also add here, check that a Checker instance returns
and reports expected data and errors, such as error-messages and stats -
at least in a exit-ok (0) situation.
On a personal note, as we are copying more data across process
boundaries, I suspect that the memory implications of this might cause
issues for large projects already running with -jN and duplicate code
detection on. That said, given that it takes a long time to perform
lints of large code bases that is an issue for the [near?] future and
likely to be part of the performance work. Either way but let's get it
working first and deal with memory and perforamnce considerations later
- I say this as there are many quick wins we can make here, e.g.
file-batching, hashing lines, data compression and so on.
2020-03-26 11:41:22 +00:00
|
|
|
|
|
|
|
@abc.abstractmethod
|
|
|
|
def get_map_data(self):
|
2021-04-26 11:59:44 +00:00
|
|
|
"""Returns mergable/reducible data that will be examined"""
|
mapreduce| Fixes -jN for map/reduce Checkers (e.g. SimilarChecker)
This integrate the map/reduce functionality into lint.check_process().
We previously had `map` being invoked, here we add `reduce` support.
We do this by collecting the map-data by worker and then passing it to a
reducer function on the Checker object, if available - determined by
whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin
interface or nor.
This allows Checker objects to function across file-streams when using
multiprocessing/-j2+. For example SimilarChecker needs to be able to
compare data across all files.
The tests, that we also add here, check that a Checker instance returns
and reports expected data and errors, such as error-messages and stats -
at least in a exit-ok (0) situation.
On a personal note, as we are copying more data across process
boundaries, I suspect that the memory implications of this might cause
issues for large projects already running with -jN and duplicate code
detection on. That said, given that it takes a long time to perform
lints of large code bases that is an issue for the [near?] future and
likely to be part of the performance work. Either way but let's get it
working first and deal with memory and perforamnce considerations later
- I say this as there are many quick wins we can make here, e.g.
file-batching, hashing lines, data compression and so on.
2020-03-26 11:41:22 +00:00
|
|
|
|
|
|
|
@classmethod
|
|
|
|
@abc.abstractmethod
|
|
|
|
def reduce_map_data(cls, linter, data):
|
2021-04-26 11:59:44 +00:00
|
|
|
"""For a given Checker, receives data for all mapped runs"""
|