third_party.pylibs.pylint.src/pylint/checkers/mapreduce_checker.py

# Copyright (c) 2020 Frank Harrison <frank@doublethefish.com>
# Copyright (c) 2021 Pierre Sassoulas <pierre.sassoulas@gmail.com>
# Copyright (c) 2021 Marc Mueller <30130371+cdce8p@users.noreply.github.com>

# Licensed under the GPL: https://www.gnu.org/licenses/old-licenses/gpl-2.0.html
# For details: https://github.com/PyCQA/pylint/blob/main/LICENSE
import abc


class MapReduceMixin(metaclass=abc.ABCMeta):
    """A mixin design to allow multiprocess/threaded runs of a Checker"""

    @abc.abstractmethod
    def get_map_data(self):
        """Returns mergable/reducible data that will be examined"""

    @classmethod
    @abc.abstractmethod
    def reduce_map_data(cls, linter, data):
        """For a given Checker, receives data for all mapped runs"""
Apply copyrite --contribution-threshold 2021-02-21 14:13:06 +00:00			`# Copyright (c) 2020 Frank Harrison <frank@doublethefish.com>`
Update copyright notice with copyrite 2021-02-28 20:13:57 +00:00			`# Copyright (c) 2021 Pierre Sassoulas <pierre.sassoulas@gmail.com>`
Bump pylint to 2.9.0-dev1, update changelog 2021-06-17 08:21:08 +00:00			`# Copyright (c) 2021 Marc Mueller <30130371+cdce8p@users.noreply.github.com>`
mapreduce\| Fixes -jN for map/reduce Checkers (e.g. SimilarChecker) This integrate the map/reduce functionality into lint.check_process(). We previously had `map` being invoked, here we add `reduce` support. We do this by collecting the map-data by worker and then passing it to a reducer function on the Checker object, if available - determined by whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin interface or nor. This allows Checker objects to function across file-streams when using multiprocessing/-j2+. For example SimilarChecker needs to be able to compare data across all files. The tests, that we also add here, check that a Checker instance returns and reports expected data and errors, such as error-messages and stats - at least in a exit-ok (0) situation. On a personal note, as we are copying more data across process boundaries, I suspect that the memory implications of this might cause issues for large projects already running with -jN and duplicate code detection on. That said, given that it takes a long time to perform lints of large code bases that is an issue for the [near?] future and likely to be part of the performance work. Either way but let's get it working first and deal with memory and perforamnce considerations later - I say this as there are many quick wins we can make here, e.g. file-batching, hashing lines, data compression and so on. 2020-03-26 11:41:22 +00:00
			`# Licensed under the GPL: https://www.gnu.org/licenses/old-licenses/gpl-2.0.html`
Fix copyright links (#4647) * Fix link in license header * Update link to astroid bump_changelog 2021-07-01 10:47:58 +00:00			`# For details: https://github.com/PyCQA/pylint/blob/main/LICENSE`
mapreduce\| Fixes -jN for map/reduce Checkers (e.g. SimilarChecker) This integrate the map/reduce functionality into lint.check_process(). We previously had `map` being invoked, here we add `reduce` support. We do this by collecting the map-data by worker and then passing it to a reducer function on the Checker object, if available - determined by whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin interface or nor. This allows Checker objects to function across file-streams when using multiprocessing/-j2+. For example SimilarChecker needs to be able to compare data across all files. The tests, that we also add here, check that a Checker instance returns and reports expected data and errors, such as error-messages and stats - at least in a exit-ok (0) situation. On a personal note, as we are copying more data across process boundaries, I suspect that the memory implications of this might cause issues for large projects already running with -jN and duplicate code detection on. That said, given that it takes a long time to perform lints of large code bases that is an issue for the [near?] future and likely to be part of the performance work. Either way but let's get it working first and deal with memory and perforamnce considerations later - I say this as there are many quick wins we can make here, e.g. file-batching, hashing lines, data compression and so on. 2020-03-26 11:41:22 +00:00			`import abc`


			`class MapReduceMixin(metaclass=abc.ABCMeta):`
Changes after black update 2021-04-26 11:59:44 +00:00			`"""A mixin design to allow multiprocess/threaded runs of a Checker"""`
mapreduce\| Fixes -jN for map/reduce Checkers (e.g. SimilarChecker) This integrate the map/reduce functionality into lint.check_process(). We previously had `map` being invoked, here we add `reduce` support. We do this by collecting the map-data by worker and then passing it to a reducer function on the Checker object, if available - determined by whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin interface or nor. This allows Checker objects to function across file-streams when using multiprocessing/-j2+. For example SimilarChecker needs to be able to compare data across all files. The tests, that we also add here, check that a Checker instance returns and reports expected data and errors, such as error-messages and stats - at least in a exit-ok (0) situation. On a personal note, as we are copying more data across process boundaries, I suspect that the memory implications of this might cause issues for large projects already running with -jN and duplicate code detection on. That said, given that it takes a long time to perform lints of large code bases that is an issue for the [near?] future and likely to be part of the performance work. Either way but let's get it working first and deal with memory and perforamnce considerations later - I say this as there are many quick wins we can make here, e.g. file-batching, hashing lines, data compression and so on. 2020-03-26 11:41:22 +00:00
			`@abc.abstractmethod`
			`def get_map_data(self):`
Changes after black update 2021-04-26 11:59:44 +00:00			`"""Returns mergable/reducible data that will be examined"""`
mapreduce\| Fixes -jN for map/reduce Checkers (e.g. SimilarChecker) This integrate the map/reduce functionality into lint.check_process(). We previously had `map` being invoked, here we add `reduce` support. We do this by collecting the map-data by worker and then passing it to a reducer function on the Checker object, if available - determined by whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin interface or nor. This allows Checker objects to function across file-streams when using multiprocessing/-j2+. For example SimilarChecker needs to be able to compare data across all files. The tests, that we also add here, check that a Checker instance returns and reports expected data and errors, such as error-messages and stats - at least in a exit-ok (0) situation. On a personal note, as we are copying more data across process boundaries, I suspect that the memory implications of this might cause issues for large projects already running with -jN and duplicate code detection on. That said, given that it takes a long time to perform lints of large code bases that is an issue for the [near?] future and likely to be part of the performance work. Either way but let's get it working first and deal with memory and perforamnce considerations later - I say this as there are many quick wins we can make here, e.g. file-batching, hashing lines, data compression and so on. 2020-03-26 11:41:22 +00:00
			`@classmethod`
			`@abc.abstractmethod`
			`def reduce_map_data(cls, linter, data):`
Changes after black update 2021-04-26 11:59:44 +00:00			`"""For a given Checker, receives data for all mapped runs"""`