Hacker News

Python's dynamic nature can make it quite difficult to express some things correctly. That, or the type checkers have issues when it comes to understanding what would be considered safe in other languages. Years ago when I knew far less about types and programming, I never had such problems in for example Java. It was sometimes stupid, but I always found a way to express things. Although it could also be, that I merely want more out of inference and and safety. For example recently I wanted a pipeline of steps, but the steps could have any input and output type, as long as that type aligns with the previous step's types and the type checker should also know what the final output type is, and I additionally wanted it to work so that I don't have to add all the steps at once, so that I can construct the pipeline step by step. Tried for hours, but didn't find a working solution that type checks. Also tried with the help of LLMs, which gave superficially looking great code for this, but then there was always some type error somewhere, and they struggled to fix that. Ultimately, I gave up on the type checking between steps and output type of the pipeline, as I realized, that I invested hours into something that might be impossible or way waaay too much work for what I get from it. I would not have spent any time on this without type annotating and would have simply gone with a dynamic solution.

whilenot-dev 2 days ago [ - ]

That doesn't sound like it'd have something to do with the dynamic nature of python. Type checking is a static analysis of the source code, so if you'd want something to be inferred dynamically, then you'll have to make use of generics:

  from typing import Callable
  
  
  class Pipeline[T]:
    def __init__(self, value: T) -> None:
      self._value = value
  
    def step[U](self, cb: Callable[[T], U]) -> 'Pipeline[U]':
      return Pipeline(cb(self._value))
  
    def terminate(self) -> T:
      return self._value
  
  
  def _float_to_int(value: float) -> int:
    return int(value)
  
  
  def _int_to_str(value: int) -> str:
    return str(value)
  
  
  def main() -> None:
    result = Pipeline(3.14)\
      .step(_float_to_int)\
      .step(_int_to_str)\
      .terminate()
    
    print(result)

  
  if __name__ == '__main__':
    main()

You could further constrain the generic type through type variables: https://docs.python.org/3/library/typing.html#typing.TypeVar

zelphirkalt 2 days ago [ - ]

I think this pipeline implementation does some things different from what I wanted (but did not precisely describe. It seems that each step is run right away, as it is "added", rather than collected and run when `terminate` is called. Also each step can only consume the result of the previous step, not the results of earlier steps. This can be worked around, by ending the pipeline and then starting multiple pipelines from the result of the first pipeline, if needed. I think you would need to import Generic and write something like `class Pipeline(Generic[T]):` as well? Or is `class Pipeline[T]:` a short form of that?

In my experiment I wanted to get a syntax like this:

    pipeline = Pipeline()

    ...some code here...

    pipeline.add_step(Step(...some meta data..., ...actual procedure to run...))

So then I would need generics for `Step` too and then Pipeline would need to change result type with each call of `add_step`, which seems like current type checkers cannot statically check.

I think your solution circumvents the problem maybe, because you immediately apply each step. But then how would the generic type work? When is that bound to a specific type?

whilenot-dev 2 days ago [ - ]

> Or is `class Pipeline[T]:` a short form of that?

Yes, since 3.12.

> Pipeline would need to change result type with each call of `add_step`, which seems like current type checkers cannot statically check.

Sounds like you want a dynamic type with your implementation (note the emphasis). Types shouldn't change at runtime, so a type checker can perform its duty. I'd recommend rethinking the implementation.

This is the best I can do for now, but it requires an internal cast. The caller side is type safe though, and the same principle as above applies:

    from functools import reduce
    from typing import cast, Any, Callable, Mapping, TypeVar


    def _id_fn[T](value: T) -> T:
        return value


    class Step[T, U]:
        def __init__(
            self,
            metadata: Mapping[str, Any],
            procedure: Callable[[T], U],
        ) -> None:
            self._metadata = metadata
            self._procedure = procedure

        def run(self, value: T) -> U:
            return self._procedure(value)
        

    TInput = TypeVar('TInput')
    
    
    class Pipeline[TInput, TOutput = TInput]:
        def __init__(
            self,
            steps: tuple[*tuple[Step[TInput, Any], ...], Step[Any, TOutput]] | None = None,
        ) -> None:
            self._steps: tuple[*tuple[Step[TInput, Any], ...], Step[Any, TOutput]] = (
                steps or (Step({}, _id_fn),)
            )
        
        def add_step[V](self, step: Step[TOutput, V]) -> 'Pipeline[TInput, V]':
            steps = (
                *self._steps,
                step,
            )
        
            return Pipeline(steps)

        def run(self, value: TInput) -> TOutput:
            return cast(
                TOutput,
                reduce(
                    lambda acc, val: val.run(acc),
                    self._steps,
                    value,
                ),
            )


    def _float_to_int(value: float) -> int:
        return int(value)


    def _int_to_str(value: int) -> str:
        return str(value)


    def main() -> None:
        step_a = Step({}, _float_to_int)
        step_b = Step({}, _int_to_str)

        foo = Pipeline[float]()\
            .add_step(step_a)\
            .add_step(step_b)\
            .run(3.14)
        print(foo)

        bar = Pipeline[float]()\
            .run(3.14)
        print(bar)


    if __name__ == '__main__':
        main()

zelphirkalt 2 days ago [ - ]

Thanks for your efforts. I didn't expect and couldn't expect anyone to invest time into trying to make this work. I might try your version soon.