CapPython
Caja is an object-capability subset of Javascript. Joe-E is an object-capability subset of Java. Would it be feasible or worthwhile to do a similar subset of Python?
Possible base rules for a Cajita-like Python:
"_*" (private) attributes may only be accessed through self
Assignment to attributes may only be done through self
- i.e. There are no publicly assignable attributes.
If we allowed assigning to public attributes, object holders could overwrite public methods, and objects would need to defend themselves from this. We could introduce a base class that disables assignment by default by defining __setattribute__, but this would not protect non-CapPython code, which would not be required to use this base class.
- This would make Python much easier to optimise, because the set of fields on an object would then be statically known (if the superclasses are statically known). An object could be represented as a struct instead of a dict, which I suspect would be the single most important thing you could do to improve Python's performance.
self variables may not be assigned to.
No inheritance. Allowing access to the superclass's __init__ method complicates things.
getattr: remove it or wrap it
Problem areas:
- Lists and dicts are mutable. Would want immutable versions, but how would that work?
- E's list and dict literals produce immutable types by default. E provides freeze() and diverge() methods for converting between the mutable and immutable types.
- Python has no direct equivalent to E's argument/variable guards, so it is awkward to write methods that check their argument types
Bound methods provide the attributes im_class, im_func and im_self, which we would want to be private
- Could disallow using bound methods as values and syntactically require that methods are invoked. This would be quite a severe restriction because bound methods are very useful for event handlers.
Functions provide the attributes func_globals etc., which we would want to be private
Generators provide the attribute gi_frame which we would want to be private, and others, including gi_running, which we might also want to make private
- Imports
- Exports from modules are not explicit in Python. All top-level module definitions are exported. Furthermore, all imports are also exported.
- Methods can be acquired from a class (unbound methods) instead of from an object (bound methods), e.g.
class C(object): def method(self): pass C.method(C()) # succeeds C.method("string") # raises TypeErrorWe might want to check syntactically that classes are only ever used as constructors. That would be an annoying limitation because it would prevent classes-as-constructors being passed around to be used like functions. We could instead rewrite C as C.__new__ (assuming C is a new-style class), or, for a verifier, require the programmer to do the same when C is not invoked directly.
Class objects also participate in isinstance.
- We need to prevent direct access to the functions that methods are made from, because these functions are allowed to access private attributes. However, C.method does not return the original function, it returns an unbound method (which wraps the function), which checks that its first argument is of type C, as shown in the example. Is it unsafe to allow access to unbound methods? In the presence of inheritance, it provides a way to invoke the base class's methods, which may have been overridden in the derived class.
- Ways in which method functions could escape:
- By reading the function's variable within the class scope.
By defining __metaclass__ in the class scope.
- Sources of non-determinism:
The repr() of many objects contains the object's memory address. str() and repr() are also accessible via the % operator on strings. A partial fix would be to require all CapPython classes to derive from a version of object that overrides the __str__ method so that it does not return the object's address.
Comparison of objects is also based on memory address by default. cmp(object(), object()) sometimes returns 1 and sometimes returns -1.
- dict item ordering
Out-of-stack exceptions (RuntimeError: maximum recursion depth exceeded). Joe-E handles this by stopping these exceptions from being caught or from being handled via try...finally. Caja does not attempt to handle this: Caja does not try to be fully deterministic.
The __del__ method could expose garbage collector behaviour. In CPython it does not expose GC behaviour because the GC is only for freeing cycles and it will not free a cycle containing an object with a __del__ method.
Is it OK to allow classes to define __getattr__ or __getattribute__?
Decorators: In general, decorators are not safe because they can capture method functions (which are statically allowed to write to self._*) and pass in other objects as self. Some decorators, such as @property, may be safe.
Sometimes it is necessary for methods in a class C to access private members of instances of C other than through self. The Java/C++ definition of "private" permits this and is based on static types whereas the E/lambda definition of "private" disallows this and is based on scope.
Example 1: A RemoteProxy needs to look inside other RemoteProxys that it receives as arguments in order to pass their object IDs across the wire.
Example 2: E's MintMaker example. In E this is implemented using sealers and unsealers.
- Inheritance:
Python does not have "final" classes like Java has. If we have isinstance, access to a class C lets you create objects that masquerade as being instances of C but have different behaviour.
Accessing private attributes (usually methods) via super(C, self)._method can be statically allowed, but use of super on non-self objects should not be allowed because it lets outsiders get access to versions of methods that have been overridden.
It would be necessary to allow super(C, self).__init__(...) calls.
print provides authority to write to the global stdout stream. In general, it should not be allowed. Code should not rely on it for printing real output. However, it is useful as a debugging tool. It could be changed to send output to a debugging stream.
- Exceptions can leak authority; they can contain references to authority-carrying objects. We could check exceptions when they are raised or when they are caught.
Python's variable binding rules are crazy. For example, (lambda: ([x for x in (1,2)], x))() is different from (lambda: (list(x for x in (1,2)), x))(). x is a free variable in the latter but not the former, because loop variables are bound in generators but not in list comprehensions (where they are assigned and escape, only to become bound by the lambda in this example). If x were open, we'd have to get the rules exactly right to be sure that the program cannot access the open function. This is a good example of where a rewriter can be safer than a verifier. A rewriter can apply variable renaming ("alpha conversion" is the fancy name), so we can be sure that two variables are not accidentally the same variable.
Python has unfortunate variable binding semantics. Would you really want to use Python for obj-cap programming given this?
There are two ways to do subsetting, with a verifier or with a rewriter. Are there any advantages to not requiring rewriting?
If code requires rewriting, it might have bugs that only occur when the rewriting is applied, such as accessing C.method when C is rewritten to C.__new__, and we might not detect them when running the code without rewriting.
- There is no inherent runtime cost to rewriting. It depends what rewriting is done.
Unsealers
class RemoteProxy(object):
def __init__(self, connection, object_index):
self._connection = connection
self._index = object_index
def _get_connection_and_index(self):
return self._connection, self._index
def cap_invoke(self, args):
self._connection.send_message(self._index, args)
# Use of this unbound private method could be statically allowed,
# and would be treated as part of the class definition:
unseal_remote = RemoteProxy._get_connection_and_index
# This class object is attenuated so that it can only be used as a constructor.
# This is done implicitly by the rewriter after the class definition:
RemoteProxy = RemoteProxy.__new__
Implementation
A Bazaar repository is on Launchpad: https://code.launchpad.net/cappython
One half (lint.py) annotates the Python AST with variable binding information; this is not CapPython-specific. Python's variable bindings rules are esoteric so this is relatively complicated. The other half is much simpler and performs some checks on the annotated AST.
It uses the compiler.ast module, but this is deprecated in Python 2.6 and scheduled for removal in 3.0 (see this commit). What is the replacement?
See also
Brett Cannon looked into securing Python; see discussion on e-lang and the paper Controlling Access to Resources Within The Python Interpreter (Brett Cannon and Eric Wohlstadter, 2007)
RPython, a more easily compilable subset of Python used by PyPy
Old versions of Python provide the rexec module, which can be used in conjuction with the Bastion wrapper object
Zope's RestrictedPython. This does not provide encapsulation. It allows getattr/setattr to be wrapped on a per-module basis, but it does not distinguish between objects created by the module/class and other objects. I think it is more like the Valija version of Caja than Cajita.
List of object-capability languages on the E wiki
Blog post: Introducing CapPython
