I am currently working on an important part of the upcoming Apache Camel 2.9 and 3.0 versions the refactoring of camel-core to make it ready for future growth. So you may ask why refactor something that is working so well. From the user perspective you are right.Camel is easy to use and on first sight the architecture of the core is quite clear.
So let´s first look at the architecture. There are several packages in camel with certain responsibilities:
- org.apache.camel : Camel API. Almost everyone will use this
- spi : Service provider interfaces. This is for third party extensions that want to hook into camel to change functionality like jmx management
- model: This is where the biggest part of the Java DSL lives
- builder: Builder pattern implementations for the Java DSL
- components: Several components that are shipped with camel-core like file and bean component
- impl: Implementation classes
- processor: Runtime elements of camel routes that do the processing. They are created by the model classes
- management: JMX management functionality
- converter: Type converters
- util: Utility packages that work on the API and provide convenience methods
So as I said the architecture looks quite sound. Almost everything has its place.
There are three main problems though:
- The camel-core grows quite a lot. While camel-core had about 30k lines of code in 2008 it now (2011) has about 70k lines of code
- There are no rules in place which packages in camel-core may use which other packages
- Probably mainly as a consequence of 2 there are a lot of cyclic dependencies in camel-core
So I started by anaylizing the current state of camel-core with a tool called structure 101. When given jars or a classes directory it can visualize dependencies and compute a metric calles XS (Excess). The higher the number the worse the code is. It also allows to dig into each dependency and see exactly which classes and methods are involved. So for each dependency you can drill down to the code line that causes it.
So camel 2.8 looks like this on the top level of packages:
The nodes of this graph are the packages. The edges are dependencies from one class in package A to another class in package B. A dependency cycle means that package A depends (possibly transitively) on package B while package B also depends on package A. A dependency cycle means that you can not understand or change one package independently from the other as the dependencies may cause unwanted effects in the other package.
In case of camel the whole top level packages are involved in a dependency cycle. This means that all packages depend on all others in some way. While this was tolerable when camel was smaller the current size and growth rate mean that it is becoming ever harder to change something without breaking another functionality.
I have also attached the completestructure 101 report of camel 2.8 that shows the XS number and some details how it is composed. The excess value in camel 2.8 is about 43.000 or 28% and the top level packages are 10% entangled.
So facing this situation I have several goals:
- Overall the goal is to have less coupling in camel between modules and higher coherence inside a module
- Create a small API that is independent of the rest of camel-core. This API should contain everything a component developer needs to create a component
- Propose dependency rules for camel-core that make sure that no tangles will be created
- Split up camel-core into several projects with clear unidirectional dependencies
- Camel 2.9 should be very compatible to Camel 2.8
- Camel 3.0 should get rid of deprecated packages but else have only few incompatibilities
To reach these goals it is necessary to disentangle the current packages by doing refactorings.
I already started these refactorings in camel 2.9-SNAPSHOT. The first big steps where:
- refactor the management code to remove cycles inside the code. In 2.8 it contained API as well as implementation. This is nicely split up now
- Introduce a support package to contain base classes that implement the API and are expected to be extended by many other classes. They form some kind of extended API. I already moved some important classes there
So after these steps the cycles in camel-core are still present but are much weaker now. A first small cycle that is really removed is with the main package which is now not referenced by any other package anymore.
This improvement is also supported by the structure 101 report for the snapshot. It shows that the excess value dropped to 33.000 (22%) and the camel-core top level packages are now only 7% entangled. So this already shows a big drop in complexity. At the same time the current refactorings are using stub classes at the old places that are marked @Deprecated so the upcoming camel 2.9.0 should still be highly compatible with camel 2.x.
During the next time I continue to work on the refactoring with the goal to at least remove the dependency cycle from the camel API to the rest of camel. This would then allow to have a separate camel-api jar that components could be based on.
If you want to dig into the structure of camel yourself you can get an evaluation version of structure 101 at headways software. If you are a committer in an apache project you can get a free license of structure 101. Many thanks to headways for sponsoring apache projects in this way.
I would be very interested in dicussing proposals for further refactorings and how the future architecture of Apache Camel should look like.