Thursday, September 1, 2011

The irony of not testing

Some time ago I wanted to do something with jQuery. But our customer was still using a really obsolete version: 1.1.3.1 that has been released in 2008. At that time the last stable version was 1.6 and the functionality I needed was introduced in jQuery 1.3. As I had no intention to write something that has been already written (and fixed multiple times over its existence) I asked the the leading programmer of this project if there would be a slight chance of replacing it (jQuery) with a newer version.

The answer was simple: "NO". As strange it may sound I kind of undestood him and it wasn't a suprise for me. It was just a sad realization of the truth. To change the jQuery library I needed to upload it into the core module, that has then been referenced by 9 other big modules (that depended on him). And as NONE of these 9 modules had automated tests of their behavior, no one would know if this change would introduce some bugs or not (they already had workaround and fixes for the old jQuery version and god knows how it would interact with the latest version). So if you look at it: my small task/requirement could have killed all 9 modules that have been nicely working for at least 3 years (and it would require regression tests for EVERYTHING which was extremely costly).

The irony was that the lack of useful tests might forced me into introducing some custom code that was not mature enough (from performance and memory consumption view) and probably buggy (even with some testing) instead of just using some standard code that has been proven with time and repaired/maintained constantly.

The lesson learned from this is: test your code as without it your code might become so big that the task of testing will take a whole year for you (because you need to learn all the business logic behind it to see that this strange behavior is the correct one). As without tests : you might be unable to upgrade your existing libraries and will be forced to create custom workarounds for something that has been already done by somebody else. And in comparison to your code the existing one might be more performant, mature AND under active maintenance.

Sunday, July 24, 2011

Spring proxy - calling methods within the same service

One of the bigest issues of spring aop is, that when you use its proxies for adding some aop functionality (like for example transactions or security), your calls to a method withing the same bean won't trigger the advised aop functionality.

So if you have a Service, that has two methods, where method A() HAS NO @Transactional annotation and method B() HAS a @Transactional annotation and non transactional method A() calls during its execution transactional method B() then spring won't start any transaction. This is because the spring proxy will redirect its call for A() to the service object but the call to B() won't be executed on the proxy (that knows how and when to start the transaction) but instead on the actual service object that has not functional code to start the transaction (only a @Transactional annotation on the method B()).

To overcome this problem I have implemented a simple solution that injects the proxy of current bean instance and then you execute advised methods calling this instance proxy variable. All you have use is a @ThisInstance annotation and register a custom BeanPostProcessor:
<bean class="sk.yourpackage.ThisInstanceBeanPostProcessor"/>
After this you annotate a setter or field with @ThisInstance annotation and spring will inject a proxy instance of this bean as a setter parameter or field value (if a proxy won't be created it will inject the actual unproxied service). Like for example:
@ThisInstance
private MyService thisInstance;
After this you change your code from:
this.B();
into
thisInstance.B();
And that is the whole configuration. It works for private, protected, ... method and fields and for beans with any scoping.
 
This is the code of the annotation:
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.FIELD, ElementType.METHOD})
public @interface ThisInstance { }
and this is the code for bean post processor:
public class ThisInstanceBeanPostProcessor implements BeanPostProcessor, Ordered {

private final Set<Class<? extends Annotation>> annotationTypes = new LinkedHashSet<Class<? extends Annotation>>();

private int order = Ordered.LOWEST_PRECEDENCE;


public ThisInstanceBeanPostProcessor() {
this.annotationTypes.add(ThisInstance.class);
}


@Override
public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException {
return bean;
}


@Override
public Object postProcessAfterInitialization(Object bean, String beanName) throws BeansException {

Object targetBean = getTargetBean(bean);

injectCurrentInstance(targetBean, bean);

return bean;
}


private Object getTargetBean(Object bean) {
Object target = bean;

if (target instanceof Advised) {
target = ((Advised) target).getTargetSource();

if (target instanceof TargetSource) {
try {
target = ((TargetSource) target).getTarget();
} catch (Exception e) {
throw new IllegalStateException(e);
}
}
}

return target;
}


private void injectCurrentInstance(Object targetBean, Object thisInstance) {

if (annotationTypes.size() == 0) {
return;
}

Class<?> beanClass = targetBean.getClass();

do {
// for each interface - look for injection annotations
for (Class beanInterface : beanClass.getInterfaces()) {
for (Method method : beanInterface.getMethods()) {
for (Class<? extends Annotation> annotationType : annotationTypes) {
if (method.getAnnotation(annotationType) != null) {
invokeMethod(targetBean, method, thisInstance);
break;
}
}
}
}

// for each method - look for injection annotations
for (Method method : beanClass.getDeclaredMethods()) {
for (Class<? extends Annotation> annotationType : annotationTypes) {
if (method.getAnnotation(annotationType) != null) {
invokeMethod(targetBean, method, thisInstance);
break;
}
}
}

// for each field - look for injection annotations
for (Field field : beanClass.getDeclaredFields()) {
for (Class<? extends Annotation> annotationType : annotationTypes) {
if (field.getAnnotation(annotationType) != null) {
setFieldValue(targetBean, field, thisInstance);
break;
}
}
}

beanClass = beanClass.getSuperclass();

} while (!Object.class.equals(beanClass));
}


private void invokeMethod(Object object, Method method, Object... values) {
boolean isAccessible = method.isAccessible();

try {
method.setAccessible(true);

method.invoke(object, values);
} catch (Exception e) {
throw new IllegalStateException(e);
} finally {
method.setAccessible(isAccessible);
}
}


private void setFieldValue(Object object, Field field, Object value) {
boolean isAccessible = field.isAccessible();

try {
field.setAccessible(true);

field.set(object, value);
} catch (Exception e) {
throw new IllegalStateException(e);
} finally {
field.setAccessible(isAccessible);
}
}


/**
* allows you to add custom annotation type
*
* @param annotationType custom annotation type
*/
public void setAnnotationType(Class<? extends Annotation> annotationType) {
this.annotationTypes.add(annotationType);
}


/**
* allows you to define custom annotation types
*
* @param annotationTypes custom annotation types
*/
public void setAnnotationTypes(Set<Class<? extends Annotation>> annotationTypes) {
this.annotationTypes.clear();
this.annotationTypes.addAll(annotationTypes);
}


@Override
public int getOrder() {
return order;
}


public void setOrder(int order) {
this.order = order;
}
}

Saturday, May 21, 2011

Some geecon 2011 notes

A week has passed and I have forced myself to put together some notes I have scribbled during my stay there:

Java 7

What have we found: the good part is that oracle is holding up to the schedule and there is a big chance that this year there will be finally a new java version. I am still curious if this will somehow change the so often said question: "java is dead", but otherwise this is good. So what interesting functionalities can we find in java 7:
  • support for dynamic languages - so your groovy, scala, jruby will be once again faster better cooler
  • project coin : so you can write something like this: List<String> list = new ArrayList<>();
  • fork-join - concurrency library that is not bad, but I would maybe concentrate more on concurrency alias actors (for example akka). You could try this library long time ago as it was available as jsr166y.jar
  • after many many years, you'll be able to use strings in switch statement (although everybody knows that switch - case is bad and we should use strategy pattern instead :D)
  • automatical closing of Closable resources opened in the beginning of try statement
  • one catch for multiple exceptions
For me it was a disappointment that the collection usage simplification won't make it into java 7 (although I knew this before I still had some hope).

To see how this will work in action you can look at this quite old article.


Spring

Besides the presentation on the whole spring ecosystem we heard something about the long awaited spring 3.1. They added caching support (quite old information) as they had no more patience to wait for the never coming java standard. Also it should be fully compatible with java 7. It is possible that the execution and scheduling framework will use the new fork-join library. For me a it was quite interesting how easily (with few lines) you can integrate with many network and social solutions: email, XMPP, RSS/Atom, twitter. Twitter can be also used for one to one communication. Spring also checks if you haven't reached the twitter message limit (the limit check works even when you are writing messages from multiple devices/platforms). Also we learned that twitter is often making incompatible changes. And as Spring relies on 3rd party libraries in this integration they can't control the speed of change adaption. This is also why they announced "spring social" (also quite old information) where they will write and adapt all code by their internal team. So by using spring social you should be almost always up to date with current twitter api. The last interesting thing (which some people still don't know) is spring data that provides a simplification of jdbc, hibernate, jpa and noSql databases processing


Hadoop

This was quite a good introduction into the whole hadoop ecosystem. From what I know Hadoop is based on some research papers from google. You can use it to execute tasks over massive amount of data where the task can be divided into smaller tasks and then the result can be combined to produce one global result. The heart of it all is HDFS (hadoop file system), that is based on GFS (google file system). It presents a distributed file system layer, that is positioned above your physical disks. Over it operates the MapReduce framework. To write your custom map reduce tasks/jobs you have to implement interfaces Mapper and Reducer and extend class MapReduceBase, plus define a job configuration (but you are not limited just to java - although it seems that java is one of the most perforant solutions). Another interesting module is HBase which lays over the HDFS and provides equivalent functionality to google's bigtable (from my point of view Hadoop is just one big loot stolen from google that is provided as open source - but this is good). Hbase can be used as input or output for MapReduce jobs. The good thing about Hadoop is that you can process tasks over massive new data without the risk that it will blow up, because it needs to be reindexed first (as you might see in standard database solutions). For me a new information was that cloudera provides a simplification of the installation process and a business (payed) support (which can be a good argument for your customers)


Code generation

There was a presentation of some frameworks that provide you code generation from annotations. Lombok provides annotations to simplify your work: @Getter, @Setter, @Synchronized (even when it seams stupid it does it better than simple synchronize), @EqualAndHashCode, ... The annotations will be replaced by code during compilation process (to make this work the process must be written for each platform: windows, linux, ...). Other solution was Spring Roo, that uses aspectj and *.aj files (which you probably already know). There were also some interesting annotations from Groovy: @Delegate - when used on variable, then the encapsulating class has all its methods and they refer to this object (quite cool - from what I know there is no such thing in java), @Lazy - lazy initialization, @Singleton - proper singleton implementation. For me quite new thing was contract driven programming where you can define a precondition that must be met so that a method can be executed without exception and an after condition - to be honest I am not sure what will happen if this after condition will be violated (as I don't expect it will do some rollback). The sad thing is that the ide support is sometimes not good (for example for Lombok in IDEA). I've also learned new java command: javap that will show you the skeleton of compiled class (everything except method implementation details)


Refactoring

I'll just summarize some interesting and practices (different from the ones so often heard):
  • set timer to 2 hours, start your work. review your work. is it better -> commit it. is it worst -> revert
  • do incremental changes
  • the code with most revisions is the best candidate for test coverage
  • if you want to use new libraries, but your project tech lead won't allow you to put them into production, put them into test code :D
  • have fun, for example give some small awards to those who are most active in code maintenance (like for example a picture on the wall "maintainer of the week", or badges "100% test code coverage guru")
  • good argument for unit tests: when you write them other colleagues can't *** up your code


New J2EE

I have seen only a small glimpse of it, but it seems very readable and simple. Although some spring guys have been making fun of j2ee as from they point of view (when we omit glassfish) nobody supports the whole standard (even when it is released longer than a year now).


GridGain

After some live demos a left the presentation with impression that this is a cool framework for distributed programming in scala (at least from the maintenance point of view). The distribution of code is almost automatic (not much more than compilation of code is required), new nodes execute jobs instantly after they start and it seemed that there is no complex configuration. So the main goal you have to focus on is your business logic (MapReduce/Actors).


Neo4j

This graph oriented database (yeees, you're right: NoSQL) was presented by the most charismatic presenter from the whole conference. From other NoSQL databases it has some interesting features. It supports transactions. As it is a graph oriented database it stores objects and maps relations between them. Also the querying has a language defined for it, so you don't have to figure out how to retrieve the data from your key value store and also there is no need to write map-reduce functions. It is good for data retrieval in situations where normal database would blow up, because of the required joins (also the speed of querying seems to be unaffected by the data amount, which is quite a opposite situation as you can see it on relational databases). So for example work with hierarchies is a great candidate for Neo4j (you can also define how deep would you like to go, to prevent the out of memory error once you would retrieve too much data). There is heavy use of caching (in this point I am not sure how the first query (before anything is cached) would perform). But there are also some things you must be aware of. The neo4j doesn't scale well (between multiple nodes) when your data is heavily interconnected. As in this case there is a great traffic between nodes that slows the whole query process. To assure great scalability you should be able to split your data into separate data buckets, where each node would host one to many complete data bucket.


Hibernate

From my point of view, it is really hard to say something new about hibernate. But there has been a good summarization presentation about the common pitfalls/mistakes.


Node.js

This was a part of the workshop day. It contained a small introduction to node.js (if you review the presentation on the node.js page you'll get the almost same information), plus we heard a small but clear explanation of javascript prototyping. After this we downloaded some great practical example (mortal combat client/server written in node.js - code can be found here). Once we got it running we were asked to implement the login screen. Oh and yes, for those who would like to try it out, I would like to cite one advice that has been told to me: "are you trying to run it on cygwin? forget it! Its no good."

my notes:
- if you hate javascript I advice you to see this presentation (that has nothing to do with geecon :))
- node.js has good integration with html 5 web sockets (if you don't know why they are the new hype, read this article)


So in the end I would like to do some summarization. Once I left geecon I had some mixed feelings. From my point of view it is hard to say something new about java as most of the interesting stuff is now being processed in different languages: groovy, scala, ruby, javascript. But somehow they have managed to make it a quite interesting conference. The only thing I was maybe missing were some electric plugs (as they are normally not present in multi-cinemas) (yeah I know, I should have bought a mac :() but otherwise it was ok.

notes for me:
- never sit down. you'll fall asleep
- don't eat polish eggs

Monday, May 16, 2011

Some hints I have learned about ActiveVOS (BPMS solution) so far

If you expect many long running processes (or what is worse human tasks) set the system logging level to none and define the logging for each of your process separately
Process logs can take a lot of your database storage. Especialy the logs for human tasks.
Not that they wouldn't be big if we would take a look at them separately. One log will take only few bytes. The problem is that there are many of them.
From our measurement we had approximately around 5500 log records for one human task which could consume almost 1MB of logical datastorage. As the system process are using the default logging level then by setting it to none you'll be rid of this problem (as you'll almost never need the logs from system processes - event the support never wants them).

Try to avoid architectural design based on manual process fixing in case of failure
One of the biggest advantages of ActiveVOS is that you can set the logging to full
level for any of your processes and once there is a failure you can return to any
point of it, fix some data and continue from there as if nothing happened.
The disadvantage is that once you overuse this "pattern", the maintenance cost of your solution will heavily increase as you'll always need some additional guys that will fix your failed processes from the point they arived into the office
(but that of course depends on the number of processes you're dealing with).
I am not saying that you should absolutely omit this pattern but I suggest that
you should always look if there is no additional automatic solution.

Be prepared that some things won't work
Nothing is perfect and ActiveVOS (even when it provides a cool solution) has some of it flaws. So be prepared that in some cases you'll end up with a workaround. To just name some that we needed here is a short list:
  • XQuery methods "instance of" and "typeswitch" are not working properly in once you want to check an element type that might come from a hierarchy of types. The problem is that the saxon library which is underneath is not providing this functionality. The same goes for method like "rename".
  • If you're using asynch calls with WS-Addressing than the timeout policy is not working if you'll define it on reply activity.
  • If multiple wsdl files in one process are using the same namespace then they must be merged. I don't know how it is in newer versions (we lastly used this feature in version 6.2), but the eventing can show some unexpected behavior once used in clustered environment (but as said this might be already fixed).
But don't be afraid as these are some minor issues for which you can surely find a workaround and might be fixed over the time.

Don't use human tasks as mere notifications for FINAL task that needs to be done
From collegues project I have the experience that the people forget to close these FINAL TASK "notifications" once the work is finished as for them its not as important (because they have already a new notification/task that needs to be done).
Even the management doesn't care as everything has been done/finished, so (from the bussiness point of view) who cares that the process was not closed.
This creates not only confusion about which tasks are finished and which not (sure the user knows), but also makes a heavy usage of the database as unfinished system processes that run the human task can consume much of the database storage and can't be deleted if we wan't to decrease the database size.

Try to get all your important business data out of your processes and store them into
separate database
before the process ends
Currently there is no archivation mechanism (of what I would know) that would allow you to archive only a part of your processes (for example completed ones) and leave other in tact.
The only thing you can do is archive your ActiveVOS database as a whole.
And than recover it as a whole.
This can be very tricky as it prohibits any incremental recoveries or storages (you can write it on your own, but than you'll loose the support - and which customer would be willing to loose that) and would work create a backup when you have a great mix of completed and running processes. You'll probably get to this problem once you'll have many finished processes as they can cost a lot of database memory (in one big project we're already over 600GB !!! of Oracle database). The best thing how you can prepare yourself for this is to a create some persisting activities in your processes, that will gather all meaningfull informations and store them into separate database. This way once you'll encounter some problems with database space, you can easily delete some of your old completed processes.
Oh and don't forget to put them in the first version of your processes as your deletes can be only used for processes "older than" (so forget defining time range from - to) and you wouldn't be happy if the oldest processes without persistence mechanism would be deleted/sacrificed for the sake of the database (as you must delete the newer versions).

Friday, May 6, 2011

Javascript localization hell

Not long time ago I visited a hell. The hell was called javascript localization. And the headache started when I wanted to sort some strings. To do so I needed to compare them first. This would seem as a simple task if you would be an english gentleman without the fear of diacritiques. But as I live and work in a country where diacritique is ever present I needed to process it properly. Method localeCompare was working differently on every browser. Also if method indexOf was used for searching trough text with diacritique using search expression without diacritique it was not working properly.

So I once again ended with something I don't like very much. I wrote my custom javascript util that could be used like this:
localeHelper.diacritiqueComparison('čerešne', 'citron'); // = 1
localeHelper.diacritiqueComparison('čerešne', 'hrozno'); // = -1

// this is case insensitive
localeHelper.containsDiacritiqueText('štart', 'st'); // = true
localeHelper.containsDiacritiqueText('štart', 'št'); // = true
localeHelper.containsDiacritiqueText('štart', 'tar'); // = true
localeHelper.containsDiacritiqueText('trend', 'tr'); // = true
localeHelper.containsDiacritiqueText('trend', 'ťr'); // = false (as we explicitly want the symbol 'ť' and not 't')
localeHelper.containsDiacritiqueText('trend', 'b'); // = false (symbol 'b' is not present)
The final code looked like this:
function LocaleHelper() { // constructor

var i;

// source: http://en.wikipedia.org/wiki/Alphabets_derived_from_the_Latin
// supported languages :
// austro-bavarian, belarusian, croatian, czech, dutch, estonian, finish, french, german, hungarian, irish, italian,
// latvian, lithuanian, polish, portuquese, romanian, slovak, sorbian, spanish, swedish, turkish
//
// todo: not sure how to or if I should process german letter 'ß'
var localUpperVowelList = "ÁÀÂÄĂĀÃÅĄÆÉÈĖÊËĚĒĘÍÌİÎÏĪĮÓÒÔÖÕŐŒÚÙÛÜŬŪŰŮŲÝŸ";
var latinUpperVowelList = "AAAAAAAAAAEEEEEEEEIIIIIIIOOOOOOOUUUUUUUUUYY";

var localLowerVowelList = "áàâäăāãåąæéèėêëěēęıíìîïīįóòôöõőœúùûüŭūűůųýÿ";
var latinLowerVowelList = "aaaaaaaaaaeeeeeeeeiiiiiiiooooooouuuuuuuuuyy";

var localUpperConsonantList = "ĆČÇĎĐĞĢĶĹĻŁĽŃŇÑŅŔŘŚŠŞȘŤŢṬŹŻŽ";
var latinUpperConsonantList = "CCCDDGGKLLLLNNNNRRSSSSTTTZZZ";

var localLowerConsonantList = "ćčçďđğģķĺļłľńňñņŕřśšşșťţṭźżž";
var latinLowerConsonantList = "cccddggkllllnnnnrrsssstttzzz";

this.charMap = [];
for (i = 0; i < localUpperVowelList.length; i++) {
this.charMap[localUpperVowelList.charAt(i)] = latinUpperVowelList.charAt(i);
}
for (i = 0; i < localLowerVowelList.length; i++) {
this.charMap[localLowerVowelList.charAt(i)] = latinLowerVowelList.charAt(i);
}
for (i = 0; i < localUpperConsonantList.length; i++) {
this.charMap[localUpperConsonantList.charAt(i)] = latinUpperConsonantList.charAt(i);
}
for (i = 0; i < localLowerConsonantList.length; i++) {
this.charMap[localLowerConsonantList.charAt(i)] = latinLowerConsonantList.charAt(i);
}
}


LocaleHelper.prototype = {

removeCharDiacritique : function(charToProcess) {

var result = this.charMap[charToProcess];
if ((result == undefined) || (result == null)) {
result = charToProcess;
}

return result;
},

localeCharCompare : function(charA, charB) {

var newCharA = this.removeCharDiacritique(charA);
var newCharB = this.removeCharDiacritique(charB);

return (newCharA == newCharB) ? 0 : ((newCharA < newCharB) ? -1 : 1);

// removed: doesn't work on every browser
// return charA.localeCompare(charB);
},

isLatinLetter : function(character) {
return (character >= 'a' && character <= 'z') || (character >= 'A' && character <= 'Z');
},


// case sensitivity is used only when the words have the same letters (they are read the same way)
diacritiqueComparison : function(textA, textB) {
// todo: note: in Lithuanian alphabet the 'y' character is just before before 'j' - so this algorithm won't work properly

var result = 0;

var caseDiff = 0; // difference in case sensitiveness
var minLength = Math.min(textA.length, textB.length);
for (var i = 0; i < minLength; i++) {
var charA = textA.charAt(i);
var charB = textB.charAt(i);
var lowerA = charA.toLocaleLowerCase();
var lowerB = charB.toLocaleLowerCase();

result = this.localeCharCompare(lowerA, lowerB);
if (result == 0 && lowerA != lowerB) {
result = (lowerA < lowerB) ? -1 : 1;
}

if (result == 0) {
if (caseDiff == 0 && charA != charB) { // first most left difference in case is the only important one
caseDiff = (charA < charB) ? -1 : 1;
}
} else {
break;
}
}

if (result == 0) {
if (textA.length != textB.length) {
result = (textA.length < textB.length) ? -1 : 1;
} else {
result = caseDiff; // if the strings are identical let the case sensitive difference decide
}
}

return result;
},

containsDiacritiqueText : function (fullText, searchText) {

var textA = fullText;
var textB = searchText;

var result = false;

if (textB.length == 0) {
result = true;
} else if (textA.length >= textB.length) {
for (var i = 0 ; i < textA.length - textB.length + 1; i++) {
var found = true;

for (var j = 0; j < textB.length; j++) {
var charA = textA.charAt(i + j).toLocaleLowerCase();
var charB = textB.charAt(j).toLocaleLowerCase();

if (charA != charB) {
if (this.localeCharCompare(charA, charB) != 0) {
found = false;
break;
} else if (!this.isLatinLetter(charB)) {
found = false;
break;
}
}
}

if (found === true) {
result = true;
break;
}
}
}

return result;
}
};


var localeHelper = new LocaleHelper();

Wednesday, May 4, 2011

Porovnanie stringov s diakritikou v javascripte

Dnes som od zákazníka dostal zadanie napísať v javascripte metodu pre zotriedenie textov. Čo by bola celkom jednoduchá úloha, ak by išlo anglické texty. No bohužiaľ, naša slovenská (a takisto česká) abeceda má jednu nechutnú vec: diakritiku.

Ak sa teda snažíte zotriediť tieto texty:
  • "cudzí", "čučoriedka", "ťava", "tŕň", "trstina"
dostanete pri štandartnom porovnávaní textov túto postupnosť:
  • "cudzí", "trstina", "tŕň", "čučoriedka", "ťava"
A to nie je práve najlepšie usporiadanie (štandartne su prvé veľké písmená bez diakritiky, nasledované malými písmenami bez diakritiky, veľké písmená s diakritikou a na záver malé písmená s diakritikou).

Klasické porovnanie teda nie je dostačujúce. Skúšal som teda nájsť niečo vhodnejšie a narazil som na celkom peknú metódu:
  • textA.localeCompare(textB)
S jej použitím som dosiahol o trošku lepší, nie však dostačujúci výsledok:
  • "čučoriedka", "cudzí", "ťava", "tŕň", "trstina"
Problém tejto metódy je že diakritiku úplne odignoruje a teda nikdy nedá znak s diakritikou za znak bez diakritiky, ale tieto znaky majú v porovnaní identické postavenie (čo spôsobilo, že znak 'č' sa ocitol pred znakom 'c', znak 'ť' pred znakom 't' a znak 'ŕ' pred znakom 'r').

K tomu sa ešte pridali problémy s kapitálkami. Skúšal som nájsť nejaké vhodné riešenie na internete bohužiaľ žiadne nebolo dostačujúce (plus nemal som chuť vymenovávať všetky znaky s diakritikou - som detailista a chcel som aby to fungovalo pre všetky jazyky odvodené z latinskej abecedy).

A tak som dospel k niečomu, čo normálne robia všetci nadšený programátori (a čo sa štandartne považuje za chybu). Napíšem si túto metódu sám. A toto je čo nakoniec vzniklo:
   function diacritiqueComparison(textA, textB) {
var result = 0;

var caseDiff = 0; // difference in case sensitiveness
var minLength = Math.min(textA.length, textB.length);
for (var i = 0; i < minLength; i++) {
var charA = textA.charAt(i);
var charB = textB.charAt(i);
var lowerA = charA.toLocaleLowerCase();
var lowerB = charB.toLocaleLowerCase();

result = lowerA.localeCompare(lowerB);
if (result == 0 && lowerA != lowerB) {
result = (lowerA < lowerB) ? -1 : 1;
}

if (result == 0) {
if (caseDiff == 0 && charA != charB) { // first most left difference in case is the only important one
caseDiff = (charA < charB) ? -1 : 1;
}
} else {
break;
}
}

if (result == 0) {
if (textA.length != textB.length) {
result = (textA.length < textB.length) ? -1 : 1;
} else {
result = caseDiff; // if the strings are identical let the case sensitive difference decide
}
}

return result;
}
Po jej použití som už dosahoval celkom uspokojivé výsledky:
  • "cudzí", "čučoriedka", "trstina", "tŕň", "ťava"
Dodatočná poznámka z nasledujúceho dňa: bodaj by porazilo celý ten javascript. Metóda localeCompare funguje na každom prehliadači inak, dokonca sa mi zdá, že je aj rozdiel medzi IE verziami (vďaka čomu som si ešte aj túto metódu musel naprogramovať sám - a áno, nakoniec som vymenovával znaky s diakritikou :(). Kde toto všetko skončí.

Thursday, April 28, 2011

XSS - prevent path manipulation

Few days ago I have been asked by a customer to add validation for file download functionality.

They needed to verify that the file requested from web page is in working directory or in some of its sub directories and log once it isn't. And this is the final code (if you know of some better/existing solution, just let me know):

public static String correctFilePath(String filePath, boolean allowSubDirAccess) {
String newFilePath = filePath;

if (StringUtils.isNotBlank(newFilePath)) {
String normalizedFilePath = new File(newFilePath).getPath();

if (StringUtils.startsWith(normalizedFilePath, File.separator)) {
log.error("attempt to access the root directory: " + filePath);
}

List<String> allowedPathElements = new ArrayList<String>();
for (String pathElement : StringUtils.split(normalizedFilePath, File.separator)) {
pathElement = pathElement.trim();
if (!".".equals(pathElement)) {
if (pathElement.contains(":")) {
log.error("attempt to access the root directory: " + filePath);
} else if ("..".equals(pathElement)) {
if (allowedPathElements.size() > 0) {
allowedPathElements.remove(allowedPathElements.size() - 1);
} else {
log.error("attempt to access parent of working directory: " + filePath);
}
} else {
allowedPathElements.add(pathElement);
}
}
}

if (!allowSubDirAccess) {
if (allowedPathElements.size() > 1) {
log.error("attempt to access files not located in working directory: " + filePath);
}

newFilePath = allowedPathElements.get(allowedPathElements.size() - 1);
} else {

newFilePath = StringUtils.join(allowedPathElements, File.separator);
}
}

return newFilePath;
}

Wednesday, January 26, 2011

Trapenie s Hibernate proxy

V nasledujucom clanku popisem jeden z pripadov, kedy sa vam moze vytvorit problematicka hibernate proxy.

V podstate to je celkom jednoduche. Ked mate velku stromovu strukturu, nie stale chcete aby sa vam nacitali vsetky objekty, ktore sa v nej nachadzaju. Chcete vsak aby sa niektore casti dotahovali len ak ich treba, pripadne sa dotiahnu v specifickych selectoch (napriklad pouzitim LEFT JOIN FETCH).
Aby sa toho dosiahlo definuje sa na niektore vazby/fieldy lazy loading = FetchType.LAZY (teda sa z databazy nacitaju az ked ich skutocne treba). No a niekedy sa stane, ze niekto toto urobi aj na fieldy ktorych trieda predstavuje hierarchicku strukturu (toto je zakladny kamen urazu). Teda napriklad je takto oznaceny field typu Animal od ktoreho je oddena trieda Cat, Dog, a Mouse.

Ak si teda nacitate triedu, ktora ma lazy loaded field, hibernate pre neho vytvori proxy (hibernate to takto robi pre vsetky lazy loading fieldy). A az ked sa na tej proxine zavola nejaky getter/setter tak hibernate spusti mechanizmus, ktory z databazy dotaha vsetky informacie a tento (doteraz nedotiahnuty) objekt vytvori. Takze teraz mame field ktory odkazuje na proxinu a ta sa dalej odkazuje (v sebe zaobaluje) skutocne nacitany (nami ocakavany objekt). Vsetky dotazy na gettre a settre sa potom cez proxinu zavolaju na tomto objekte.

No a teraz prichadza ta zaujimava cast: dovodom preco sa dava lazy loading je aby sa nerobilo zbytocne vela selectov do tabuliek, ktorych data sa vobec nevyuzivaju. Co znamena, ze sa teda nerobi ani select do tabuliek pre dany lazy loading field. A tu je ta dolezita vec. Hibernate teda v case vytvarania proxy nepozna skutocny typ objektu, ktory pod proxinou bude, nakolko nevykonal selekt ktory by mu povedal, ze to je napriklad Dog. Nakolko teda nevie, ci to bude Dog, Cat, Mouse alebo Animal, vytvori hibernate proxinu ktora ma ten isty typ ako je zadefinovany na fielde. V nasom pripade teda vytvori proxinu typu Animal. A aj ked sa pod nou neskor dotiahne objekt typu Dog, tak ked na tejto proxine urobime instanceof Dog, tak nam java vyhodi false. Dovodom je, ze aj ked proxina pod sebou ma ulozeny objekt typu Dog ona sama je typu Animal (a nas field sa odkazuje na tuto proxinu a nie na konkretny objekt pod nou). Inak len tak pre zaujimavost: v pripade ze by Animal, Dog, Mouse a Cat boli interfacy, potom by proxina implementovala VSETKY (teda instanceof by vratila true na lubovolny z nich).

Takze, uz len jedna vec a mame to hotove :D.

p.s.: tato chyba je viac menej sposobena zlym navrhom
p.s.2: plus toto je specificka vec, ktora sa prikladom neda ukazat, nakolko z prikladu nevidno ze vobec nejaky problem existuje.

Takze, ale aby sme pokracovali dalej. Dufam, ze dovodu, preco hibernate take proxiny vytvara ste teda pochopil?

A teraz prichadza na radu dalsia taka zaujimava vec: hibernate ma jedno klucove pravidlo. V pripade, ze bola entita nacitana do sessiony (stane sa manazovanou) musi byt tato entita (v danej sessione) reprezentovana len jednym jedinym objektom. Co v skratke znamena ze ak by ste v jednej sessione urobil dvakrat za sebou select na jednu a tu istu entitu, dostanete stale identicky objekt. Nie teda objekt s rovnakymi fieldami ale UPLNE TEN ISTY objekt. Teda zmena v objekte z prveho selektu sa premietne v objekte z druheho (nakolko ide o jeden objekt). To vsak znamena i to, ze ked je entity reprezentovana proxinou, tak lubovolny selekt na nu vrati tiez (v danej sessione) proxinu. Co uz vsak trosicku zavana problemom.

Moze sa totiz stat situacia ze, niekto si dotiahne objekt, ktory obsahuje odkaz na danu entitu typu Animal. A nakolko tento odkaz/field je urceny na lazy loading vytvori sa pre danu entitu proxina. Dajme tomu ze toto urobila metoda A. Nasledne sa zavola niekde inde v kode (ale stale v tej istej transakcii/sessione) metoda B. A v metode B (ktoru pisal niekto iny ako metodu A) sa dotazeme na tuto Animal entitu. Hibernate nam vsak vrati proxinu, pod ktoru dotiahne uz konkretny objekt typu Dog (nakolko aj pre metodu A aj pre metodu B islo o tu istu entitu a MUSI to teba byt ten isty objekt). Bohuzial vsak tato proxina je typu Animal a nakolko my potrebujeme upravovat fieldy Dog-a, dostali sme sa do slepej ulicky. Najhorsie vsak je, ze v tomto pripade uz nepomoze ani LEFT JOIN FETCH na dane Animal.

Toto je asi najcastejsie ako sa tato chyba prejavuje (hlavne ak sa na pisani kodu podiela viac ludi). Clovek si dotiahne objekt o ktorom vie ze je urciteho typu ale miesto toho mu pride len proxina typu superclass. A fakt netusi preco tomu tak je. A mozno len nahodou pride na to, ze predtym sa volal nejaky (mozno validacny kod) niekoho ineho, ktory sposobil, ze tato entita je do hibernate sessiony ulozena ako proxy a hibernate pri kazdom dalsom dotaze nic ine ako proxy nevrati.

Hotovo :D

No teda, skoro. Zostava tu totiz este jeden problem. Nakolko s tymto objektom potrebujeme nejako realne robit, tak nam ta proxina trosku vadi. Casto sa takato chyba odhali vo faze projektu, kedy nejaka rapidnejsia upravu designu predstavuje problem a tento problem je teda potrebne riesit nejak inak.

V uvahu pripadaju take tri zakladne riesenia, pricom prve dve z nich su skoro az hnusne:
  1. nacitat a spracovat entitu v novej sessione/transakcii (pre vacsinu pripadov nerealne riesenie)
  2. nacitanu manazovanu entitu vyhodit zo session cache
    • vyhodenim vsetkych objektov zo session cache (trosku drasticke riesenie)
    • nacitat problematicky objekt, vyhodit ho zo session cache a znovu ho nacitat (trosku hlupe riesenie)
  3. pouzit util triedu ktora sa o to postara (mnou preferovana varianta):

public class ProxyUtil {

public static <T> T deproxy(final T object) {
// deproxy object
if (object == null) {
return null;
}
return (object instanceof HibernateProxy) ?
(T) ((HibernateProxy) object).getHibernateLazyInitializer().getImplementation() :
object;
}
}

Tato utilka zoberie ako vstup lubovolny objekt. Ak je tento objekt hibernate proxy, vrati objekt pod nou ulozeny, inak vrati nezmeneny objekt.

Tak a teraz je to uz fakt vsetko