Saturday, May 21, 2011

Some geecon 2011 notes

A week has passed and I have forced myself to put together some notes I have scribbled during my stay there:

Java 7

What have we found: the good part is that oracle is holding up to the schedule and there is a big chance that this year there will be finally a new java version. I am still curious if this will somehow change the so often said question: "java is dead", but otherwise this is good. So what interesting functionalities can we find in java 7:
  • support for dynamic languages - so your groovy, scala, jruby will be once again faster better cooler
  • project coin : so you can write something like this: List<String> list = new ArrayList<>();
  • fork-join - concurrency library that is not bad, but I would maybe concentrate more on concurrency alias actors (for example akka). You could try this library long time ago as it was available as jsr166y.jar
  • after many many years, you'll be able to use strings in switch statement (although everybody knows that switch - case is bad and we should use strategy pattern instead :D)
  • automatical closing of Closable resources opened in the beginning of try statement
  • one catch for multiple exceptions
For me it was a disappointment that the collection usage simplification won't make it into java 7 (although I knew this before I still had some hope).

To see how this will work in action you can look at this quite old article.


Spring

Besides the presentation on the whole spring ecosystem we heard something about the long awaited spring 3.1. They added caching support (quite old information) as they had no more patience to wait for the never coming java standard. Also it should be fully compatible with java 7. It is possible that the execution and scheduling framework will use the new fork-join library. For me a it was quite interesting how easily (with few lines) you can integrate with many network and social solutions: email, XMPP, RSS/Atom, twitter. Twitter can be also used for one to one communication. Spring also checks if you haven't reached the twitter message limit (the limit check works even when you are writing messages from multiple devices/platforms). Also we learned that twitter is often making incompatible changes. And as Spring relies on 3rd party libraries in this integration they can't control the speed of change adaption. This is also why they announced "spring social" (also quite old information) where they will write and adapt all code by their internal team. So by using spring social you should be almost always up to date with current twitter api. The last interesting thing (which some people still don't know) is spring data that provides a simplification of jdbc, hibernate, jpa and noSql databases processing


Hadoop

This was quite a good introduction into the whole hadoop ecosystem. From what I know Hadoop is based on some research papers from google. You can use it to execute tasks over massive amount of data where the task can be divided into smaller tasks and then the result can be combined to produce one global result. The heart of it all is HDFS (hadoop file system), that is based on GFS (google file system). It presents a distributed file system layer, that is positioned above your physical disks. Over it operates the MapReduce framework. To write your custom map reduce tasks/jobs you have to implement interfaces Mapper and Reducer and extend class MapReduceBase, plus define a job configuration (but you are not limited just to java - although it seems that java is one of the most perforant solutions). Another interesting module is HBase which lays over the HDFS and provides equivalent functionality to google's bigtable (from my point of view Hadoop is just one big loot stolen from google that is provided as open source - but this is good). Hbase can be used as input or output for MapReduce jobs. The good thing about Hadoop is that you can process tasks over massive new data without the risk that it will blow up, because it needs to be reindexed first (as you might see in standard database solutions). For me a new information was that cloudera provides a simplification of the installation process and a business (payed) support (which can be a good argument for your customers)


Code generation

There was a presentation of some frameworks that provide you code generation from annotations. Lombok provides annotations to simplify your work: @Getter, @Setter, @Synchronized (even when it seams stupid it does it better than simple synchronize), @EqualAndHashCode, ... The annotations will be replaced by code during compilation process (to make this work the process must be written for each platform: windows, linux, ...). Other solution was Spring Roo, that uses aspectj and *.aj files (which you probably already know). There were also some interesting annotations from Groovy: @Delegate - when used on variable, then the encapsulating class has all its methods and they refer to this object (quite cool - from what I know there is no such thing in java), @Lazy - lazy initialization, @Singleton - proper singleton implementation. For me quite new thing was contract driven programming where you can define a precondition that must be met so that a method can be executed without exception and an after condition - to be honest I am not sure what will happen if this after condition will be violated (as I don't expect it will do some rollback). The sad thing is that the ide support is sometimes not good (for example for Lombok in IDEA). I've also learned new java command: javap that will show you the skeleton of compiled class (everything except method implementation details)


Refactoring

I'll just summarize some interesting and practices (different from the ones so often heard):
  • set timer to 2 hours, start your work. review your work. is it better -> commit it. is it worst -> revert
  • do incremental changes
  • the code with most revisions is the best candidate for test coverage
  • if you want to use new libraries, but your project tech lead won't allow you to put them into production, put them into test code :D
  • have fun, for example give some small awards to those who are most active in code maintenance (like for example a picture on the wall "maintainer of the week", or badges "100% test code coverage guru")
  • good argument for unit tests: when you write them other colleagues can't *** up your code


New J2EE

I have seen only a small glimpse of it, but it seems very readable and simple. Although some spring guys have been making fun of j2ee as from they point of view (when we omit glassfish) nobody supports the whole standard (even when it is released longer than a year now).


GridGain

After some live demos a left the presentation with impression that this is a cool framework for distributed programming in scala (at least from the maintenance point of view). The distribution of code is almost automatic (not much more than compilation of code is required), new nodes execute jobs instantly after they start and it seemed that there is no complex configuration. So the main goal you have to focus on is your business logic (MapReduce/Actors).


Neo4j

This graph oriented database (yeees, you're right: NoSQL) was presented by the most charismatic presenter from the whole conference. From other NoSQL databases it has some interesting features. It supports transactions. As it is a graph oriented database it stores objects and maps relations between them. Also the querying has a language defined for it, so you don't have to figure out how to retrieve the data from your key value store and also there is no need to write map-reduce functions. It is good for data retrieval in situations where normal database would blow up, because of the required joins (also the speed of querying seems to be unaffected by the data amount, which is quite a opposite situation as you can see it on relational databases). So for example work with hierarchies is a great candidate for Neo4j (you can also define how deep would you like to go, to prevent the out of memory error once you would retrieve too much data). There is heavy use of caching (in this point I am not sure how the first query (before anything is cached) would perform). But there are also some things you must be aware of. The neo4j doesn't scale well (between multiple nodes) when your data is heavily interconnected. As in this case there is a great traffic between nodes that slows the whole query process. To assure great scalability you should be able to split your data into separate data buckets, where each node would host one to many complete data bucket.


Hibernate

From my point of view, it is really hard to say something new about hibernate. But there has been a good summarization presentation about the common pitfalls/mistakes.


Node.js

This was a part of the workshop day. It contained a small introduction to node.js (if you review the presentation on the node.js page you'll get the almost same information), plus we heard a small but clear explanation of javascript prototyping. After this we downloaded some great practical example (mortal combat client/server written in node.js - code can be found here). Once we got it running we were asked to implement the login screen. Oh and yes, for those who would like to try it out, I would like to cite one advice that has been told to me: "are you trying to run it on cygwin? forget it! Its no good."

my notes:
- if you hate javascript I advice you to see this presentation (that has nothing to do with geecon :))
- node.js has good integration with html 5 web sockets (if you don't know why they are the new hype, read this article)


So in the end I would like to do some summarization. Once I left geecon I had some mixed feelings. From my point of view it is hard to say something new about java as most of the interesting stuff is now being processed in different languages: groovy, scala, ruby, javascript. But somehow they have managed to make it a quite interesting conference. The only thing I was maybe missing were some electric plugs (as they are normally not present in multi-cinemas) (yeah I know, I should have bought a mac :() but otherwise it was ok.

notes for me:
- never sit down. you'll fall asleep
- don't eat polish eggs

Monday, May 16, 2011

Some hints I have learned about ActiveVOS (BPMS solution) so far

If you expect many long running processes (or what is worse human tasks) set the system logging level to none and define the logging for each of your process separately
Process logs can take a lot of your database storage. Especialy the logs for human tasks.
Not that they wouldn't be big if we would take a look at them separately. One log will take only few bytes. The problem is that there are many of them.
From our measurement we had approximately around 5500 log records for one human task which could consume almost 1MB of logical datastorage. As the system process are using the default logging level then by setting it to none you'll be rid of this problem (as you'll almost never need the logs from system processes - event the support never wants them).

Try to avoid architectural design based on manual process fixing in case of failure
One of the biggest advantages of ActiveVOS is that you can set the logging to full
level for any of your processes and once there is a failure you can return to any
point of it, fix some data and continue from there as if nothing happened.
The disadvantage is that once you overuse this "pattern", the maintenance cost of your solution will heavily increase as you'll always need some additional guys that will fix your failed processes from the point they arived into the office
(but that of course depends on the number of processes you're dealing with).
I am not saying that you should absolutely omit this pattern but I suggest that
you should always look if there is no additional automatic solution.

Be prepared that some things won't work
Nothing is perfect and ActiveVOS (even when it provides a cool solution) has some of it flaws. So be prepared that in some cases you'll end up with a workaround. To just name some that we needed here is a short list:
  • XQuery methods "instance of" and "typeswitch" are not working properly in once you want to check an element type that might come from a hierarchy of types. The problem is that the saxon library which is underneath is not providing this functionality. The same goes for method like "rename".
  • If you're using asynch calls with WS-Addressing than the timeout policy is not working if you'll define it on reply activity.
  • If multiple wsdl files in one process are using the same namespace then they must be merged. I don't know how it is in newer versions (we lastly used this feature in version 6.2), but the eventing can show some unexpected behavior once used in clustered environment (but as said this might be already fixed).
But don't be afraid as these are some minor issues for which you can surely find a workaround and might be fixed over the time.

Don't use human tasks as mere notifications for FINAL task that needs to be done
From collegues project I have the experience that the people forget to close these FINAL TASK "notifications" once the work is finished as for them its not as important (because they have already a new notification/task that needs to be done).
Even the management doesn't care as everything has been done/finished, so (from the bussiness point of view) who cares that the process was not closed.
This creates not only confusion about which tasks are finished and which not (sure the user knows), but also makes a heavy usage of the database as unfinished system processes that run the human task can consume much of the database storage and can't be deleted if we wan't to decrease the database size.

Try to get all your important business data out of your processes and store them into
separate database
before the process ends
Currently there is no archivation mechanism (of what I would know) that would allow you to archive only a part of your processes (for example completed ones) and leave other in tact.
The only thing you can do is archive your ActiveVOS database as a whole.
And than recover it as a whole.
This can be very tricky as it prohibits any incremental recoveries or storages (you can write it on your own, but than you'll loose the support - and which customer would be willing to loose that) and would work create a backup when you have a great mix of completed and running processes. You'll probably get to this problem once you'll have many finished processes as they can cost a lot of database memory (in one big project we're already over 600GB !!! of Oracle database). The best thing how you can prepare yourself for this is to a create some persisting activities in your processes, that will gather all meaningfull informations and store them into separate database. This way once you'll encounter some problems with database space, you can easily delete some of your old completed processes.
Oh and don't forget to put them in the first version of your processes as your deletes can be only used for processes "older than" (so forget defining time range from - to) and you wouldn't be happy if the oldest processes without persistence mechanism would be deleted/sacrificed for the sake of the database (as you must delete the newer versions).

Friday, May 6, 2011

Javascript localization hell

Not long time ago I visited a hell. The hell was called javascript localization. And the headache started when I wanted to sort some strings. To do so I needed to compare them first. This would seem as a simple task if you would be an english gentleman without the fear of diacritiques. But as I live and work in a country where diacritique is ever present I needed to process it properly. Method localeCompare was working differently on every browser. Also if method indexOf was used for searching trough text with diacritique using search expression without diacritique it was not working properly.

So I once again ended with something I don't like very much. I wrote my custom javascript util that could be used like this:
localeHelper.diacritiqueComparison('čerešne', 'citron'); // = 1
localeHelper.diacritiqueComparison('čerešne', 'hrozno'); // = -1

// this is case insensitive
localeHelper.containsDiacritiqueText('štart', 'st'); // = true
localeHelper.containsDiacritiqueText('štart', 'št'); // = true
localeHelper.containsDiacritiqueText('štart', 'tar'); // = true
localeHelper.containsDiacritiqueText('trend', 'tr'); // = true
localeHelper.containsDiacritiqueText('trend', 'ťr'); // = false (as we explicitly want the symbol 'ť' and not 't')
localeHelper.containsDiacritiqueText('trend', 'b'); // = false (symbol 'b' is not present)
The final code looked like this:
function LocaleHelper() { // constructor

var i;

// source: http://en.wikipedia.org/wiki/Alphabets_derived_from_the_Latin
// supported languages :
// austro-bavarian, belarusian, croatian, czech, dutch, estonian, finish, french, german, hungarian, irish, italian,
// latvian, lithuanian, polish, portuquese, romanian, slovak, sorbian, spanish, swedish, turkish
//
// todo: not sure how to or if I should process german letter 'ß'
var localUpperVowelList = "ÁÀÂÄĂĀÃÅĄÆÉÈĖÊËĚĒĘÍÌİÎÏĪĮÓÒÔÖÕŐŒÚÙÛÜŬŪŰŮŲÝŸ";
var latinUpperVowelList = "AAAAAAAAAAEEEEEEEEIIIIIIIOOOOOOOUUUUUUUUUYY";

var localLowerVowelList = "áàâäăāãåąæéèėêëěēęıíìîïīįóòôöõőœúùûüŭūűůųýÿ";
var latinLowerVowelList = "aaaaaaaaaaeeeeeeeeiiiiiiiooooooouuuuuuuuuyy";

var localUpperConsonantList = "ĆČÇĎĐĞĢĶĹĻŁĽŃŇÑŅŔŘŚŠŞȘŤŢṬŹŻŽ";
var latinUpperConsonantList = "CCCDDGGKLLLLNNNNRRSSSSTTTZZZ";

var localLowerConsonantList = "ćčçďđğģķĺļłľńňñņŕřśšşșťţṭźżž";
var latinLowerConsonantList = "cccddggkllllnnnnrrsssstttzzz";

this.charMap = [];
for (i = 0; i < localUpperVowelList.length; i++) {
this.charMap[localUpperVowelList.charAt(i)] = latinUpperVowelList.charAt(i);
}
for (i = 0; i < localLowerVowelList.length; i++) {
this.charMap[localLowerVowelList.charAt(i)] = latinLowerVowelList.charAt(i);
}
for (i = 0; i < localUpperConsonantList.length; i++) {
this.charMap[localUpperConsonantList.charAt(i)] = latinUpperConsonantList.charAt(i);
}
for (i = 0; i < localLowerConsonantList.length; i++) {
this.charMap[localLowerConsonantList.charAt(i)] = latinLowerConsonantList.charAt(i);
}
}


LocaleHelper.prototype = {

removeCharDiacritique : function(charToProcess) {

var result = this.charMap[charToProcess];
if ((result == undefined) || (result == null)) {
result = charToProcess;
}

return result;
},

localeCharCompare : function(charA, charB) {

var newCharA = this.removeCharDiacritique(charA);
var newCharB = this.removeCharDiacritique(charB);

return (newCharA == newCharB) ? 0 : ((newCharA < newCharB) ? -1 : 1);

// removed: doesn't work on every browser
// return charA.localeCompare(charB);
},

isLatinLetter : function(character) {
return (character >= 'a' && character <= 'z') || (character >= 'A' && character <= 'Z');
},


// case sensitivity is used only when the words have the same letters (they are read the same way)
diacritiqueComparison : function(textA, textB) {
// todo: note: in Lithuanian alphabet the 'y' character is just before before 'j' - so this algorithm won't work properly

var result = 0;

var caseDiff = 0; // difference in case sensitiveness
var minLength = Math.min(textA.length, textB.length);
for (var i = 0; i < minLength; i++) {
var charA = textA.charAt(i);
var charB = textB.charAt(i);
var lowerA = charA.toLocaleLowerCase();
var lowerB = charB.toLocaleLowerCase();

result = this.localeCharCompare(lowerA, lowerB);
if (result == 0 && lowerA != lowerB) {
result = (lowerA < lowerB) ? -1 : 1;
}

if (result == 0) {
if (caseDiff == 0 && charA != charB) { // first most left difference in case is the only important one
caseDiff = (charA < charB) ? -1 : 1;
}
} else {
break;
}
}

if (result == 0) {
if (textA.length != textB.length) {
result = (textA.length < textB.length) ? -1 : 1;
} else {
result = caseDiff; // if the strings are identical let the case sensitive difference decide
}
}

return result;
},

containsDiacritiqueText : function (fullText, searchText) {

var textA = fullText;
var textB = searchText;

var result = false;

if (textB.length == 0) {
result = true;
} else if (textA.length >= textB.length) {
for (var i = 0 ; i < textA.length - textB.length + 1; i++) {
var found = true;

for (var j = 0; j < textB.length; j++) {
var charA = textA.charAt(i + j).toLocaleLowerCase();
var charB = textB.charAt(j).toLocaleLowerCase();

if (charA != charB) {
if (this.localeCharCompare(charA, charB) != 0) {
found = false;
break;
} else if (!this.isLatinLetter(charB)) {
found = false;
break;
}
}
}

if (found === true) {
result = true;
break;
}
}
}

return result;
}
};


var localeHelper = new LocaleHelper();

Wednesday, May 4, 2011

Porovnanie stringov s diakritikou v javascripte

Dnes som od zákazníka dostal zadanie napísať v javascripte metodu pre zotriedenie textov. Čo by bola celkom jednoduchá úloha, ak by išlo anglické texty. No bohužiaľ, naša slovenská (a takisto česká) abeceda má jednu nechutnú vec: diakritiku.

Ak sa teda snažíte zotriediť tieto texty:
  • "cudzí", "čučoriedka", "ťava", "tŕň", "trstina"
dostanete pri štandartnom porovnávaní textov túto postupnosť:
  • "cudzí", "trstina", "tŕň", "čučoriedka", "ťava"
A to nie je práve najlepšie usporiadanie (štandartne su prvé veľké písmená bez diakritiky, nasledované malými písmenami bez diakritiky, veľké písmená s diakritikou a na záver malé písmená s diakritikou).

Klasické porovnanie teda nie je dostačujúce. Skúšal som teda nájsť niečo vhodnejšie a narazil som na celkom peknú metódu:
  • textA.localeCompare(textB)
S jej použitím som dosiahol o trošku lepší, nie však dostačujúci výsledok:
  • "čučoriedka", "cudzí", "ťava", "tŕň", "trstina"
Problém tejto metódy je že diakritiku úplne odignoruje a teda nikdy nedá znak s diakritikou za znak bez diakritiky, ale tieto znaky majú v porovnaní identické postavenie (čo spôsobilo, že znak 'č' sa ocitol pred znakom 'c', znak 'ť' pred znakom 't' a znak 'ŕ' pred znakom 'r').

K tomu sa ešte pridali problémy s kapitálkami. Skúšal som nájsť nejaké vhodné riešenie na internete bohužiaľ žiadne nebolo dostačujúce (plus nemal som chuť vymenovávať všetky znaky s diakritikou - som detailista a chcel som aby to fungovalo pre všetky jazyky odvodené z latinskej abecedy).

A tak som dospel k niečomu, čo normálne robia všetci nadšený programátori (a čo sa štandartne považuje za chybu). Napíšem si túto metódu sám. A toto je čo nakoniec vzniklo:
   function diacritiqueComparison(textA, textB) {
var result = 0;

var caseDiff = 0; // difference in case sensitiveness
var minLength = Math.min(textA.length, textB.length);
for (var i = 0; i < minLength; i++) {
var charA = textA.charAt(i);
var charB = textB.charAt(i);
var lowerA = charA.toLocaleLowerCase();
var lowerB = charB.toLocaleLowerCase();

result = lowerA.localeCompare(lowerB);
if (result == 0 && lowerA != lowerB) {
result = (lowerA < lowerB) ? -1 : 1;
}

if (result == 0) {
if (caseDiff == 0 && charA != charB) { // first most left difference in case is the only important one
caseDiff = (charA < charB) ? -1 : 1;
}
} else {
break;
}
}

if (result == 0) {
if (textA.length != textB.length) {
result = (textA.length < textB.length) ? -1 : 1;
} else {
result = caseDiff; // if the strings are identical let the case sensitive difference decide
}
}

return result;
}
Po jej použití som už dosahoval celkom uspokojivé výsledky:
  • "cudzí", "čučoriedka", "trstina", "tŕň", "ťava"
Dodatočná poznámka z nasledujúceho dňa: bodaj by porazilo celý ten javascript. Metóda localeCompare funguje na každom prehliadači inak, dokonca sa mi zdá, že je aj rozdiel medzi IE verziami (vďaka čomu som si ešte aj túto metódu musel naprogramovať sám - a áno, nakoniec som vymenovával znaky s diakritikou :(). Kde toto všetko skončí.