Summary: | Implement Code refactoring | ||
---|---|---|---|
Product: | [Applications] kdevelop | Reporter: | Sebastien <slaout> |
Component: | Language Support: CPP (old) | Assignee: | kdevelop-bugs-null |
Status: | RESOLVED FIXED | ||
Severity: | wishlist | CC: | aldoluciano, bluedzins, david.nolden.kde, esigra, ghcsilva, hattons, heithecker, killerfox512, mcguire |
Priority: | NOR | ||
Version: | 4.0.0 | ||
Target Milestone: | --- | ||
Platform: | Mandrake RPMs | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Sebastien
2003-10-27 13:12:30 UTC
I agree. KDevelop needs work on refactoring. Some more ideas: - Rename attribute/variable/methods (and all references to them) - Change method signature - Determin 'const' methods - Deprecate selected methods (so gcc prints warning etc when method is used) These should be accessible by selecting methods/attributes from within the class view tab and when the cursor is on a method/attribute from the editor. The first two rather simple suggestions below would greatly aid my development and others as well. The last two are some things I've heard KDE devs might like to have -- easy ability to mark a method as const if it is indeed const and the ability of a one-click deprecation of methods. Though not refactoring strictly speaking, I have this addition to the list, since it is very close to "Change method signature" funktionality wise. - "Implement this", i.e. I'm in a header file, the cursor is in a line defining a method which has not jet been implemented. e.g. int f(int x); I would then invoke the "Implement this" feature (shortcut, RMB-popup) and in the corresponding implementation file the body of the method would magicaly appear: int myclass::f(int x) { <-- cursor blinking here, ready to edit. } >I'm in a header file, the cursor is in a line defining a method which has not jet been >implemented. e.g. int f(int x); I would then invoke the "Implement this" feature >(shortcut, RMB-popup) and in the corresponding implementation file the body of the >method would magicaly appear:
Move the cursor over your method declaration, press F2 and enjoy the magic ;).
And also allow manipulate files easier. For example, rename files : - Right clic on Files tab, or in automake tab... to have this possibility - rename .h and .cpp together - Replace all #include "old.h" to #include "new.h" Refactoring support is one of the best time saving improvements on IDE's on the last years. The best (and most complete) IDE here is IntelliJ IDEA and it could be used as an example of a fantastic refactoring support (and source code analysis, code navigation, GUI building, etc, etc. It's an awesome IDE) Almost all refatoring techniques are described in deep in: http://www.amazon.com/exec/obidos/ASIN/0201485672/qid=1101824641/sr=2-1/ref=pd_ka_b_2_1/103-9328337-3678200 Here there is online info (not as good and clear as the book): http://www.refactoring.com/ Other Open Source C/C++ IDE's like the Eclipse CDT project included basic refactoring support from the very beginning (like the "rename method" technique) Other IDEs like NetBeans (4.0), JBuilder (9, 10), etc, already have advanced refactoring support. My votes within refactoring are for: - Rename method / property - Generate getters / setters - Extract method - Move code - Change signature I fully agree that refactoring is a killer feature of an IDE. Unfortunately, as you have noticed, only the Java IDE have implemented it. Why ? Because the C++ is awfully difficult to parse and thus refactor. I would be curious to see how far eclipse has gone, but developing a C++ refactoring library would be a fantastic project that should be shared between IDE. The only C++ refactorer I know of is closed source: http://www.xref-tech.com/xrefactory/main.html However, there are encouraging projects: - using the parser of source navigator, you could get a lot of informatino already on the C++ code. Problem is that it creates a local database of all source but that would be an acceptable trade-off to me: http://sourcenav.sf.net/ - gcc-xml could have provided this feature but they did not: http://www.gccxml.org/HTML/FAQ.html - there should be a way to use the parser of OpenC++ to extract the syntax tree and refactor after that: http://www.csg.is.titech.ac.jp/%7Echiba/openc%2B%2B.html So, lot of potential but nothing. Refactoring C++ is really really difficult. Not only Java IDEs implement it. Visual SlickEdit has it (http://www.slickedit.com). Also Ref++ (http://www.ideat-solutions.com/refpp/) a plugin to add C++ refactoring to VS. On the other hand, the next VS 2005 will include refactoring support. I *don't know* if it'll be available for C++ or just C#, but if Microsoft is adding support for it, my guess is that sooner or later all its OO languages will include it. As you said, parsing C++ is very hard. But the value of refactoring is incredible. Not only is a killer productivity tool, it allows you to definitely increase the quality of your code. Concerning, recactoring - "kscope" ( http://kde-apps.org/content/show.php?content=9910 ) looks like it could be a useful addition to kdevelop, maybe as some sort of kpart , the author even mentions interest towards adding "kscope" to kdevelopp. I agree with the above comments, refactoring tools in kdevelop would bring it too a new level. I like Jonas Jacobi's idea of an "implement this" feature. You could also set up template for comments included above the method: e.g. //=================================================== // // Method: %C::%M // // Description: // //--------------------------------------------------- %D //=================================================== %C, %M and %D would then be replaced with the class name, method name and method declaration respectively. Personally, I really wouldn't mind it all that much whether such a library would need to build a huge database of the source tree, I think that's a worthwhile tradeoff. One should probably try to contact the major opensource IDE projects and try to get a team of people together who would like to focus their efforts on actually designing and thus developing an opensource cross platform refactoring *library*, such a library could be based on similar projects (i.e. ctags etc.) but could ultimately be used by all IDE projects easily. That way, everybody would benefit from such an effort. Also, it would probably be worthwhile to think about supporting a scripting language such as perl, so that the library itself will only provide the backend, but not the actual parsing logics. Indeed, I think it might be a good idea to actually start defining the goals and requirements of such an effort, so that people can actually start making design drafts. Also, I'm really convinced that such a project would become quite popular if it's started under the umbrella of the KDE/KDevelop projects. Given the amount of votes for this 'feature' (*PROJECT* that is) it should really not be surprising to see many people joining this effort pretty soon. I'm really convinced that if we start a corresponding sourceforge project, set up a forum and point everybody to it, we would very soon be able to get this rolling. So, as a start: anybody who's interested in helping, please drop a note here :-) Re. Michael's comment, I'd like to help design and code such a refactoring library. I suppose we should wait for some interest and then start a code analysing and refactoring library project. So, I'm in to work on this. Hi, I did play quite a bit with this idea and I my thoughts so far have lead to the following conclusions, feel free to comment :-) a) it would be crucial to really make this a highly portable library that does not have any unreasonable dependencies, thus it must really not depend on something like KDE/KDevelop (probably not even QT!), rather it should provide an open interface (possibly providing a network'ed IPC interface?), so that it can be used by/with any (IDE) project on any platform, this would also ensure that there is a huge group of potential contributors. A network'ed IPC interface would also add the possibility to introduce a server/client architecture very soon, with all its benefits. b) one should really not attempt to write a parser for something as complex as C++, rather the key to success would be to have an open source compiler such as gcc/openc++ provide its COMPLETE parse tree as some sort of machine-readable (XML/RDF) output (there are already compilers (or patches for gcc) to do exactly this (will post references later!)) c) such XML/RDF output should then be used to populate a SQL database for each unit/source file, so that a cross-referencing database with symbols (namespaces, classes/structs,functions etc.) can be created. Then, it would mainly come down to knowing some basic SQL in order to be able to do actual refactoring(=SQL queries!). Also, we should concentrate on not using any specific SQL database, rather there are free SQL DB abstraction layers/libraries, that will support a whole variety of backends/databases (PostGres, mySQL, mSQL,BerkeleyDB, SQLite). d) it would probably be preferable to develop some sort of "meta"-refactoring language (possibly, SQL oriented) that would allow people to define macros (low- and high-level ones) for refactoring, for example: "RENAME FUNC void doStuff(void) IN FILE src/main.cxx to doStuff2" could automatically comprise: - select rows for corresponding source file and scope/namespace - locate/find symbols for function doStuff - rename function in prototype & declaration - determine callers (look up cross-references via SQL query) - optionally rename the callers accordingly during such an operation, the affected source files would be 'locked', so that other refactorings do not interfere with them. Users would then be offered the possibility to flush the changes from the DB to the corresponding source files. e) ultimately, it would probably not be desirable to hardcode such things in C/C++ for each refactoring action individually, rather a good approach might be to really embed some scripting interpreter such as perl into the library, so that each refactoring action/macro can be implemented using perl, that way the library would become highly maintainable and extendable. Eventually, this approach would also open up the possibility to easily add support for additional programming languages to such a library in the future. f) finally, it might become desirable to design an architecture with a high level of concurrency in mind (threaded design), so that the library can later on be used in a non-blocking mode (whenever possible), as well as support SMP platforms where applicable. Using CORBA for this goal, would ultimately also satisfy the requirements for a network'ed IPC interface. g) as such a library would ultimately be modifying the source code on the file system, it would initially probably be a good requirement to have the lib only work with source code that is being managed using a source management system such as CVS/SVN, that way we would not necessarily have to take care of the whole "UNDO" complexities when things are screwed up, and we would make sure that every step could be easily reverted quickly. more to come soon... :-) As promised, some related references: http://introspector.sourceforge.net (a patched gcc to output the parse tree in XML format) http://www-user.uni-bremen.de/~strasser/metacpp/ (another patched gcc to provide its parse tree as XML) A bunch of misc resources (unsorted) related to the overall idea can be found at the introspector author's blog: http://rdfintrospector.blogspot.com/ > Hi, > I did play quite a bit with this idea and I my thoughts so far > have lead to the following conclusions, feel free to comment :-) Erm, not that I want to discourage your enthusiasm, but have you ever heard of the "as much as necessary, as little as possible" principle? Meaning that you get the best results if you concentrate on what you really need, and dismiss the rest. Now a cross-platform, toolkit-independent, multilanguage, scriptable and networked refactoring library would maybe be a super cool project, but keep in mind also causes much more work to implement it. Removing dependences on Qt or even C++ already makes it a major effort, let alone things that are not really necessary for most people. Really, what advantages has the use of CORBA, perl, and SQL statements when you just want to create a refactoring dialog that calls some refactoring function and be done with it? In my opinion, the mentioned library would not be the first one to fail because of underestimated complexity. By the way, 300 votes for a bug doesn't mean that there are people willing to help. Might not be one single person doing actual work. I think the best thing to do would to actually start your Sourceforge project, commit some initial code and get people around you. If there is really this potential, they might just join in. Yes, maybe cooperation with other projects would be a good idea. How about you take responsibility for getting some of them together, and then we can see how it goes? (And the bottom line is... "if you don't do it, nobody does". And don't, in any case, expect me to code C voluntarily.) Now for a few of the other comments: > b) one should really not attempt to write a parser for something as complex > as C++, rather the key to success would be to have an open source compiler > such as gcc/openc++ provide its COMPLETE parse tree as some sort of > machine-readable (XML/RDF) output (there are already compilers (or patches > for gcc) to do exactly this (will post references later!)) Nice theory, but in practice it does not work. gcc, as a compiler, can only process valid source files, and if you have some code segment "anything->" without right side and semicolon, gcc based solutions just fail. IDEs want to work even when the code is not completely valid, so gcc is no option. Besides, someone in fact has attempted to write a C++ parser, and has totally succeeded, which means better C++ support in KDevelop4 and a portable parser that also can be reused by other projects and makes refactoring easier. > such XML/RDF output should then be used to populate a SQL database for each > unit/source file, so that a cross-referencing database with symbols > (namespaces, classes/structs,functions etc.) can be created. Then, it would > mainly come down to knowing some basic SQL in order to be able to do actual > refactoring(=SQL queries!). The database is already there, it's named "PCS store". > f) finally, it might become desirable to design an architecture with a high > level of concurrency in mind (threaded design), so that the library can later > on be used in a non-blocking mode (whenever possible), as well as support SMP > platforms where applicable. Using CORBA for this goal, would ultimately also > satisfy the requirements for a network'ed IPC interface. Please, please tell me what positive effect CORBA (ugh) has if you just want to move and rename a few code elements. I don't get it. Maybe other people see more potential in your ideas. > Erm, not that I want to discourage your enthusiasm, > but have you ever heard of the "as much as necessary, as little as possible" principle? > Meaning that you get the best results if you concentrate on what you really need, and dismiss the rest. Jakob, thanks for your posting-I do appreciate your comments. And yes, I also do see that my comments may have triggered inaccurate perceptions. So, let me clarify: yes, I am aware of the philosophy you mentioned above, and I do fully agree with your opinion that my above posting may appear to make things REALLY tricky- and unnecessarily so. > Now a cross-platform, toolkit-independent, multilanguage, scriptable and networked refactoring > library would maybe be a super cool project, but keep in mind also causes much more work to implement it. Yes, thus I even second many of your objections. Simply because most of the things I mentioned really seem to have barely any relevance with regards to an IDE-confined environment (which seems to be the perspective you are coming from) and the functionality you seem to envision. However, as I said above-I did already spend some time playing around with the idea of a multi-platform refactoring library and experienced different shortcomings in the various approaches, shortcomings which I tried to address by generalizing the requirements for such a project, of course I may have been prone to "over-generalizing" a bit ;-) So, while I can assure you that I do fully understand that pretty much none of the above is likely to be implemented in any sort of related effort anytime soon, I considered this the place to collect ideas and suggestions-basically, to conduct some sort of "collective brainstorming", nothing more and nothing less. Yes, I do also consider many of these ideas pretty overwhelming, however we were not yet talking of actual features or even actual code, rather we were merely talking of ideas and suggestions for a *possible* effort to approach such a project. And I am fully aware of the fact that people may have different visions and thus different requirements, my posting and your response are actually a pretty convincing proof for this. That's exactly why it is crucial to talk about our ideas in ADVANCE, *before* we actually start coding something, only in order to THEN find that we may have conflicting ideas,visions and plans. That is also why I appreciate your asking for clarification, simply because people will naturally have different ideas and expectations, for very different reasons-so it is only fair to discuss the details at lengths beforehand. On the other hand, you are actually talking of REMOVING dependencies: > Removing dependences on Qt or even C++ already makes it a major effort, let alone things that are not really necessary for most people. Personally, I do not really see how we can REMOVE something that does not yet exist? Thus, I wasn't talking of literally "removing" a dependency, but rather of coming up with design proposals that honor already sufficiently generic requirements, preferably satisfying all potential contributor's goals. So, I think something like (dropping) QT should actually only be a factor at all if there is any code existing that may turn out to be of relevance, something which I do not yet see-but maybe I missed something? BUT, if we are still talking about coming up with *IDEAS* for such a library, THEN -I think- it would be kind of unfortunate to honestly consider a QT based implementation, simply BECAUSE OF the various different requirements, AND QT's footprint (it isn't exactly lightweight is it?). For example, think about a GNOME/GTK based IDE whose users might eventually like to make use of such a refactoring library, but would certainly not want to have to install QT for a LIBRARY only? However, if there is indeed an existing code base that may be relevant, I would of course agree that it would be kind of pointless to start from scratch, after all its people who are actually contributing CODE who will end up being the decision makers. (BTW, I did actually not mention any recommendations as to favor a C-based vs. a C++ based approach) > By the way, 300 votes for a bug doesn't mean that there are people willing to help. > Might not be one single person doing actual work. I am very well aware of this, too. Nevertheless, these votes do illustrate some interest in the overall functionality, thus providing motivation for potential contributors (after all, it did motivate me to some extent!). > Really, what advantages has the use of CORBA, perl, and SQL statements when you just want to create > a refactoring dialog that calls some refactoring function and be done with it? basically, none-again, we are talking of different requirements. While my view may seem extremely complex to you, your view seems simplistic to me :-) Of course, your odds to get something done are significantly better if you are able to make complex things simple. > Nice theory, but in practice it does not work. gcc, as a compiler, can only process valid source files, That's of course right-but if you check out the aforementioned book by Martin Fowler on refactoring, you'll notice that for the majority of more advanced refactorings, the ability to deal with complete and valid source code will actually become a crucial necessity sooner or later. So, it makes only sense to use a C++ compiler to provide the parsing functionality. The functionality a separate C++ parser can offer is unlikely to be as feature-rich as the one of a "true" (compiler's) parser. Again, refactoring is much more than source code completion or renaming symbols-proper refactoring simply REQUIRES a reliable and complete database of the source code involved. I would rather use an existing compiler's parser (gcc) that I know is being permanently improved upon, than trying to come up with my own half-baked parser that may work under certain circumstances/limitations or not. Personally, I am pretty convinced that it is easier to address the shortcomings of a compiler based approach, than addressing the shortcomings of a handwritten parser, which in itself would be a totally new project. As I said already, I am seeing this whole thing as pretty modular, consisting of various layers: a) parser b) parser to RDF/XML converter c) RDF/XML to SQL converter d) SQL backend e) refactoring "modules" (possibly scripted, alternatively hard-coded in C++) f) library backend, to be used with IDEs -providing some sort of API or meta-refactoring language > and if you have some code segment "anything->" without right side and semicolon, gcc based solutions just fail. IMO, this is again too simplistic. Of course a C++ compiler will not provide a complete parse tree as soon as it encounters something as invalid/incomplete as "anything->", however you do not necessarily have to pass such incomplete statements, rather only the corresponding context is relevant and thus the DB would merely need to be queried with the proper context in mind. In this case, queried for a symbol matching "anything" in order to retrieve its public members. > IDEs want to work even when the code is not completely valid, so gcc is no option. Yes and no: You seem to have misconceptions about the approach I outlined above: assuming that you parse source code, provide its parse tree in RDF/XML format and that you populate a SQL database with the corresponding symbols ... you can indeed dynamically update the SQL database at runtime, as I said I did already play around with this, it works basically like this: 1) parse the source code code, put its symbols into the SQL db 2) establish a live link to the refactoring library 3) whenever a source file is manipulated, update the temporary SQL DB so that it reflects the latest valid changes So, when I add this at runtime to an arbitrary source file: /*************************** class CTest { public: void doStuff() {} bool getStuff() {return _stuff;} private: bool _stuff; }; int main() { CTest * ct = new CTest(); return 0; } ***************************/ the parser can indeed temporarily add corresponding symbols and honor them accordingly, so if you do indeed: ct-> it would mainly come down to informing the library about the event, so that it can query the temporary DB for ct's valid symbols (doStuff() and getStuff() in this case) and provide a combo box. Of course, there are indeed far more complex cases (think about nested namespaces or intra-namespace symbol lookups) > Besides, someone in fact has attempted to write a C++ parser, and has totally succeeded, > which means better C++ support in KDevelop4 and a portable parser that also can be reused by other projects and makes refactoring easier. That's good to hear-any references (URLs)? Is it a standalone parser, or is it bound to KDevelop? Also, does it depend on QT? We will see how things are working out. So, honestly-many of the things I mentioned were indeed based on my personal background and ideas. With the exception of a SQL database, I consider hardly any of these ideas to be crucial for the success of such a project. Even though I do indeed also see the advantage of a scripted backend for individual refactorings( simply because people/users could easily provide/contribute refactorings without having to recompile the source code). So, most of my ramblings should be mainly considered some sort of brainstorming. The fact that you do not see the necessity for using a SQL based database for the parse tree, shows again only that users like you and me have different ideas and expectations for such a lib. But, obviously I am not the only one who can envision an extremely broad area of application for a parse tree <-> SQL layer (Eric and others mentioned already ideas concerning source code analysis). This is exactly one possible way where a refactoring library and a code analysis tool could benefit from the same set of data, so no need to use different data creation schemes. That's exactly why you find me favoring a heavily modular (and admittedly also more complicated) approach to the overall idea, simply because I do personally also have multiple areas of application. So, it's really not that I am totally underestimating the complexity of the effort, rather I am very aware of it, and I also understand that even a pure C++/QT based approach would be highly complex. But likewise, I am also aware of the level of redundancy and potential pitfalls that could be avoided by a truly generic approach and design. Again, you seem to be seeing primarily a IDE/KDEVELOP based approach, whereas I do see how other/related projects might also like to benefit from such functionality, WITHOUT having to rely on KDevelop/KDE/QT (not necessarily strictly refactoring only). > The database is already there, it's named "PCS store". Okay, interesting-but this might be another example of the aforementioned redudancy: while I do have to admit that I am not at all familiar with this "PCS store" approach employed by KDevelop, I may ask you how you think programs such as ctags,etags, kscope etc. work? Right, basically all of these tools will attempt to build some sort of symbol database that they can query for symbols that are of interest to them. So, while you may object "hey, but they are doing different things", all of these tools could often basically use one common GENERIC data source easily for the majority of their work IF it was available: a symbolic parse tree representation (possibly but not necessarily compiler-provided). Thus, IF there was one generic tool available that 1) parses source code and 2) provides a well-documented and easily accessible database to such symbols, most if not all of such programs could easily make use of such a tool, rather than having to re-invent the wheel and use their own "database". > Please, please tell me what positive effect CORBA (ugh) has if you just want to move and rename a few code elements. simple: NONE, none at all! :-) Well again-I was merely making up my mind, referring to my own visions, ideas and requirements, being fully aware of the fact that something like this isn't going to be important to the majority of people who are mainly interested in an IDE-oriented refactoring lib. However, the motivation behind my mentioning CORBA was mainly due to my personal background at work, where we use several networked machines to do distributed work (i.e. distcc-based compilations), likewise a "CORBA'fied" refactoring library might not only support threaded/concurrent parsing/refactoring for a single machine, but also for networked machines in some sort of LAN. > I don't get it. Maybe other people see more potential in your ideas. I don't think the problem is in NOT seeing the potential, rather I think the problem might be to see the necessity in a more modular approach than you (and possibly others?) seem to envision. To repeat myself: I think it is absolutely out of question that it should primarily be crucial to get this thing rolling, regardless of any fancy ideas and complicated features (like the ones I just mentioned). However, in the long run it may indeed be a good idea to keep some of the more advanced ideas in mind when it comes to extending existing design and architectures. For example, you may not see the benefit in a networking interface for such a library, but personally I do indeed see the advantage of possibly running a dedicated refactoring server, specifically for this task-enabling users to do pretty complex things, without affecting the workstation's performance. Also, a SQL DB of a huge project (think Linux kernel) may indeed require a good amount of resources. Likewise, a network interface might eventually come in handy as soon as you want to enable IDEs to provide "live update" functionality for things like intellisense (auto-completion), so that an IDE may connect to the server and send/retrieve networked queries via some sort of networked refactoring shell, where you use some simple refactoring dialect to query the actual SQL DB. Thus, you would no longer need to manually provide some sort of event loop, rather you could simply feed all IDE-supported events directly to the library's network interface, and see what it returns. Thus, basically enabling IDEs to make use of such functionality without having to recompile sources. CORBA was merely meant to provide a generic solution to the possible requirement of concurrency, distribution/networking and straight-forward interfacing. Yet I have to fully agree that it would add quite an undesirable dependecy if pursued. As to the reasons for mentioning Perl, this was my original idea back in autumn last year-not so much because it would be my personal favorite, rather because it 1) offers SQL support out of the box, 2) can be easily embedded in any C/C++ application, 3) has a high level of regex support which would definitely come in handy for some of the more complex refactorings one may consider. So, even though my initial attempts were indeed concentrated on using Perl for individual refactorings, I would nevertheless still consider other scriptings languages such as Python/Ruby, too. Likewise, the reasons for using RDF/XML as intermediate source format were mainly meant to provide a generic solution to a generic problem. Eventually, it may indeed be the case that compilers will by default be able to output such info in some sort of XML format. Thus, it would only make sense to keep the whole thing maximally general. Of course, providing your own parser does offer its advantages, too-even though it does naturally also add redundancy... In conclusion, my posting was not so much meant to be too feature-specific, rather the main message was supposed to be that a high level of modularization and multi-platform'ness seem pretty desirable to me. > I think the best thing to do would to actually start your Sourceforge project, > commit some initial code and get people around you. > If there is really this potential, they might just join in. I would love to do so and see this happen, however this discussion has also proven that we first need to get the basics straight. Obviously, my personal ideas vary greatly from yours. Maybe Eric will chime in and provide his stance on things. As I said already, I would find it kind of unfortunate to start a project just to see how the potential contributors disagree even on the basics. > Yes, maybe cooperation with other projects would be a good idea. > How about you take responsibility for getting some of them together, > and then we can see how it goes? > (And the bottom line is... "if you don't do it, nobody does". I think, before we actually start trying to attract people it is required to come up with some BASIC concept, detailing our goals, requirements and a simple roadmap for the next 6-12 months. > And don't, in any case, expect me to code C voluntarily. As I said already, I didn't mean to imply we should favor C over C++ So, keep the flames comin' :-) This is nice to see some discussions around this idea :-) I registered a project on SourceForge and set up a Wiki so further discussions and planning can be made in a somewhat more "appropriate" environment, as this is aiming to be independent of kdevelop or the kde project in general. I took the liberty of naming it "carl" as in "code analysis and refactoring library". http://carl.sourceforge.net Here are some of my views of this project's goals: - This is a refactoring library. Code *must* be of good quality and follow good design principles (this is also implicitly a vote on C++ over C, even though I'm a C guy myself); - It has to be flexible enough to add new refactorings through a plugin/external interface; - It should be extensible to other 3rd-gen languages (those that share similar constructs to C++ and Java), but it should not be a essential goal for an initial release; - It has to be portable and limit external dependencies; - This is a library, so it must not directly depend on user input. The development tool will take care of asking and forwarding relevant information between the library and the user; - A pass through this library must not change the user coding style. This implies that comments have to be preserved and stay at their (relative) positions, as do other tokens. I'm not optimistic about external tools suitable for our code parsing needs, by my last requirement. I also have my doubts SQL queries would be complete enough to handle moderately complex refactorings. It can be good for renamings or the like, but I think it is absolutely necessary to work with whole code hierarchies. The common interaction between the caller (IDE or whatever) and the library would essentially be: 1. Caller calls a "BuildSyntaxTree" on each (or a subset of) project files. 2. The library parses the file and build a hybrid structure (syntax tree, code flow hints, scope and usage pointers, etc.) and returns a handle to the caller. 3. The caller then calls a refactoring, let's say "split temporary variable", with a word position index: Refactor(handle, "SplitTempVar", row, column). 4. The library looks for the corresponding token/structure on (row,column) using its 'code flow hints' data and calls the corresponding refactoring. SplitTempVar example: 4.1 The token retrieves through a pointer its scope 4.2 Code hierarchy is traversed to find occurences of direct assignation 4.3 On a direct assignation, a new token is declared and further references to the previous token are modified to this new token 4.4 Repeat up to the end of the token scope 5. The caller asks the library to render a new file content from the handle and updates the file accordingly As for CORBA, SQL, etc., I think the library code quality should be good enough that these will be integrated transparently if we ever need them. I personally think this is over-engineering at this point and it would be better to begin with something simple. After all, if it isn't trivial to add or change a function, we'll just refactor our code... Sorry for the rather lengthy reply, but I agree it's nice to see something happening :-) Also, maybe it's really a good idea to actually start a project in order to get this started. By the way, interesting name for a project, what was your second name? ;-) Just to get things straight: - personally I prefer C++ over C, so really no factor at all- I think C would make things even more complicated. - I also envision a very flexible and modular approach, again most of the things I mentioned are probably really not desirable initially, but it would indeed be good if the design is modular enough to allow certain features to be added without too much headache. - While I do agree that support for programming languages other than C/C++ should not matter anytime soon, I have to agree that it would eventually be nice to keep the architecture and design generic enough to allow contributors to add support for additional programming languages to such a library. Java support would definitely be cool. - I do fully agree concerning the portability requirements Eric mentioned, however personally I would really NOT mind having a reasonable amoung of external dependencies if they do add significant functionality (i.e think of boost)-however, such dependencies, should definitely not be something as huge as a hardly relevant GUI library. - I would also recommend a heavily library-oriented approach where it is made sure that the library will basically "serve the IDE". - Likewise, I also think that it is crucial to have such a library maintain comments and style. Concerning the use of external tools, I was specifically referring to compiler-based parsers previously, and this only for the reason that for quite a time there was obviously no free C/C++ parser that could be easily used for our purposes. Given the added overhead of using a compiler for the parsing task, I would definitely also consider alternatives. However, so far I am not yet aware of existing alternatives, other than of course writing your own parser. And then it's really a matter of outweighing the pros vs. the cons of using a compiler based approach, which would also have crucial advantages a handwritten parser would not have. So, why exactly do you Eric have doubts concerning the use of a compiler based approach? For the same reasons Jajob mentioned yesterday? I may indeed be wrong, but as long as there is no standalone C/C++ parser with reasonable dependencies available, I still consider the compiler based approach the most promising option. How would you currently try to address and satisfy the parsing requirements? Would you in fact start writing your own C/C++ parser, specifically for refactoring purposes? Also, concerning SQL: exactly how familiar are you with it? I am asking because I have indeed spend quite some time playing with the idea, and even simulating various refactorings from the Fowler book, basically I ended up learning that most (if not all!) refactorings could often be realized using SQL directly. Of course, this would require a correspondingly verbose database structure that honors the code's structure and layout. But maybe you can provide a quick example for a "moderately complex refactoring"? I think it might actually help to clear things up, so I would love to share my view on using a SQL based approach for such more complicated refactorings. Maybe you can even pick one of the refactorings from refactoring.com, so that we directly use a real life example? Likewise, I'd like to ask you for an example of working with "whole code hierarchies" or rather for an example why using the SQL approach could not equate doing exactly that? Apart from that I think it's a good idea to actually start some sort of roadmap, let's say we can agree on the following initial requirements: 1) C/C++ parser (pre-processed and raw) 2) some sort of database backend (SQL or not) 3) a database schema for the source code (mapping source code to tables,columns/rows) 4) implementation of individual low level routines required for refactoring ("find symbol","find callers","lookup namespace/scope" etc.) 5) implementation of higher level/macro routines for refactoring (rename/move/remove symbol etc.) ...then we should determine how we should initially approach each requirement. > - It has to be flexible enough to add new refactorings through a plugin/external interface;
Eric, I think the idea of using a plugin based approach is actually pretty interesting, on the other hand I think this would actually add more complexity than simply using a scripting language for the very same purpose. Also, I am not aware of any multi platform plugin libraries. Apart from that, a module/plugin based approach would again make CORBA an interesting option eventually.
Related threads from the gcc mailing list concerning the use of gcc to output its parse tree for use with other programs: http://gcc.gnu.org/ml/gcc/2005-05/msg00642.html http://gcc.gnu.org/ml/gcc/2004-11/msg01032.html http://gcc.gnu.org/ml/gcc/2003-04/msg01401.html http://gcc.gnu.org/ml/gcc/2002-08/msg00859.html http://gcc.gnu.org/ml/gcc/2001-12/msg01386.html http://gcc.gnu.org/ml/gcc/2000-10/msg00528.html http://gcc.gnu.org/ml/gcc/2000-08/msg00320.html http://gcc.gnu.org/ml/gcc/2000-08/msg00035.html Related and possibly interesting project: http://sourceforge.net/projects/cpptool > - A pass through this library must not change the user coding style. > This implies that comments have to be preserved and stay at their (relative) positions, as do other tokens. > I'm not optimistic about external tools suitable for our code parsing needs, by my last requirement. If you check out OpenC++ ( http://opencxx.sourceforge.net/ ) you'll find that it is specifically meant to be used as parser backend for (C++) refactoring purposes, basically it is merely a pretty powerful toolkit that can be used to write a highly-customized parser for C/C++ source code. So, you don't really get a ready-to-run parser if you download their tarball, rather you'll still have to tell the openC++ library what exactly it is supposed to do. So, currently I am still pretty convinced that it should be easier to customize such a parser, rather than writing your own parser from scratch. Any opinions? Concerning the preservation of user style/formatting, I think this would mainly come down to extending the underlying parser to explicitly look for (and report): 1) whitespaces, 2) tabs, 3) carriage returns. [preferably for Win32 & *nix folks) Equally, the parser should also honor comments the same way (/**/ and //). That way, the parser would provide some sort of "meta" information for all tokens it encounters. So, we would could store positional information as well as style/formatting information for all encountered tokens, too. Consequently, it would even be possible to use such a database to do pretty advanced reformatting, possibly even rule-based (i.e. "reformat all methods for class XY in file header.cxx") Basically, one could provide wizards to do a style-analysis and optionally even allow developers to automatically enforce a certain [spacing/commenting] style within a project By the way, I spent last weekend checking out additionaly resources and found it actually pretty enlightening to see how other people (and projects) have attempted to approach their goals, among others I also found several new and interesting resources, for example: http://www.nobugs.org/developer/parsingcpp Maybe, it's a good time now to start populating the wiki with some of these things. I think it's a good idea to minimally start defining some goals? Michael, after all your posts, I'm not as pessimistic as before for the use of an existing parser :-) In fact, I'm overly optimistic that this tool could be coded good enough that the underlying parts (parser/database/structures/refactoring API) can be abstracted enough to not matter anyway. Maybe I overvalue the use of patterns and our ability to refactor our own code, but I think we should make an example of good and extensible coding styles. I've thought again on the SQL backend, and must say I've "seen the light" as its flexibility to handle most, if not all refactorings. I'm however not sure it's the good tool for everything, but with some abstraction in its use, I don't think it matters anyway. I'm not as verbose on this bug report discussion as Michael, but I do think it's time for this project to actually move on. I think our first goal should be to fill up the wiki with refactoring algorithms/possible (SQL?) implementations and project goals. This will help us determine the required data structures and work our way up to a functional product. I won't post in this bug report anymore, unless it's directly related to KDevelop (ie.: KDevelop refactoring dialogs or whatever). http://carl.sourceforge.net >> b) one should really not attempt to write a parser for something as complex >> as C++, rather the key to success would be to have an open source compiler >> such as gcc/openc++ provide its COMPLETE parse tree as some sort of >> machine-readable (XML/RDF) output (there are already compilers (or patches >> for gcc) to do exactly this (will post references later!)) >Nice theory, but in practice it does not work. gcc, as a compiler, can only >process valid source files, and if you have some code segment "anything->" >without right side and semicolon, gcc based solutions just fail. IDEs want to >work even when the code is not completely valid, so gcc is no option. Besides, >someone in fact has attempted to write a C++ parser, and has totally succeeded, >which means better C++ support in KDevelop4 and a portable parser that also can >be reused by other projects and makes refactoring easier. Yet, even providing refactoring ONLY when the project is in a valid state would be really neat. In fact, even eclipse (for the simpler Java) requires a valid state in many cases. Finaly another pointer might be http://www.cs.umd.edu/~jfoster/cqual/ another idea: replace all bool arguments of a function by enum, so that func(true,false) becomes func(Class::doItSilently | Class::doItThisWay) i hope that kdevelop will settle down eventually so we can work on polishing. *** Bug 96406 has been marked as a duplicate of this bug. *** *** Bug 1728 has been marked as a duplicate of this bug. *** *** Bug 87718 has been marked as a duplicate of this bug. *** *** Bug 123462 has been marked as a duplicate of this bug. *** Since this seems to be a catch-all bug about refactoring, I add an idea here: Move assignments in constructors to the initalizer list. For example: Foo::Foo() { someMember = 12; } should become: Foo::Foo() : someMember( 12 ) { } This should take into account the ordering of the member variables in the class declaration, since otherwise GCC will spit out warnings. Hello, this seems to be a long-running wishlist. I am Ramón Zarazúa, and implementing this for kdevelop 4 is my GSoC project for 2009. I can post important updates on this bug. Not sure whether this should be changed to "invalid" or "resolved", but there's simply so many items and discussion in here that it doesn't have much worth. Some of the most important refactoring features have already been implemented, especially "rename ...", thus I'm marking this as fixed. For more specific refactoring ideas, new wishes should be opened, so there is a correct place to discuss, and something to mark as "resolved" once it's implemented. @Ramon: You should post your updates in some more visible place, like for example a blog :) |