Quantitative analysis of Boehm's GC Quantitative analysis of Boehm's GC

Quantitative analysis of Boehm's GC

  • 期刊名字:哈尔滨工业大学学报
  • 文件大小:893kb
  • 论文作者:GUAN Xue-tao,ZHANG Yuan-rui,GO
  • 作者单位:MicroProcessor R & D Center
  • 更新时间:2020-12-06
  • 下载次数:
论文简介

Journal of Harbin Institue of Technology (New Series), Vol. 10, No. 3, 2003Quantitative analysis of Boehm's GCGUAN Xue tao° ; ZHANG Yuan-rui, GOU Xiao gang, CHENG Xu( MicroProcessor R & D Center, Peking University , Beijing 100871 , China, * E-mail:chengxu@ mprc. pku. edu. cn)Abstract: The term garbage collection describes the automated process of finding previously allocated memorythat is no longer in use. in order to make the memory available to satisfy subsequent allocation requests. Wehave reviewed existing papers and implementations of GC, and especially analyzed Boehm's C codes,which isa real-time mark-sweep GC running under Linux and ANSI C standard. In this paper, we will quantitatively an-alyze the performance of diferent configurations of Boehm' 8 collector subjected to diferent workloads. Reportedmeasurements demonstrale that a refined garbage ollector is a viable alternative to traditional explicit memorymanagenent techniques, even for low-level languages. It is more a trade-off for certain system than an ll-or-nothing proposition.Key words: conservative; garbage ollection; incremental; mark-sweep; real-timeCLC number: TP391Document code: AArticle ID: 1005-91 13(2003 )03-0259-060 INTRODUCTIONthe performance comparison between systems with cer-tain collection and one with C' s traditional malloe andGarbage collection is a hotspot at the end of thefree though the comparison is considerably dependenteighties and the beginning of the nineties in the twenti-on the underlying system and the relevant benchmarks.eth century, and most papers have emerged for variousIn this paper, we report the quantitative performn-collection strategies.ance analysis of several different configurations of Boe-Traditional techniques include reference counting,hms ineremental mark-sweep garbage collection systemnark-sweep and copyingAnd many variations arsubjected to several different workloads. Reportedpossible. Advanced techniques have been developed tomeasurements demonstrate that a refined garbage col-ameliorate the situation in two different ways: first, bylector is a viable altemative to traditional explicit mem-spltting the work traditionally performed in a singleory management techniques, even for low-level langua-collector invocation into finer-grained parts; second ,ges like ANSI C.by exploiting statistical observations on object life-times. Generational collection, hardware assisted col-1 BOEHMS REAL TIME COLLECTORlection and incremental garbage collection are all ad-vanced choices. Generally, different collectors showFor real-time computer systems, high reliabilitydifferent advantages and disadvantages.and fault tolerance of both software and hardware are ofGarbage collection is not an all-or-nothing propo-utmost importance. Real-time software engineer is moresition. It is a trade-off for certain system. On oneconcerned with ensuring the system 's reliability thanhand, it can reduce the cost of development of a soft-optimizing its throughput.ware system and the occurrence of dynamic memoryTo meet real-time system requests, the CC re-management errors in both prototype and productionsearch community has put forth a lot of proposals for re-components. It can offer storage throughputs that ex.al-time garbage collection. The real-time engineer mustceed the capabilities of traditional memory managementbe able to derive the worst-case execution time for eachtechniques and enhance the sofware modeling and re-task in the systerm. Tasks interact with the garbage col-usability. But, on the other hand, it may increase thlector by allocating new memory and by fetching ancormplexity and reduce the overall system performance.storing to memory locations contained within heap-allo-And in the interactive and real-time system, it maycated objects. The delays associated with memory allo-suspend applications periodically.cation and memory access must be bounded by con-As is known to all, performance is essential tcstants中国煤r llow the real-timesystem software and its improvement needs expenses ofschedu化二be analyzed or pre-space, time or hardware. Many papers have showedcomputYHCNMHGrunsonCPU,its.Received 2003 -09 -05.259●Journal of Harbin Institute of Technology (New Series), Vol. 10, No. 3, 2003computation must be modeled as a periodic task withdoes the most frequent allocations and the requiredbounded execution time and a fixed period of execu-memories are always small. The variance of its malloction. The performance impact of garbage collection onbytes is the smallest, but the variance of its malloethe complete system must be small enough that it doestime is the biggest. Cksum, Diff, and Sort representot prevent system from meeting its real-time dead-for the programs, whose malloc times are of a lttledifference, but the memory requirements are distinct,Boehm’s collector is a real-time collctor. Ifrom low to high. Eppstein does not do much malloc,mainly uses mark -sweep algorithm to reclaim garbagewhile Pei does frequent malloc. But the variance ofproduced by user or system, and do allocations dynam-their memory requirements is small , respectively.ically.Tab.1 Benchmark introductionThis collector has two important features : conser-Program' V ersionIntroductionvative and incremental. It can operate with only mini-mal information about the layout of the client programsGNU groff - front end for the groff documentdata. Instead of relying on compiler provided informa-foratting system. Normaly it runs the torfftion on the location of pointers, they assume that anyGroff1.17.2 program and a post processor appropriatebit patterm that could be a valid pointer is a valid point-for the slected device. Called Subprograms:er. It can also take care to interleave collections withtroff version 1.17.2. grops version 1.17.2the main computation, in order to avoid distractingA block sorting file compressor. It com.pauses for both real-time and interactive applications.esss files using the Burows-Wheeler blockConceptually,Boehm’s ollector operates roughlysorting text compression algorithm, andin four phases : Preparation phase clears all mark bits,Bzip210.1huffman coding. Generally compression isindicating that all objects are potentially unreachable.considered better than more conventionalMark phase marks all objects that can be reachable viaL27MZ78-based compression. Performancechains of pointers from variables. Sweep phase scansis close to the PPM family of statisticalthe heap for inaccessible, and hence umarked, ob-compression.jects, and returms them to an appropriate free list forBelongs to lextutils 2.0.14. Sort lines of textreuse. And finalization phase inserts unreachable ob-Sort2.0.14jects into the queues, which have been registered forBelongs to textutils 20.14. Checksum andfinalization outside the collector.Chksum 2.0.14count bytes in a file.2 PERFORMANCE OF BOEHM'S GCBelongs to difuils 2.7.2. Find differ-Diff2.7.2ences between two flesEarly versions of this collector were developed asSimulates the detection and de screening ofa part of research projects supported in part by the Na-Halftone ilustrations algorithm, it takestional Science Foundation. and the Defense AdvancePei1.254 grey-scale images as inputs and produces aResearch Projects Agency. Much of the code was re-de -screened gray-scale output image for ec-written by Hans-J. Boehm, at Xerox PARC.ah halfone iluslration detected.2.1 Benchmark SelectionImplementation of Eppstein' S AigorithmRecently, we have ported several allocation- inten-Eppstein1.that eenunerates (by increasing weight) thesive C programs to our garbage- collected system.K shorteet paths in weighted graphs.Tab. 1 is the description of the benchmarks.In order to analyze these programs in our environ-The malloc time in the table is measured withoutment, it is necessary to :(1)Add ollection's library into the compiling li-garbage collction. And they are all within 10 ns,braries of certain program. Also, the program shouldwhich is very short. The least malloe time in the meas-ured system takes roughly 3 ns.include collection’s header.(2) Replace all occurrences of malloc with appro-2.3 Comparison between GC and NonGCThe allocator of traditional explicit memory man-priate invocations of new.agement is distributed as part of the GNU libe distribu-(3 ) Replace all occurrences of free with delete.2.2 Benchmark Allocation Statistics' 中国煤化工diferet aleaion algo: of the requested memo-Tab.2 shows the behavior of each benchmarkC NM H Gise more time and moreMaloc. These benchmarks show distinguished behaviorsIYHspace.of the memory allocations. Bzaip2 does a very lttleTab. 3 shows program execution time for GC andmalloc, but allocating a very large memory each timeNonGC.and the variance of the malloc bytes is also big. Groff●260.Journal of Harbin Institute of Technology (New Series), Vol. 10, No. 3, 2003User indicates user time, system is the systemdistincttime and wall is the total time. Program with garbageMINHINCR = 16collection takes a litler longer than without GC. ForV Minimum heap increment, in blocks ofexample, Diff uses 0.02 s to do malloc and uses 0.05HBLKSIZEto do malloe with GC.MAXHINCR =512Here we have program size of GC and NonGC.V Maximum heap increment , in blocksProgram size comparison: difference mean =TIME_ LIMIT =5041 443 bytes.V pause timesTab. 4 shows the program size with GC. Obvious-GC incremental =0ly, the size becomes larger than the traditional malloc.v Using incremental collectionThe difference mean is 41 443 on average.CC_ full freq =4Tab.2 Benchmark llocation statistisV Number of partial collections between fullBzip2 Cksum Diff Eppetein Grof Pei SortcollectionsMallocGC quiet =056115209. 94E5 7 54426TimesV Disable statistics outputMalloe bytes66.84E5 581424 026 18 342 1. 94E5GC_ word GC free. space_ divisor =4neanTab.4 Program size of GC and NonGCMalloc bytes 33.60E6 1 0244096 48000 24576 32776 2. 45E7Bzip2DifEppsteinMallc bytes g12Clibe89 049251 58926 354minGe129 690293 97867 159Malloc bytees 7.52E6 9113 16358 80523 1.84E7 2.58E6 2.45E7difference40 64142 38940 845SUMalle bye2.08E12 20520 3.31E5 1.15E8 2495 s. 28E64.74E12PeiConvolvePeiExtractPeiMakevarianceGlibe317 430144 744147 422Malloetime 9.72 3.88 3.80 6.52 3.76 3.94 3.74359 035186 349188 995mean/ ns41 60541 573Malloe timemax/ns253l57 14487 82We try to make sure that we allocate at least NMalloc time2.98 2.98 2.98 2.98 2.98 2.98 2. 98GC_ free_ space_ divisor bytes between collections ,min/nswhere N is the heap size plus a rough estimate of theMalle time817485root set size. Increasing its value will use less space ,but more collction time. Decreasing it will appreciablyTab.3 Program execution time of GC and NonGCdecrease collection time at the expense of space. GCfree_ space_ divisor = 1 will effectively disable colle-tions.User SystemWallUserSystemThe fllowing configurations are all compared with2.820.053.090.040.270.01benchmarks.G2.872.990.060.101. Findleak: #undef FIND LEAK2. Noninter: #undef AlL_ INTERIOR_ POINT-EpposteinPeERSUser System wall3. Nonstub: #undef STUBBORN ALOCClibe 0.010.06 0.640.301.184. Heapinc8: #define MINHINCR 160.020.760.324.135. Heapinc32: #define MINHINCR 166. Time60: #define TME LIMITT 607. Time80: #define TIME LIMIT 802.4 Behavior of Boehm ' s Collector8. Freespc2: GC. free. space_ divisor =2We have following configures here with initial9. Freespc3: GC free space divisor =3their values:10. Freespe5: GC free space. _divisor =5STUBBORN ALLOCV Define stubborn allocation primitives中国煤化工: divisor=6_divisor =7FIND LEAKALL INTERIOR _POINTERS5CNMH_divisor=8MERGE SIZES14. Increment: GC incremental = 115. Freq2: GC full freq =2 & GC_ incrementalV Round up some object sizes, so that fewer●261.Jourral of Harbin Institute of Technology ( New Series), Vol. 10, No.3, 2003=1so increases. For with the upper time increases, the16. Freq3: GC_ full freq =3 & GC. incrementalmarking time may also increase to push more objects in=he mark stack. So failure to finish a collection may be17. Freq5: GC_ full freq =5 & GC_ incrementalincreased. This causes heap to grow lager.FREO. Ia∞p心18. Freq6: GC_ full freq =6 & GC_ incremental19. Freq7: GC_ full_ freq=7 & GC incremental20. Freq8: GC_ full_ freq=8 & CC_ incrementalAnd we only choose four programs to compare:Bzip2, Diff, Pei and Eppstein.Statistic 1: max heap size for each configurationFigs.1 ~4 describe the max heap size of each pro-gram during. garbage collction. FREQ reflects theFig.3 Max heap size for Peiheap increasing times, and heap size is the multipleWhen GC free space, divisor increases, the fre-times of 65 536. For example, benchmark has heapquency of collection. is also increased, but each time ,size 64,which is actually 64 * 65 536 bytes.From tables, we know that if the collector allowsthe collector may not colleet enough areas for allocationincremental garbage collection, then the heap will be-needs, thus, expanding heap a litle as a result in pro-come about 2 ~ 5 times larger than the heap with world-gram' Eppstein.stop-collection. In the incremental mode,the. collectorcoreclaims less garbage than in the .stop-the-world mode ,so it makes heap grow faster. The configuration Freq3,Freq5,Freq6, Freq7 and Freq8 are all in incrementalmode. The result is most obvious in program Eppstein., HHE) hespsoeheap siczeFig. 4 Max heap size for EppsteinStatistic 2: collection timesWe look at program Bzip2 and Diff under configu-ration Freq2. The general program may trigger 5 timesof full mark, and the last column is the average bytes515020allocated between each invocation ( as shown inTab. 5).Fig. 1 Max heap size for Bzip2Tab. 5 Collection time of Bzip2 and DifProgramConfigColeetion times Bytes AllocatedBzip2Freq23600 140Diff2460 744Statistic 3: reclaimed bytesFig. 5 mainly shows the reclaimed bytes duringeach collection. In Fig. 5,the negative number does中国煤化工er than zero, but just ahasp si-ldicates that we reclaimYHC N M H Grich may not satisfly theFig. 2 Max heap size for Diffapplying of allocation.Statistic 4: object size distributionIf the pause time increases to 80 ms, the heap al-This measurement is on the benchmark. In most万数据Journal of Harbin Instiute of Technology (New Series), Vol. 10, No. 3, 2003cases, an objeet' s size is smaller than 400 bytes, andstead of relying on compiler provided information on thea few objects are lager. This reflects the original sizelocation of pointers, they assume that any bit pattermthat required by the user. In the collector, we actuallythat could be a valid pointer is a valid pointer. Gener-merge size first and align it before the allocation takesally, this is safe only under the assumption that objectsplace. And this can reduce the distinctions of objects’do not move.size, thus saving spaces( as shown in Fig. 6).It uses a two-level tree data structure to aid in fastpointer identification. Two kinds of objects in the col-lector are called pointer-free object and normal object1 500respectively. Pointer-free object is also called atomicobject, which contains no pointers. Normal object is营1000also called composite object, which contains pointers.Fig.9 is comparison between atomic objects and com-500position objects in use. Fig. 9 relects the result of pro-gram Bzip2. In the program, the atomic objects are-12500000 137000000 2875000000 4375000000nearly 0 bytes, meaning that most objects containspointers. This may produce overhead for conservativeFig. 5 Reclaimed bytes of each collectiongarbage collctor to check the candidate pointers.rnddl 3stennf开200|150|0o{n北DiFig. 6 Object size distributionStatistic 5: wasted areasWe know that the actually object size is largerthan the original one for merging and alignment rea-sons. So each allocation can produce wasted areas.There is a phenomenon, that the average allocation sizeis lager, the wasted area size is smaller. The mallocbytes mean of Bzip2 is 683 989,and itsmax wastedarea is4 732 bytes. The malloc bytes mean of Diff is180 484 4732 :872 1:536142. 232 4, the smallest, but its max wasted area canbe up to 11 536 bytes. The malloc bytes mean of Pei isPenn- Bbun2 .11一 rpe cali342. 040 5 and its wasted areas are mostly small. AndEppstein has the smallest wasted areas. Compared withFig.7 Comparison of wasted areasprogram' s malloc and free behaivor analysis, we couldconclude that more allocations means more wastedStatistic 8: false pointers measurementbytes( as shown in Fig. 7).Boehm' s collector maintains two different kinds ofStatistic 6: time measurementsblacklisting. A page may be black listed for interiorCollection time is the time of a whole garbage col-pointer references if it was the target of a near misslection. The result is that it mainly takes 20 ms to fin-from a location that requires interior pointer recogni-ish this. And the initial time for preparation phasetion, e. g. the stack. If the near miss came from atakes less than 10 ms. Some times finalizer is evoked ,sourcepointer recognition,and it also takes less than 10 ms. They are all withit is bl中国煤化工rage blacklisted in50 ms defined in bencbmark( as shown in Fig. 8). .this wa;YHCNMHGbject,80longasitStatistic 7: object kinds comparisonis not tne IISL Page UI a large uujcct.Boehm' s collector is a conservative garbage colThe configuration is noninter. False pointers fromlector, which can operate with only minimal informa-the stack are more dangerous, but they are not many,tion about the layout of the client program's data. In.263.Journal of Harbin Institute of Technology (New Series), Vol. 10, No. 3, 2003luckily. Tab. 6 shows the fact.Tab. 6 Statistics of false pointersBzip2Dif EppsteinPe3 CONCLUSIONSFalse pointers with-9163Now, there is no pure-mono-GC, i.e. ,any mod-out interior pointererm GC is the integration of several GC mechanisms andsome tricks. Every method is only an idea inside theFalse pointers from2the stackreal GC.As discussed above, we have known the featuresof the garbage collectors. We can ' t imagine a colectorOur future. works are concentrating in the deeply a-have all advantages of others, so we must make trade-nalysis of the Boehm s source code and performance.off. This trade-off is based with the target, or the per-We plan to do the collection research on our embeddedformance to which we attach great importance.A plat-system, and there are two directions , described as fol-form can evaluate some performance, but for the con-lowing:crete system,such as embedded system and hardware-(1) Adding GCMM to the MMU, and implemen-assisted garbage-collected system, it can' ↑describeting and analyzing hardware-assistant collectors, espe-the distinguishing parts, or can' t satisfy the require-cially the real-time features.ments, then the results are questionable, and it just(2) Inserting collectors to the embedded operatingpossible is our future work.system afforded with the compiler, and analyzing the 0-erall performance,especially the real-time features.30Both directions are practical works, and will formthe indeed useful materials.《20+++++++.++HReferences:[1] BARRA K. Mark/Sweep Compaction for Substantially nes-鲁10++ted Beta Objects, NCC-Note DTEK/03/88[R].[s. 1.]:Norwegian Computing Center, 1988.[2] BARTLETT J F. Compacting Garbage Collection with Am-biguous Roots, Technical Report 88/2[R].[s. I. ]: DECWestern Research Laboratory, 1988.3] HOSKING A L, ELIOT J,MOSS B, et al. Stefanovic: Acomparative performance evaluation of write barrier imple-1号89gmentations[J]. 0OPSLA 92 Proceedings, ACM SICPLAN00Notices, 1992 ,27(10): 92 - 109. .[4] LIEBERMAN H, HEWITT C. A real-time garbage ollc-tor based on the lifetimes of objects[ J ]. Communications ofFig.8 Collection timethe ACM,1983 ,26(6): 419 -429.[5] MORRIS L F. A time-and spac-fficient garbage collectioncomposition objectsalgorithm[ J]. Communications of the ACM, 1978 ,21(8):3000000662 - 665.6] WILSON P R. Uniprocessor garbage cllection techniques[ A]. Proceedings of the Intermational Workshop on Memo-200000ry Management[C].[s. l.]:[s. n.], 1992. 1 -42.[7] ZORN B. Comparing mark and-sweep and stop and copy100000garbage collection[ A]. Proceedings of the ACM Symposi-um on Lisp and Functional Programming[C].[s. L.]:[s.atomic objectsn.] , 1990. 87-98.3DATEFig.9 Object kinds comparison中国煤化工MYHCNMHG.264.

论文截图
版权:如无特殊注明,文章转载自网络,侵权请联系cnmhg168#163.com删除!文件均为网友上传,仅供研究和学习使用,务必24小时内删除。