jdk – 生命是一次奇遇

【转载】PhantomReference and Finalization

jojoster / 2016-04-052016-11-14 / 技术

软引用，弱引用以及虚引用¶

软引用（SoftReferences），比较典型的应用是在内存缓存的场景中。JVM会尽可能地将对象保留在内存中，当JVM内存不足的时候，才会从最早的references开始清除。根据javadoc中的描述，整个清除过程是没有保障的。

弱引用（WeakReferences）是我最经常使用的类型。典型的用途是在创建一些弱引用的监听器（Listener），或者是想收集某个对象的额外信息（使用WeakHashMap）的场景中。非常有助于降低类耦合度

其实笔者读到这里的时候，是产生了一些疑问的。为何使用weakHashMap可以降低类耦合度？设想一下使用了WeakHashMap的场景，weakhashmap可以优雅的解决内存释放的问题，但是如果没有WeakHashMap的话，那么实现就会复杂许多。可以在对象不在使用的时候，将它从Map中移除。这就需要容易管理者构造一个清理的函数给对象调用者使用，或者使用一个监听器模式。比如在编写一些使用者非常广泛的api类型的代码时候（比如jdk的api），添加这样的函数可能会使使用者的api变得非常复杂。

虚引用（Phantom references）则适用于在垃圾回收之前进行的预处理，比如需要释放一些资源的场景。遗憾的是，很多开发者会使用finalize()方法去执行这些操作，这并不是一个好的方式。finalize方法如果没有小心的使用在恰当的线程，恰当的时机，那么很可能会对应用造成可怕的性能影响，甚至会影响应用的数据完整性。

在虚引用的构造方法中，开发者需要显式的指定一个ReferenceQueue去将已经标记为“phantom reachable”的对象加入ReferenceQueue队列中。“phantom reachable”代表连虚引用本身都引用不到的对象。最令人迷惑的是即使Phantom references继续保持着私有对象的引用（区别于软引用以及弱引用），get方法也会返回一个null。这样一来，一旦进入这个状态的对象就无法再一次获得强引用。

开发者可以一次一次的对ReferenceQueue调用poll()方法，检测是否有新的PhantomReferences进入“phantom reachable”状态。正常的写法中，可以使类继承于java.lang.ref.PhantomReference，以保证引用的对象只垃圾回收一次，然后无法继续被获取。

`PhantomReference` 以及 `finalization`的细节¶

对PhantomReference 来说，最常见的误解会认为它是被设计用来“修复”finalizers 带来的对象逃逸问题。举个例子来说，我们常常会这么说：

虚引用可以避免finalize()带来的基础问题：finalize()方法可以通过创建一个新的强引用，使自身免于垃圾回收而进行“逃逸”。所以，重写了finalize()方法的对象，需要至少在两条分别的垃圾回收链中，才会被正确的回收。

然而，使用了虚引用，也有可能使对象出现逃逸，请看以下的代码

<span></span><span class="n">Reference</span> <span class="n">ref</span> <span class="o">=</span> <span class="n">referenceQueue</span><span class="o">.</span><span class="na">remove</span><span class="o">();</span>  <span class="c1">//ref is our PhantomReference instance</span>
<span class="n">Field</span> <span class="n">f</span> <span class="o">=</span> <span class="n">Reference</span><span class="o">.</span><span class="na">class</span><span class="o">.</span><span class="na">getDeclaredField</span><span class="o">(</span><span class="s">&quot;referent&quot;</span><span class="o">);</span>
<span class="n">f</span><span class="o">.</span><span class="na">setAccessible</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;I see dead objects! --&gt; &quot;</span> <span class="o">+</span> <span class="n">f</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">ref</span><span class="o">));</span> <span class="c1">//This is obviously a very bad practice.</span>

				
					
				1
2
3
4

						<span></span><span class="n">Reference</span> <span class="n">ref</span> <span class="o">=</span> <span class="n">referenceQueue</span><span class="o">.</span><span class="na">remove</span><span class="o">();</span>  <span class="c1">//ref is our PhantomReference instance</span>
<span class="n">Field</span> <span class="n">f</span> <span class="o">=</span> <span class="n">Reference</span><span class="o">.</span><span class="na">class</span><span class="o">.</span><span class="na">getDeclaredField</span><span class="o">(</span><span class="s">&quot;referent&quot;</span><span class="o">);</span>
<span class="n">f</span><span class="o">.</span><span class="na">setAccessible</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;I see dead objects! --&gt; &quot;</span> <span class="o">+</span> <span class="n">f</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">ref</span><span class="o">));</span> <span class="c1">//This is obviously a very bad practice.</span>

					

			

由此可见，表面上看，引用类型非常有可能是通过成员变量 Reference#referent 指向那些已经失去引用的对象。但是实际上，垃圾回收器对对特定的对象产生了一个意外。这一现象也直接对上文中的结论产生了冲突：

虚引用只用对象在实际的内存空间中被移除时候，才会执行enqueued操作。

到底哪种说法是正确的， javadoc是这样说明的：

Phantom references are most often used for scheduling pre-mortem cleanup

所以，如果虚引用并不是设计用来修正finalize逃逸的问题（这个问题非常严肃，曾经被Lazarus、Jesus 以及许多其他学者指出），那么虚引用究竟有什么作用？

finalize()方法实际是通过垃圾回收线程去执行的，即使在简单的单线程应用中，考虑到潜在因素，也可能出现并发问题（比如错误的将共享状态放入同步方法中等）。但是使用了虚引用的话，你可以制定执行出队操作的线程（在单线程程序中，指定的线程会周期性的做这个任务）

使用 WeakReference 的话，会如何？¶

弱引用看起来也会满足垃圾回收之前的内存清理场景。区别在于合适进行引用的入队操作。PhantomReference会在执行finalization之后入队，而WeakReference会在之前。对于finalize()方法中没有关键实现的对象来说，不受影响。

但是对于那些需要在finalize()方法中执行一些清理的对象，就会有些许不同

(PhantomReference’s get() 方法总是返回null)。开发者需要存储尽可能多的状态信息，去进行清理操作。举个例子，清理array中的对象，设置为null以后，开发者需要记录下来array中对象的下标，方便后续跟踪查看。对于这类型操作，可以将类继承于PhantomReference，然后创建这个类的实例。

下面更进一步的说说。

想象一个场景：一名开发者准备在某个对象中编写一段清除钩子的代码（通过 finalize()或者是通过[Weak|Phantom]Reference），当这个对象仅仅有属于线程栈空间的强引用（比如局部变量）的时候，开发者调用了一个方法，那么这时，可能发生这样的事情：

出于性能的考虑，JVM会检测是否这个对象有失去引用的可能。所以，在执行方法的过程中，finalization 可能被并发的执行。这样可能导致一些不可预料的结果（finalization 可能修改了一些类内部的状态，比如其他方法也会使用这些状态）。这种情况非常罕见，可以采取以下的方式修复：

<span></span><span class="n">Object</span> <span class="nf">method</span><span class="o">()</span> <span class="o">{</span> 
    <span class="kd">synchronized</span> <span class="o">(</span><span class="k">this</span><span class="o">)</span> <span class="o">{</span><span class="c1">//do work here }</span>
    <span class="k">return</span> <span class="n">result</span><span class="o">;</span>
<span class="o">}</span>

<span class="kd">public</span> <span class="kt">void</span> <span class="nf">finalize</span><span class="o">()</span> <span class="o">{</span>
    <span class="kd">synchronized</span> <span class="o">(</span><span class="k">this</span><span class="o">)</span> <span class="o">{</span> <span class="c1">//do work here}</span>
<span class="o">}</span>

				
					
				1
2
3
4
5
6
7
8

						<span></span><span class="n">Object</span> <span class="nf">method</span><span class="o">()</span> <span class="o">{</span> 
    <span class="kd">synchronized</span> <span class="o">(</span><span class="k">this</span><span class="o">)</span> <span class="o">{</span><span class="c1">//do work here }</span>
    <span class="k">return</span> <span class="n">result</span><span class="o">;</span>
<span class="o">}</span>
 
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">finalize</span><span class="o">()</span> <span class="o">{</span>
    <span class="kd">synchronized</span> <span class="o">(</span><span class="k">this</span><span class="o">)</span> <span class="o">{</span> <span class="c1">//do work here}</span>
<span class="o">}</span>

					

			

这种情况仅仅适用于那些仅仅在线程栈中持有的对象：
– 对象重写了finalize()方法。
– 有一个[Weak|Soft|PhantomReference]引用指向这个对象，同时已经进入了ReferenceQueue，有另外一个线程进行dequeue的操作

总结一下，最安全的清理机制，是通过PhantomReference以及ReferenceQueue，在同一个线程下进行清理。如果是启用了另一个线程，那么就需要使用同步方法快

Java知识探究一:关于IO类库

jojoster / 2015-03-092016-11-14 / 技术

经过组织考察，令我忽然发觉自己在最常用的Java中也有很多不明白的地方，实为平身一大憾事，今天特意抽时间将这些点滴记录下来，与大家一起分享

第一批想整理的知识点如下：

Java的IO探究，IO的整个结构与发展，顺带附上公司某小工写的断点续传代码学习。
Java的异常机制，关于编译时异常和运行时异常的探究。
JavaCommon包的理解，尤其是collection包的一些小看法，其实容器嘛，什么样的Utils也逃不出一些基本的范畴，比如存、取、排序、安全性、校验等等等。

闲话不多说，先开始今天的主题，研究一下IO的整个结构

从体系结构上划分，IO系统总共分为两大模块， IO和NIO（非阻塞），IO诞生于JDK1.4之前，JDK1.4时，产生了NIO，并且借用NIO重构了部分IO的代码，比如FileInputStream中增加了对NIO进行支持的getChannel()方法，再比如Reader和FileReader基本用nio全部重写了。

一、Think in IO

IO从实现上，大致分为字节流和字符流两种：

字节流。对文件的读写操纵以字节为单位，说的直白一点，就是操作byte，byte数组。对应无符号整数的话，就是read方法的正常返回值范围在[0，255]之间，范围有限的返回值有很多优点，比较有代表性的一个就是可以流来做一个简单的zip实现，算法的话，采用huffman树。当然，一个一个字节操作的话，效率不高，利用Buffer则效率提高不少。但是字节流有个问题，那就是在操作文本文件的时候，对于编码会有很多多余的代码，例子如下
```
FileInputStream is = new FileInputStream("F:\\books\\base\\vim常用指令.txt");
        byte[] buff = new byte[BUFFER_SIZE];
        int readSize = 0;
        while ((readSize = is.read(buff)) != -1)
        {
            System.out.println(readSize);
            if(readSize<1024){
                byte[] tmp = new byte[readSize];
                System.arraycopy(buff, 0, tmp, 0, readSize);
                System.out.print(new String(tmp, "GBK"));
            }else{
                System.out.print(new String(buff, "GBK"));
            }
        }
```

字符流。以字符作为单元进行操作，Reader内部实现其实就是以char或者char数组作为缓存容器的。操作文本文件时候方便许多。编码采用系统默认的编码格式。找了好久才找到代码的说+_+，代码隐藏的很深，从Reader找到ImputStreamReader，再到StreamDecoder再到nio包中的Charset，最终是优先获取系统中的环境变量，System.getProperties()也可以获取，windows7中文版的话，获取到的是“ file.encoding=GB18030”

/**
     * Returns the default charset of this Java virtual machine.
     *
     * <p> The default charset is determined during virtual-machine startup and
     * typically depends upon the locale and charset of the underlying
     * operating system.
     *
     * @return  A charset object for the default charset
     *
     * @since 1.5
     */
    public static Charset defaultCharset() {
        if (defaultCharset == null) {
        synchronized (Charset.class) {
        java.security.PrivilegedAction pa =
            new GetPropertyAction("file.encoding");
        String csn = (String)AccessController.doPrivileged(pa);
        Charset cs = lookup(csn);
        if (cs != null)
            defaultCharset = cs;
                else 
            defaultCharset = forName("UTF-8");
            }
    }
    return defaultCharset;
    }

下面详细叙述一下字节流

一、InputStream 和 OutputStream 是两个 abstact 类，对于字节为导向的 stream 都扩展这两个鸡肋（基类 ^_^ ） ;

FileInputStream，打开本地文件的流，常用，有3个构造方法
public FileInputStream(File file)
public FileInputStream(String name)
public FileInputStream(FileDescriptor fdObj) 值得强调，这个构造是不能直接用的，FileDescriptor 相当于打开文件的句柄，可以用一个文件流创建另一个，这样创建的流相当于是一个。一个流关闭的话，另一个也不能读取。
PipedInputStream，必须与PipedOutputStream一起使用，必须是两个或者多个线程中使用，类似生产者消费者模型， PipedOutputStream将数据写到共享的buffer数组中，通知PipedInputStream读取。有两点注意事项：a）使用PipedInputStream的read方法时候要注意，如果缓冲区没有数据的话，会阻塞当前线程，在主线程中运行的话，会卡住不动。b）PipedOutputStream所在的线程如果停止，那么PipedOutputStream所使用的资源也会回收，会造成pipe 的“broken”，PipedInputStream的read方法也会报错。“A pipe is said to be broken if a thread that was providing data bytes to the connected piped output stream is no longer alive. ”
FilterInputStream，本身是不能被实例化的，是BufferedInputStream等的父类，其实不创建这个类也可以实现它的子类，这个类内部的方法几乎全部都是复用父类的方法。其实它存在的意义更多是代表一个抽象，意思是在InputStream的基础之上对返回数据进行了重新包装或者处理，处理原因可能各不相同，于是又了各不相同的子类。
LineNumberInputStream，这个类是字节流和字符流转换中的失败产物，已经确定为被废弃，废弃的理由是在字节流中强制的判断读取换行，不考虑编码方面的问题。先不管功能能不能实现，首先从抽象层次上面就有欠缺。挪到字符流里面就皆大欢喜。对应的有LineNumberReader这个类可以使用。具体参见LineNumberReader详解。
DataInputStream，直接读取目标文件的byte，拼接或转化byte为其他基本类型，比如下面方法
```
public final int readInt() throws IOException {
        int ch1 = in.read();
        int ch2 = in.read();
        int ch3 = in.read();
        int ch4 = in.read();
        if ((ch1 | ch2 | ch3 | ch4) < 0)
            throw new EOFException();
        return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
    }
```
对于基本类型可以这样转化，但是对于float和double，各自用了Float类和Double类中的native方法进行转化，想来与操作系统底层有关系。
```
public final double readDouble() throws IOException {
    return Double.longBitsToDouble(readLong());
    }
```
唯一实现的比较复杂的是readUTF方法，需要读取全部数据，必须是符合格式的，需要用DataOutputStream的writeUTF进行对应的写。DataInputStream在实际运用中，还是应该与DataOutputStream一起使用，不然的话，意义不是十分大。

BufferedInputStream，初始化一个8192大小的缓存，提高效率用，调用API上面没有任何不同，只是减少了直接读取系统数据的次数。内部持有一个普通的inputStream，只有缓冲区空了以后，才真正调用inputStream的read去写满缓冲区，所以直接用BufferedInputStream的read方法可以提高效率。
有点意思的是这个类里面用了一个AtomicReferenceFieldUpdater对象来进行对volatile类型缓冲byte数组的更新和替换，这个类的compareAndSet方法带有原子性质的比较和更新。

/**
     * Atomic updater to provide compareAndSet for buf. This is
     * necessary because closes can be asynchronous. We use nullness
     * of buf[] as primary indicator that this stream is closed. (The
     * "in" field is also nulled out on close.)
     */
    private static final 
        AtomicReferenceFieldUpdater<BufferedInputStream, byte[]> bufUpdater = 
        AtomicReferenceFieldUpdater.newUpdater
        (BufferedInputStream.class,  byte[].class, "buf");// 创建原子更新器
...
/**
     * Fills the buffer with more data, taking into account
     * shuffling and other tricks for dealing with marks.
     * Assumes that it is being called by a synchronized method.
     * This method also assumes that all data has already been read in,
     * hence pos > count.
     */
    private void fill() throws IOException {
        byte[] buffer = getBufIfOpen();
    if (markpos < 0)
        pos = 0;        /* no mark: throw away the buffer */
    else if (pos >= buffer.length)    /* no room left in buffer */
        if (markpos > 0) {    /* can throw away early part of the buffer */
        int sz = pos - markpos;
        System.arraycopy(buffer, markpos, buffer, 0, sz);
        pos = sz;
        markpos = 0;
        } else if (buffer.length >= marklimit) {
        markpos = -1;    /* buffer got too big, invalidate mark */
        pos = 0;    /* drop buffer contents */
        } else {        /* grow buffer */
        int nsz = pos * 2;
        if (nsz > marklimit)
            nsz = marklimit;
        byte nbuf[] = new byte[nsz];
        System.arraycopy(buffer, 0, nbuf, 0, pos);
                if (!bufUpdater.compareAndSet(this, buffer, nbuf)) {//进行更新比较， 如果buf对象和buffer相同， 那么进行更新，不同的话，不更新
                    // Can't replace buf if there was an async close.
                    // Note: This would need to be changed if fill()
                    // is ever made accessible to multiple threads.
                    // But for now, the only way CAS can fail is via close.
                    // assert buf == null;
                    throw new IOException("Stream closed");
                }
                buffer = nbuf;
        }
        count = pos;
    int n = getInIfOpen().read(buffer, pos, buffer.length - pos);
        if (n > 0)
            count = n + pos;
    }

PushBackInputStream，特点是unread()方法，作用是在读取流的过程中自行添加入字节或者字节数组，进行重新读取，小说中随机插入的广告url倒是可以用这个实现，冷不丁的在读取过程中插入一个urlbyte数组，倒也方便。
ByteArrayInputStream，特点是内存操作，读取的数据全部都在缓存数组中，构造方法如下
```
public ByteArrayInputStream(byte buf[])
public ByteArrayInputStream(byte buf[], int offset, int length)
```
StringBufferInputStream，这个类已经被废弃，原因是错误的对字节流进行向字符流的转化，忽略了编码问题。值得一提的是，这个类里基本所有部分方法都是线程安全的。swing的某个类中还引用了这个方法。
ObjectInputStream，这个类可以说的比较多
1. 实现了两个接口，ObjectInut：定义了可以read到的类型，ObjectStreamConstants：定义了读取文件类型的常量，使用readObject时候，区分读取到的对象是什么类型，从序列化的对象进行读取时候，需要通过标志位来判断读取到的是什么对象，这个常量里面定义了这些值，都是short的。
2. 拥有一个内部类BlockDataInputStream，这个类的作用是读取基本类型数据时候进行缓存，以提高效率，但是也产生了问题，http://www.tuicool.com/articles/v6RNNr 反序列化和序列化一定注意，建议使用read(byte[]，start，end) 替代简单的read(byte[])，使用后者的话，可能出现读取乱码，内容错误等问题，尤其是音视频，可能出现杂音，因为ObjectInputStream是根据单个字节来判断数据类型的，所以一定要准确。

OutputStream，基本每个InputStream都有一个对应的OutputStream，来实现对应的功能，基本全都是抽象方法。

FileOutputStream，FileDescriptor相当于句柄，既然是句柄，就会有多个流可能使用之，所以FileDescriptor有incrementAndGetUseCount方法，用来线程安全的进行引用计数器+1的操作。另外值得注意的是，FileOutputStream还有追加写入的构造方法

public FileOutputStream(File file, boolean append)
        throws FileNotFoundException
    {
        String name = (file != null ? file.getPath() : null);
    SecurityManager security = System.getSecurityManager();
    if (security != null) {
        security.checkWrite(name);
    }
        if (name == null) {
            throw new NullPointerException();
        }
    fd = new FileDescriptor();
        fd.incrementAndGetUseCount();
        this.append = append;
    if (append) {
        openAppend(name);
    } else {
        open(name);
    }
    }

PipedOutputStream，需要与InputStream进行配合使用，不在赘述

【迁移】java 文件定位

jojoster / 2015-03-092016-11-14 / 技术

java中，定位class的方式，总共有以下几种

XXX.class.getResource(String resourceName)
XXX.class.getClassLoader().getResource(String resourceName)
Thread.currentThread().getContextClassLoader().getResource()
XXX.class.getProtectionDomain().getCodeSource().getLocation() –只能获取类的路径
System.getProperty(“user.dir”)–只能获取java的启动目录

下面对各种方式进行评析

调用class类本身的getResource方法，入口有参数名称前面是否有“\”是有很大区别的
调用classLoader的getResource方法，入口参数不能有有“\”

RMIJdbc使用中的问题

jojoster / 2015-01-212016-11-14 / 技术

最近项目里用到了rmijdbc，使用过程中发现了两个问题

jdbc-odbc读取备注类型（meno）字段，混合数字的长文本容易出现阶段的现象
rmi客户端连接执行查询时，会开启大量端口。大部分端口都只用一次就放弃不用，操作系统就会将端口设置为“TIME_WAIT”的状态

Continue reading “RMIJdbc使用中的问题”

软引用，弱引用以及虚引用¶

PhantomReference 以及 finalization的细节¶

使用 WeakReference 的话，会如何？¶

`PhantomReference` 以及 `finalization`的细节¶